Secrets Management in System Design: Vaults, Encryption, Rotation & Zero-Trust (Visualized)
Every production system holds sensitive credentials โ database passwords, API keys, TLS certificates, OAuth tokens. How you store, access, rotate, and audit those secrets is the difference between a secure system and a breach waiting to happen. This guide covers the full secrets management lifecycle with live animations.
Secrets management is the discipline of securely storing, distributing, rotating, and auditing sensitive credentials โ such as database passwords, API keys, TLS certificates, and OAuth tokens โ so that only authorized services can access them, and every access is traceable. Without deliberate secrets management, credentials end up hard-coded in source code, committed to version control, or baked into container images, creating persistent security vulnerabilities that are difficult to remediate after a breach.
Modern distributed systems can have hundreds of services each needing dozens of credentials. Manual secret distribution โ copying passwords into config files or environment variables set by hand โ does not scale and makes rotation painful. A proper secrets management system centralizes the storage, enforces access controls, and provides programmatic APIs so services can retrieve secrets dynamically at runtime rather than at build time.
What Counts as a Secret?
A secret is any piece of data that grants access to a protected resource or proves identity, and whose exposure would let an unauthorized party impersonate a legitimate principal. Common categories include:
Database credentials โ usernames and passwords for PostgreSQL, MySQL, MongoDB, Redis, and similar stores. API keys and tokens โ third-party service credentials (Stripe, Twilio, SendGrid) and internal service-to-service tokens. TLS/SSL certificates and private keys โ the key material that proves a server's identity to clients. Encryption keys โ symmetric keys used to encrypt data at rest. OAuth client secrets โ the shared secret between an application and an identity provider such as Auth0 or Google. SSH private keys โ used for server access and Git operations. Each category has different rotation cadences and risk profiles, but all share the same management requirements: encrypted storage, narrow access, full audit trail, and automated rotation.
Why Secrets Must Not Live in Code or Committed .env Files
Hard-coding a secret in source code โ or committing a .env file to a repository โ is the single most common cause of credential leaks. Once a secret is in version control, it remains in the history indefinitely even after deletion. Git history scanners, automated bots, and archive.org mirrors can surface it years later. GitHub's secret scanning feature flags known credential formats the moment they are pushed, but that is reactive, not preventive.
The correct model is never store a secret as a static artifact. Secrets should live only in a dedicated secret store, retrieved at runtime by an authenticated service identity. The running process holds the secret in memory for as long as needed, then discards it. The source code repository should contain only references (paths, secret names, environment variable names) โ never values.
Secret Stores and Vaults
A secret store (also called a vault) is a hardened, dedicated service that stores secrets encrypted at rest, enforces access-control policies, provides a cryptographically authenticated API, and emits an immutable audit log of every access. Services authenticate to the vault using a machine identity โ a cloud IAM role, a Kubernetes service account, a client certificate, or a short-lived AppRole token โ and receive only the secrets their policy explicitly permits.
| Tool | Type | Strengths | Typical Use Case |
|---|---|---|---|
| HashiCorp Vault | Self-hosted or HCP | Dynamic secrets, fine-grained policies, PKI engine, multi-cloud | On-prem, multi-cloud, strict compliance |
| AWS Secrets Manager | Managed (AWS) | Native IAM integration, automatic RDS rotation, CloudTrail logs | AWS-native workloads |
| AWS KMS | Managed (AWS) | Key management, envelope encryption, HSM backing | Encryption key management, not raw secrets |
| Azure Key Vault | Managed (Azure) | Certificates, keys, secrets, RBAC integration | Azure workloads |
| GCP Secret Manager | Managed (GCP) | Versioning, IAM binding per secret, regional replication | GCP workloads |
| Doppler | SaaS | Developer UX, CI/CD integrations, sync to cloud | Startups, fast iteration |
HashiCorp Vault is the most feature-rich open option. Its secrets engines are plugins that generate credentials on demand: the database engine connects to PostgreSQL and creates a temporary user with a TTL; the PKI engine signs X.509 certificates; the AWS engine vends short-lived IAM credentials. The application never stores a long-lived password โ it gets a credential that expires in minutes.
Authenticating to the Vault: Dynamic Short-Lived Secrets
The critical insight behind modern secrets management is the concept of dynamic secrets: instead of storing a static password forever, the vault generates a unique, time-limited credential the moment a service requests it. Each credential is associated with a lease โ a TTL (time-to-live). When the lease expires, the credential is automatically revoked at the source system. If a dynamic credential is compromised, the attacker's window is limited to the remaining lease time, often just 15โ60 minutes.
The service must renew its lease before expiry (like a hotel key-card being extended) or request a fresh credential. This keeps the system continuously authenticated without ever holding a permanent secret.
Encryption at Rest: Envelope Encryption and DEK/KEK
Secrets stored in a vault must themselves be encrypted at rest. The industry-standard pattern is envelope encryption, which uses two layers of keys to solve a fundamental chicken-and-egg problem: how do you protect the key that encrypts your data?
The mechanism uses two key types. A Data Encryption Key (DEK) is a randomly generated symmetric key (typically AES-256) used to encrypt the actual secret or data blob. It is unique per secret. A Key Encryption Key (KEK), also called a master key, is stored in a Hardware Security Module (HSM) or a managed KMS (Key Management Service) and is used to wrap (encrypt) the DEK. Only the encrypted DEK is stored alongside the ciphertext โ the plaintext DEK never touches persistent storage. When decryption is needed, the application calls KMS to unwrap the DEK, uses it to decrypt the data in memory, then discards the DEK. Rotating the master key only requires re-wrapping the DEKs โ the ciphertext does not change.
AWS KMS, Google Cloud KMS, and Azure Key Vault all implement envelope encryption under the hood. When you call aws kms generate-data-key, KMS returns both the plaintext DEK (use it, then discard) and the encrypted DEK (store it). Later, aws kms decrypt unwraps the DEK so you can decrypt your data. The plaintext DEK never persists โ only the encrypted copy does.
Secret Rotation
Even well-protected secrets should be rotated regularly. Rotation limits the blast radius of an undetected leak: a credential that was silently stolen six months ago becomes useless after rotation. It also ensures that processes which should no longer have access โ terminated employees, decommissioned services โ lose it automatically.
The challenge with rotation is zero-downtime transition. A naive approach (delete old password, set new one) creates a window where old clients fail before they receive the new credential. The robust pattern is a three-stage overlap rotation:
Stage 1 โ Create new credential: the secret store creates a new credential at the source (e.g., creates a new DB user or generates a new API key). Stage 2 โ Dual active: both the old and new credentials are valid simultaneously. Running services use the old one; new instances pick up the new one. Stage 3 โ Retire old: once all instances have cycled to the new credential (verified by a grace period or explicit health check), the old credential is revoked. This is the pattern AWS Secrets Manager uses for RDS rotation via Lambda.
AWS Secrets Manager automates this entire cycle for supported databases. You configure a rotation schedule (e.g., every 30 days) and a Lambda function that performs the three-stage rotation. Services that retrieve secrets via the SDK automatically get the latest version without any code changes or deployments.
Least-Privilege Access and Audit Logging
Least privilege means each service should be able to read only the specific secrets it needs, and nothing else. In HashiCorp Vault, this is expressed as policies written in HCL:
# Vault policy: payment-service can only read its own DB credentials
path "secret/data/payment-service/db" {
capabilities = ["read"]
}
# payment-service CANNOT read secrets for auth-service or admin
# path "secret/data/auth-service/*" { ... } <- not grantedIn AWS, this maps to IAM policies restricting secretsmanager:GetSecretValue to specific ARNs. Every call to retrieve a secret is captured in an audit log โ in Vault's audit device, AWS CloudTrail, or GCP Cloud Audit Logs. The log entry records the caller identity, timestamp, secret name, source IP, and result (allowed/denied). This makes it possible to answer questions like: which services accessed the database password last week, and did any access happen from an unexpected IP?
Audit logs should be shipped to an immutable, append-only store (e.g., AWS S3 with Object Lock, or a SIEM) so they cannot be tampered with even by a compromised service.
Secrets in CI/CD Pipelines
CI/CD pipelines are a high-risk surface for secret exposure. Build logs are often stored and surfaced to many users; environment variables can be printed by careless debug statements; artifacts may be cached with credentials embedded. Best practices for pipeline secret hygiene:
Use native secret injection: GitHub Actions Secrets, GitLab CI/CD Variables, and CircleCI Environment Variables encrypt secrets and mask them from logs. Prefer OIDC federation: modern CI systems support OpenID Connect tokens that let the pipeline authenticate to cloud providers (AWS, GCP) as a trusted identity, receiving short-lived credentials rather than storing long-lived API keys. Never print secrets: add lint rules or git pre-commit hooks that detect patterns like print(password) or console.log(secret). Scan before push: tools like trufflehog, detect-secrets, and gitleaks scan commits for credential patterns and can block the push if found.
Secrets Management Tool Comparison
| Capability | HashiCorp Vault | AWS Secrets Manager | Doppler |
|---|---|---|---|
| Hosting | Self-hosted or HCP | Fully managed | SaaS |
| Dynamic secrets | Yes (DB, PKI, AWS, SSH) | RDS only | No |
| Automatic rotation | Via plugins | Native for RDS/Redshift | Manual or webhook |
| Audit log | Vault audit device | CloudTrail | Activity log |
| Envelope encryption | Transit secrets engine + KMS | KMS-backed | AES-256 at rest |
| Access control | HCL policies + AppRole + K8s | IAM policies + RBAC | Project-based RBAC |
| Multi-cloud | Yes | AWS only | Yes (sync to any) |
| Free tier | Open-source self-host | First 30 days free | Free for individuals |
Frequently Asked Questions
What is the difference between a secret store and a key management service (KMS)?
A secret store (like HashiCorp Vault or AWS Secrets Manager) is optimized for storing and retrieving arbitrary credentials โ database passwords, API keys, certificates. A Key Management Service (like AWS KMS or Google Cloud KMS) is optimized for cryptographic operations: generating keys, encrypting and decrypting small payloads, and managing key lifecycle and rotation of encryption keys specifically. The two are complementary: a secret store typically uses a KMS to encrypt the secrets it holds (envelope encryption). You would use a KMS directly when you need to perform encryption operations in your application code, and a secret store when you need to securely distribute credentials to services.
How do services authenticate to the secret store without a pre-shared secret?
This is the classic bootstrapping problem and it is solved differently depending on environment. In cloud environments, services authenticate using their cloud IAM identity โ an EC2 instance profile, an EKS service account bound to an IAM role, or a GCP service account. These identities are provisioned by the infrastructure layer (not the application), so no secret is ever distributed manually. In on-premises environments, HashiCorp Vault's AppRole method provides a role ID (non-secret, embeddable in config) and a secret ID (short-lived, delivered via a trusted orchestrator like Terraform or a CI/CD pipeline). Kubernetes-native workloads use the Kubernetes auth method, where the pod presents its Kubernetes service account JWT to Vault, which validates it against the Kubernetes API server.
How often should secrets be rotated?
Rotation frequency should be proportional to the risk profile of the secret. Dynamic secrets (issued per-request by Vault's database engine) are effectively rotated with every use โ their TTL may be 15 minutes to 1 hour. Long-lived API keys for third-party services should rotate every 30โ90 days; AWS recommends 30 days for IAM access keys. TLS certificates typically have 90-day validity (Let's Encrypt) or 1-year validity for internal CAs, and should be rotated before expiry with automated tooling like cert-manager. Database passwords in production should rotate every 30 days at minimum, automated via a scheduler. The key principle: the rotation cadence must be shorter than the window in which a stolen credential could go undetected.
A secret that never rotates, never expires, and lives in a config file is not a secret โ it is a liability waiting to be discovered. Treat every credential as ephemeral by design.
โ alokknight Engineering
