Secrets in Serverless

Secrets, Serverless, KMS, IAM Posted on

Serverless applications and cloud functions often need to communicate with an upstream API or service. Perhaps they require a username and password to connect to a database, an API key to talk to an upstream service, or a certificate to authenticate to an API. This raises questions like: How do I manage secrets in serverless environments? How do I get credentials into my serverless lambda or cloud function? How can I use secrets AWS Lambda or Google Cloud Functions?

This post describes common patterns and approaches for managing secrets in serverless, including the benefits and drawbacks of each approach. The code samples are available in a variety of languages on GitHub at sethvargo/secrets-in-serverless.

IAM

Before diving into the world of secrets management, it is important to ask ourselves if we need to manage secrets at all. Most cloud providers offer robust IAM controls which allow restricting access to specific authorized services and APIs. In essence, the function is granted the ability to talk to a service or API, and there is a basis of trust in the cloud provider to authenticate and authorize based on IAM policies.

Consider, for example, connecting a function to Cloud Bigtable. Instead of trying to inject a username, password, or API key into the cloud function, you can instead grant permission to read from Bigtable via a service account and GCP IAM will handle authentication and authorization for you.

It is also possible to leverage IAM across cloud providers. For example, you can create an OIDC provider on AWS that allows AWS to trust Google Cloud as an authentication provider. Once configured, you could grant permissions a Cloud Function to pull data from an private S3 bucket without needing to manually generate an AWS access key pair and securely inject it into the Cloud Function.

Even with robust IAM models, there are still situations where you need to inject secrets or credentials into a cloud function. Perhaps you operate a Postgres cluster, use OpenFaaS or Knative on-premises, or depend on an outdated technology that does not support IAM. In these cases, you will need to inject secrets or credentials into your serverless function at runtime.

Environment Variables

One of the most common methods for injecting secrets into serverless applications is with environment variables. Almost every application, platform, and service is able to read an environment variable, so it guarantees a reasonable support matrix and is a well-understood piece of 12 factor applications.

Here is an example serverless function that retrieves its configuration from environment variables.

import os

username = os.environ['DB_USER']
password = os.environ['DB_PASS']

def F(request):
    return f'{username}:{password}'

A user or CI system would submit this function using the CLI or API.

$ gcloud alpha functions deploy envvars \
    --runtime python37 \
    --entry-point F \
    --set-env-vars DB_USER=my-user,DB_PASS=s3cr3t \
    --trigger-http

On deploy, the function retrieves its configuration from environment variables submitted during configuration and stores the result in memory. We can invoke this function by visiting its HTTPS endpoint.

$ gcloud functions call envvars
my-user:s3cr3t

This was fast, quick, and easy... but is it secure?

While this approach is simple and straightforward, it comes with considerable security drawbacks - the secrets exist in plaintext in the environment. Any other process, library, or dependency running inside the process has access to the environment which has already been exploited multiple times. Unfortunately, it is trivial for a malicious library author to inject this type of vulnerability into an otherwise helpful utility package.

Environment variables are a great approach for storing configuration, but not secrets. Most programming languages are successful because they contain a vast ecosystem of third party tools and libraries, many of which are authored by individual contributors. Not all of those contributors may have your application's security interests at heart. Unless you are rigorously auditing your code and all its dependencies (and its dependencies' dependencies, and its dependencies' dependencies' dependencies, and ...), you should avoid storing secret or sensitive information in environment variables. Additionally, most cloud providers do not consider the environment of a function to be "secret". Anyone with read-only permissions on the cloud or lambda function can see its environment variables, making it clear that we should not be storing any sensitive or secret information in environment variables.

To be absolutely, unequivocally clear, you should not store secret or sensitive information in environment variables in plaintext.

Encrypted Environment Variables

Another common approach for injecting secrets into serverless applications is to use encrypted environment variables. Before an application or function is launched, the secrets are encrypted into ciphertext (encrypted strings) and stored in environment variables. Applications then decrypt the ciphertext at boot, giving the function access to the plaintext values.

s3cr3t                 -> KMS encryption -> CiQAePa3VEpDBjS2acf...
CiQAePa3VEpDBjS2acf... -> KMS decryption -> s3cr3t

This approach solves the issue of a rogue third party dependency submitting a raw dump of the environment, but it introduces new challenges, most notably:

  • Serverless applications must be authenticated to decrypt the environment variables
  • Serverless applications now must be aware of how to decrypt the environment variables

We can solve the initial authentication problem by leveraging cloud provider authentication. Instead of trying to authenticate the application, we authenticate service calls made from the application using the cloud provider's IAM or metadata service. For example, on Google Cloud, you can grant permission to the Cloud Functions runtime service account to decrypt data. The API calls to Google Cloud from inside the serverless function are authenticated through that service account. This removes the need for injecting initial authentication, like an API key or JWT, as plaintext into the application for service-to-service communications.

Making serverless applications aware of how to decrypt environment variables is a more challenging problem to solve. Ultimately this logic must be included in the application, or else it is susceptible to the same problems described earlier with plaintext environment variables. For example, we cannot leverage a wrapper, because then all parts of the serverless application would have access to the plaintext data. The decryption must be handled by our applications, which means our applications must be aware of how to decrypt data. Thankfully, most cloud providers include an SDK or client-side library for popular languages.

Below is an example for decrypting environment variables at boot with Python and Google Cloud Functions. A similar technique could be used for AWS or Azure (with different client libraries of course). This decrypts the values during boot and stores the plaintext values in-memory.

import base64
import os
import googleapiclient.discovery

crypto_key_id = os.environ['KMS_CRYPTO_KEY_ID']

def decrypt(client, s):
    response = kms_client \
        .projects() \
        .locations() \
        .keyRings() \
        .cryptoKeys() \
        .decrypt(name=crypto_key_id, body={"ciphertext":s}) \
        .execute()

    return base64.b64decode(response['plaintext']).decode('utf-8').strip()


kms_client = googleapiclient.discovery.build('cloudkms', 'v1')

username = decrypt(kms_client, os.environ['DB_USER'])
password = decrypt(kms_client, os.environ['DB_PASS'])

def F(request):
    return f'{username}:{password}'

Prior to launching the function, encrypt the plaintext using Google Cloud KMS.

$ echo "s3cr3t" | gcloud kms encrypt \
    --location=global \
    --keyring=serverless-secrets \
    --key=app1 \
    --ciphertext-file=- \
    --plaintext-file=- \
    | base64

CiQAePa3VEpDBjS2acf...

Grant the function IAM permissions to decrypt these values through KMS. This avoids the need to inject the "first secret" into the function, since it is automatically authenticated via its attached service account.

$ gcloud iam service-accounts create app1-kms-decrypter
$ gcloud kms keys add-iam-policy-binding app1 \
    --location global \
    --keyring serverless-secrets \
    --member "serviceAccount:app1-kms-decrypter@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" \
    --role roles/cloudkms.cryptoKeyDecrypter

Instead of injecting the plaintext environment variable values, inject the ciphertext encrypted values into the environment when launching the cloud or lambda function, along with the service account which has permissions to decrypt those values. At the time of this writing, Cloud Functions IAM in private alpha. You can request GCP IAM alpha access or wait until the public beta is available.

$ gcloud alpha functions deploy encrypted-envvars \
    --runtime python37 \
    --entry-point F \
    --service-account app1-kms-decrypter@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com \
    --set-env-vars KMS_CRYPTO_KEY_ID=projects/${GOOGLE_CLOUD_PROJECT}/locations/global/keyRings/serverless-secrets/cryptoKeys/app1,DB_USER=CiQAePa3VEjcuknRhLX...,DB_PASS=CiQAePa3VEpDBjS2ac... \
    --trigger-http

When the serverless function boots, it will read the encrypted environment variable values, decrypt them using Google Cloud KMS, and store the plaintext values in memory. If a dependency dumps the environment and sends it to an untrusted source, the attackers will only have the encrypted values. They will not be able to decrypt the encrypted values because they do not have permission to access that KMS key. The function still behaves as before, but it is less susceptible to being compromised via an environment dump.

$ gcloud functions call encrypted-envvars
my-user:s3cr3t

This approach trades a bit of complexity for added security. The application now has specialized code to decrypt values a boot. This makes it less susceptible to an environment dump attack, but it does tightly couple the application to the KMS provider. While many cloud providers offer a generous free tier, requests to KMS can also nominally increase the costs of running the function.

Google Cloud Storage

Update: There is an easier way to do this now. See the section below on Berglas for information on an open source tool that uses Google Cloud Storage and Google Cloud KMS to help with secrets management.

On Google Cloud, another approach is to leverage Google Cloud Storage for storing secrets and sensitive information. Data is always encrypted at rest on Cloud Storage, and it can also optionally be encrypted with a customer supplied (CSEK) or customer managed (CMEK) encryption key. Under this model, secrets are uploaded to Cloud Storage in plaintext, encrypted at rest on Cloud Storage, and access is tightly controlled via IAM permissions. It is also possible to enable object versioning on the bucket to keep a history of secrets for auditing or compliance reasons.

This approach should not be considered a replacement for a robust secrets management solution, but it offers a low barrier to entry, especially for development and staging environments where security requirements may be less strict than in production.

Create a Cloud Storage bucket to store the secrets.

$ gsutil mb gs://${GOOGLE_CLOUD_PROJECT}-serverless-secrets

Revoke the default bucket permissions. By default, anyone with access to the Google Cloud project can view objects in the bucket. The following commands make it so that only the bucket owner and explicitly granted users can access objects inside.

$ gsutil defacl set private gs://${GOOGLE_CLOUD_PROJECT}-serverless-secrets
$ gsutil acl set -r private gs://${GOOGLE_CLOUD_PROJECT}-serverless-secrets

Write some secrets into the bucket. Even though they are being committed as plaintext, they are encrypted at rest, and access is tightly controlled via IAM.

$ gsutil -h 'Content-Type: application/json' cp - gs://${GOOGLE_CLOUD_PROJECT}-serverless-secrets/app1 <<< '{"username":"my-user", "password":"s3cr3t"}'

Create a service account with permission to access the data in the encrypted Cloud Storage bucket. Notice that this only grants permission to access that single secrets file, not the entire bucket.

$ gcloud iam service-accounts create app1-gcs-reader
$ gsutil iam ch serviceAccount:app1-gcs-reader@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com:legacyObjectReader \
    gs://${GOOGLE_CLOUD_PROJECT}-serverless-secrets/app1
$ gsutil iam ch serviceAccount:app1-gcs-reader@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com:legacyBucketReader \
    gs://${GOOGLE_CLOUD_PROJECT}-serverless-secrets

The function downloads and parses the file from Cloud Storage, saving those values in memory for the lifetime of the function.

import os
import json
from google.cloud import storage

blob = storage.Client() \
    .get_bucket(os.environ['STORAGE_BUCKET']) \
    .get_blob('app1') \
    .download_as_string()

parsed = json.loads(blob)

username = parsed['username']
password = parsed['password']

def F(request):
    return f'{username}:{password}'

Deploy this function with the service account. This gives the cloud function IAM permission to read the secret from Cloud Storage. At the time of this writing, Cloud Functions IAM is in private alpha. You can request GCP IAM alpha access or wait until the public beta is available.

$ gcloud alpha functions deploy gcs \
    --runtime python37 \
    --entry-point F \
    --service-account app1-gcs-reader@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com \
    --set-env-vars STORAGE_BUCKET=${GOOGLE_CLOUD_PROJECT}-serverless-secrets \
    --trigger-http

Access the function at the published URL.

$ gcloud functions call gcs
my-user:s3cr3t

This function is retrieving its secrets at runtime from a Google Cloud Storage bucket, which is encrypted by default and can optionally be encrypted with a customer-managed or customer-supplied key. This approach provides a low barrier to entry with reasonable security guarantees for accessing secret or sensitive information in a cloud lambda or serverless function.

Berglas

Berglas is an open source tool for Google Cloud that automates the process of creating and managing a Cloud KMS key and Cloud Storage bucket. The tool is open source on GitHub.

Create a Cloud Storage bucket and Cloud KMS key with the most restrictive permissions:

$ berglas bootstrap --bucket my-bucket

Create a secret:

$ berglas create my-bucket/my-secret abcd1234 \
    --key $KMS_KEY

Access a secret:

$ berglas access my-bucket/my-secret

You can also use Berglas across a variety of Google Cloud products and services such as Cloud Functions, Cloud Run, Kubernetes Engine, Cloud Build, and more. For more information, please see the README and examples on GitHub.

HashiCorp Vault

HashiCorp Vault is a popular tool for secrets management. Under this model, each function authenticates to the secrets provider using machine-supplied information like instance metadata or a service account. Vault verifies that information and issues a time-based batch token with the permissions attached. Functions then use this token to authenticate further requests to Vault.

These examples are a bit more complex, so please see the code in sethvargo/secrets-in-serverless.

Final Thoughts

None of these methods fully prevent a determined hacker. Ultimately serverless function authors need to audit and secure their dependency trees appropriately. Even with these approaches, there are still unmitigated attack vectors like shared memory access for dependent libraries or social engineering attacks. However, taking these steps can help secure your serverless lambda functions from rudimentary attacks and hacking attempts.

There are still a number of unexplored topics in this post like storing secrets on the filesystem, virtual volume mounts, and other vendor-specific secrets management solutions. Hopefully I will have time to write about those in a future post, but each comes with its own tradeoffs and complexities just like the examples shown here.

Thank you for reading. I hope this post helped shed some light on the state of secrets management in serverless lambda and cloud functions. You can try these techniques on Google Cloud Functions today. If you have any questions, please tweet at me.

About Seth

Seth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.