cfgpkg, or how to use age encryption with YAML

I have a case where I need to store a YAML file in an S3 bucket where it’s available to be copied to a server as part of an automated deployment process. The YAML file contains a few sensitive values, like API keys, so it’d be best to keep those values encrypted.

I’d prefer not to have to encrypt the entire file, because that makes it more difficult to work with and update. It would be nice if I could safely keep a local copy of the file, update it as needed, and only have to deal with encryption when I’m adding or changing a sensitive value.

Not being aware of anything that could encrypt and decrypt just a subset of values in a YAML file—and not looking too hard for one, since this seemed like a fun problem to solve. Let’s write a script.

age reached its 1.0.0 release recently, and it looks like a good tool for the job. It has a simple command line interface, and there are binaries for multiple platforms. Plus, age uses asymmetric encryption, which will let us keep a copy of the public key locally for encrypting new and updated values without having to worry about having a private key (capable of decrypting) lying around.

First, the requirements:

Encrypt sensitive values locally and leave others alone
Decrypt sensitive values on the server during deployment

That’s it, really. We should be able to open our .yml file in a text editor, add a key/value pair, mark it as sensitive, and then run the script to encrypt the sensitive value.

Both requirements imply the script will need a way to identify sensitive values—it needs to know what to encrypt and what to decrypt. Since config keys aren’t likely to include an exclamation mark, that seems like a good way to mark sensitive keys. Let’s do it like this: API_KEY!. And to let’s surround encrypted values in an age “tag” like this: <age>c2VjcmV0</age>.

It’s probably easiest to visualize what we’re going for, so let’s start with a simple YAML file at each state we want to handle: (1) the original file with a cleartext sensitive value, (2) the encrypted version that will be stored where a deployment script can get it, and (3) the final decrypted file for our application to use.

The original file:

APP_HOST: app.example.com
API_KEY!: secret_key_abc123

The encrypted file:

APP_HOST: app.example.com
API_KEY: <age>c2VjcmV0X2tleV9hYmMxMjMK</age>

The decrypted file:

APP_HOST: app.example.com
API_KEY: secret_key_abc123

The original file and decrypted file are identical except that the decrypted file does not contain a ! character marking the sensitive key.

Because the script needs to run locally and on our server, let’s write it as a Bash script. Ruby would surely make for simpler code, but that would introduce a deployment dependency our app may not already have.

Let’s start writing some code. At first we can ignore validation and usage instructions and just keep it simple. We know we need to be able to tell the program what to do (encrypt or decrypt), what YAML file to operate on, and what encryption key to use, so let’s start with a simple outline.

#!/bin/bash
cmd="$1"
yaml_file="$2"
if [[ "$cmd" == "enc" ]]; then
  public_key="$3"
  # TODO: encrypt sensitive values in $yaml_file
else
  private_key_file="$3"
  # TODO: decrypt encrypted values in $yaml_file
fi

Now’s would be a good time to see briefly how to interact with the age program using stdout and stdin:

# generate a key pair
$ age-keygen -o key.txt
Public key: age1abc123...

# encrypt and base64-encode a string using
# public key from age-keygen output
$ echo "secret" | age -r age1abc123... | base64
xyz789...

# base64-decode and decrypt string using
# private key from age-keygen run
$ echo "xyz789..." | base64 -d | age -d -i key.txt
secret

Next, let’s implement the encryption part. sed seems like a good tool to use for our in-place file editing.¹ We need to identify each sensitive key, remove the ! key suffix, and encrypt the value and wrap it in our <age>...</age> marker. To keep things simple we’ll assume sensitive values are on a single line with their keys. Indentation won’t matter.² And to avoid accidentally leaving a sensitive value on a commented-out line, let’s process those lines too.

So, let’s first find and extract sensitive keys into an array. (Note that I’m writing this on macOS, so I’ll use sed -E, which is like sed -r on Linux. We’ll address this in our script later. I’m also including extra line breaks to make the script more readable without horizontal scrolling.)

# extract keys ending with '!' from the
# YAML file, including commented-out lines
while IFS='' read -r line; do
  keys+=("$line")
done < <( \
  grep -E '^[# ]*\w+! *:' "$yaml_file" | sed -E \
    '/[a-zA-Z0-9_]+! *:/ s/^[# ]*([a-zA-Z0-9_]+)! *:.*/\1/' \
)

Now let’s iterate over our $keys array to remove its ! suffix and encrypt its value.

# encrypt values for extracted keys,
# and remove the '!' from the keys
for key in "${keys[@]}"; do
  # find the key and extract its value
  val="$(grep -E "^[# ]*$key! *:" "$yaml_file" \
       | head -n 1 | sed -E "s/^[# ]*$key! *: *(.*)/\1/")"
  # encrypt the value and encode as Base64
  enc="$(echo -n "$val" \
       | age -r "$public_key" | base64)"
  # replace the "key!: val" line with "key: <age>...</age>"
  [[ -n "$enc" ]] && sed -E -i \
    "/<age>/! s%^([# ]*$key)!( *: *).*%\1\2<age>$enc</age>%" \
    "$yaml_file"
done

Ok, not too bad, as shell scripts go. How about the decryption part? It’ll be similar to encryption—we just need to build our $keys array by looking for encrypted values instead of sensitive keys.

# extract keys with <age> values from the
# YAML file, including commented-out lines
while IFS='' read -r line; do
  keys+=("$line")
done < <( \
  grep -E '^[# ]*\w+ *: *<age>.*</age>' "$yaml_file" | sed -E \
    '/[a-zA-Z0-9_]+:/ s/^[# ]*([a-zA-Z0-9_]+) *:.*/\1/' \
)

Then we’ll Base64-decode and decrypt each value.

# decrypt values for extracted keys
for key in "${keys[@]}"; do
  enc="$(grep -E "^[# ]*$key *:" "$yaml_file" \
       | sed -E 's:.*<age>(.*)</age>.*:\1:')"
  val="$(base64 -d <<< "$enc" | age -d -i "$private_key_file")"
  [[ -n "$val" ]] && sed -E -i \
    "s%^([# ]*$key *: *).*%\1$val%" "$yaml_file"
done

Before putting it all together, let’s make a function for invoking sed using the arguments it expects based on the OS the script is running on. (I originally saw this in dehydrated.)

# use `-r` or `-E` depending on platform
_sed() {
  if [[ "$(uname)" = "Linux" ]]; then
    sed -r "${@}"
  else
    sed -E "${@}"
  fi
}
# use `-i` or `-i ''` depending on platform
_sed_i() {
  if [[ "$(uname)" = "Linux" ]]; then
    sed -r -i "${@}"
  else
    sed -E -i '' "${@}"
  fi
}

Let’s put together what we have so far, using our new _sed and _sed_i functions and some docs and input validation:

#!/bin/bash

# Encrypts and decrypts sensitive values in a YAML file.
#
# Values can be encrypted locally, such as in preparation
# for placing the file where it's available to a deployment
# process, and then decrypted on a server during deployment.
#
# Sub commands:
#
#   enc: identifies sensitive key-value pairs (any key
#        ending with "!"), then encrypts sensitive values
#        and removes trailing "!" from keys
#
#   dec: decrypts encrypted values
#
# Usage:
#
#   cfgpkg enc <config-file> <public-key>
#   cfgpkg dec <config-file> <key-file>
#
# Examples:
#
#   cfgpkg enc app.yml age1abc123...
#   cfgpkg dec app.yml /path/to/age.key


_sed() {
  if [[ "$(uname)" = "Linux" ]]; then
    sed -r "${@}"
  else
    sed -E "${@}"
  fi
}

_sed_i() {
  if [[ "$(uname)" = "Linux" ]]; then
    sed -r -i "${@}"
  else
    sed -E -i '' "${@}"
  fi
}

cmd="$1"
yaml_file="$2"

# exit if command, file, and key were not specified
if [[ $# -ne 3 ]] || [[ "$cmd" != "enc" && "$cmd" != "dec" ]]; then
  >&2 echo 'Usage: cfgpkg <enc|dec> <yaml-file> <pub-key/key-file>'
  exit 1
elif [[ ! -f "$yaml_file" ]]; then
  >&2 echo "$yaml_file does not exist"
  exit 1
fi


if [[ "$cmd" == "enc" ]]; then
  public_key="$3"
  # extract keys ending with '!' from the
  # YAML file, including commented-out lines
  while IFS='' read -r line; do
    keys+=("$line")
  done < <( \
    grep -E '^[# ]*\w+! *:' "$yaml_file" | _sed \
      '/[a-zA-Z0-9_]+! *:/ s/^[# ]*([a-zA-Z0-9_]+)! *:.*/\1/' \
  )
  # encrypt values for extracted keys,
  # and remove the '!' from the keys
  for key in "${keys[@]}"; do
    # find the key and extract its value
    val="$(grep -E "^[# ]*$key! *:" "$yaml_file" \
         | head -n 1 | _sed "s/^[# ]*$key! *: *(.*)/\1/")"
    # encrypt the value and encode as Base64
    enc="$(echo -n "$val" \
         | age -r "$public_key" | base64)"
    # replace the "key!: val" line with "key: <age>...</age>"
    [[ -n "$enc" ]] && _sed_i \
      "/<age>/! s%^([# ]*$key)!( *: *).*%\1\2<age>$enc</age>%" \
      "$yaml_file"
  done
else
  private_key_file="$3"
  # extract keys with <age> values from the
  # YAML file, including commented-out lines
  while IFS='' read -r line; do
    keys+=("$line")
  done < <( \
    grep -E '^[# ]*\w+ *: *<age>.*</age>' "$yaml_file" | _sed \
      '/[a-zA-Z0-9_]+:/ s/^[# ]*([a-zA-Z0-9_]+) *:.*/\1/' \
  )
  # decrypt values for extracted keys
  for key in "${keys[@]}"; do
    enc="$(grep -E "^[# ]*$key *:" "$yaml_file" \
         | _sed 's:.*<age>(.*)</age>.*:\1:')"
    val="$(base64 -d <<< "$enc" | age -d -i "$private_key_file")"
    [[ -n "$val" ]] && _sed_i \
      "s%^([# ]*$key *: *).*%\1$val%" "$yaml_file"
  done
fi

Ok, we have a working, (*nix) platform-independent script. I named it “cfgpkg” (config packager). Let’s give it a try using the sample YAML from the beginning of this post to ensure it works end-to-end. (You’ll need to install age if you haven’t already.)

Create our .yml file with sample data and generate our age key:

$ echo 'APP_HOST: app.example.com' > app.yml
$ echo 'API_KEY!: secret_key_abc123' >> app.yml
$ age-keygen -o age.key
Public key: age1abc123...

Encrypt the sensitive value in our .yml file:

$ cfgpkg enc app.yml age1abc123...
$ cat app.yml
APP_HOST: app.example.com
API_KEY: <age>c2VjcmV0X2tleV9hYmMxMjMK</age>

Decrypt the sensitive value in our .yml file:

$ cfgpkg dec app.yml age.key
$ cat app.yml
APP_HOST: app.example.com
API_KEY: secret_key_abc123

There are two features I’d like to add and one known bug to fix, but we’ll save these for another time:

[feature] the ability to “undo” an encryption run, essentially restoring our .yml file to its original pre-encryption state (same as the decrypted state, but with the ! sensitive key markers), possibly as a rev sub command
[feature] the ability to extract a single encrypted value without modifying the file, possibly as an ext sub command
[bug] duplicate sensitive keys cause a problem, including on commented-out lines, because sed replaces every matching occurrence it finds

I plan to update this post when these three things have been addressed. If you find other bugs or have an idea for a useful feature, feel free to email me.

Notes

When initially researching, I discovered sed has an e (execute) command that allows you to do substitutions using the output of a shell command that runs for each match. This did exactly what I needed, but I’ll save you the time you might have otherwise spent before finding this line in the docs: “This is a GNU sed extension.” This means it won’t work on the version of sed that ships with macOS, so I opted against using it here. ↩︎
If your YAML file contains a multi-line value, and one of the lines in that value starts with what looks like a sensitve key followed by a colon, this script will incorrectly process that line. To keep things simple, our script will process the file as text using regular expressions. In other words, it won’t use a YAML parser, because that would introduce a dependency. ↩︎