Avoid Storing Sensitive Data in Image Layers

Avoid Storing Sensitive Data in Image Layers

In this article, we'll explore why it's essential to avoid storing sensitive data in the image layer when working with Docker. We'll use the Docker CLI tool to build an image from a Dockerfile and demonstrate the security risks it poses when including sensitive information.

First, let's break down the Dockerfile we'll be working with:

FROM alpine
RUN touch /secret.txt
RUN echo "sensitive-data" > /secret.txt
RUN rm /secret.txt
  • The FROM command creates a layer based on the Alpine image.

  • The LABEL command modifies the image's metadata without creating a new layer.

  • The first RUN command creates a new file, altering the filesystem and thus creating a new layer.

  • The second RUN command writes data into the file, resulting in another new layer.

  • Finally, the last RUN command removes the file and writes the result into a new layer.

Now, let's build the image using the Docker CLI:

$ docker build . -t secret

We've successfully created a new image from the Dockerfile.

If you publish this image on a popular registry like Docker Hub, keep in mind that anyone with access to the Docker image can potentially access any file included in that image. Docker stores each layer separately, meaning that even if a subsequent layer removes the secret.txt file, the previous layer still contains the sensitive data.

Let's see that in action

$ docker run --rm -it secret /bin/sh 
/ # ls
bin    etc    lib    mnt    proc   run    srv    tmp    var
dev    home   media  opt    root   sbin   sys    usr
/ # exit

Upon running the image, you won't find the secret.txt file in the file system. However, the file is still embedded in the image layer.

Let's explore that by converting the image into a tar file using the docker save command:

$ mkdir secret-data
$ cd secret-data
$ docker save secret -o secret-image-unpack.tar
$ tar -xf secret-image-unpack.tar
$ ls
35c8305da895b7eb8329795724dca7b962df13090760e034de2061003e428ffa
47d192fd3df9c48cc1e2cc06900bddb9ac9417d0c4c039ab04288876ae449983.json
5bd981e7c2ca18bdef505759d56db73e93fe31fbf20c60c7adfc2ecefecc8a0e
619fc296dd43f00de5c3529dd4e2f33d1221379eee560ff273a7f35042759523
f60ab9307adae1629c41c2bec5e9efa0c105538846bbb31d529b1569f9959f63
manifest.json
repositories
secret-image-unpack.tar

Upon untarring, you'll find some interesting information. Let's explore it step by step:

  • manifest.json is a file that describes the image's configuration, tags, and a list of layers in a sequence where f60ab is the base layer and 35c8 is the last layer.
$ cat manifest.json | jq
[
  {
    "Config": "47d192fd3df9c48cc1e2cc06900bddb9ac9417d0c4c039ab04288876ae449983.json",
    "RepoTags": [
      "secret:latest"
    ],
    "Layers": [
      "f60ab9307adae1629c41c2bec5e9efa0c105538846bbb31d529b1569f9959f63/layer.tar",
      "5bd981e7c2ca18bdef505759d56db73e93fe31fbf20c60c7adfc2ecefecc8a0e/layer.tar",
      "619fc296dd43f00de5c3529dd4e2f33d1221379eee560ff273a7f35042759523/layer.tar",
      "35c8305da895b7eb8329795724dca7b962df13090760e034de2061003e428ffa/layer.tar"
    ]
  }
]
  • config file (47d192...json) includes the history of how the image was built and configured.
$ cat 47d192fd3df9c48cc1e2cc06900bddb9ac9417d0c4c039ab04288876ae449983.json  | jq '.history'
[
  {
    "created": "2023-09-28T21:19:27.686110063Z",
    "created_by": "/bin/sh -c #(nop) ADD file:756183bba9c7f4593c2b216e98e4208b9163c4c962ea0837ef88bd917609d001 in / "
  },
  {
    "created": "2023-09-28T21:19:27.801479409Z",
    "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/sh\"]",
    "empty_layer": true
  },
  {
    "created": "2023-11-07T12:32:37.493886117+05:30",
    "created_by": "RUN /bin/sh -c touch /secret.txt # buildkit",
    "comment": "buildkit.dockerfile.v0"
  },
  {
    "created": "2023-11-07T12:32:37.961070242+05:30",
    "created_by": "RUN /bin/sh -c echo \"sensitive-data\" > /secret.txt # buildkit",
    "comment": "buildkit.dockerfile.v0"
  },
  {
    "created": "2023-11-07T12:32:38.419003725+05:30",
    "created_by": "RUN /bin/sh -c rm /secret.txt # buildkit",
    "comment": "buildkit.dockerfile.v0"
  }
]

As you can see, in this case, the sensitive data is revealed in the step that runs the echo command.

From the manifest file we can see 619fc is layer where the we put sensitive data in the file.
Let's unpack the tar file and see what contents that layer holds:

$ cd 619fc296dd43f00de5c3529dd4e2f33d1221379eee560ff273a7f35042759523
$ ls
json  layer.tar  VERSION
$ tar -xf layer.tar
etc  json  layer.tar  secret.txt  VERSION
$ cat secret.txt 
sensitive-data

Surprisingly, we still have the file stored in the layer even if the file was deleted in the subsequent layer. Anyone can easily obtain that file by unpacking the image, So It's generally a best practice to not include any information that you are not supposed to reveal to the general public.

I hope you have gained insights into how the docker builds and stores the information in a series of layers. It's crucial to ensure that sensitive information is handled securely, outside of the image layers, to mitigate potential security risks.

If you are interested in docker, containers in general, let's connect on Twitter (@narharistwt) and interactively share the learning.