Optimizing Dockerfiles with Multi-Stage Builds

Streamline Your Containerization Process for Efficient and Lightweight Docker Images

Optimizing Dockerfiles with Multi-Stage Builds

Introduction: Docker has revolutionized the way we package and distribute applications. However, as your Docker images grow in size, optimizing them becomes crucial. In the past, developers had to resort to manual cleanup techniques or use separate Dockerfiles for development and production. Fortunately, the introduction of multi-stage builds has simplified the process, allowing you to create efficient and streamlined Dockerfiles while reducing complexity. In this article, we'll explore the power of multi-stage builds and how they can benefit your Docker workflows.

Docker Image Optimization

Before Multi-Stage Builds: Previously, developers would create separate Dockerfiles for development and production environments. The development Dockerfile contained all the necessary dependencies and tools, while the production Dockerfile was slimmed down, containing only the application and its required dependencies. This approach required manual cleanup and often involved complex scripting to keep the image size small and manageable.

The Builder Pattern: To optimize Dockerfiles, developers employed various techniques, commonly known as the builder pattern. These techniques involved compressing multiple commands into a single RUN instruction to reduce the number of image layers. However, this approach was error-prone and challenging to maintain, often leading to long and convoluted Dockerfiles.

Introducing Multi-Stage Builds: Multi-stage builds emerged as a solution to the challenges faced by developers in optimizing Dockerfiles. With multi-stage builds, you can use multiple FROM statements in a single Dockerfile, each representing a separate build stage. This approach allows you to selectively copy artifacts between stages, discarding unnecessary resources and reducing the final image size.

Example Dockerfile: Let's consider an example to illustrate the power of multi-stage builds. Suppose we have an application written in Go that requires two stages: a build stage and a production stage. Here's how a Dockerfile utilizing multi-stage builds would look:

# syntax=docker/dockerfile:1

FROM golang:1.16 AS builder
WORKDIR /go/src/github.com/yourusername/yourapp/
RUN go get -d -v golang.org/x/net/html  
COPY app.go ./
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -o app .

FROM alpine:latest  
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/yourusername/yourapp/app ./
CMD ["./app"]

In this example, the Dockerfile consists of two stages: 1. the builder stage and 2. the production stage.

The builder stage uses the golang:1.16 base image and compiles the Go application. It then passes the compiled artifact to the production stage, which uses the alpine:latest base image and adds the necessary dependencies to run the application. By copying only the compiled artifact, the final image size is significantly reduced.

You only need the single Dockerfile. No need for a separate build script. Just run docker build .

docker build -t alexellis2/href-counter:latest .

The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images, and you don’t need to extract any artifacts to your local system at all.

How does it work? The second FROM instruction starts a new build stage with the alpine:latest image as its base. The COPY --from=0 line copies just the built artifact from the previous stage into this new stage. The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.

Benefits of Multi-Stage Builds:

  1. Smaller Image Size: Multi-stage builds allow you to create lean and efficient Docker images by discarding unnecessary artifacts and dependencies. This results in smaller image sizes, reducing storage requirements and network transfer times.

  2. Simplified Dockerfile: With multi-stage builds, you no longer need to maintain separate Dockerfiles for different environments. You can consolidate your build stages into a single Dockerfile, improving readability and maintainability.

  3. Faster Builds: By eliminating unnecessary layers and dependencies, multi-stage builds can speed up the build process. Only the required stages are executed, reducing build times and improving developer productivity.

Naming Build Stages: To enhance clarity and flexibility, you can assign names to your build stages. By using the "AS" keyword, you can provide meaningful names to your stages and refer to them in subsequent instructions. This naming convention ensures that even if the instructions are reordered later, the build process remains intact.

# syntax=docker/dockerfile:1

FROM golang:1.16 AS builder
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html  
COPY app.go ./
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -o app .

FROM alpine:latest  
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/alexellis/href-counter/app ./
CMD ["./app"]

This example improves the previous one by naming the stages and using the name in the COPY instruction. This means that even if the instructions in your Dockerfile are re-ordered later, the COPY doesn’t break.

Stopping at a Specific Build Stage: When you build your image, you don’t necessarily need to build the entire Dockerfile including every stage. You can specify a target build stage. The following command assumes you are using the previous Dockerfile but stops at the stage named builder:

docker build --target builder -t alexellis2/href-counter:latest .

A few scenarios where this might be useful are:

  • Debugging a specific build stage

  • Using a debug stage with all debugging symbols or tools enabled, and a lean production stage

  • Using a testing stage in which your app gets populated with test data, but building for production using a different stage which uses real data.

Use an external image as a “stage”

When using multi-stage builds, you aren’t limited to copying from stages you created earlier in your Dockerfile. You can use the COPY --from instruction to copy from a separate image, either using the local image name, a tag available locally or on a Docker registry, or a tag ID. The Docker client pulls the image if necessary and copies the artifact from there. The syntax is:

COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf

Use a previous stage as a new stage

You can pick up where a previous stage left off by referring to it when using the FROM directive. For example:

# syntax=docker/dockerfile:1

FROM alpine:latest AS builder
RUN apk --no-cache add build-base

FROM builder AS build1
COPY source1.cpp source.cpp
RUN g++ -o /binary source.cpp

FROM builder AS build2
COPY source2.cpp source.cpp
RUN g++ -o /binary source.cpp

Conclusion: Multi-stage builds have revolutionized the way developers optimize Dockerfile. By leveraging multiple build stages, you can create lean and efficient Docker images without sacrificing readability and maintainability. With reduced image sizes and improved build times, multi-stage builds are a valuable addition to any Docker workflow. So, embrace the power of multi-stage builds and streamline your containerization process.