Building an arm64 container for Apache Druid for your Apple Silicon

blog-post

The published Apache Druid container at Docker is a linux/amd64 only image. Running this on your Apple Silicon (M1 or M2 chipset) is slow.

Fortunately, it is super easy to build your own leveraging the binary distribution and existing docker.sh.

All of this is available in a Dockerfile and build script in the druid-m1 repository. The build.sh builds an arm64 image based on the version to be downloaded in the Dockerfile.

A linux/amd64 container-based deployment of Apache Druid on the Apple M1 Silicon takes 2 minutes (1:58.58 - test on Apple M1 Max with 64GB memory and 32GB allocated) to start and become available for processing. An image built with linux/arm64 based linux images only takes 18 seconds (0:17.79) to become available.

Just need an arm64/v8 image, just download the druid-m1 project and run the build.sh script. What to know a little bit into how it was put together; continue on.

Image

The process of creating this image isn’t complicated. Three major pieces went into it’s creation.

OS Architecture

First find and use containers that have an arm64/v8 image. Both “openjdk:11-jre-slim” and “busybox” have arm64/v8 images.

openjdk container image
Docker Container Image for OpenJDK 11

Software Installation

The Dockerfile downloads and installs Druid and downloads and uses the druid.sh that is being maintained by Apache Druid.

ARG DRUID_VERSION=0.23.0
ADD https://dlcdn.apache.org/druid/${DRUID_VERSION}/apache-druid-${DRUID_VERSION}-bin.tar.gz /tmp
ADD https://raw.githubusercontent.com/apache/druid/${DRUID_VERSION}/distribution/docker/druid.sh /druid.sh

Druid Extensions

Druid extensions are added by pull-deps operation available with Druid. For this build, the kafka-emitter extension is included, but others are easy to add.

RUN \
	java -cp "/opt/druid/lib/*" \
		-Ddruid.extensions.directory="/opt/druid/extensions/" \
		-Ddruid.extensions.hadoopDependenciesDir="/opt/druid/hadoop-dependencies/" \
		org.apache.druid.cli.Main tools pull-deps --no-default-hadoop \
		-c "org.apache.druid.extensions.contrib:kafka-emitter"

Why The Difference?

The Dockerfile that is part of Apache Druid is all about building the software. But since this is being done after a build is released; its approach is to used the fact that the binaries are available for download.

New To Druid?

If you are new to Druid and want to see what it can do, check out the druid-late demonstration within dev-local-demos. It leverages a container-based ecosystem provided at dev-local. Update the .env file within the druid folder to point to your individually built arm64 image.

Reach Out

Please contact us if you would like to talk about online analytic processing or event-streaming.