In one of my earlier posts, we looked at how to create a Spark Application to read the Data from a CSV file. In this post, we’ll take a look at how Docker image can be created for the Spark Financial Analysis application so that it can be easily run inside a container.

The code discussed in this post is available in GitHub. This is an extension to the Spark Financial Analysis application that we created in my earlier post.

Let’s take a look at the steps required to create a Docker image for this application and push the image to Docker Hub.

  • Update the build.gradle to create a shadow/fat/uber jar including all the dependent jars. For this we are using the shadow jar plugin that packages the application to contain all the dependencies.
apply plugin: 'java'
apply plugin: 'idea'
apply plugin: 'com.github.johnrengelman.shadow'

sourceCompatibility = 1.8
targetCompatibility = 1.8

repositories {

dependencies {

    compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '2.2.0'
    compile group: 'org.apache.spark', name: 'spark-streaming_2.10', version: '2.2.0'
    compile group: 'org.apache.spark', name: 'spark-sql_2.10', version: '2.2.0'
    testCompile group: 'junit', name: 'junit', version: '4.11'

    manifest {
    attributes 'Main-Class': ''

shadowJar {
    zip64 true
  • Create a Dockerfile directly under the project containing following commands. This Docker file is used to create the Docker image for the Spark Financial Analysis application. The docker image follows a layered approach with new images built upon the base images. In this case we are using openjdk as our base image. We create a new directory for storing the build artifacts and also the input CSV file containing the loan data. Finally we define the entry point to run the application.
FROM openjdk:8-jre-alpine
RUN mkdir -p /opt/apps/spark
COPY build/libs/SparkFinancialAnalysis-1.0-SNAPSHOT-all.jar /opt/apps/spark
RUN ls -ltr
RUN mkdir -p /bigdata
COPY src/main/resources/LoanStats_2017Q2.csv /bigdata/LoanStats_2017Q2.csv
RUN cd /opt/apps/spark && ls -ltr
ENTRYPOINT ["java", "-jar", "/opt/apps/spark/SparkFinancialAnalysis-1.0-SNAPSHOT-all.jar"]
  • Run the build using following command

./gradlew shadowJar

  • Run the Docker build command to create the new image. here we tag the new image as version 1.0

docker build -t amansardana/spark-financial-analysis:1.0 .

  • Run the application

docker run –name=spark-financial-analysis-app amansardana:spark-financial-analysis

  • The new docker image can be published to Docker hub for anyone to pull the image and run the application anywhere inside the container.

docker push amansardana/spark-financial-analysis:1.0

