In one of my earlier posts, we looked at how to create a Spark Application to read the Data from a CSV file. In this post, we’ll take a look at how Docker image can be created for the Spark Financial Analysis application so that it can be easily run inside a container.
The code discussed in this post is available in GitHub. This is an extension to the Spark Financial Analysis application that we created in my earlier post.
Let’s take a look at the steps required to create a Docker image for this application and push the image to Docker Hub.
- Update the build.gradle to create a shadow/fat/uber jar including all the dependent jars. For this we are using the shadow jar plugin that packages the application to contain all the dependencies.
apply plugin: 'java' apply plugin: 'idea' apply plugin: 'com.github.johnrengelman.shadow' sourceCompatibility = 1.8 targetCompatibility = 1.8 repositories { mavenCentral() } dependencies { compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '2.2.0' compile group: 'org.apache.spark', name: 'spark-streaming_2.10', version: '2.2.0' compile group: 'org.apache.spark', name: 'spark-sql_2.10', version: '2.2.0' testCompile group: 'junit', name: 'junit', version: '4.11' } jar{ manifest { attributes 'Main-Class': 'com.financial.analysis.spark.SparkFinancialAnalysisMain' } } shadowJar { zip64 true }
- Create a Dockerfile directly under the project containing following commands. This Docker file is used to create the Docker image for the Spark Financial Analysis application. The docker image follows a layered approach with new images built upon the base images. In this case we are using openjdk as our base image. We create a new directory for storing the build artifacts and also the input CSV file containing the loan data. Finally we define the entry point to run the application.
FROM openjdk:8-jre-alpine MAINTAINER asardana.com RUN mkdir -p /opt/apps/spark COPY build/libs/SparkFinancialAnalysis-1.0-SNAPSHOT-all.jar /opt/apps/spark RUN ls -ltr RUN mkdir -p /bigdata COPY src/main/resources/LoanStats_2017Q2.csv /bigdata/LoanStats_2017Q2.csv RUN cd /opt/apps/spark && ls -ltr ENTRYPOINT ["java", "-jar", "/opt/apps/spark/SparkFinancialAnalysis-1.0-SNAPSHOT-all.jar"]
- Run the build using following command
./gradlew shadowJar
- Run the Docker build command to create the new image. here we tag the new image as version 1.0
docker build -t amansardana/spark-financial-analysis:1.0 .
- Run the application
docker run –name=spark-financial-analysis-app amansardana:spark-financial-analysis
- The new docker image can be published to Docker hub for anyone to pull the image and run the application anywhere inside the container.
docker push amansardana/spark-financial-analysis:1.0