In this post we'll look at how to scale a spring batch application on AWS using remote partitioning technique. Spring batch applications can be scaled by running multiple process in parallel on remote machines that can work independently on the partitioned data. There is a master step that knows how to partition the data and … Continue reading Scaling Spring Batch Application on AWS with remote partitioning
Category: Big Data
Financial Data Analysis using Kafka, Storm and MariaDB
In my previous posts, we looked at how to integrate Kafka and Storm for streaming loan data and cleansing the data before ingesting it into processing pipeline for aggregating the data. We also looked at how to leverage Liquibase for managing the relational database in form of immutable scripts that could be version controlled. This fits … Continue reading Financial Data Analysis using Kafka, Storm and MariaDB
Financial Data Analysis – Kafka, Storm and Spark Streaming
In my earlier posts, we looked at how Spark Streaming can be used to process the streaming loan data and compute the aggregations using Spark SQL. We also looked at how the data can be stored in file system for future batch analysis. We discussed how Spark can be integrated with Kafka to ingest the … Continue reading Financial Data Analysis – Kafka, Storm and Spark Streaming
Stream Processing using Storm and Kafka
In my earlier post, we looked at how Kafka can be integrated with Spark Streaming for processing the loan data. In the Spark streaming process, we are cleansing the data to remove invalid records before we aggregate the data. We could potentially cleanse the data in the pipeline prior to streaming the loan records in … Continue reading Stream Processing using Storm and Kafka
Financial Data Analysis using Kafka and Spark Streaming
In my earlier posts on Apache Spark Streaming, we looked at how data can be processed using Spark to compute the aggregations and also store the data in a compressed format like Parquet for future analysis. We also looked at how data can be published and consumed using Apache Kafka which is a distributed message … Continue reading Financial Data Analysis using Kafka and Spark Streaming
Introduction to Stream Processing using Apache Spark
In my previous post, we looked at how Apache Spark can be used to ingest and aggregate the data using Spark SQL in a batch mode. There are different ways to create the Dataset from the raw data depending upon whether the schema of the ingested data is already well-known in advance (RDD of Java … Continue reading Introduction to Stream Processing using Apache Spark
Analyzing financial data with Apache Spark
With the rise of big data processing in the Enterprise world, it's quite evident that Apache Spark has become one of the most popular framework to process large amount of data to both in the batch mode and real-time. This article won't go into the overview of Apache Spark since there is already many good … Continue reading Analyzing financial data with Apache Spark