Financial Data Analysis – Kafka, Storm and Spark Streaming

In my earlier posts, we looked at how Spark Streaming can be used to process the streaming loan data and compute the aggregations using Spark SQL. We also looked at how the data can be stored in file system for future batch analysis. We discussed how Spark can be integrated with Kafka to ingest the … Continue reading Financial Data Analysis – Kafka, Storm and Spark Streaming

Financial Data Analysis using Kafka and Spark Streaming

In my earlier posts on Apache Spark Streaming, we looked at how data can be processed using Spark to compute the aggregations and also store the data in a compressed format like Parquet for future analysis. We also looked at how data can be published and consumed using Apache Kafka which is a distributed message … Continue reading Financial Data Analysis using Kafka and Spark Streaming

Introduction to Stream Processing using Apache Spark

In my previous post, we looked at how Apache Spark can be used to ingest and aggregate the data using Spark SQL in a batch mode. There are different ways to create the Dataset from the raw data depending upon whether the schema of the ingested data is already well-known in advance (RDD of Java … Continue reading Introduction to Stream Processing using Apache Spark