Gokul Prabagaren

Gokul Prabagaren

Engineering Manager


Challenges of Spark Application Coexisting with NoSQL Databases

CapitalOne is the first US bank to exist out of on-premises and moved completely on Cloud. Over this process of modernizing our application in CapitalOne Card Rewards, we developed ground up custom transactions processing application on open source technologies like Spark, Mongo, Cassandra etc. This application currently processes millions of customer transactions daily providing them millions of miles, cash and points everyday. In process of building our application, we came across many challenging issues to have Spark application process data from MongoDB and Cassandra backend to serve customers. This talk is going to focus on few of those issues, what is the impact of those issue and how to mitigate them. The following are list of issues this talk will focus on:

  • How Cassandra Key sequence is important and how it impacts in querying
  • How Cassandra batching helps and works well with Spark partitions
  • Importance of Cassandra Data Modeling and its implications after MVP/Deployment
  • How to manage Mongo Connection (at JVM level)
  • Implications of using MongoSpark connector on its Partitioner

All the issues highlighted are faced by us in our application. This talk will focus on what are these issues in Spark/Mongo/Cassandra app environment and how to mitigate them. Anyone using Spark apps with Mongo and Cassandra databases as backend can benefits from this talk.