In the world of diginomics driven by Fintech and Mortgage Services, we have business teams asking for information on the effectiveness of product offerings as soon as a product is launched (Day 1). How do we take this journey and deliver it in a very fast-paced, compliant, and complex environment, with relevant acceptance and adoption? In this discussion, we will share our innovative approach for data engineering with integrated best practices in data management, IoT, and process automation.
The data application runs the processes in a highly distributed and memory-intensive framework to reduce processing time. We will be discussing the common, generic, self-learning data engineering framework which we have deployed to handle and manage all data integration challenges from multiple sources with one solution. The reusable, extensible program executes against multidimensional and semistructured XML data set and leverages Jupyter and Anaconda Notebook as well as core Apache components of Hortonworks Data Platform (such as Spark, Hive, Oozie, and Zeppelin). By leveraging PySpark and other tools, the modular architecture provides faster, easier data processing with lower development and maintenance cost.