Data Integration: 2013's Top 3 Trends
Big data, Hadoop, and the need for faster performance will be at the forefront of data integration this year.
By Jorge A. Lopez, Senior Manager, Data Integration, Syncsort
In 2012, many IT organizations continued to view data integration as a major obstacle to achieving strategic business objectives. In some cases, DI tools once viewed as a means of simplifying data integration have become an obstacle for IT organizations needing faster, more efficient, and more cost-effective ETL. In fact, a 2012 research report by the analyst firm Enterprise Strategy Group found data integration complexity was the number one data analytics challenge cited by survey respondents.
Additionally, we saw many IT organizations begin to look to new solutions such as Hadoop to address these challenges. In 2013, as big data tips toward the mainstream and becomes more interactive, we expect it to further accelerate the need for faster performance and greater scalability. Despite adding more complexity, we also expect to see a rise in hand coding as a means to complete integration tasks. Together, these trends will ultimately push organizations to redefine their data integration strategies and seek new approaches.
2013 Trend #1: Big data becomes the "new normal"
The past two years created high expectations for big data because new approaches offered a better way to tame and harness the rapidly growing volume of multi-structured data. However, many are still trying to figure out how to leverage big data in their environment. Although 2012 was primarily centered on experimentation with tools such as Hadoop and NoSQL databases, this year we expect to see an increased shift toward mainstream adoption where big data becomes the new normal and no longer being perceived as "big."
With this shift, organizations will become more aware of their data challenges and will need to understand how to separate all the noise from the value, ensuring that they have the information that they really need. One way organizations can take advantage of this new normal is to perform a cost-benefit analysis. Big data only makes sense for businesses as long as it is affordable. Although open source frameworks such as Hadoop can offer very low entry points, they can quickly become expensive because they require highly specialized skills. Therefore, it will be important to choose the right technologies to get the maximum efficiency from hardware infrastructure while increasing staff productivity by leveraging the skills already present across your business.
2013 Trend #2: Hand coding becomes more prevalent
This year, we saw a growing need for high-performance data integration as conventional data integration tools continued to struggle to keep pace with the increasing performance and scalability requirements of big data. Due to these underperforming tools, many organizations were forced to rely on constant manual tuning to achieve and sustain acceptable performance. As a result, organizations have reverted back to hand coding to get the job done. In 2013, we'll continue to see a spike in hand coding as organizations seek alternative approaches to meet performance and scalability requirements. As organizations retreat to manual coding, they will face development and maintenance challenges, especially as big data continues to raise the requirements bar.
Although Hadoop is often seen as an ideal solution, it can also spawn new manual coding as big data replaces conventional SQL code with Java code, Pig, and Hive for MapReduce. This aspect indicates a major challenge that nearly every organization working with Hadoop is facing -- Hadoop is not easy to implement. Developing MapReduce jobs and tuning sort operations requires very specific skills that are not only expensive but very difficult to find. For many organizations, it is a significant barrier to achieving scalability and ease of use, in turn driving requirements to accelerate performance and reduce the complexity of Hadoop deployments.
These goals can be achieved by using friendly graphical user interfaces that leverage existing IT skills and highly scalable, self-tuning engines to help reduce the complexities of designing for performance. By adopting this approach, organizations can easily leverage the Hadoop framework without the need to learn how to manually develop MapReduce jobs.
2013 Trend #3: Big data blurs batch and real-time processing
With the introduction of Apache Drill and Cloudera Impala, 2012 saw the union of big data and real-time processing, making the interactive analysis and ad hoc query of big data possible -- a previously unimaginable feat. With batch and real-time processing now going hand in hand, we expect to see big data make a definitive move toward the interactive market. This evolution of the decision support environment results from new requirements in terms of velocity, variety, and volume of data.
In the future, organizations will need near-real-time data at the finest level of detail -- with performance being the key to ensuring that its interactive capability is sustainable. For example, a leading independent mobile advertising organization depends on analyzing large volumes of data in near-real time to provide the best and most effective advertising space to its customers. Challenged by rapid data growth and exponential infrastructure costs, they migrated critical data integration processes from application servers to a high-performance ETL solution that enabled increased business agility and a faster, more scalable architecture. As a result, data integration processes that took an hour could now be completed in ten minutes. This approach also saved the company an estimated $125,000 in deferred hardware purchases due to more efficient processing.
This evolution of big data into the interactive space will prove to be extremely valuable in many existing cases, such as analyzing sensor data from smart grids in near-real time to alert utility companies about potential power outages before they happen, so preventive measures can be taken. More important, this will open the doors to new use cases for big data.
As big data becomes the new normal and data management techniques originally defined to tackle the "big data" challenge begin to assimilate across the business no matter what the size or type of data, a high-performance data integration strategy will be crucial to ensure faster time to insights that will drive business results.
Jorge A. Lopez is senior manager, data integration at Syncsort. You can contact the author at firstname.lastname@example.org.