Syncsort, a global leader in Big Data and mainframe software, today announced the results from its second annual Hadoop survey, showing that as more organizations are moving from Hadoop experimentation to production, realizing the full potential of big data analytics, there are three top areas they will focus on in 2016.
Syncsort polled over 250 respondents including data architects, IT managers, developers, business intelligence/data analysts and data scientists, with a majority (66 percent) coming from organizations with revenues over $100 million. Participants were from a broad range of industries including financial services, healthcare, government and retail.
Based on the survey results, there are three key trends Syncsort anticipates in 2016:
Apache Spark will move from a talking point into deployment. Nearly 70 percent of respondents are most interested in Apache Spark, surpassing interest in all other compute frameworks, including the recognized incumbent, MapReduce (55 percent). While Syncsort expects MapReduce will still be the prevalent compute framework in production, the high level of interest should translate into more Spark deployments, mostly running on Hadoop.
Offloading from expensive platforms into Hadoop will continue to increase in numbers and scope. 63 percent of respondents feel Hadoop will help them increase business/IT agility, 55 percent expect to increase operational efficiency and reduce costs, and over 51 percent want to leverage it to make more data available for business use across the entire organization. These findings are consistent with Syncsort customer use cases that should continue to gain steam in 2016, including Mainframe and Enterprise Data Warehouse (EDW) offload to Hadoop.
A growing number of companies will look to leverage Hadoop for advanced use cases. More than half of respondents see Hadoop as a way to innovate, using data from social media and IoT, and applying predictive analytics and visualization for greater insights about their business. Hadoop is yet to be leveraged for mobile apps and software, as only 4.9 percent reported utility for these use cases.
“As Hadoop adoption becomes mainstream, the number of applications in production increases and the use cases, frameworks and data sources become more varied and complex. Organizations realize significant benefits from Hadoop; however, they also cite challenges in keeping up with new tools and skills, connectivity and data movement, and unforeseen costs,” said Tendü Yoğurtçu, General Manager of Syncsort’s Big Data business. “A single software environment to access all enterprise data and manage the entire data pipeline will be critical for organizations to maximize the ROI on their Big Data projects, especially as the demand for real-time analytics in industries such as financial services, healthcare, telecommunications, and retail increases.”
Based on additional customer feedback, Syncsort predicts two additional trends in 2016:
More organizations will leverage streaming, real-time data sources. The best business decisions often require the most recent data available. Popular use cases include fraud detection, analytics on telemetry and security data, insurance claim validation, and the IoT.
Data governance and security will be major areas of focus as organizations move to production deployments. More organizations will move towards adopting a “Hadoop first” approach to data management – skipping traditional and more expensive platforms and applying metadata, lineage, security, and other data management measures on Hadoop from the start.
“The ability to combine real-time data sources with batch data will create even more insights for businesses, and predictive analytics will play a critical role in this,” continued Yoğurtçu. “The ability to transform and prepare data in flight will be more important, eliminating the need for staging increasing volumes of data. Though challenging, this will also create an opportunity to deliver next generation data integration products, future proofing user’s applications while taking advantage of highly scalable and distributed platforms like Apache Hadoop and Spark, on-premise or in the cloud.”