Data Plane Offering Marks New Phase for Hortonworks
The new Data Plane Service (DPS) that Hortonworks unveiled today at the Strata Data Conference marks the start of a new phase for the publicly traded company, according to Arun Murthy, chief product officer and co-founder of the company.
If Hortonworks’s Distribution of Apache Hadoop (HDP) was the company’s first stage and its Apache NiFi-based stream processing system, dubbed Hortonworks Data Flow (HDF), was the second stage, then the launch of DPS marks a new stage, according to Murthy.
“It’s the third leg of the stool, but it’s a service rather than a product,” he tells Datanami. “You go to our service, then you point our service to your data assets and workloads, and we can start to manage it.”
The new Web-based DPS offering, ostensibly, provides two main capabilities. First, it lets customers manage the security and governance of data wherever it might be. That could be in a Hadoop cluster, a stream processing system, or an enterprise data warehouse (EDW) sitting on premise, in the public cloud, or a hybrid mixture of both.
Any product that integrates with the APIs for Apache Ranger and Apache Atlas can be managed via DPS. “Anything that can talk to Atlas and Ranger, we can go access and service and configure,” Murthy says. “As long as they can leverage open source standards for security and governance or metadata management — Atlas, Ranger, and so on — we can access that and give you really great services on top of them.”
Secondly, DPS allows users to dynamically spin up cloud or on-premise clusters to process the data that’s managed via the Ranger and Atlas API hooks. The DPS offering includes built-in capabilities for common tasks, such as securely uploading data from on-premise to the cloud or moving data from one cloud to another. DPS gets its workload deployment capabilities via hooks into the Cloudbreak offering that Hortonworks obtained with its acquisition of SequenceIQ two years ago.
“Obviously it ties HDF and HDP together, but it’s certainly our intent to go beyond HDP and HDF,” Murthy says. “Data Plane is a cloud based services that delivers an extensible platform to reliably manage data and workloads…spin up clusters, and apply consistent security and governance policies across streaming data, tabular structured data, unstructured data and so on, regardless of where the data resides.”
DPS is not a product that you install, Murthy says, but rather a Web-based app store where customers can select different data management and processing items from a la carte menu. “We want this to be extensible to more than just Hortonworks products,” he says. “We want you to be able to plug in multiple sources of product – it may be data in S3 or data in Azure or data in your enterprise data warehouse.”
While it hasn’t had a data plane product, per se, until now, the company has been working on data plane concepts for the past 24 months, Murthy says. At the vendor’s Hadoop Summit in June 2016, Hortonworks executives laid out a clear vision for a federated data plane that allows customers to manage data and workloads wherever they might reside — in the data center to the edge of IoT — with Atlas, Ranger, and Ambari providing the integration points.
This federated data plane idea goes somewhat against the core concept in Hadoop, which is that data should be centralized in one giant repository, and various applications are then brought to it for processing (i.e. “bring the compute to the data” rather than vice versa). Instead of physically storing all your data in one place, as many Hadoop vendors have recommended, with a federated data plane, you leave your various datasets where they want to sit and then connect them through logical views. The management layer on top of that is often called a data fabric.
“So instead of having one data lake to rule them all, it seems like enterprise need to manage multiple ponds and lakes and oceans of data,” Murthy says. “I love how Forrester talks about this notion of a data fabric, and if you like, the data plane is an instantiation of that concept.”
Forrester analyst Noel Yuhanna has been instrumental in defining data fabrics for the industry. Earlier this year, Yuhanna wrote a report on data fabrics that describe how they essentially combine a disparate collection of technologies to address key pain points in big data projects — such as data access, discovery, transformation, integration, security, governance, lineage, and orchestration — in a cohesive and self-service manner.
“The solution must be able to process and curate large amounts of structured, semi-structured, and unstructured data stored in big data platforms such as Apache Hadoop, MPP EDWs, NoSQL, Apache Spark, in-memory technologies, and other related commercial and open source platforms, including Apache projects,” Yuhanna wrote. “In addition, it must leverage big data technologies such as Spark, Hadoop, and in-memory as a compute and storage layer to assist the big data fabric with aggregation, transformation, and curation processing.”
For Hortonworks, the DPS service represents the start of a new service that gives customers the capability to manage data wherever it might be. The DPS service will expose “pluggable interfaces” that let Hortonworks and its ecosystem of partners offer a variety of data management services.
“What we’re hearing from enterprises is, we’ve been growing a lot of tooling and a lot of key technology,” Murthy says. “But I think what’s missing in the market is almost a fabric, if you will, over all this.”