- SQL stream builder integrates with the Shared Data Experience (SDX) framework.
- Cloudera developed the SDX framework to enforce governance and security policies across CDF.
- The data analysts can now query the streaming data without learning about the programming languages.
Eventador offers integration with Cloudera DataFlow (CDF) streaming platform. It offers a common framework to process streaming data using open source Apache Flink, Kafka Streams, or Spark Structured Streaming engines. Initially, the only way to query the data was by using programming tools based on Java for Scala. Nowadays, the data analyst can query CDF data without the understanding of the code syntax, said Dinesh Chandrasekhar, head of product marketing for Cloudera.
The SQL stream builder enables the analysts to develop views of query results. It can be exposed to other applications via REST application programming interfaces (APIs). It also offers integration with the Shared Data Experience (SDX) framework. Cloudera developed it to enforce governance and security policies across CDF.
Requirement of real-time data streams:
Despite the increase of a wide range of programming languages employed for data analysis, the dominant language for data query in the enterprise remains SQL. However, as the requirement for data streams query the real-time becomes larger, the organizations want an ability to extend SQL to potentially identify eye anomalies and problems in the processes to indicate potential fraud, according to Chandrasekhar.
There is an increasing requirement to query streaming data which is driven by digital business transformation initiatives. These initiatives process and analyze data in real-time through platforms like Spark and Kafka. Analysts will be required to launch an ad hoc query against the data to resolve a pressing issue before the data is stored in a relational database. “Data has a shelf life,” says Chandrasekhar.
Rather than finding a developer to write the query in Java for another programming language, it is possible for an analyst to immediately launch SQL query by themselves. Initially, that query might not have been launched. Because it would take a lot of time and effort to find a developer to write the code. In general, more data than ever is processed and analyzed at both the points of creation and consumption and the point where it moves between applications in real-time. Cloudera claims that much of the data will land in a data warehouse according to the open-source distribution of Hadoop. But in the last few years SQL compatible data lakes based and proprietary platforms are managed by cloud service providers. This data has been gaining traction at the expense of platform providers based on Hadoop.
More SQL compatible tools:
Cloudera is adding one more SQL-compatible tool to the portfolio. This makes it easy to query data residing in Hadoop and other frameworks like Apache Spark. These are typically deployed on the top of Hadoop. It is not yet clear to what degree will these capabilities enable the cloud to counter the recent success quotient of its competitors. Yet it is a provider of a data warehouse platform based on open-source software Cloudera appeals to IT organizations who have decided to avoid using proprietary software as far as possible.
Regardless of the tool employed for meta-analysis, there is more of it than ever been generated faster. The degree to which humans will analyze the data is generated in real-time remains to be seen. Many of the digital processors that organizations try to analyze occur in milliseconds. This is too quick for a human being to catch up with without some form of help from artificial intelligence.
There is a lot of data residing in streaming platforms that can be subject to a data query. The challenge is knowing how to structure those SQL queries and when to launch them.