Cloudera Streaming Analytics 1.15 for Cloudera 7.3.1

3 minute read

The Data In Motion team is pleased to announce the release of Cloudera Streaming Analytics 1.15 for Cloudera 7.3.1. This release focuses on enhancing the user experience and adding new important features to the product, and includes improvements to SQL Stream Builder as well as a rebase to Apache Flink 1.20.1.

See What’s New.

Release Highlights

Cloudera platform support
- Cloudera Streaming Analytics 1.15 is supported on Cloudera 7.3.1.100 (Cumulative Hot Fix 1). Ensure that you review the 7.3.1.100 Release Notes and Support Matrix to understand which operating system, database, and JDK versions are supported for Cloudera Streaming Analytics as well.
Rebase to Apache Flink 1.20
- Streaming analytics deployments, including SQL Stream Builder, now support Apache Flink 1.20. For more information on what is included in the Apache Flink 1.20 version, see the Apache Flink 1.20 Release Announcement and Release Notes.
Support for batch-mode queries in Cloudera SQL Stream Builder
- Users can select and use “batch” as the runtime mode for production execution mode jobs that are running in an isolated Flink cluster. For more information, see Executing SQL jobs in production mode.
OpenTelemetry Metrics Reporter (Technical Preview)
- Cloudera Streaming Analytics now includes, in Technical Preview, the OpenTelemetry Metrics Reporter to aggregate metrics to a third-party tool using open standards. To learn more about using the OpenTelemetry reporter with Flink, see the documentation for Flink and Cloudera SQL Stream Builder, as well as the Apache Flink Documentation.
Python for table transformations and webhook connector
- Support in Python UDFs for table transformations and the webhook connector have been added to Cloudera SQL Stream Builder and Flink. To learn more, see the Webhook Connector documentation

Please see the Release Notes for the complete list of fixes and improvements.

Getting to the New Release

To upgrade to Cloudera Streaming Analytics 1.15.0 on Cloudera on premises, check out this upgrade guide. If you are using Cloudera on cloud, refer to the Upgrade advisor documentation.

Use Cases

Event-Driven Applications: Stateful applications that ingest events from one or more event streams and react to incoming events by triggering computations, state updates, or external actions.

Apache Flink excels in handling the concept of time and state for these applications and can scale to manage very large data volumes (up to several terabytes) with exactly once consistency guarantees. Moreover, Apache Flink’s support for event-time, highly customizable window logic, and fine-grained control of time as provided by the ProcessFunction enable the implementation of advanced business logic. Moreover, Apache Flink features a library for Complex Event Processing (CEP) to detect patterns in data streams.

However, Apache Flink’s outstanding feature for event-driven applications is its support for savepoints. A savepoint is a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing.

Examples:
- Fraud detection
- Anomaly detection
- Rule-based alerting
- Business process monitoring
- Web application (social network)
Data Analytics Applications: With a sophisticated stream processing engine, analytics can also be performed in real-time. Streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are written to an external database or maintained as internal state. A dashboard application can read the latest results from the external database or directly query the internal state of the application.

Apache Flink supports streaming as well as batch analytical applications.

Examples:
- Quality monitoring of telco networks
- Analysis of product updates & experiment evaluation in mobile applications
- Ad-hoc analysis of live data in consumer technology
- Large-scale graph analysis
Data Pipeline Applications: Streaming data pipelines serve a similar purpose as Extract-transform-load (ETL) jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they can read records from sources that continuously produce data and move it with low latency to their destination.

Examples:
- Real-time search index building in e-commerce
- Continuous ETL in e-commerce