Data Flow

Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage. Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data. This enables rapid application delivery because developers can focus on app development, not infrastructure management.

OCI Data Flow demo (1:30)

Ronin and Oracle improve cancer care and deliver on an AI bill of rights

Discover how Ronin leveraged OCI Data Flow with Apache Spark to build a future where every clinical decision is rooted in data, personalized for a given individual, and rendered efficiently with confidence.

Get the tech details

Integrating and Preparing Data for Data Science

Watch the Oracle Developer Live Event and see how to utilize Data Integration and Data Flow to optimize how data used.

Watch the video (38:31)

Try an Oracle Cloud Data Flow workshop

Learn how how Data Flow makes running Spark applications easy, secure, and simple.

Access workshop

Data Flow features

Managed infrastructure

OCI Data Flow handles infrastructure provisioning, network setup, and teardown when Spark jobs are complete. Storage and security are also managed, which means less work is required for creating and managing Spark applications for big data analysis.

Easier cluster management

With OCI Data Flow, there are no clusters to install, patch, or upgrade, which saves time and operational costs for projects.

Simplified capacity planning

OCI Data Flow runs each Spark job in private dedicated resources, eliminating the need for upfront capacity planning.

Lower costs

With OCI Data Flow, IT only needs to pay for the infrastructure resources that Spark jobs use while they are running.

Advanced streaming support capabilities

Spark Streaming with zero management, automatic fault-tolerance, and automatic patching.

Enable continuous processing

With Spark Streaming support, you gain capabilities for continuous retrieval and continuous availability of processed data. OCI Data Flow handles the heavy lifting of stream processing with Spark, along with the ability to perform machine learning on streaming data using MLLib. OCI Data Flow supports Oracle Cloud Infrastructure (OCI) Object Storage and any Kafka-compatible streaming source, including Oracle Cloud Infrastructure (OCI) Streaming as data sources and sinks.

Automatic fault tolerance

Spark handles late-arriving data due to outages and can catch up backlogged data over time with watermarking—a Spark feature that maintains, stores, and then aggregates late data—without needing to manually restart the job. OCI Data Flow automatically restarts your application when possible and your application can simply continue from the last checkpoint.

Cloud native authentication

OCI Data Flow streaming applications can use cloud native authentication via resource principals so applications can run longer than 24 hours.

Cloud native security and governance

Leverage unmatched security from Oracle Cloud Infrastructure. Authentication, isolation, and all other critical points are addressed. Protect business-critical data with the highest levels of security.

Granular security

OCI Data Flow makes native use of Oracle Cloud's Identity and Access Management system for controlled data and access, so data stays secure.

Managed resources

Set quotas and limits to manage resources available to OCI Data Flow and control costs.

Try a free hands-on lab

Simplified operations

OCI Data Flow simplifies common operational tasks like log management and access to operational UIs, freeing up developer time to focus on building applications.

Increased visibility

OCI Data Flow makes it easy to see what Spark users are doing by aggregating operational information into a single, searchable UI.

Simple debugging and diagnostics

Tracking down logs and tools to troubleshoot a Spark job can take hours—but not with a consolidated view of log output, Spark history server, and more.

Avoid future costs

Sort, search, and filter to investigate historic applications to better address expensive jobs and avoid unnecessary expenditures.

Manage runaway Spark jobs

Administrators can easily discover and stop live Spark jobs that are running for too long or consuming too many resources and driving up costs.

Try a free hands-on lab

Simplified development

Big data ecosystems require many moving parts and integrations—but OCI Data Flow is compatible with existing Spark investments and big data services, making it easy to manage the service and deliver its results where they’re needed.

Compatible with existing applications

Migrate existing Spark applications from Hadoop or other big data services.

Secure output management

Automatically—and securely—capture and store Spark jobs' output, and then access them through the UI or REST APIs to bring make analytics available.

Control with REST APIs

All aspects of OCI Data Flow can be managed using simple REST APIs, from application creation to execution to accessing results of Spark jobs.

Try a free hands-on lab

Oracle Cloud Infrastructure Data Flow Reduces Cost by 75%

With Oracle Cloud Infrastructure Data Flow, we met client SLAs by reducing the time needed for data processing by 75% and by reducing the cost by more than 300%.
Arun Nimmala, Delivery Director Global Services Integration and Analytics Architecture, Oracle

OCI Data Flow key benefits

Accelerate workflows with NVIDIA RAPIDS

NVIDIA RAPIDS Accelerator for Apache Spark in OCI Data Flow is supported to help accelerate data science, machine learning, and AI workflows.

ETL offload

Data Flow manages ETL offload by overseeing Spark jobs, optimizing cost, and freeing up capacity.
Active archive

Data Flow's output management capabilities optimize the ability to query data using Spark.
Unpredictable workloads

Resources can be automatically shifted to handle unpredictable jobs and lower costs. A dashboard provides a view of usage and budget for future planning purposes.
Machine learning model training

Spark and machine learning developers can use Spark’s machine learning library and run models more efficiently using Data Flow.
Spark Streaming

Gain Spark Streaming support with zero management and automatic fault tolerance with end-to-end, exactly once guarantees, and automatic patching.

Read about some of the above use cases

Resources

Related cloud products

Oracle Cloud Infrastructure Data Science

End-to-end machine learning

See product details

Oracle Cloud Infrastructure Data Catalog

Self-service data discovery

See product details

Oracle Autonomous Data Warehouse

Cloud data warehouse service

See product details

Oracle Cloud Infrastructure Object Storage

Build your data lake

See product details

Get started with OCI Data Flow

Signup for free trial

Get Training

Learn about Oracle Cloud Infrastructure Data Flow.

Watch the video (10:25)

Hands-on lab

Experience the live product hands-on for free.

Start the lab

Contact sales

Talk to a team member about Oracle Cloud Infrastructure Data Flow.

Get in touch

Data Flow

Data Flow features

Managed infrastructure

Easier cluster management

Simplified capacity planning

Lower costs

Advanced streaming support capabilities

Enable continuous processing

Automatic fault tolerance

Cloud native authentication

Cloud native security and governance

Granular security

Managed resources

Simplified operations

Increased visibility

Simple debugging and diagnostics

Avoid future costs

Manage runaway Spark jobs

Simplified development

Compatible with existing applications

Secure output management

Control with REST APIs

Oracle Cloud Infrastructure Data Flow Reduces Cost by 75%

OCI Data Flow key benefits

Accelerate workflows with NVIDIA RAPIDS

ETL offload

Active archive

Unpredictable workloads

Machine learning model training

Spark Streaming

Resources

Free tutorials

Free workshop

Get started with Spark Streaming

Data flow samples

Related cloud products

Oracle Cloud Infrastructure Data Science

End-to-end machine learning

Oracle Cloud Infrastructure Data Catalog

Self-service data discovery

Oracle Autonomous Data Warehouse

Cloud data warehouse service

Oracle Cloud Infrastructure Object Storage

Build your data lake

Get started with OCI Data Flow

Signup for free trial

Get Training

Hands-on lab

Contact sales