What is Azure Data Factory?

Cloud
November 05, 2024

What is Azure Data Factory?

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.

Azure Data Factory (ADF) does not store any data itself. It allows you to create data-driven workflows to orchestrate the movement of data between supported data stores and then process the data using compute services in other regions or in an on-premise environment. It also allows you to monitor and manage workflows using both programmatic and UI mechanisms.

Ø Azure Data Factory use cases

ADF can be used for:

· Supporting data migrations

· Getting data from a client’s server or online data to an Azure Data Lake

· Carrying out various data integration processes

· Integrating data from different ERP systems and loading it into Azure Synapse for reporting

Ø How does Azure Data Factory work?

The Data Factory service allows you to create data pipelines that move and transform data and then run the pipelines on a specified schedule (hourly, daily, weekly, etc.). This means the data that is consumed and produced by workflows is time-sliced data, and we can specify the pipeline mode as scheduled (once a day) or one time.

Azure Data Factory pipelines (data-driven workflows) typically perform three steps.

1: Connect and Collect

Connect to all the required sources of data and processing such as SaaS services, file shares, FTP, and web services. Then, move the data as needed to a centralized location for subsequent processing by using the Copy Activity in a data pipeline to move data from both on-premise and cloud source data stores to a centralization data store in the cloud for further analysis.

2: Transform and Enrich

Once data is present in a centralized data store in the cloud, it is transformed using compute services such as HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Machine Learning.

3: Publish

Deliver transformed data from the cloud to on-premise sources like SQL Server or keep it in your cloud storage sources for consumption by BI and analytics tools and other applications.

Ø Azure Data Factory pricing

With Data Factory, you pay only for what you need. In fact, pricing for data pipeline is calculated based on:

· Pipeline orchestration and execution;

· Data flow execution and debugging;

· Number of Data Factory operations such as create pipelines and pipeline monitoring.