DP-203: Microsoft Data Engineering on Microsoft Azure

Course Overview

In this course, the student will learn about data engineering as it pertains to working with batch and real-time analytical solutions using Azure data platform technologies. Students will begin by understanding the core compute and storage technologies that are used to build an analytical solution. The students will learn how to interactively explore data stored in files in a data lake. They will learn the various ingestion techniques that can be used to load data using the Apache Spark capability found in Azure Synapse Analytics or Azure Databricks, or how to ingest using Azure Data Factory or Azure Synapse pipelines. The students will also learn the various ways they can transform the data using the same technologies that is used to ingest data. They will understand the importance of implementing security to ensure that the data is protected at rest or in transit. The student will then show how to create a real-time analytical system to create real-time analytical solutions.

Key Learning Areas

Introduction to Azure Synapse Analytics
Explore Azure Databricks
Introduction to Azure Data Lake storage
Get started with Azure Stream Analytics
Use Azure Synapse serverless SQL pool to query files in a data lake
Create a lake database in Azure Synapse Analytics
Secure data and manage users in Azure Synapse serverless SQL pools
Use Apache Spark in Azure Databricks
Use Delta Lake in Azure Databricks
Analyze data with Apache Spark in Azure Synapse Analytics
Integrate SQL and Apache Spark pools in Azure Synapse Analytics
Use data loading best practices in Azure Synapse Analytics
Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline
Integrate data with Azure Data Factory or Azure Synapse Pipeline
Perform code-free transformation at scale with Azure Data Factory or Azure Synapse Pipeline
Orchestrate data movement and transformation in Azure Data Factory or Azure Synapse Pipeline
Plan hybrid transactional and analytical processing using Azure Synapse Analytics
Implement Azure Synapse Link with Azure Cosmos DB
Secure a data warehouse in Azure Synapse Analytics
Configure and manage secrets in Azure Key Vault
Implement compliance controls for sensitive data
Enable reliable messaging for Big Data applications using Azure Event Hubs

Course Outline

Introduction to Azure Synapse Analytics

Identify the business problems that Azure Synapse Analytics addresses
Describe core capabilities of Azure Synapse Analytics
Determine when to use Azure Synapse Analytics

Explore Azure Databricks

Provision an Azure Databricks workspace
Identify core workloads and personas for Azure Databricks
Describe key concepts of an Azure Databricks solution

Introduction to Azure Data Lake Storage

Decide when you should use Azure Data Lake Storage Gen2
Create an Azure storage account by using the Azure portal
Compare Azure Data Lake Storage Gen2 and Azure Blob storage
Explore the stages for processing big data by using Azure Data Lake Store
List the supported open-source platforms

Get Started with Azure Stream Analytics

Understand data streams
Understand event processing
Get started with Azure Stream Analytics

Use Azure Synapse Serverless SQL Pool to Query Files in a Data Lake

Identify capabilities and use cases for serverless SQL pools in Azure Synapse Analytics
Query CSV, JSON, and Parquet files using a serverless SQL pool
Create external database objects in a serverless SQL pool

Create a Lake Database in Azure Synapse Analytics

Understand lake database concepts and components
Describe database templates in Azure Synapse Analytics
Create a lake database

Secure Data and Manage Users in Azure Synapse Serverless SQL Pools

Choose an authentication method in Azure Synapse serverless SQL pools
Manage users in Azure Synapse serverless SQL pools
Manage user permissions in Azure Synapse serverless SQL pools

Use Apache Spark in Azure Databricks

Describe key elements of the Apache Spark architecture
Create and configure a Spark cluster
Describe use cases for Spark
Use Spark to process and analyze data stored in files
Use Spark to visualize data

Use Delta Lake in Azure Databricks

Describe core features and capabilities of Delta Lake
Create and use Delta Lake tables in Azure Databricks
Create Spark catalog tables for Delta Lake data
Use Delta Lake tables for streaming data

Analyze Data with Apache Spark in Azure Synapse Analytics

Identify core features and capabilities of Apache Spark
Configure a Spark pool in Azure Synapse Analytics
Run code to load, analyze, and visualize data in a Spark notebook

Integrate SQL and Apache Spark Pools in Azure Synapse Analytics

Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
Understand the use-cases for SQL and Spark Pools integration
Authenticate in Azure Synapse Analytics
Transfer data between SQL and Spark Pool in Azure Synapse Analytics
Authenticate between Spark and SQL Pool in Azure Synapse Analytics
Integrate SQL and Spark Pools in Azure Synapse Analytics
Externalize the use of Spark Pools within Azure Synapse workspace
Transfer data outside the Synapse workspace using SQL Authentication
Transfer data outside the Synapse workspace using the PySpark Connector
Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics

Use Data Loading Best Practices in Azure Synapse Analytics

Understand data loading design goals
Explain loading methods into Azure Synapse Analytics
Manage source data files
Manage singleton updates
Set-up dedicated data loading accounts
Manage concurrent access to Azure Synapse Analytics
Implement Workload Management
Simplify ingestion with the Copy Activity

Petabyte-Scale Ingestion with Azure Data Factory or Azure Synapse Pipeline

Introduction
List the data factory ingestion methods
Describe data factory connectors
Exercise: Use the data factory copy activity
Exercise: Manage the self-hosted integration runtime
Exercise: Setup the Azure integration runtime
Understand data ingestion security considerations
Knowledge check

Integrate Data with Azure Data Factory or Azure Synapse Pipeline

Understand Azure Data Factory
Describe data integration patterns
Explain the data factory process
Understand Azure Data Factory components
Azure Data Factory security
Set up Azure Data Factory
Create linked services
Create datasets
Create data factory activities and pipelines
Manage integration runtime

Perform Code-Free Transformation at Scale with Azure Data Factory or Azure Synapse Pipeline

Introduction
Explain Data Factory transformation methods
Describe Data Factory transformation types
Use Data Factory mapping data flow
Debug mapping data flow
Use Data Factory wrangling data
Use compute transformations within Data Factory
Integrate SQL Server Integration Services packages within Data Factory
Knowledge check

Orchestrate Data Movement and Transformation in Azure Data Factory or Azure Synapse Pipeline

Introduction
Understand data factory control flow
Work with data factory pipelines
Debug data factory pipelines
Add parameters to data factory components
Integrate a Notebook within Azure Synapse Pipelines
Execute data factory packages
Knowledge check

Plan Hybrid Transactional and Analytical Processing using Azure Synapse Analytics

Describe Hybrid Transactional / Analytical Processing patterns
Identify Azure Synapse Link services for HTAP

Implement Azure Synapse Link with Azure Cosmos DB

Configure an Azure Cosmos DB Account to use Azure Synapse Link
Create an analytical store enabled container
Create a linked service for Azure Cosmos DB
Analyze linked data using Spark
Analyze linked data using Synapse SQL

Secure a Data Warehouse in Azure Synapse Analytics

Understand network security options for Azure Synapse Analytics
Configure Conditional Access
Configure Authentication
Manage authorization through column and row level security
Manage sensitive data with Dynamic Data masking
Implement encryption in Azure Synapse Analytics

Configure and Manage Secrets in Azure Key Vault

Explore proper usage of Azure Key Vault
Manage access to an Azure Key Vault
Explore certificate management with Azure Key Vault
Configure a Hardware Security Module Key-generation solution

Implement Compliance Controls for Sensitive Data

Plan and implement data classification in Azure SQL Database
Understand and configure row-level security and dynamic data masking
Understand the usage of Microsoft Defender for SQL
Explore how Azure SQL Database Ledger works

Enable Reliable Messaging for Big Data Applications using Azure Event Hubs

Create an event hub using the Azure CLI
Configure applications to send or receive messages through the event hub
Evaluate performance of event hub using the Azure portal

Who Benefits

The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytical solutions using data platform technologies that exist on Microsoft Azure. The secondary audience for this course data analysts and data scientists who work with analytical solutions built on Microsoft Azure.

Prerequisites

Participants taking part in Data Engineering on Microsoft Azure DP-203 training should have knowledge of cloud computing and data concepts.

It would be beneficial if the participant has completed the below-mentioned courses that include:

AZ-900 Microsoft Azure Fundamentals
DP-900 Microsoft Azure Data Fundamentals