Course Overview

In this course, the student will learn about data engineering as it pertains to working with batch and real-time analytical solutions using Azure data platform technologies. Students will begin by understanding the core compute and storage technologies that are used to build an analytical solution. The students will learn how to interactively explore data stored in files in a data lake. They will learn the various ingestion techniques that can be used to load data using the Apache Spark capability found in Azure Synapse Analytics or Azure Databricks, or how to ingest using Azure Data Factory or Azure Synapse pipelines. The students will also learn the various ways they can transform the data using the same technologies that is used to ingest data. They will understand the importance of implementing security to ensure that the data is protected at rest or in transit. The student will then show how to create a real-time analytical system to create real-time analytical solutions.

Key Learning Areas

  • Introduction to Azure Synapse Analytics
  • Explore Azure Databricks
  • Introduction to Azure Data Lake storage
  • Get started with Azure Stream Analytics
  • Use Azure Synapse serverless SQL pool to query files in a data lake
  • Create a lake database in Azure Synapse Analytics
  • Secure data and manage users in Azure Synapse serverless SQL pools
  • Use Apache Spark in Azure Databricks
  • Use Delta Lake in Azure Databricks
  • Analyze data with Apache Spark in Azure Synapse Analytics
  • Integrate SQL and Apache Spark pools in Azure Synapse Analytics
  • Use data loading best practices in Azure Synapse Analytics
  • Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline
  • Integrate data with Azure Data Factory or Azure Synapse Pipeline
  • Perform code-free transformation at scale with Azure Data Factory or Azure Synapse Pipeline
  • Orchestrate data movement and transformation in Azure Data Factory or Azure Synapse Pipeline
  • Plan hybrid transactional and analytical processing using Azure Synapse Analytics
  • Implement Azure Synapse Link with Azure Cosmos DB
  • Secure a data warehouse in Azure Synapse Analytics
  • Configure and manage secrets in Azure Key Vault
  • Implement compliance controls for sensitive data
  • Enable reliable messaging for Big Data applications using Azure Event Hubs

Course Outline

Introduction to Azure Synapse Analytics

  • Identify the business problems that Azure Synapse Analytics addresses
  • Describe core capabilities of Azure Synapse Analytics
  • Determine when to use Azure Synapse Analytics

Explore Azure Databricks

  • Provision an Azure Databricks workspace
  • Identify core workloads and personas for Azure Databricks
  • Describe key concepts of an Azure Databricks solution

Introduction to Azure Data Lake Storage

  • Decide when you should use Azure Data Lake Storage Gen2
  • Create an Azure storage account by using the Azure portal
  • Compare Azure Data Lake Storage Gen2 and Azure Blob storage
  • Explore the stages for processing big data by using Azure Data Lake Store
  • List the supported open-source platforms

Get Started with Azure Stream Analytics

  • Understand data streams
  • Understand event processing
  • Get started with Azure Stream Analytics

Use Azure Synapse Serverless SQL Pool to Query Files in a Data Lake

  • Identify capabilities and use cases for serverless SQL pools in Azure Synapse Analytics
  • Query CSV, JSON, and Parquet files using a serverless SQL pool
  • Create external database objects in a serverless SQL pool

Create a Lake Database in Azure Synapse Analytics

  • Understand lake database concepts and components
  • Describe database templates in Azure Synapse Analytics
  • Create a lake database

Secure Data and Manage Users in Azure Synapse Serverless SQL Pools

  • Choose an authentication method in Azure Synapse serverless SQL pools
  • Manage users in Azure Synapse serverless SQL pools
  • Manage user permissions in Azure Synapse serverless SQL pools

Use Apache Spark in Azure Databricks

  • Describe key elements of the Apache Spark architecture
  • Create and configure a Spark cluster
  • Describe use cases for Spark
  • Use Spark to process and analyze data stored in files
  • Use Spark to visualize data

Use Delta Lake in Azure Databricks

  • Describe core features and capabilities of Delta Lake
  • Create and use Delta Lake tables in Azure Databricks
  • Create Spark catalog tables for Delta Lake data
  • Use Delta Lake tables for streaming data

Analyze Data with Apache Spark in Azure Synapse Analytics

  • Identify core features and capabilities of Apache Spark
  • Configure a Spark pool in Azure Synapse Analytics
  • Run code to load, analyze, and visualize data in a Spark notebook

Integrate SQL and Apache Spark Pools in Azure Synapse Analytics

  • Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
  • Understand the use-cases for SQL and Spark Pools integration
  • Authenticate in Azure Synapse Analytics
  • Transfer data between SQL and Spark Pool in Azure Synapse Analytics
  • Authenticate between Spark and SQL Pool in Azure Synapse Analytics
  • Integrate SQL and Spark Pools in Azure Synapse Analytics
  • Externalize the use of Spark Pools within Azure Synapse workspace
  • Transfer data outside the Synapse workspace using SQL Authentication
  • Transfer data outside the Synapse workspace using the PySpark Connector
  • Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics

Use Data Loading Best Practices in Azure Synapse Analytics

  • Understand data loading design goals
  • Explain loading methods into Azure Synapse Analytics
  • Manage source data files
  • Manage singleton updates
  • Set-up dedicated data loading accounts
  • Manage concurrent access to Azure Synapse Analytics
  • Implement Workload Management
  • Simplify ingestion with the Copy Activity

Petabyte-Scale Ingestion with Azure Data Factory or Azure Synapse Pipeline

  • Introduction
  • List the data factory ingestion methods
  • Describe data factory connectors
  • Exercise: Use the data factory copy activity
  • Exercise: Manage the self-hosted integration runtime
  • Exercise: Setup the Azure integration runtime
  • Understand data ingestion security considerations
  • Knowledge check

Integrate Data with Azure Data Factory or Azure Synapse Pipeline

  • Understand Azure Data Factory
  • Describe data integration patterns
  • Explain the data factory process
  • Understand Azure Data Factory components
  • Azure Data Factory security
  • Set up Azure Data Factory
  • Create linked services
  • Create datasets
  • Create data factory activities and pipelines
  • Manage integration runtime

Perform Code-Free Transformation at Scale with Azure Data Factory or Azure Synapse Pipeline

  • Introduction
  • Explain Data Factory transformation methods
  • Describe Data Factory transformation types
  • Use Data Factory mapping data flow
  • Debug mapping data flow
  • Use Data Factory wrangling data
  • Use compute transformations within Data Factory
  • Integrate SQL Server Integration Services packages within Data Factory
  • Knowledge check

Orchestrate Data Movement and Transformation in Azure Data Factory or Azure Synapse Pipeline

  • Introduction
  • Understand data factory control flow
  • Work with data factory pipelines
  • Debug data factory pipelines
  • Add parameters to data factory components
  • Integrate a Notebook within Azure Synapse Pipelines
  • Execute data factory packages
  • Knowledge check

Plan Hybrid Transactional and Analytical Processing using Azure Synapse Analytics

  • Describe Hybrid Transactional / Analytical Processing patterns
  • Identify Azure Synapse Link services for HTAP

Implement Azure Synapse Link with Azure Cosmos DB

  • Configure an Azure Cosmos DB Account to use Azure Synapse Link
  • Create an analytical store enabled container
  • Create a linked service for Azure Cosmos DB
  • Analyze linked data using Spark
  • Analyze linked data using Synapse SQL

Secure a Data Warehouse in Azure Synapse Analytics

  • Understand network security options for Azure Synapse Analytics
  • Configure Conditional Access
  • Configure Authentication
  • Manage authorization through column and row level security
  • Manage sensitive data with Dynamic Data masking
  • Implement encryption in Azure Synapse Analytics

Configure and Manage Secrets in Azure Key Vault

  • Explore proper usage of Azure Key Vault
  • Manage access to an Azure Key Vault
  • Explore certificate management with Azure Key Vault
  • Configure a Hardware Security Module Key-generation solution

Implement Compliance Controls for Sensitive Data

  • Plan and implement data classification in Azure SQL Database
  • Understand and configure row-level security and dynamic data masking
  • Understand the usage of Microsoft Defender for SQL
  • Explore how Azure SQL Database Ledger works

Enable Reliable Messaging for Big Data Applications using Azure Event Hubs

  • Create an event hub using the Azure CLI
  • Configure applications to send or receive messages through the event hub
  • Evaluate performance of event hub using the Azure portal

Who Benefits

The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytical solutions using data platform technologies that exist on Microsoft Azure. The secondary audience for this course data analysts and data scientists who work with analytical solutions built on Microsoft Azure.

Prerequisites

Participants taking part in Data Engineering on Microsoft Azure DP-203 training should have knowledge of cloud computing and data concepts.

It would be beneficial if the participant has completed the below-mentioned courses that include:

  • AZ-900 Microsoft Azure Fundamentals
  • DP-900 Microsoft Azure Data Fundamentals