DP-203: Data Engineering on Microsoft Azure

18%

Question 81

HOTSPOT -
You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:




Box 1: succeed

Box 2: failed
Example:
Now let's say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as the previous scenario.

Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.

Reference:
https://datasavvy.me/2021/02/18/azure-data-factory-activity-failures-and-pipeline-outcomes/

Question 82

You have several Azure Data Factory pipelines that contain a mix of the following types of activities:
● Wrangling data flow
● Notebook
● Copy
● Jar
Which two Azure services should you use to debug the activities? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point
Azure Synapse Analytics
Azure HDInsight
Azure Machine Learning
Azure Data Factory
Azure Databricks




1. Data wangling is only supported by ADF not Synapse Analytics.

2. Notebook, Jar activity requires Databricks.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/wrangling-overview
https://docs.microsoft.com/en-us/azure/data-factory/transform-data-databricks-jar

Question 83

You plan to create an Azure Data Factory pipeline that will include a mapping data flow.
You have JSON data containing objects that have nested arrays.
You need to transform the JSON-formatted data into a tabular dataset. The dataset must have one row for each item in the arrays.

Which transformation method should you use in the mapping data flow?
new branch
unpivot
alter row
flatten




Answer is flatten

Use the flatten transformation to take array values inside hierarchical structures such as JSON and unroll them into individual rows. This process is known as denormalization.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-flatten

Question 84

You have an Azure Data Factory pipeline that is triggered hourly.
The pipeline has had 100% success for the past seven days.
The pipeline execution fails, and two retries that occur 15 minutes apart also fail. The third failure returns the following error.

ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'.
Account: 'contosoproduksouth'. Filesystem: wwi.
Path: 'BIKES/CARBON/year=2021/month=01/day=10/hour=06'.
ErrorCode: 'PathNotFound'.
Message: 'The specified path does not exist.'.
RequestId: '6d269b78-901f-001b-4924-e7a7bc000000'.
TimeStamp: 'Sun, 10 Jan 2021 07:45:05

What is a possible cause of the error?
The parameter used to generate year=2021/month=01/day=10/hour=06 was incorrect.
From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.
From 06:00 to 07:00 on January 10, 2021, the file format of data in wwi/BIKES/CARBON was incorrect.
The pipeline was triggered too early.




Answer is From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.

The error message says a missing file, which matches with answer B: missing data from 06:00. The process had re-tried three times, 15 mins apart, which explains that the error was generated 07:45.

Question 85

You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.

What should you use?
the Activity log blade for the Data Factory resource
Pipeline runs in the Azure Data Factory user experience
the Resource health blade for the Data Factory resource
Azure Data Factory activity runs in Azure Monitor




Answer is Azure Data Factory activity runs in Azure Monitor

Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor

Question 86

Which of the following terms refer to the scale of compute that is being used in an Azure SQL Synapse Analytics server?
RTU
DWU
DTU




Answer is DWU

DWU refers to Data Warehouse Units. It is the measure of compute scale that is assigned to an Azure SL Data Warehouse. RTU is a compute scale unit of Cosmos DB. DTU is a compute scale unit of Azure SQL Database.

Question 87

You have an Azure Synapse Analytics database, within this, you have a dimension table named Stores that contains store information. There is a total of 263 stores nationwide. Store information is retrieved in more than half of the queries that are issued against this database. These queries include staff information per store, sales information per store and finance information. You want to improve the query performance of these queries by configuring the table geometry of the stores table. Which is the appropriate table geometry to select for the stores table?
Round Robin
Non Clustered
Replicated table




Answer is Replicated table

A replicated table is an appropriate table geometry choice as the size of the data in the table is less than 200m and the table will be replicated to every distribution node of an Azure Synapse Analytics to improve the performance. A Round Robin distribution is a table geometry that is useful to perform initial data loads. Non Clustered is not a valid table geometry in Azure Synapse Analytics.

Question 88

What is the default port for connecting to an enterprise data warehouse in Azure Synapse Analytics?
TCP port 1344
UDP port 1433
TCP port 1433




Answer is TCP port 1433

The default port for connecting to an Azure Synapse Analytics is TCP port 1433.

Question 89

You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity?
Azure - SSIS
Azure IR
Self-hosted
Pipelines




Answer is Azure IR

When moving data between Azure data platform technologies, the Azure Integration runtime is used when copying data between two Azure data platforms. Azure-SSIS IR is used when you lift and shift existing SSIS workload, while Self-hosted IR is used when working with data movement from private networks to the cloud and vica versa

Question 90

Encrypted communication is turned on automatically when connecting to an Azure SQL Database or Azure Synapse Analytics. True or False?
True
False




Answer is True

True. Azure SQL Database enforces encryption (SSL/TLS) at all times for all conections.

< Previous PageNext Page >

Quick access to all questions in this exam