DP-203: Data Engineering on Microsoft Azure Certification Dump Questions Answers Examples

DP-203: Data Engineering on Microsoft Azure

19%

Question 81

You plan to create an Azure Data Factory pipeline that will include a mapping data flow.
You have JSON data containing objects that have nested arrays.
You need to transform the JSON-formatted data into a tabular dataset. The dataset must have one row for each item in the arrays.

Which transformation method should you use in the mapping data flow?

new branch

unpivot

alter row

flatten

Answer is flatten

Use the flatten transformation to take array values inside hierarchical structures such as JSON and unroll them into individual rows. This process is known as denormalization.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-flatten

Question 82

You have an Azure Data Factory pipeline that is triggered hourly.
The pipeline has had 100% success for the past seven days.
The pipeline execution fails, and two retries that occur 15 minutes apart also fail. The third failure returns the following error.

ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'.
Account: 'contosoproduksouth'. Filesystem: wwi.
Path: 'BIKES/CARBON/year=2021/month=01/day=10/hour=06'.
ErrorCode: 'PathNotFound'.
Message: 'The specified path does not exist.'.
RequestId: '6d269b78-901f-001b-4924-e7a7bc000000'.
TimeStamp: 'Sun, 10 Jan 2021 07:45:05

What is a possible cause of the error?

The parameter used to generate year=2021/month=01/day=10/hour=06 was incorrect.

From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.

From 06:00 to 07:00 on January 10, 2021, the file format of data in wwi/BIKES/CARBON was incorrect.

The pipeline was triggered too early.

Answer is From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.

The error message says a missing file, which matches with answer B: missing data from 06:00. The process had re-tried three times, 15 mins apart, which explains that the error was generated 07:45.

Question 83

You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.

What should you use?

the Activity log blade for the Data Factory resource

Pipeline runs in the Azure Data Factory user experience

the Resource health blade for the Data Factory resource

Azure Data Factory activity runs in Azure Monitor

Answer is Azure Data Factory activity runs in Azure Monitor

Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor

Question 84

Which of the following terms refer to the scale of compute that is being used in an Azure SQL Synapse Analytics server?

RTU

DWU

DTU

Answer is DWU

DWU refers to Data Warehouse Units. It is the measure of compute scale that is assigned to an Azure SL Data Warehouse. RTU is a compute scale unit of Cosmos DB. DTU is a compute scale unit of Azure SQL Database.

Question 85

You have an Azure Synapse Analytics database, within this, you have a dimension table named Stores that contains store information. There is a total of 263 stores nationwide. Store information is retrieved in more than half of the queries that are issued against this database. These queries include staff information per store, sales information per store and finance information. You want to improve the query performance of these queries by configuring the table geometry of the stores table. Which is the appropriate table geometry to select for the stores table?

Round Robin

Non Clustered

Replicated table

Answer is Replicated table

A replicated table is an appropriate table geometry choice as the size of the data in the table is less than 200m and the table will be replicated to every distribution node of an Azure Synapse Analytics to improve the performance. A Round Robin distribution is a table geometry that is useful to perform initial data loads. Non Clustered is not a valid table geometry in Azure Synapse Analytics.

Question 86

What is the default port for connecting to an enterprise data warehouse in Azure Synapse Analytics?

TCP port 1344

UDP port 1433

TCP port 1433

Answer is TCP port 1433

The default port for connecting to an Azure Synapse Analytics is TCP port 1433.

Question 87

You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity?

Azure - SSIS

Azure IR

Self-hosted

Pipelines

Answer is Azure IR

When moving data between Azure data platform technologies, the Azure Integration runtime is used when copying data between two Azure data platforms. Azure-SSIS IR is used when you lift and shift existing SSIS workload, while Self-hosted IR is used when working with data movement from private networks to the cloud and vica versa

Question 88

How long is the Recovery Point Objective for Azure Synapse Analytics?

4 hours

8 hours

12 hours

16 hours

Answer is 8 hours.

Azure Synapse Analytics has a Recovery Point Objective of 8 hours.

Question 89

You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named Server1.
You need to verify whether the size of the transaction log file for each distribution of DW1 is smaller than 160 GB.

What should you do?

On the master database, execute a query against the sys.dm_pdw_nodes_os_performance_counters dynamic management view.

From Azure Monitor in the Azure portal, execute a query against the logs of DW1.

On DW1, execute a query against the sys.database_files dynamic management view.

Execute a query against the logs of DW1 by using the Get-AzOperationalInsightSearchResult PowerShell cmdlet.

Answer is On the master database, execute a query against the sys.dm_pdw_nodes_os_performance_counters dynamic management view.

The following query returns the transaction log size on each distribution. If one of the log files is reaching 160 GB, you should consider scaling up your instance or limiting your transaction size.

-- Transaction log size
SELECT
instance_name as distribution_db, cntr_value*1.0/1048576 as log_file_size_used_GB, pdw_node_id
FROM sys.dm_pdw_nodes_os_performance_counters
WHERE
instance_name like 'Distribution_%'
AND counter_name = 'Log File(s) Used Size (KB)'

References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-manage-monitor

Question 90

You have an enterprise data warehouse in Azure Synapse Analytics.
You need to monitor the data warehouse to identify whether you must scale up to a higher service level to accommodate the current workloads.

Which is the best metric to monitor?

More than one answer choice may achieve the goal. Select the BEST answer.

CPU percentage

DWU used

DWU percentage

Data IO percentage

Answer is DWU used

DWU used, defined as DWU limit * DWU percentage, represents only a high-level representation of usage across the SQL pool and is not meant to be a comprehensive indicator of utilization. To determine whether to scale up or down, consider all factors which can be impacted by DWU such as concurrency, memory, tempdb, and adaptive cache capacity. We recommend running your workload at different DWU settings to determine what works best to meet your business objectives.

Reference:
https://docs.microsoft.com/bs-latn-ba/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-concept-resource-utilization-query-activity

< Previous Page Next Page >

DP-203: Data Engineering on Microsoft Azure

451 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam