DP-203: Data Engineering on Microsoft Azure

76%

Question 351

You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.

Solution: You use a tumbling window, and you set the window size to 10 seconds.

Does this meet the goal?
Yes
No




Answer is Yes

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a series of events and how they are mapped into 10-second tumbling windows.


Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

Question 352

You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.

Solution: You use a session window that uses a timeout size of 10 seconds.

Does this meet the goal?
Yes
No




Answer is No

Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.

Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

Question 353

You are designing an Azure Stream Analytics solution that will analyze Twitter data.
You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.

Solution: You use a hopping window that uses a hop size of 5 seconds and a window size 10 seconds.

Does this meet the goal?
Yes
No




Answer is No

Instead use a tumbling window. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.

Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

Question 354

You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.

Does this meet the goal?
Yes
No




Answer is No

R support within Synapse Spark is currently not available. You cannot execute the R script using a stored procedure activity.

Question 355

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
● A workload for data engineers who will use Python and SQL.
● A workload for jobs that will run notebooks that use Python, Scala, and SQL.
● A workload that data scientists will use to perform ad hoc analysis in Scala and R.
The enterprise architecture team at your company identifies the following standards for Databricks environments:
● The data engineers must share a cluster.
● The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
● All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.
You need to create the Databricks clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a Standard cluster for the data engineers, and a High Concurrency cluster for the jobs.

Does this meet the goal?
Yes
No




Answer is No

A workload for data engineers who will use Python and SQL. --> high concurrency
A workload for jobs that will run notebooks that use Python, Scala, and SQL. --> standard
A workload that data scientists will use to perform ad hoc analysis in Scala and R. --> standard because high concurrency does not support Scala

Reference:
https://stackoverflow.com/questions/65869399/high-concurrency-clusters-in-databricks

Question 356

You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.

Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.
Does this meet the goal?
Yes
No




Answer is No

R script cannot be run in Azure Synapse Analytics. sp_execute_external_script can only be applied to SQL Server 2016 (13.x) and later, Azure SQL Managed Instance.

Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/r-developers-guide

Question 357

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.
You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer of
Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.

Solution: In an Azure Synapse Analytics pipeline, you use a data flow that contains a Derived Column transformation.
Does this meet the goal?
Yes
No




Answer is Yes

.Data flows are available both in Azure Data Factory and Azure Synapse Pipelines. Use the derived column transformation to generate new columns in your data flow or to modify existing fields.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column

Question 358

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.
You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer of
Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.

Solution: You use an Azure Synapse Analytics serverless SQL pool to create an external table that has an additional DateTime column.
Does this meet the goal?
Yes
No




Answer is No

Instead use the derived column transformation to generate new columns in your data flow or to modify existing fields.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column

Question 359

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.
You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.

Solution: In an Azure Synapse Analytics pipeline, you use a Get Metadata activity that retrieves the DateTime of the files.

Does this meet the goal?
Yes
No




Answer is No

You need to pass the "Get metadata"'s output as a parameter to the ingest process, processing each file inside a "for" loop, for example.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

Question 360

You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.

Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes an Azure Databricks notebook, and then inserts the data into the data warehouse.

Does this meet the goal?
Yes
No




Answer is Yes

You can execute R code in a notebook, and then call it from Data Factory.
You can check it at "Databricks Notebook activity" header.

Reference:
https://docs.microsoft.com/en-US/azure/data-factory/transform-data

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/sparkr/overview

< Previous PageNext Page >

Quick access to all questions in this exam

Warning: file_get_contents(http://www.geoplugin.net/php.gp?ip=216.73.216.150): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/passnexa/public_html/view/question.php on line 243