DP-203: Data Engineering on Microsoft Azure Certification Dump Questions Answers Examples

DP-203: Data Engineering on Microsoft Azure

72%

Question 331

You manage a Microsoft Azure SQL Data Warehouse Gen 2.
Users report slow performance when they run commonly used queries. Users do not report performance changes for infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?

Cache used percentage

Local tempdb percentage

DWU percentage

CPU percentage

Data IO percentage

Answer is Cache used percentage

The Gen2 storage architecture automatically tiers your most frequently queried columnstore segments in a cache residing on NVMe based SSDs designed for Gen2 data warehouses. Greater performance is realized when your queries retrieve segments that are residing in the cache. You can monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the Gen2 cache.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache

Question 332

You are implementing automatic tuning mode for Azure SQL databases.
Automatic tuning mode is configured as shown in the following table.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

No - No - No

No - No - Yes

No - Yes - No

No - Yes - Yes

Yes - No - No

Yes - No - Yes

Yes - Yes - No

Yes - Yes - Yes

Answer is Yes - Yes - No

Automatic tuning options can be independently enabled or disabled per database, or they can be configured on SQL Database servers and applied on every database that inherits settings from the server. SQL Database servers can inherit Azure defaults for Automatic tuning settings. Azure defaults at this time are set to FORCE_LAST_GOOD_PLAN is enabled, CREATE_INDEX is enabled, and DROP_INDEX is disabled.

References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning

Question 333

You are designing a new Lambda architecture on Microsoft Azure.
The real-time processing layer must meet the following requirements:

Ingestion:

Receive millions of events per second
Act as a fully managed Platform-as-a-Service (PaaS) solution
Integrate with Azure Functions

Stream processing:

Process on a per-job basis
Provide seamless connectivity with Azure services
Use a SQL-based query language

Analytical data store:

Act as a managed service
Use a document store
Provide data encryption at rest

You need to identify the correct technologies to build the Lambda architecture using minimal effort.
Which technologies should you use?

Answer is B - A - D

Box 1: Azure Event Hubs
This portion of a streaming architecture is often referred to as stream buffering. Options include Azure Event Hubs, Azure IoT Hub, and Kafka.
Incorrect Answers: Not HDInsight Kafka Azure Functions need a trigger defined in order to run. There is a limited set of supported trigger types, and Kafka is not one of them.

Box 2: Azure Stream Analytics
Azure Stream Analytics provides a managed stream processing service based on perpetually running SQL queries that operate on unbounded streams. You can also use open source Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster.

Box 3: Azure Synapse Analytics
Azure Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing. HDInsight supports Interactive Hive, HBase, and Spark SQL, which can also be used to serve data for analysis.

References:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/

Question 334

You are creating a managed data warehouse solution on Microsoft Azure.
You must use PolyBase to retrieve data from Azure Blob storage that resides in parquet format and load the data into a large table called FactSalesOrderDetails.
You need to configure Azure SQL Data Warehouse to receive the data.

Which four actions should you perform in sequence?

Answer is F-E-A-C

Step 1: Create a master key on the database
Create a master key on the database. This is required to encrypt the credential secret.
Step 2: Create an external data source for Azure Blob storage
Create an external data source with CREATE EXTERNAL DATA SOURCE..
Step 3: Create an external file format to map parquet files.
Create an external file format with CREATE EXTERNAL FILE FORMAT.
FORMAT TYPE: Type of format in Hadoop (DELIMITEDTEXT, RCFILE, ORC, PARQUET).
Step 4: Create the external table FactSalesOrderDetails
To query the data in your Hadoop data source, you must define an external table to use in Transact-SQL queries.
Create an external table pointing to data stored in Azure storage with CREATE EXTERNAL TABLE.
Note: PolyBase is a technology that accesses and combines both non-relational and relational data, all from within SQL Server. It allows you to run queries on external data in Hadoop or Azure blob storage.

References:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-azure-blob-storage

Question 335

You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes.

Which type of slowly changing dimension (SCD) should you use?

Type 0

Type 1

Type 2

Type 3

Answer is Type 2

A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example,
IsCurrent) to easily filter by current dimension members.

Incorrect Answers:
B: A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten.
D: A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value of a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history, rather than storing additional rows to track each change like in a Type 2 SCD.

Reference:
https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types

Question 336

You are designing a star schema for a dataset that contains records of online orders. Each record includes an order date, an order due date, and an order ship date.
You need to ensure that the design provides the fastest query times of the records when querying for arbitrary date ranges and aggregating by fiscal calendar attributes.

Which two actions should you perform?

Create a date dimension table that has a DateTime key.

Use built-in SQL functions to extract date attributes.

Create a date dimension table that has an integer key in the format of YYYYMMDD.

In the fact table, use integer columns for the date fields.

Use DateTime columns for the date fields.

Answers are
C. Create a date dimension table that has an integer key in the format of YYYYMMDD.
D. In the fact table, use integer columns for the date fields.

Question 337

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.
The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?

Yes

Answer is Yes

We would need a Standart cluster for the jobs to support Scala. High-concurrecny cluster does not support Scala.

Note: Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

References:
https://docs.azuredatabricks.net/clusters/configure.html

Question 338

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.
The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databrick clusters for the workloads.
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?

Yes

No need for a High Concurrency cluster for each data scientist.
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

References:
https://docs.azuredatabricks.net/clusters/configure.html

Question 339

You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is less than 1 MB.

Does this meet the goal?

Yes

Answer is No

Instead convert the files to compressed delimited text files.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data

Question 340

You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You convert the files to compressed delimited text files.

Does this meet the goal?

Yes

Answer is Yes

All file formats have different performance characteristics. For the fastest load, use compressed delimited text files.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data

< Previous Page Next Page >

DP-203: Data Engineering on Microsoft Azure

451 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam