DP-203: Data Engineering on Microsoft Azure

72%

Question 331

You manage a Microsoft Azure SQL Data Warehouse Gen 2.
Users report slow performance when they run commonly used queries. Users do not report performance changes for infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?
Cache used percentage
Local tempdb percentage
DWU percentage
CPU percentage
Data IO percentage




Answer is Cache used percentage

The Gen2 storage architecture automatically tiers your most frequently queried columnstore segments in a cache residing on NVMe based SSDs designed for Gen2 data warehouses. Greater performance is realized when your queries retrieve segments that are residing in the cache. You can monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the Gen2 cache.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache

Question 332

You are implementing automatic tuning mode for Azure SQL databases.
Automatic tuning mode is configured as shown in the following table.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.
No - No - No
No - No - Yes
No - Yes - No
No - Yes - Yes
Yes - No - No
Yes - No - Yes
Yes - Yes - No
Yes - Yes - Yes




Answer is Yes - Yes - No


Automatic tuning options can be independently enabled or disabled per database, or they can be configured on SQL Database servers and applied on every database that inherits settings from the server. SQL Database servers can inherit Azure defaults for Automatic tuning settings. Azure defaults at this time are set to FORCE_LAST_GOOD_PLAN is enabled, CREATE_INDEX is enabled, and DROP_INDEX is disabled.

References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning

Question 333

You are designing a new Lambda architecture on Microsoft Azure.
The real-time processing layer must meet the following requirements:

Ingestion:
Stream processing:
Analytical data store:
You need to identify the correct technologies to build the Lambda architecture using minimal effort.
Which technologies should you use?





Answer is B - A - D

Box 1: Azure Event Hubs
This portion of a streaming architecture is often referred to as stream buffering. Options include Azure Event Hubs, Azure IoT Hub, and Kafka.
Incorrect Answers: Not HDInsight Kafka Azure Functions need a trigger defined in order to run. There is a limited set of supported trigger types, and Kafka is not one of them.

Box 2: Azure Stream Analytics
Azure Stream Analytics provides a managed stream processing service based on perpetually running SQL queries that operate on unbounded streams. You can also use open source Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster.

Box 3: Azure Synapse Analytics
Azure Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing. HDInsight supports Interactive Hive, HBase, and Spark SQL, which can also be used to serve data for analysis.

References:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/

Question 334

You are creating a managed data warehouse solution on Microsoft Azure.
You must use PolyBase to retrieve data from Azure Blob storage that resides in parquet format and load the data into a large table called FactSalesOrderDetails.
You need to configure Azure SQL Data Warehouse to receive the data.

Which four actions should you perform in sequence?




Answer is F-E-A-C


Step 1: Create a master key on the database
Create a master key on the database. This is required to encrypt the credential secret.
Step 2: Create an external data source for Azure Blob storage
Create an external data source with CREATE EXTERNAL DATA SOURCE..
Step 3: Create an external file format to map parquet files.
Create an external file format with CREATE EXTERNAL FILE FORMAT.
FORMAT TYPE: Type of format in Hadoop (DELIMITEDTEXT, RCFILE, ORC, PARQUET).
Step 4: Create the external table FactSalesOrderDetails
To query the data in your Hadoop data source, you must define an external table to use in Transact-SQL queries.
Create an external table pointing to data stored in Azure storage with CREATE EXTERNAL TABLE.
Note: PolyBase is a technology that accesses and combines both non-relational and relational data, all from within SQL Server. It allows you to run queries on external data in Hadoop or Azure blob storage.

References:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-azure-blob-storage

Question 335

You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes.

Which type of slowly changing dimension (SCD) should you use?
Type 0
Type 1
Type 2
Type 3




Answer is Type 2

A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example,
IsCurrent) to easily filter by current dimension members.

Incorrect Answers:
B: A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten.
D: A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value of a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history, rather than storing additional rows to track each change like in a Type 2 SCD.

Reference:
https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types

Question 336

You are designing a star schema for a dataset that contains records of online orders. Each record includes an order date, an order due date, and an order ship date.
You need to ensure that the design provides the fastest query times of the records when querying for arbitrary date ranges and aggregating by fiscal calendar attributes.

Which two actions should you perform?
Create a date dimension table that has a DateTime key.
Use built-in SQL functions to extract date attributes.
Create a date dimension table that has an integer key in the format of YYYYMMDD.
In the fact table, use integer columns for the date fields.
Use DateTime columns for the date fields.




Answers are
C. Create a date dimension table that has an integer key in the format of YYYYMMDD.
D. In the fact table, use integer columns for the date fields.



Question 337

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads: The enterprise architecture team at your company identifies the following standards for Databricks environments: You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?
Yes
No




Answer is Yes

We would need a Standart cluster for the jobs to support Scala. High-concurrecny cluster does not support Scala.

Note: Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

References:
https://docs.azuredatabricks.net/clusters/configure.html

Question 338

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads: The enterprise architecture team at your company identifies the following standards for Databricks environments: You need to create the Databrick clusters for the workloads.
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?
Yes
No




No need for a High Concurrency cluster for each data scientist.
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

References:
https://docs.azuredatabricks.net/clusters/configure.html

Question 339

You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is less than 1 MB.

Does this meet the goal?
Yes
No




Answer is No

Instead convert the files to compressed delimited text files.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data

Question 340

You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.
You need to prepare the files to ensure that the data copies quickly.
Solution: You convert the files to compressed delimited text files.

Does this meet the goal?
Yes
No




Answer is Yes

All file formats have different performance characteristics. For the fastest load, use compressed delimited text files.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data

< Previous PageNext Page >

Quick access to all questions in this exam

Warning: file_get_contents(http://www.geoplugin.net/php.gp?ip=216.73.216.150): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/passnexa/public_html/view/question.php on line 243