DP-203: Data Engineering on Microsoft Azure

32%

Question 151

You are designing database for an Azure Synapse Analytics dedicated SQL pool to support workloads for detecting ecommerce transaction fraud.
Data will be combined from multiple ecommerce sites and can include sensitive financial information such as credit card numbers.
You need to recommend a solution that meets the following requirements:
- Users must be able to identify potentially fraudulent transactions.
- Users must be able to use credit cards as a potential feature in models.
- Users must NOT be able to access the actual credit card numbers.

What should you include in the recommendation?
Transparent Data Encryption (TDE)
row-level security (RLS)
column-level encryption
Azure Active Directory (Azure AD) pass-through authentication




Answer is column-level encryption

Use Always Encrypted to secure the required columns. You can configure Always Encrypted for individual database columns containing your sensitive data.
Always Encrypted is a feature designed to protect sensitive data, such as credit card numbers or national identification numbers (for example, U.S. social security numbers), stored in Azure SQL Database or SQL Server databases.

Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine

Question 152

You have an on-premises data warehouse that includes the following fact tables. Both tables have the following columns: DateKey, ProductKey, RegionKey.
There are 120 unique product keys and 65 unique region keys.

Queries that use the data warehouse take a long time to complete.
You plan to migrate the solution to use Azure Synapse Analytics. You need to ensure that the Azure-based solution optimizes query performance and minimizes processing skew.

What should you recommend?




1. Hash Distributed, ProductKey because >2GB and ProductKey is extensively used in joins
2. Hash Distributed, RegionKey because "The table size on disk is more than 2 GB." and you have to chose a distribution column which: "Is not used in WHERE clauses. This could narrow the query to not run on all the distributions."

Round-robin tables are useful for improving loading speed.
Consider using the round-robin distribution for your table in the following scenarios:
- When getting started as a simple starting point since it is the default
- If there is no obvious joining key
- If there is not good candidate column for hash distributing the table
- If the table does not share a common join key with other tables
- If the join is less significant than other joins in the query
- When the table is a temporary staging table
Note: A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.

Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choosing-a-distribution-column

Question 153

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:
- TransactionType: 40 million rows per transaction type
- CustomerSegment: 4 million per customer segment
- TransactionMonth: 65 million rows per month
- AccountType: 500 million per account type

You have the following query requirements:
- Analysts will most commonly analyze transactions for a given month.
- Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type
You need to recommend a partition strategy for the table to minimize query times.

On which column should you recommend partitioning the table?
CustomerSegment
AccountType
TransactionType
TransactionMonth




Answer is TransactionMonth

the analyst will be querying transactions for month, and then its mentioned that transaction analysis will be done on Transaction_type, customer_segment and account_type, meaning they won't be querying for an individual columns but all 3 at the same time, which means it's pointless to partition between these columns, so transaction month is the answer.

In most cases data is partitioned on a date column that is closely tied to the order in which the data is loaded into the SQL pool.

Reference:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool

Question 154

Which of the following technologies typically provide an ingestion point for data streaming in an event processing solution that uses static data as a source?
IoT Hubs
Event Hubs
Azure Blob storage




Answer is Azure Blob storage

Azure Blob storage provide an ingestion point for data streaming in an event processing solution that uses static data as a source.

IoT hubs provide an ingestion point for data streaming in an event processing solution that uses streaming data from an IoT device as a source.

Event hubs provide an ingestion point for data streaming in an event processing solution that uses streaming data from an application as a source.

Question 155

Applications that publish messages to Azure Event Hub very frequently will get the best performance using Advanced Message Queuing Protocol (AMQP) because it establishes a persistent socket. True or False?
True
False




Answer is True

Publishers can use either HTTPS or AMQP. AMQP opens a socket and can send multiple messages over that socket.

Question 156

By default, how many partitions will a new Event Hub have?
1
2
3
4
8




Answer is 4

Event Hubs default to 4 partitions. Partitions are the buckets within an Event Hub. Each publication will go into only one partition. Each consumer group may read from one or more than one partition.

Question 157

If an Event Hub goes offline before a consumer group can process the events it holds, those events will be lost. True or False?
True
False




Answer is False

Events are persistent. Each consumer group has its own cursor maintaining its position within the partition. The consumer groups can resume processing at their cursor position when the Event Hub is again available.

Question 158

Which job input consumes data streams from applications at low latencies and high throughput?
Azure Blob
Event Hubs
IoT hubs




Answer is Event Hubs

Event hubs consumes data streams from applications at low latencies and high throughput. Azure Blob stores static data that can be rerun through a streaming job. Azure IoT Hub is a data stream ingestion service that consumes events from IoT devices and also provides bi-directional communication between Azure and IoT devices.

Question 159

Streaming Analytics Query Language is a subset of which query language
WQL
TSQL
JSON




Answer is TSQL

Streaming Analytics Query Language is a subset of Transact-SQL. WQL is WIndows Management Instrumentation query language. JSON is an open standard file format

Question 160

You are a Data Engineer for Contoso. You want to view key health metrics of your Stream Analytics jobs. Which tool in Streaming Analytics should you use?
Diagnostics
Alerts
Dashboards




Answer is Dashboards

Dashboard are used to view the key health metrics of your Stream Analytics jobs. Diagnostic logging is turned off by default and can help with root-cause analysis in production deployments. Alerts enable proactive detection of issues in Stream Analytics.

< Previous PageNext Page >

Quick access to all questions in this exam