DP-203: Data Engineering on Microsoft Azure

34%

Question 161

You are developing a solution that will stream to Azure Stream Analytics. The solution will have both streaming data and reference data.

Which input type should you use for the reference data?
Azure Cosmos DB
Azure Event Hubs
Azure Blob storage
Azure IoT Hub




Answer is Azure Blob storage

Stream Analytics supports Azure Blob storage and Azure SQL Database as the storage layer for Reference Data.

References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data

Question 162

You develop data engineering solutions for a company.
You need to ingest and visualize real-time Twitter data by using Microsoft Azure.

Which three technologies should you use? Each correct answer presents part of the solution.
Event Grid topic
Azure Stream Analytics Job that queries Twitter data from an Event Hub
Azure Stream Analytics Job that queries Twitter data from an Event Grid
Logic App that sends Twitter posts which have target keywords to Azure
Event Grid subscription
Event Hub instance




Answers are;
Azure Stream Analytics Job that queries Twitter data from an Event Hub
Logic App that sends Twitter posts which have target keywords to Azure
Event Hub instance


You can use Azure Logic apps to send tweets to an event hub and then use a Stream Analytics job to read from event hub and send them to PowerBI.

References:
https://community.powerbi.com/t5/Integrations-with-Files-and/Twitter-streaming-analytics-step-by-step/td-p/9594

Question 163

You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct values for a column named clusterID.
You monitor the Stream Analytics job and discover high latency.
You need to reduce the latency.

Which two actions should you perform?
Add a pass-through query.
Add a temporal analytic function.
Scale out the query by using PARTITION BY.
Convert the query to a reference query.
Increase the number of streaming units.




Answers are;
Scale out the query by using PARTITION BY.
Increase the number of streaming units.


Scaling a Stream Analytics job takes advantage of partitions in the input or output. Partitioning lets you divide data into subsets based on a partition key. A process that consumes the data (such as a Streaming Analytics job) can consume and write different partitions in parallel, which increases throughput.

Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your job. This capacity lets you focus on the query logic and abstracts the need to manage the hardware to run your Stream Analytics job in a timely manner.

References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption

Question 164

You are monitoring an Azure Stream Analytics job.
You discover that the Backlogged Input Events metric is increasing slowly and is consistently non-zero.
You need to ensure that the job can handle all the events.

What should you do?
Change the compatibility level of the Stream Analytics job.
Increase the number of streaming units (SUs).
Create an additional output stream for the existing input stream.
Remove any named consumer groups from the connection and use $default.




Answer is Increase the number of streaming units (SUs).

Backlogged Input Events: Number of input events that are backlogged. A non-zero value for this metric implies that your job isn't able to keep up with the number of incoming events. If this value is slowly increasing or consistently non-zero, you should scale out your job. You should increase the Streaming Units.

Note: Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your job.

Reference:
https://docs.microsoft.com/bs-cyrl-ba/azure/stream-analytics/stream-analytics-monitoring

Question 165

A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).
You need to optimize performance for the Azure Stream Analytics job.

Which two actions should you perform?
Implement event ordering
Scale the SU count for the job up
Implement Azure Stream Analytics user-defined functions (UDF)
Scale the SU count for the job down
Implement query parallelization by partitioning the data output
Implement query parallelization by partitioning the data input




Answer are; Scale the SU count for the job up
Implement query parallelization by partitioning the data input


Scale out the query by allowing the system to process each input partition separately.
A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the data stream from.

References:h
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

Question 166

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.
You need to output the count of tweets during the last five minutes every five minutes.

Which windowing function should you use?
a five-minute Sliding window
a five-minute Session window
a five-minute Tumbling window
has a one-minute hop




Answer is a five-minute Tumbling window

Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
SELECT
   Timezone,
   Count(*) AS  Count
FROM TwitterStream
TIMESTAMP BY CreatedAt
GROUP BY
   TimeZone,
TumblingWindow(second,10)

Incorrect Answers:
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.

Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

Question 167

You need to implement complex stateful business logic within an Azure Stream Analytics service.
Which type of function should you create in the Stream Analytics topology?
JavaScript user-define functions (UDFs)
Azure Machine Learning
JavaScript user-defined aggregates (UDA)




Answer is JavaScript user-defined aggregates (UDA)

Azure Stream Analytics supports user-defined aggregates (UDA) written in JavaScript, it enables you to implement complex stateful business logic. Within UDA you have full control of the state data structure, state accumulation, state decumulation, and aggregate result computation.

References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-javascript-user-defined-aggregates

Question 168

A company plans to analyze a continuous flow of data from a social media platform by using Microsoft Azure Stream Analytics. The incoming data is formatted as one record per row. You need to create the input stream. How should you complete the REST API segment?
A - A
A - B
A - C
B - C
B - A
C - B
C - A
C - C




Answer is A - B

Box 1: CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record.
JSON and AVRO are not formatted as one record per row.

Box 2: "type":"Microsoft.ServiceBus/EventHub",
Properties include "EventHubName"

References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-inputs
https://en.wikipedia.org/wiki/Comma-separated_values

Question 169

You are implementing Azure Stream Analytics functions.
Which windowing function should you use for each requirement?
A-B-C
B-A-A
C-D-B
D-A-C
A-C-D
B-C-D
C-D-A
D-A-B




Answer is D-A-C

Box 1: Tumbling
Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.

Box 2: Hoppping
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.

Box 3: Sliding
Sliding window functions, unlike Tumbling or Hopping windows, produce an output only when an event occurs. Every window will have at least one event and the window continuously moves forward by an € (epsilon). Like hopping windows, events can belong to more than one sliding window.


References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

Question 170

You are processing streaming data from vehicles that pass through a toll booth.
You need to use Azure Stream Analytics to return the license plate, vehicle make, and hour the last vehicle passed during each 10-minute window.

How should you complete the query?
Check the answer section




WITH LastInWindow AS
(
   SELECT
      MAX(Time) AS LastEventTime
   FROM
      Input TIMESTAMP BY Time
   GROUP BY
      TumblingWindow(minute, 10)
)
SELECT
   Input.License_plate,
   Input.Make,
   Input.Time
FROM
   Input TIMESTAMP BY Time
INNER JOIN LastInWindow
   ON     DATEDIFF(minute, Input, LastInWindow) BETWEEN 0 AND 10
      AND Input.Time = LastInWindow.LastEventTime

Box 2: TumblingWindow
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.

Box 3: DATEDIFF
DATEDIFF is a date-specific function that compares and returns the time difference between two DateTime fields, for more information, refer to date functions.

Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

< Previous PageNext Page >

Quick access to all questions in this exam