You develop data engineering solutions for a company.
You need to ingest and visualize real-time Twitter data by using Microsoft Azure.
Which three technologies should you use? Each correct answer presents part of the solution.
Event Grid topic
Azure Stream Analytics Job that queries Twitter data from an Event Hub
Azure Stream Analytics Job that queries Twitter data from an Event Grid
Logic App that sends Twitter posts which have target keywords to Azure
Event Grid subscription
Event Hub instance
Answers are;
Azure Stream Analytics Job that queries Twitter data from an Event Hub
Logic App that sends Twitter posts which have target keywords to Azure
Event Hub instance
You can use Azure Logic apps to send tweets to an event hub and then use a Stream Analytics job to read from event hub and send them to PowerBI.
You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct values for a column named clusterID.
You monitor the Stream Analytics job and discover high latency.
You need to reduce the latency.
Which two actions should you perform?
Add a pass-through query.
Add a temporal analytic function.
Scale out the query by using PARTITION BY.
Convert the query to a reference query.
Increase the number of streaming units.
Answers are;
Scale out the query by using PARTITION BY.
Increase the number of streaming units.
Scaling a Stream Analytics job takes advantage of partitions in the input or output. Partitioning lets you divide data into subsets based on a partition key. A process that consumes the data (such as a Streaming Analytics job) can consume and write different partitions in parallel, which increases throughput.
Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your job. This capacity lets you focus on the query logic and abstracts the need to manage the hardware to run your Stream Analytics job in a timely manner.
You are monitoring an Azure Stream Analytics job.
You discover that the Backlogged Input Events metric is increasing slowly and is consistently non-zero.
You need to ensure that the job can handle all the events.
What should you do?
Change the compatibility level of the Stream Analytics job.
Increase the number of streaming units (SUs).
Create an additional output stream for the existing input stream.
Remove any named consumer groups from the connection and use $default.
Answer is Increase the number of streaming units (SUs).
Backlogged Input Events: Number of input events that are backlogged. A non-zero value for this metric implies that your job isn't able to keep up with the number of incoming events. If this value is slowly increasing or consistently non-zero, you should scale out your job. You should increase the Streaming Units.
Note: Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your job.
A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).
You need to optimize performance for the Azure Stream Analytics job.
Implement query parallelization by partitioning the data output
Implement query parallelization by partitioning the data input
Answer are;
Scale the SU count for the job up
Implement query parallelization by partitioning the data input
Scale out the query by allowing the system to process each input partition separately.
A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the data stream from.
You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.
You need to output the count of tweets during the last five minutes every five minutes.
Which windowing function should you use?
a five-minute Sliding window
a five-minute Session window
a five-minute Tumbling window
has a one-minute hop
Answer is a five-minute Tumbling window
Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
SELECT
Timezone,
Count(*) AS Count
FROM TwitterStream
TIMESTAMP BY CreatedAt
GROUP BY
TimeZone,
TumblingWindow(second,10)
Incorrect Answers:
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.
You need to implement complex stateful business logic within an Azure Stream Analytics service.
Which type of function should you create in the Stream Analytics topology?
JavaScript user-define functions (UDFs)
Azure Machine Learning
JavaScript user-defined aggregates (UDA)
Answer is JavaScript user-defined aggregates (UDA)
Azure Stream Analytics supports user-defined aggregates (UDA) written in JavaScript, it enables you to implement complex stateful business logic. Within UDA you have full control of the state data structure, state accumulation, state decumulation, and aggregate result computation.
A company plans to analyze a continuous flow of data from a social media platform by using Microsoft Azure Stream Analytics. The incoming data is formatted as one record per row.
You need to create the input stream.
How should you complete the REST API segment?
A - A
A - B
A - C
B - C
B - A
C - B
C - A
C - C
Answer is A - B
Box 1: CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record.
JSON and AVRO are not formatted as one record per row.
Box 2: "type":"Microsoft.ServiceBus/EventHub",
Properties include "EventHubName"
You are implementing Azure Stream Analytics functions.
Which windowing function should you use for each requirement?
A-B-C
B-A-A
C-D-B
D-A-C
A-C-D
B-C-D
C-D-A
D-A-B
Answer is D-A-C
Box 1: Tumbling
Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.
Box 2: Hoppping
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.
Box 3: Sliding
Sliding window functions, unlike Tumbling or Hopping windows, produce an output only when an event occurs. Every window will have at least one event and the window continuously moves forward by an € (epsilon). Like hopping windows, events can belong to more than one sliding window.
You are processing streaming data from vehicles that pass through a toll booth.
You need to use Azure Stream Analytics to return the license plate, vehicle make, and hour the last vehicle passed during each 10-minute window.
How should you complete the query?
Check the answer section
WITH LastInWindow AS
(
SELECT
MAX(Time) AS LastEventTime
FROM
Input TIMESTAMP BY Time
GROUP BY
TumblingWindow(minute, 10)
)
SELECT
Input.License_plate,
Input.Make,
Input.Time
FROM
Input TIMESTAMP BY Time
INNER JOIN LastInWindow
ON DATEDIFF(minute, Input, LastInWindow) BETWEEN 0 AND 10
AND Input.Time = LastInWindow.LastEventTime
Box 2: TumblingWindow
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
Box 3: DATEDIFF
DATEDIFF is a date-specific function that compares and returns the time difference between two DateTime fields, for more information, refer to date functions.