A company purchases IoT devices to monitor manufacturing machinery. The company uses an Azure IoT Hub to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What
should you recommend?
Azure Data Factory instance using Azure Portal
Azure Data Factory instance using Azure PowerShell
Azure Stream Analytics cloud job using Azure Portal
Azure Data Factory instance using Microsoft Visual Studio
Answer is Azure Stream Analytics cloud job using Azure Portal
In a real-world scenario, you could have hundreds of these sensors generating events as a stream. Ideally, a gateway device would run code to push these events to Azure Event Hubs or Azure IoT Hubs. Your Stream Analytics job would ingest these events from Event Hubs and run real-time analytics queries against the streams. Create a Stream Analytics job: In the Azure portal, select + Create a resource from the left navigation menu. Then, select Stream Analytics job from Analytics.
You are designing an anomaly detection solution for streaming data from an Azure IoT hub. The solution must meet the following requirements:
- Send the output to Azure Synapse.
- Identify spikes and dips in time series data.
- Minimize development and configuration effort.
Which should you include in the solution?
Azure Databricks
Azure Stream Analytics
Azure SQL Database
Answer is Azure Stream Analytics
You can identify anomalies by routing data via IoT Hub to a built-in ML model in Azure Stream Analytics.
You are a data engineer. You are designing a Hadoop Distributed File System (HDFS) architecture. You plan to use Microsoft Azure Data Lake as a data storage repository.
You must provision the repository with a resilient data schema. You need to ensure the resiliency of the Azure Data Lake Storage. What should you use?
A - A - A
A - A - B
A - B - B
B - A - A
B - A - B
B - B - A
B - B - B
Answer is B - A - A
Box 1: NameNode
An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients.
Box 2: DataNode
The DataNodes are responsible for serving read and write requests from the file system"™s clients.
Box 3: DataNode
The DataNodes perform block creation, deletion, and replication upon instruction from the NameNode.
Note: HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system's clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb.
You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace.
You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data.
One minute later, you execute the following query from a serverless SQL pool in MyWorkspace.
SELECT EmployeeID
FROM mytestdb.dbo.myParquetTable
WHERE name = 'Alice';
What will be returned by the query?
24
an error
a null value
Answer is an error
Once a database has been created by a Spark job, you can create tables in it with Spark that use Parquet as the storage format. Table names will be converted to lower case and need to be queried using the lower case name. These tables will immediately become available for querying by any of the Azure Synapse workspace Spark pools. The Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in serverless SQL pool.
You are planning the deployment of Azure Data Lake Storage Gen2.
You have the following two reports that will access the data lake:
- Report1: Reads three columns from a file that contains 50 columns.
- Report2: Queries a single record based on a timestamp.
You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times.
What should you recommend for each report?
1: Parquet - column-oriented binary file format
2: AVRO - Row based format, and has logical type timestamp
Not Parquet, TSV: Not options for Azure Data Lake Storage Gen2.
An application will use Microsoft Azure Cosmos DB as its data solution. The application will use the Cassandra API to support a column-based database type that uses containers to store items.
You need to provision Azure Cosmos DB.
Which container name and item name should you use? Each correct answer presents part of the solutions.
collection
rows
graph
entities
table
Answer is B & E
B because; Depending on which API you use, an Azure Cosmos item can represent either a document in a collection, a row in a table, or a node or edge in a graph. The following table shows the mapping of API-specific entities to an Azure Cosmos item:
Cosmos entity
SQL API
Cassandra API
Azure Cosmos DB API for MongoDB
Gremlin API
Table API
Azure Cosmos item
Document
Row
Document
Node or edge
Item
E because; An Azure Cosmos container is specialized into API-specific entities as shown in the following table:
You are developing a data engineering solution for a company. The solution will store a large set of key-value pair data by using Microsoft Azure Cosmos DB.
The solution has the following requirements:
Data must be partitioned into multiple containers.
Data containers must be configured separately.
Data must be accessible from applications hosted around the world.
The solution must minimize latency.
You need to provision Azure Cosmos DB.
Configure account-level throughput.
Provisionan Azure Cosmos DB account with the Azure Table API. Enable geo-redundancy.
Configure table-level throughput.
Replicate the data globally by manually adding regions to the Azure Cosmos DB account.
Provision an Azure Cosmos DB account with the Azure Table API. Enable multi-region writes.
Answer is Provision an Azure Cosmos DB account with the Azure Table API. Enable multi-region writes.
Scale read and write throughput globally. You can enable every region to be writable and elastically scale reads and writes all around the world. The throughput that your application configures on an Azure Cosmos database or a container is guaranteed to be delivered across all regions associated with your Azure Cosmos account. The provisioned throughput is guaranteed up by financially backed SLAs.
You want to ensure that there is 99.999% availability for the reading and writing of all your data. How can this be achieved?
By configuring reads and writes of data in a single region.
By configuring reads and writes of data for multi-region accounts with multi region writes.
By configuring reads and writes of data for multi-region accounts with a single region writes.
Answer is "By configuring reads and writes of data for multi-region accounts with multi region writes."
By configuring reads and writes of data for multi-region accounts with multi region writes, you can achieve 99.999% availability
Question 199
What are the three main advantages to using Cosmos DB?
Cosmos DB offers global distribution capabilities out of the box.
Cosmos DB provides a minimum of 99.99% availability.
Cosmos DB response times of read/write operations are typically in the order of 10s of milliseconds.
All of the above.
Answer is All of the above.
All of the above. Cosmos DB offers global distribution capabilities out of the box, provides a minimum of 99.99% availability and has response times of read/write operations are typically in the order of 10s of milliseconds.
Question 200
You are a data engineer wanting to make the data that is currently stored in a Table Storage account located in the West US region available globally. Which Cosmos DB model should you migrate to?
Gremlin API
Cassandra API
Table API
Mongo DB API
Answer is Table API
The Table API Cosmos DB model will enable you to provide global availability of your table storage account data. Gremlin API is used to store Graph databases. Cassandra API is used to store date from Cassandra databases and Mongo DB API is used to store Mongo DB databases.