You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks environments:
The data engineers must share a cluster.
The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a High Concurrency cluster for the jobs.
Does this meet the goal?
Yes
No
Answer is No
-Data scientist should have their own cluster and should terminate after 120 mins - STANDARD
-Cluster for Jobs should support scala - STANDARD
Note:
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
Solution:
1. Create a remote service binding pointing to the Azure Data Lake Gen 2 storage account
2. Create an external file format and external table using the external data source
3. Load the data using the CREATE TABLE AS SELECT statement.
Does the solution meet the goal?
Yes
No
Answer is No
You need to create an external file format and external table from an external data source, instead from a remote service binding pointing.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob storage file named Customers. The file will contain both in-store and online customer details. The online customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location. The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one query, and two outputs.
Does this meet the goal?
Yes
No
Answer is No
We need one reference data input for LocationIncomes, which rarely changes.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob storage file named Customers. The file will contain both in-store and online customer details. The online customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location. The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, two queries, and four outputs.
Does this meet the goal?
Yes
No
Answer is Yes
We need one reference data input for LocationIncomes, which rarely changes.
We need two queries, on for in-store customers, and one for online customers. For each query two outputs is needed.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
A workload for data engineers who will use Python and SQL
A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team at your company identifies the following standards for Databricks environments:
The data engineers must share a cluster.
The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a Standard cluster for the data engineers, and a High Concurrency cluster for the jobs.
Does this meet the goal?
Yes
No
Answer is No
We need a High Concurrency cluster for the data engineers and the jobs.
Note:
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
3. Create an external file format and external table using the external data source
4. Load the data using the CREATE TABLE AS SELECT statement
Does the solution meet the goal?
Yes
No
Answer is Yes
It is not necessary to convert the parquet files to CSV files.
You need to create an external file format and external table using the external data source.
You load the data using the CREATE TABLE AS SELECT statement.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob storage file named Customers. The file will contain both in-store and online customer details. The online customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location. The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two outputs.
Does this meet the goal?
Yes
No
Answer is No
We need one reference data input for LocationIncomes, which rarely changes.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob storage file named Customers. The file will contain both in-store and online customer details. The online customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location. The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two outputs.
Yes
No
Answer is No
We need one reference data input for LocationIncomes, which rarely changes
Note: Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob storage file named Customers. The file will contain both in-store and online customer details. The online customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location. The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, two queries, and four outputs.
Does this meet the goal?
Yes
No
Answer is Yes
We need one reference data input for LocationIncomes, which rarely changes.
We need two queries, on for in-store customers, and one for online customers.
For each query two outputs is needed.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly.
You are developing a solution that will use Azure Stream Analytics. The solution will accept an Azure Blob storage file named Customers. The file will contain both in-store and online customer details. The online customers will provide a mailing address.
You have a file in Blob storage named LocationIncomes that contains median incomes based on location. The file rarely changes.
You need to use an address to look up a median income based on location. You must output the data to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input, one query, and two outputs.
Does this meet the goal?
Yes
No
Answer is No
We need one reference data input for LocationIncomes, which rarely changes.
We need two queries, on for in-store customers, and one for online customers.
For each query two outputs is needed.
Note: Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly.