DP-203: Data Engineering on Microsoft Azure

2%

Question 1

Which role works with Azure Cognitive Services, Cognitive Search, and the Bot Framework?
A data engineer
A data scientist
An AI engineer




Answer is An AI engineer

Artificial intelligence (AI) engineers work with AI services such as Cognitive Services, Cognitive Search, and the Bot Framework.

Data engineers are responsible for the provisioning and configuration of both on-premises and cloud data platform technologies.
Data scientists perform advanced analytics to help drive value from data.

Question 2

Azure Databricks encapsulates which Apache Storage technology?
Apache HDInsight
Apache Hadoop
Apache Spark




Answer is Apache Spark

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure.

Apache HDInsight does not exist, Azure HDInsight is a fully managed, full-spectrum, open-source analytics service for enterprises. HDInsight is a cloud service that makes it easy, fast, and cost-effective to process massive amounts of data.

Apache Hadoop is the original open-source framework for distributed processing and analysis of big data sets on clusters.

Question 3

Which security features does Azure Databricks not support?
Azure Active Directory
Shared Access Keys
Role-based access




Answer is Shared Access Keys

Shared Access Keys are a security feature used within Azure storage accounts. Azure Active Directory and Role-based access are supported security features in Azure Databricks.

Question 4

Which of the following Azure Databricks is used for support for R, SQL, Python, Scala, and Java?
MLlib
GraphX
Spark Core API




Answer is Spark Core API

Spark Core API support for R, SQL, Python, Scala, and Java in Azure Databricks.

MLlib is the Machine Learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.

GraphX provides graphs and graph computation for a broad scope of use cases from cognitive analytics to data exploration.

Question 5

Which Notebook format is used in Databricks?
DBC
.notebook
.spark




Answer is D-B-C

dbc file types are the supported Databricks notebook format. There is no no .notebook or .spark file format available.

Question 6

Which browsers are recommended for best use with Databricks Notebook?
Chrome and Firefox
Microsoft Edge and IE 11
Safari and Microsoft Edge




Answer is Chrome and Firefox

Chrome and Firefox are the recommended browsers by Databricks. Microsoft Edge and IE 11 are not recommended because of faulty rendering of iFrames, but Safari is also an acceptable browser.

Question 7

How do you connect your Spark cluster to the Azure Blob?
By calling the .connect() function on the Spark Cluster.
By mounting it
By calling the .connect() function on the Azure Blob




Answer is By mounting it

By mounting it. Mounts require Azure credentials such as SAS keys and give access to a virtually infinite store for your data. The .connect() function is not a valid method.

Question 8

How does Spark connect to databases like MySQL, Hive and other data stores?
JDBC
ODBC
Using the REST API Layer




Answer is JDBC

JDBC. JDBC stands for Java Database Connectivity, and is a Java API for connecting to databases such as MySQL, Hive, and other data stores. ODBC is not an option and the REST API Layer is not available

Question 9

How do you specify parameters when reading data?
Using .option() during your read allows you to pass key/value pairs specifying aspects of your read
Using .parameter() during your read allows you to pass key/value pairs specifying aspects of your read
Using .keys() during your read allows you to pass key/value pairs specifying aspects of your read




Answer is "Using .option() during your read allows you to pass key/value pairs specifying aspects of your read"

Using .option() during your read allows you to pass key/value pairs specifying aspects of your read. For instance, options for reading CSV data include header, delimiter, and inferSchema.

Question 10

By default, how are corrupt records dealt with using spark.read.json()
They appear in a column called "_corrupt_record"
They get deleted automatically
They throw an exception and exit the read operation




Answer is "They appear in a column called _corrupt_record"

They appear in a column called "_corrupt_record". They do not get deleted automatically or throw an exception and exit the read operation

Next Page >

Quick access to all questions in this exam