DP-100: Designing and Implementing a Data Science Solution on Azure Certification Dump Questions Answers Examples

DP-100: Designing and Implementing a Data Science Solution on Azure

96%

Question 191

You are creating a classification model for a banking company to identify possible instances of credit card fraud. You plan to create the model in Azure Machine Learning by using automated machine learning.
The training dataset that you are using is highly unbalanced.
You need to evaluate the classification model.

Which primary metric should you use?

normalized_mean_absolute_error

AUC_weighted

accuracy

normalized_root_mean_squared_error

spearman_correlation

Answer is AUC_weighted is a Classification metric.

Note: AUC is the Area under the Receiver Operating Characteristic Curve. Weighted is the arithmetic mean of the score for each class, weighted by the number of true instances in each class.

Incorrect Answers:
A: normalized_mean_absolute_error is a regression metric, not a classification metric.
C: When comparing approaches to imbalanced classification problems, consider using metrics beyond accuracy such as recall, precision, and AUROC. It may be that switching the metric you optimize for during parameter selection or model selection is enough to provide desirable performance detecting the minority class.
D: normalized_root_mean_squared_error is a regression metric, not a classification metric.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml

Question 192

You train a model and register it in your Azure Machine Learning workspace. You are ready to deploy the model as a real-time web service.
You deploy the model to an Azure Kubernetes Service (AKS) inference cluster, but the deployment fails because an error occurs when the service runs the entry script that is associated with the model deployment.
You need to debug the error by iteratively modifying the code and reloading the service, without requiring a re-deployment of the service for each code update.

What should you do?

Modify the AKS service deployment configuration to enable application insights and re-deploy to AKS.

Create an Azure Container Instances (ACI) web service deployment configuration and deploy the model on ACI.

Add a breakpoint to the first line of the entry script and redeploy the service to AKS.

Create a local web service deployment configuration and deploy the model to a local Docker container.

Register a new version of the model and update the entry script to load the new version of the model from its registered path.

Answer is Create an Azure Container Instances (ACI) web service deployment configuration and deploy the model on ACI.

How to work around or solve common Docker deployment errors with Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning.
The recommended and the most up to date approach for model deployment is via the Model.deploy() API using an Environment object as an input parameter. In this case our service will create a base docker image for you during deployment stage and mount the required models all in one call. The basic deployment tasks are:
1. Register the model in the workspace model registry.
2. Define Inference Configuration:
a. Create an Environment object based on the dependencies you specify in the environment yaml file or use one of our procured environments.
b. Create an inference configuration (InferenceConfig object) based on the environment and the scoring script.
3. Deploy the model to Azure Container Instance (ACI) service or to Azure Kubernetes Service (AKS).

Question 193

You are a data scientist working for a bank and have used Azure ML to train and register a machine learning model that predicts whether a customer is likely to repay a loan.
You want to understand how your model is making selections and must be sure that the model does not violate government regulations such as denying loans based on where an applicant lives.
You need to determine the extent to which each feature in the customer data is influencing predictions.

What should you do?

Enable data drift monitoring for the model and its training dataset.

Score the model against some test data with known label values and use the results to calculate a confusion matrix.

Use the Hyperdrive library to test the model with multiple hyperparameter values.

Use the interpretability package to generate an explainer for the model.

Add tags to the model registration indicating the names of the features in the training dataset.

Answer is Use the interpretability package to generate an explainer for the model.

When you compute model explanations and visualize them, you're not limited to an existing model explanation for an automated ML model. You can also get an explanation for your model with different test data. The steps in this section show you how to compute and visualize engineered feature importance based on your test data.

Incorrect Answers:
A: In the context of machine learning, data drift is the change in model input data that leads to model performance degradation. It is one of the top reasons where model accuracy degrades over time, thus monitoring data drift helps detect model performance issues.
B: A confusion matrix is used to describe the performance of a classification model. Each row displays the instances of the true, or actual class in your dataset, and each column represents the instances of the class that was predicted by the model.
C: Hyperparameters are adjustable parameters you choose for model training that guide the training process. The HyperDrive package helps you automate choosing these parameters.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl

Question 194

You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework.
You must configure Azure Machine Learning Hyperdrive to optimize the hyperparameters for the classification model.
You need to define a primary metric to determine the hyperparameter values that result in the model with the best accuracy score.

Which three actions must you perform?

Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.

Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss.

Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.

Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.

Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.

Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key accuracy.

Answers are A,D,F

A and D:
primary_metric_name="accuracy",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE
Optimize the runs to maximize "accuracy". Make sure to log this value in your training script.
Note:
primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script. primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs.

F: The training script calculates the val_accuracy and logs it as "accuracy", which is used as the primary metric.

Question 195

You create a multi-class image classification deep learning model that uses a set of labeled images. You create a script file named train.py that uses the PyTorch 1.3 framework to train the model.
You must run the script by using an estimator. The code must not require any additional Python libraries to be installed in the environment for the estimator. The time required for model training must be minimized.
You need to define the estimator that will be used to run the script.

Which estimator type should you use?

TensorFlow

PyTorch

SKLearn

Estimator

Answer is PyTorch

For PyTorch, TensorFlow and Chainer tasks, Azure Machine Learning provides respective PyTorch, TensorFlow, and Chainer estimators to simplify using these frameworks.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-ml-models

Question 196

You are a lead data scientist for a project that tracks the health and migration of birds. You create a multi-class image classification deep learning model that uses a set of labeled bird photographs collected by experts.
You have 100,000 photographs of birds. All photographs use the JPG format and are stored in an Azure blob container in an Azure subscription.
You need to access the bird photograph files in the Azure blob container from the Azure Machine Learning service workspace that will be used for deep learning model training. You must minimize data movement.

What should you do?

Create an Azure Data Lake store and move the bird photographs to the store.

Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to the database.

Create and register a dataset by using TabularDataset class that references the Azure blob storage containing bird photographs.

Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning service workspace.

Answer is Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning service.

We recommend creating a datastore for an Azure Blob container. When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data

Question 197

You use the Azure Machine Learning service to create a tabular dataset named training_data. You plan to use this dataset in a training script.
You create a variable that references the dataset using the following code:

training_ds = workspace.datasets.get("training_data")

You define an estimator to run the script.
You need to set the correct property of the estimator to ensure that your script can access the training_data dataset.

Which property should you set?

environment_definition = {"training_data":training_ds}

inputs = [training_ds.as_named_input('training_ds')]

script_params = {"--training_ds":training_ds}

source_directory = training_ds

Answer is inputs = [training_ds.as_named_input('training_ds')]
Example:

# Get the training dataset
diabetes_ds = ws.datasets.get("Diabetes Dataset")
# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder, inputs=[diabetes_ds.as_named_input('diabetes')],
# Pass the dataset as an input
compute_target = cpu_cluster, conda_packages=['pandas','ipykernel','matplotlib'], pip_packages=['azureml-sdk','argparse','pyarrow'], entry_script='diabetes_training.py')

Reference:
https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-%20Optimizing%20Model%20Training.ipynb

Question 198

You are creating a new Azure Machine Learning pipeline using the designer.
The pipeline must train a model using data in a comma-separated values (CSV) file that is published on a website. You have not created a dataset for this file.
You need to ingest the data from the CSV file into the designer pipeline using the minimal administrative effort.

Which module should you add to the pipeline in Designer?

Convert to CSV

Enter Data Manually

Import Data

Dataset

Answer is Dataset

The preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to data that lives in or is accessible from a datastore or at a Web URL. The Dataset class is abstract, so you will create an instance of either a FileDataset (referring to one or more files) or a TabularDataset that's created by from one or more files with delimited columns of data.
Example:

from azureml.core import Dataset
iris_tabular_dataset = Dataset.Tabular.from_delimited_files([(def_blob_store, 'train-dataset/iris.csv')])

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline

Question 199

You have a comma-separated values (CSV) file containing data from which you want to train a classification model.
You are using the Automated Machine Learning interface in Azure Machine Learning studio to train the classification model. You set the task type to Classification.
You need to ensure that the Automated Machine Learning process evaluates only linear models.

What should you do?

Add all algorithms other than linear ones to the blocked algorithms list.

Set the Exit criterion option to a metric score threshold.

Clear the option to perform automatic featurization.

Clear the option to enable deep learning.

Set the task type to Regression.

Answer is Clear the option to perform automatic featurization.

Automatic featurization can fit non-linear models.

Reference:
https://econml.azurewebsites.net/spec/estimation/dml.html
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-models

Question 200

You create a deep learning model for image recognition on Azure Machine Learning service using GPU-based training.
You must deploy the model to a context that allows for real-time GPU-based inferencing.
You need to configure compute resources for model inferencing.

Which compute type should you use?

Azure Container Instance

Azure Kubernetes Service

Field Programmable Gate Array

Machine Learning Compute

Answer is Azure Kubernetes Service

You can use Azure Machine Learning to deploy a GPU-enabled model as a web service. Deploying a model on Azure Kubernetes Service (AKS) is one option.
The AKS cluster provides a GPU resource that is used by the model for inference.
Inference, or model scoring, is the phase where the deployed model is used to make predictions. Using GPUs instead of CPUs offers performance advantages on highly parallelizable computation.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus

< Previous Page Next Page >

DP-100: Designing and Implementing a Data Science Solution on Azure

142 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam