DP-100: Designing and Implementing a Data Science Solution on Azure Certification Dump Questions Answers Examples

DP-100: Designing and Implementing a Data Science Solution on Azure

67%

Question 131

You build a binary classification model using the Azure Machine Learning Studio Two-Class Neural Network module.
You are preparing to configure the Tune Model Hyperparameters module for the purpose of tuning accuracy for the model.

Which of the following are valid parameters for the Two-Class Neural Network module?

Depth of the tree

Random number seed

Optimization tolerance

The initial learning weights diameter

Lambda

Number of learning iterations

Project to the unit-sphere

Answers are;
Random number seed
The initial learning weights diameter
Number of learning iterations

According the Microsoft's Azure documentation regarding 2-class NN's, Random Seed is a valid parameter. Referrence: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/two-class-neural-network?view=azureml-api-2#how-to-configure

Question 132

You are in the process of constructing a deep convolutional neural network (CNN). The CNN will be used for image classification.
You notice that the CNN model you constructed displays hints of overfitting.
You want to make sure that overfitting is minimized, and that the model is converged to an optimal fit.

Which of the following is TRUE with regards to achieving your goal?

You have to add an additional dense layer with 512 input units, and reduce the amount of training data.

You have to add L1/L2 regularization, and reduce the amount of training data.

You have to reduce the amount of training data and make use of training data augmentation.

You have to add L1/L2 regularization, and make use of training data augmentation.

You have to add an additional dense layer with 512 input units, and add L1/L2 regularization.

Answer is You have to add L1/L2 regularization, and make use of training data augmentation.

You have to add L1/L2 regularization, and make use of training data augmentation.

When a deep CNN model displays hints of overfitting, it means that the model is too complex and has learned to fit the training data too closely. One way to minimize overfitting is to add regularization to the model, which adds a penalty term to the loss function, encouraging the model to choose simpler solutions.

L1/L2 regularization adds a penalty term to the loss function that discourages the model from using large weights in the network. This has the effect of reducing the complexity of the model and can help prevent overfitting.

Data augmentation is another effective technique to minimize overfitting. It involves applying random transformations to the training data, such as random rotations or translations, to create new training examples that are similar to the original ones. This helps the model to generalize better to unseen data.

Reference:
https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d

Question 133

You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual Machine (DSVM).

What should you recommend?

Rattle

TensorFlow

Weka

Scikit-learn

Answer is TensorFlow

TensorFlow is an open-source library for numerical computation and large-scale machine learning. It uses Python to provide a convenient front-end API for building applications with the framework.
TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence- to-sequence models for machine translation, natural language processing, and PDE (partial differential equation) based simulations.

Incorrect Answers:
A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.
C: Weka is used for visual data mining and machine learning software in Java.
D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

Reference:
https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html

Question 134

You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.

Which module should you use?

Split Data

Load Trained Model

Assign Data to Clusters

Group Data into Bins

Answer is Split Data

The Split Data module in Azure Machine Learning Studio can be used to divide a dataset into two distinct datasets. The module allows you to specify the fraction of the dataset that you want to include in each of the two datasets.

The other modules you mentioned are not used to divide data into two distinct datasets. The Load Trained Model module is used to load a trained machine learning model into Azure Machine Learning Studio. The Assign Data to Clusters module is used to assign data points to clusters. The Group Data into Bins module is used to group data points into bins.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data

Question 135

You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.

Solution: Calculate the column median value and use the median value as the replacement for any missing value in the column.

Does the solution meet the goal?

Yes

Answer is Yes

You need to analyze a full dataset; just means you can't drop the rows or the columns. Replacing missing data with the median may increase the cardinality but dimensionality is only increased by adding new feature columns. Median replacement is a valid method in this case. The answer should be "Yes".


1 | 5 | - | - | 7 | 3 |

- | 0 | 2 | 2 | 7 | 4 |

2 | 6 | 9 | - | 2 | - |

3 | - | - | - | 7 | - |

would lead to


1 | 5 | 4 | 4 | 7 | 3 |

2 | 0 | 2 | 2 | 7 | 4 |

2 | 6 | 9 | 4 | 2 | 4 |

3 | 5 | 5 | 5 | 7 | 5 |

Hence
- missing data cleaned
- dimensionality preserved

Question 136

You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation.
You have already configured a k parameter as the number of splits. You now have to configure the k parameter for the cross-validation with the usual value choice.

Recommendation: You configure the use of the value k=3.

Will the requirements be satisfied?

Yes

Answer is No

The recommendation to use k=3 is a common practice in k-fold cross-validation, but it may not necessarily satisfy the requirements in every case. It depends on the specific requirements of the task and the characteristics of the data sample.

For example, if the data sample is small, using k=3 may not provide enough training data for the model to learn from, resulting in a high variance in the evaluation metric. In this case, a larger value of k may be more appropriate. On the other hand, if the data sample is very large, using k=3 may result in a low bias but high variance, in which case a smaller value of k may be more appropriate.

Therefore, it's important to consider the specific requirements and characteristics of the task and data sample when choosing the value of k for k-fold cross-validation. In general, the recommendation to use k=3 is a good starting point, but it may not always be the best choice.

Question 137

You have recently concluded the construction of a binary classification machine learning model.
You are currently assessing the model. You want to make use of a visualization that allows for precision to be used as the measurement for the assessment.

Which of the following actions should you take?

You should consider using Venn diagram visualization.

You should consider using Receiver Operating Characteristic (ROC) curve visualization.

You should consider using Box plot visualization.

You should consider using the Binary classification confusion matrix visualization.

Answer is You should consider using the Binary classification confusion matrix visualization.

You can get the precision number without any further calculations in a confusion matrix. You cannot visualize precision with ROC. True Positive Rate(on ROC's y axis) = Recall. Not precision. PR curve is used to visualize precision.

Reference:
https://builtin.com/data-science/precision-and-recall
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#confusion-matrix

Question 138

You have been tasked with ascertaining if two sets of data differ considerably. You will make use of Azure Machine Learning Studio to complete your task. You plan to perform a paired t-test.

Which of the following are conditions that must apply to use a paired t-test? (Choose all that apply.)

All scores are independent from each other.

You have a matched pairs of scores.

The sampling distribution of d is normal.

The sampling distribution of x1- x2 is normal.

Answers are;
B. You have a matched pairs of scores.
C. The sampling distribution of d is normal.

Choose a paired t-test when these conditions apply:
1. You have a matched pairs of scores. For example, you might have two different measures per person, or matched pairs of individuals (such as a husband and wife).
2. Each pair of scores is independent of every other pair.
3.The sampling distribution of d is normal.

A is for single sample t-test, D is for unpaired t-test, BC are for paired t-test

Reference:
https://docs.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/test-hypothesis-using-t-test

Question 139

The finance team asks you to train a model using data in an Azure Storage blob container named finance-data.
You need to register the container as a datastore in an Azure Machine Learning workspace and ensure that an error will be raised if the container does not exist.

How should you complete the code?

Check the answer section

Box 1: register_azure_blob_container
Register an Azure Blob Container to the datastore.

Box 2: create_if_not_exists = False
Create the file share if it does not exist, defaults to False.

We are not creating a container, only registering it and we need an error message if it does not exist. If we set "create_if_not_exists" to true, it will not display the error message but create the container and we dont want that.

Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore

Question 140

An organization uses Azure Machine Learning service and wants to expand their use of machine learning.
You have the following compute environments. The organization does not want to create another compute environment.

You need to determine which compute environment to use for the following scenarios.

Which compute types should you use? To answer, drag the appropriate compute environments to the correct scenarios. Each compute environment may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

nb_server

aks_cluster

mlc_cluster

Answer is 1.mlc_cluster, 2.aks_cluster

Machine Learning Compute Cluster supports integration with AML designer training pipeline, and Azure Kubernetes Service supports integration with AML Designer.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance

< Previous Page Next Page >

DP-100: Designing and Implementing a Data Science Solution on Azure

142 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam