DP-100: Designing and Implementing a Data Science Solution on Azure
67%
142 QUESTIONS AS TOTAL
Question 131
You build a binary classification model using the Azure Machine Learning Studio Two-Class Neural Network module.
You are preparing to configure the Tune Model Hyperparameters module for the purpose of tuning accuracy for the model.
Which of the following are valid parameters for the Two-Class Neural Network module?
Depth of the tree
Random number seed
Optimization tolerance
The initial learning weights diameter
Lambda
Number of learning iterations
Project to the unit-sphere
Answers are;
Random number seed
The initial learning weights diameter
Number of learning iterations
You are in the process of constructing a deep convolutional neural network (CNN). The CNN will be used for image classification.
You notice that the CNN model you constructed displays hints of overfitting.
You want to make sure that overfitting is minimized, and that the model is converged to an optimal fit.
Which of the following is TRUE with regards to achieving your goal?
You have to add an additional dense layer with 512 input units, and reduce the amount of training data.
You have to add L1/L2 regularization, and reduce the amount of training data.
You have to reduce the amount of training data and make use of training data augmentation.
You have to add L1/L2 regularization, and make use of training data augmentation.
You have to add an additional dense layer with 512 input units, and add L1/L2 regularization.
Answer is You have to add L1/L2 regularization, and make use of training data augmentation.
You have to add L1/L2 regularization, and make use of training data augmentation.
When a deep CNN model displays hints of overfitting, it means that the model is too complex and has learned to fit the training data too closely. One way to minimize overfitting is to add regularization to the model, which adds a penalty term to the loss function, encouraging the model to choose simpler solutions.
L1/L2 regularization adds a penalty term to the loss function that discourages the model from using large weights in the network. This has the effect of reducing the complexity of the model and can help prevent overfitting.
Data augmentation is another effective technique to minimize overfitting. It involves applying random transformations to the training data, such as random rotations or translations, to create new training examples that are similar to the original ones. This helps the model to generalize better to unseen data.
You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual Machine (DSVM).
What should you recommend?
Rattle
TensorFlow
Weka
Scikit-learn
Answer is TensorFlow
TensorFlow is an open-source library for numerical computation and large-scale machine learning. It uses Python to provide a convenient front-end API for building applications with the framework.
TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence- to-sequence models for machine translation, natural language processing, and PDE (partial differential equation) based simulations.
Incorrect Answers:
A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.
C: Weka is used for visual data mining and machine learning software in Java.
D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.
You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.
Which module should you use?
Split Data
Load Trained Model
Assign Data to Clusters
Group Data into Bins
Answer is Split Data
The Split Data module in Azure Machine Learning Studio can be used to divide a dataset into two distinct datasets. The module allows you to specify the fraction of the dataset that you want to include in each of the two datasets.
The other modules you mentioned are not used to divide data into two distinct datasets. The Load Trained Model module is used to load a trained machine learning model into Azure Machine Learning Studio. The Assign Data to Clusters module is used to assign data points to clusters. The Group Data into Bins module is used to group data points into bins.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Calculate the column median value and use the median value as the replacement for any missing value in the column.
Does the solution meet the goal?
Yes
No
Answer is Yes
You need to analyze a full dataset; just means you can't drop the rows or the columns. Replacing missing data with the median may increase the cardinality but dimensionality is only increased by adding new feature columns. Median replacement is a valid method in this case. The answer should be "Yes".
You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation.
You have already configured a k parameter as the number of splits. You now have to configure the k parameter for the cross-validation with the usual value choice.
Recommendation: You configure the use of the value k=3.
Will the requirements be satisfied?
Yes
No
Answer is No
The recommendation to use k=3 is a common practice in k-fold cross-validation, but it may not necessarily satisfy the requirements in every case. It depends on the specific requirements of the task and the characteristics of the data sample.
For example, if the data sample is small, using k=3 may not provide enough training data for the model to learn from, resulting in a high variance in the evaluation metric. In this case, a larger value of k may be more appropriate. On the other hand, if the data sample is very large, using k=3 may result in a low bias but high variance, in which case a smaller value of k may be more appropriate.
Therefore, it's important to consider the specific requirements and characteristics of the task and data sample when choosing the value of k for k-fold cross-validation. In general, the recommendation to use k=3 is a good starting point, but it may not always be the best choice.
Question 137
You have recently concluded the construction of a binary classification machine learning model.
You are currently assessing the model. You want to make use of a visualization that allows for precision to be used as the measurement for the assessment.
Which of the following actions should you take?
You should consider using Venn diagram visualization.
You should consider using Receiver Operating Characteristic (ROC) curve visualization.
You should consider using Box plot visualization.
You should consider using the Binary classification confusion matrix visualization.
Answer is You should consider using the Binary classification confusion matrix visualization.
You can get the precision number without any further calculations in a confusion matrix. You cannot visualize precision with ROC. True Positive Rate(on ROC's y axis) = Recall. Not precision. PR curve is used to visualize precision.
You have been tasked with ascertaining if two sets of data differ considerably. You will make use of Azure Machine Learning Studio to complete your task.
You plan to perform a paired t-test.
Which of the following are conditions that must apply to use a paired t-test? (Choose all that apply.)
All scores are independent from each other.
You have a matched pairs of scores.
The sampling distribution of d is normal.
The sampling distribution of x1- x2 is normal.
Answers are;
B. You have a matched pairs of scores.
C. The sampling distribution of d is normal.
Choose a paired t-test when these conditions apply:
1. You have a matched pairs of scores. For example, you might have two different measures per person, or matched pairs of individuals (such as a husband and wife).
2. Each pair of scores is independent of every other pair.
3.The sampling distribution of d is normal.
A is for single sample t-test, D is for unpaired t-test, BC are for paired t-test
The finance team asks you to train a model using data in an Azure Storage blob container named finance-data.
You need to register the container as a datastore in an Azure Machine Learning workspace and ensure that an error will be raised if the container does not exist.
How should you complete the code?
Check the answer section
Box 1: register_azure_blob_container
Register an Azure Blob Container to the datastore.
Box 2: create_if_not_exists = False
Create the file share if it does not exist, defaults to False.
We are not creating a container, only registering it and we need an error message if it does not exist. If we set "create_if_not_exists" to true, it will not display the error message but create the container and we dont want that.
An organization uses Azure Machine Learning service and wants to expand their use of machine learning.
You have the following compute environments. The organization does not want to create another compute environment.
You need to determine which compute environment to use for the following scenarios.
Which compute types should you use? To answer, drag the appropriate compute environments to the correct scenarios. Each compute environment may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
nb_server
aks_cluster
mlc_cluster
Answer is 1.mlc_cluster, 2.aks_cluster
Machine Learning Compute Cluster supports integration with AML designer training pipeline, and Azure Kubernetes Service supports integration with AML Designer.