DP-100: Designing and Implementing a Data Science Solution on Azure Certification Dump Questions Answers Examples

DP-100: Designing and Implementing a Data Science Solution on Azure

86%

Question 171

You are creating a binary classification by using a two-class logistic regression model.
You need to evaluate the model results for imbalance.

Which evaluation metric should you use?

Relative Absolute Error

AUC Curve

Mean Absolute Error

Relative Squared Error

Accuracy

Root Mean Square Error

Answer is AUC Curve

One can inspect the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is to the upper left corner, the better the classifier's performance is (that is maximizing the true positive rate while minimizing the false positive rate). Curves that are close to the diagonal of the plot, result from classifiers that tend to make predictions that are close to random guessing.

References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance#evaluating-a-binary-classification-model

Question 172

You are developing a linear regression model in Azure Machine Learning Studio. You run an experiment to compare different algorithms.
The following image displays the results dataset output:

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the image.

A - A

A - C

A - E

C - B

C - C

D - E

D - F

F - C

Answer is C - C

Box 1: Boosted Decision Tree Regression
Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better.

Box 2:
Online Gradient Descent: If you want the algorithm to find the best parameters for you, set Create trainer mode option to Parameter Range. You can then specify multiple values for the algorithm to try.

References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regression

Question 173

You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal to 10.
You need to select the bias and variance properties of the model with varying tree depth values.

Which properties should you select for each tree depth?

A - B - G - J

A - E - H - K

B - F - G - L

B - D - I - J

C - D - H - K

C - E - I - L

A - B - B - A

D - F - B - K

Answer is A - B - B - A

In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has low bias and high variance.
Note: In statistics and machine learning, the bias""variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the variance will decrease the bias.

References:
https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

Question 174

You have a model with a large difference between the training and validation error values.
You must create a new model and perform cross-validation.
You need to identify a parameter set for the new model using Azure Machine Learning Studio.

Which module you should use for each step?

A - B - C - D

A - D - C - B

B - A - C - D

B - C - A - D

C - A - B - D

C - B - D - A

D - B - A - C

D - C - B - A

Answer is D - B - A - C

Box 1: Split data

Box 2: Partition and Sample

Box 3: Two-Class Boosted Decision Tree

Box 4: Tune Model Hyperparameters
Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a "best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed.
We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified parameters. Use Tune Model Hyperparameters to identify the optimal parameters.

References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

Question 175

You are using C-Support Vector classification to do a multi-class classification with an unbalanced training dataset. The C-Support Vector classification using Python code shown below:

from sklearn.svm import svc
import numpy as np
svc = SVC(kernel = 'linear', class_weight = 'balanced', c-1.0, random-state-0)
model1 = svc.fit(X_train, y)

You need to evaluate the C-Support Vector classification code.

Which evaluation statement should you use?

A - A

A - B

A - C

B - A

B - B

B - C

C - A

C - B

Answer is C - A

Box 1: Automatically adjust weights inversely proportional to class frequencies in the input data
The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

Box 2: Penalty parameter
Parameter: C : float, optional (default=1.0)
Penalty parameter C of the error term.

References:
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Question 176

You are evaluating a Python NumPy array that contains six data points defined as follows: data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn machine learning library: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.

How should you complete the code segment?

A - B - D

A - C - B

B - D - B

B - C - A

C - B - A

C - B - B

D - C - A

D - B - C

Answer is B - C - A

Box 1: k-fold

Box 2: 3
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.

Box 3: data
Example: Example:

>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]

References:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

Question 177

You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two datasets.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.

A - A

A - C

B - A

B - D

C - A

C - C

D - C

D - D

Answer is B - A

Box 1: Positive skew
Positive skew values means the distribution is skewed to the right.

Box 2: Negative skew
Negative skewness values mean the distribution is skewed to the left.

References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-elementary-statistics

Question 178

You are with a time series dataset in Azure Machine Learning Studio.
You need to split your dataset into training and testing subsets by using the Split Data module.

Which splitting mode should you use?

Recommender Split

Regular Expression Split

Relative Expression Split

Split Rows with the Randomized split parameter set to true

Answer is Split Rows

Use this option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50.

Incorrect Answers:
Regular Expression Split: Choose this option when you want to divide your dataset by testing a single column for a value.
Relative Expression Split: Use this option whenever you want to apply a condition to a number column.

References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data

Question 179

You are performing a classification task in Azure Machine Learning Studio. You must prepare balanced testing and training samples based on a provided data set.
You need to split the data with a 0.75:0.25 ratio.
Which value should you use for each parameter?

A - A - B - B

A -C - A - B

A - A - A - B

B - A - A - B

B - C - A - A

C - D - A - B

D - C - A - B

D - D - B - B

Answer is A - A - A - B

Box 1: Split rows
Use the Split Rows option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50.
You can also randomize the selection of rows in each group, and use stratified sampling. In stratified sampling, you must select a single column of data for which you want values to be apportioned equally among the two result datasets.

Box 2: 0.75
If you specify a number as a percentage, or if you use a string that contains the "%" character, the value is interpreted as a percentage. All percentage values must be within the range (0, 100), not including the values 0 and 100.

Box 3: Yes
To ensure splits are balanced.

Box 4: No
If you use the option for a stratified split, the output datasets can be further divided by subgroups, by selecting a strata column.

Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data

Question 180

You are tuning a hyperparameter for an algorithm. The following table shows a data set with different hyperparameter, training error, and validation errors.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic.

Question

Answer Choise

Which H value should you select based on the data?

What H value displays the poorest training result?

A - A

B - A

C - D

D - E

E - A

A - B

B - C

C - E

Answer is D - E

Box 1: 4
Choose the one which has lower training and validation error and also the closest match.
Minimize variance (difference between validation error and train error).

Box 2: 5
Minimize variance (difference between validation error and train error).

Reference:
https://medium.com/comet-ml/organizing-machine-learning-projects-project-management-guidelines-2d2b85651bbd

< Previous Page Next Page >

DP-100: Designing and Implementing a Data Science Solution on Azure

142 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam