DP-100: Designing and Implementing a Data Science Solution on Azure
62%
142 QUESTIONS AS TOTAL
Question 121
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Join Data Azure Machine Learning Studio module.
Will the requirements be satisfied?
Yes
No
Answer is No
The Join Data module in Azure Machine Learning Studio is used to combine two datasets by matching values in key columns. It is not a general-purpose tool for adding features or filling column values in a single dataset.
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Edit Metadata Azure Machine Learning Studio module.
Will the requirements be satisfied?
Yes
No
Answer is No
"Edit Meta Data" as the name is saying, is for editing the meta data, you can select the columns that you want to change, you can change their type e.g. numerical to categorial, then you can choose if they are features or labels etc.. you can also change the column name here too.
Edit meta data cannot add a new column. New column can be added through SQL transformation or python script.
You want to train a classification model using data located in a comma-separated values (CSV) file.
The classification model will be trained via the Automated Machine Learning interface using the Classification task type.
You have been informed that only linear models need to be assessed by the Automated Machine Learning.
Which of the following actions should you take?
You should disable deep learning.
You should enable automatic featurization.
You should disable automatic featurization.
You should set the task type to Forecasting.
Answer is You should enable automatic featurization.
Enabling automatic featurization will allow the Automated Machine Learning interface to automatically preprocess the CSV data and extract relevant features that are compatible with linear models. Automatic featurization can also handle missing values, categorical variables, and feature scaling, which can be time-consuming and error-prone if done manually.
Disabling deep learning (option A) may be necessary if the dataset is small or if the use of deep learning is not feasible or desired, but it is not relevant to the given scenario. Setting the task type to Forecasting (option D) is also not relevant since the task type has already been specified as Classification. Disabling automatic featurization (option C) may be appropriate if the CSV data has already been preprocessed and feature engineering has been performed manually, but it is not necessary if the CSV data is in its raw form.
Reference:
ChatGPT
Question 124
You are preparing to train a regression model via automated machine learning. The data available to you has features with missing values, as well as categorical features with little discrete values.
You want to make sure that automated machine learning is configured as follows:
- missing values must be automatically imputed.
- categorical features must be encoded as part of the training task.
Which of the following actions should you take?
You should make use of the featurization parameter with the 'auto' value pair.
You should make use of the featurization parameter with the 'off' value pair.
You should make use of the featurization parameter with the 'on' value pair.
You should make use of the featurization parameter with the 'FeaturizationConfig' value pair.
Answer is You should make use of the featurization parameter with the 'auto' value pair.
AUTO ML will handle the necessary preprocessing steps, such as imputing missing values and encoding categorical features, before training the regression model.
You make use of Azure Machine Learning Studio to develop a linear regression model. You perform an experiment to assess various algorithms.
Which of the following is an algorithm that reduces the variances between actual and predicted values?
Fast Forest Quantile Regression
Poisson Regression
Boosted Decision Tree Regression
Linear Regression
Answer is Linear Regression
Linear Regression is an algorithm that reduces the variances between actual and predicted values. Linear regression is a type of regression analysis that finds the linear relationship between a dependent variable and one or more independent variables. The goal of linear regression is to minimize the sum of the squared residuals (the differences between the predicted and actual values), which is a measure of the variance between the predicted and actual values. Therefore, linear regression aims to reduce the variances between actual and predicted values.
You have been tasked with constructing a machine learning model that translates language text into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Generative Adversarial Networks (GANs).
Will the requirements be satisfied?
Yes
No
Answer is No
Generative adversarial networks (GANs) are algorithmic architectures that use two neural networks, pitting one against the other (thus the “adversarial”) in order to generate new, synthetic instances of data that can pass for real data. They are used widely in image generation, video generation and voice generation.
You have been tasked with constructing a machine learning model that translates language text into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Recurrent Neural Networks (RNNs).
Will the requirements be satisfied?
Yes
No
Answer is Yes
Recurrent neural networks are a widely used artificial neural network. These networks save the output of a layer and feed it back to the input layer to help predict the layer's outcome. Recurrent neural networks have great learning abilities. They're widely used for complex tasks such as time series forecasting, learning handwriting, and recognizing language.
You have been tasked with evaluating the performance of a binary classification model that you created.
You need to choose evaluation metrics to achieve your goal.
Which of the following are the metrics you would choose?
Precision
Accuracy
Relative Squared Error
Coefficient of determination
Relative Absolute Error
Answers are; Precision
Accuracy
The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, F1 Score, and AUC.
Note: A very natural question is: Out of the individuals whom the model, how many were classified correctly (TP)?
This question can be answered by looking at the Precision of the model, which is the proportion of positives that are classified correctly.
You build a binary classification model using the Azure Machine Learning Studio Two-Class Neural Network module.
You are preparing to configure the Tune Model Hyperparameters module for the purpose of tuning accuracy for the model.
Which of the following are valid parameters for the Two-Class Neural Network module?
Depth of the tree
Random number seed
Optimization tolerance
The initial learning weights diameter
Lambda
Number of learning iterations
Project to the unit-sphere
Answers are;
Random number seed
The initial learning weights diameter
Number of learning iterations
You make use of Azure Machine Learning Studio to create a binary classification model.
You are preparing to carry out a parameter sweep of the model to tune hyperparameters. You have to make sure that the sweep allows for every possible combination of hyperparameters to be iterated. Also, the computing resources needed to carry out the sweep must be reduced.
Which of the following actions should you take?
You should consider making use of the Selective grid sweep mode.
You should consider making use of the Measured grid sweep mode.
You should consider making use of the Entire grid sweep mode.
You should consider making use of the Random grid sweep mode.
Answer is You should consider making use of the Random grid sweep mode.
The Random grid sweep mode randomly selects combinations of hyperparameters to test, reducing the number of total combinations and the computing resources needed to carry out the sweep. This method can still provide a good understanding of the relationship between hyperparameters and model performance, but may require multiple runs to converge on the optimal hyperparameters.
This option also controls the number of iterations over a random sampling of parameter values, but the values are not generated randomly from the specified range; instead, a matrix is created of all possible combinations of parameter values and a random sampling is taken over the matrix. This method is more efficient and less prone to regional oversampling or undersampling.
If you are training a model that supports an integrated parameter sweep, you can also set a range of seed values to use and iterate over the random seeds as well. This is optional, but can be useful for avoiding bias introduced by seed selection.
C: Entire grid: When you select this option, the module loops over a grid predefined by the system, to try different combinations and identify the best learner. This option is useful for cases where you don't know what the best parameter settings might be and want to try all possible combination of values.