DP-100: Designing and Implementing a Data Science Solution on Azure
57%
142 QUESTIONS AS TOTAL
Question 111
You construct a machine learning experiment via Azure Machine Learning Studio.
You would like to split data into two separate datasets.
Which of the following actions should you take?
You should make use of the Split Data module.
You should make use of the Group Categorical Values module.
You should make use of the Clip Values module.
You should make use of the Group Data into Bins module.
Answer is You should make use of the Split Data module.
The Split Data module is specifically designed for dividing a dataset into multiple parts. It allows you to specify the ratio or proportion of data to allocate to each resulting dataset. By configuring the Split Data module, you can split your data into two separate datasets based on your desired split ratio, such as 70% for training and 30% for testing.
You have been tasked with creating a new Azure pipeline via the Machine Learning designer.
You have to makes sure that the pipeline trains a model using data in a comma-separated values (CSV) file that is published on a website. A dataset for the file for this file does not exist.
Data from the CSV file must be ingested into the designer pipeline with the least amount of administrative effort as possible.
Which of the following actions should you take?
You should make use of the Convert to TXT module.
You should add the Copy Data object to the pipeline.
You should add the Import Data object to the pipeline.
You should add the Dataset object to the pipeline.
Answer is You should add the Import Data object to the pipeline.
The Import Data module in Azure Machine Learning designer allows you to read data from various data sources, including web URLs, and import it directly into your pipeline. By configuring the Import Data object with the URL of the CSV file, you can easily bring the data into the pipeline for further processing.
You should add the Import Data object to the pipeline. However, if you prefer to create the dataset manually, then you could use the Dataset object.
You are in the process of creating a machine learning model. Your dataset includes rows with null and missing values.
You plan to make use of the Clean Missing Data module in Azure Machine Learning Studio to detect and fix the null and missing values in the dataset.
Recommendation: You make use of the Replace with median option.
Will the requirements be satisfied?
Yes
No
Answer is Yes
We simply dont have enough information about the dataset to know if Median substitution will work, what type of data it is? so the answer is No.
Using the "Replace with median" option in the Clean Missing Data module in Azure Machine Learning Studio can help satisfy the requirements of dealing with null and missing values in your machine learning dataset. The median is a suitable option for replacing missing values in numerical features because it's less sensitive to outliers compared to the mean.
By choosing Yes option, the module will identify columns with missing values and replace those missing values with the median value of each respective column. This can help maintain the integrity of your dataset and ensure that your machine learning model receives meaningful input data.
However, keep in mind that the choice of replacement strategy can also depend on the nature of your data and the specific requirements of your machine learning problem. It's always a good practice to assess the impact of different imputation methods on your model's performance to find the best strategy for your particular case.
You are in the process of creating a machine learning model. Your dataset includes rows with null and missing values.
You plan to make use of the Clean Missing Data module in Azure Machine Learning Studio to detect and fix the null and missing values in the dataset.
Recommendation: You make use of the Custom substitution value option.
Will the requirements be satisfied?
Yes
No
Answer is Yes
Custom substitution value: Use this option to specify a placeholder value (such as a 0 or NA) that applies to all missing values. The value that you specify as a replacement must be compatible with the data type of the column.
You are in the process of creating a machine learning model. Your dataset includes rows with null and missing values.
You plan to make use of the Clean Missing Data module in Azure Machine Learning Studio to detect and fix the null and missing values in the dataset.
Recommendation: You make use of the Remove entire row option.
Will the requirements be satisfied?
Yes
No
Answer is Yes
Remove entire row: Completely removes any row in the dataset that has one or more missing values. This is useful if the missing value can be considered randomly missing.
You need to consider the underlined segment to establish whether it is accurate.
To transform a categorical feature into a binary indicator, you should make use of the Clean Missing Data module.
Select `No adjustment required` if the underlined segment is accurate. If the underlined segment is inaccurate, select the accurate option.
No adjustment required.
Convert to Indicator Values
Apply SQL Transformation
Group Categorical Values
Answer is Convert to Indicator Values
Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.
You need to consider the underlined segment to establish whether it is accurate.
To improve the amount of low incidence cases in a dataset, you should make use of the SMOTE module.
Select `No adjustment required` if the underlined segment is accurate. If the underlined segment is inaccurate, select the accurate option.
No adjustment required.
Remove Duplicate Rows
Join Data
Edit Metadata
Answer is No adjustment required.
This article describes how to use the SMOTE component in Azure Machine Learning designer to increase the number of underrepresented cases in a dataset that's used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.
You connect the SMOTE component to a dataset that's imbalanced. There are many reasons why a dataset might be imbalanced. For example, the category you're targeting might be rare in the population, or the data might be difficult to collect. Typically, you use SMOTE when the class that you want to analyze is underrepresented.
Complete the sentence by selecting the correct option in the answer area.
Probabilistic PCA
Median
SMOTE
Custom substitution value
Answer is Probabilistic PCA
Replace using Probabilistic PCA: Replaces the missing values by using a linear model that analyzes the correlations between the columns and estimates a low-dimensional approximation of the data, from which the full data is reconstructed. The underlying dimensionality reduction is a probabilistic form of Principal Component Analysis (PCA), and it implements a variant of the model proposed in the Journal of the Royal Statistical Society, Series B 21(3), 611–622 by Tipping and Bishop.
Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns.
You are in the process of constructing a regression model.
You would like to make it a Poisson regression model. To achieve your goal, the feature values need to meet certain conditions.
Which of the following are relevant conditions with regards to the label data? (Select two)
It must be whole numbers.
It must be a negative value.
It must be fractions.
It must be non-discrete.
It must be a poitive value.
Answers are;
It must be whole numbers.
It must be a poitive value.
Poisson regression is intended for use in regression models that are used to predict numeric values, typically counts. Therefore, you should use this module to create your regression model only if the values you are trying to predict fit the following conditions:
- The response variable has a Poisson distribution.
- Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels.
- A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non-whole numbers.