You are making use of the Azure Machine Learning to designer construct an experiment.
After dividing a dataset into training and testing sets, you configure the algorithm to be Two-Class Boosted Decision Tree.
You are preparing to ascertain the Area Under the Curve (AUC).
Which of the following is a sequential combination of the models required to achieve your goal?
Train, Score, Evaluate.
Score, Evaluate, Train.
Evaluate, Export Data, Train.
Train, Score, Export Data.
Answer is Train, Score, Evaluate.
Question 104
A coworker registers a datastore in a Machine Learning services workspace by using the following code:
You need to write code to access the datastore from a notebook.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
Check the answer section
Box 1: DataStore
To get a specific datastore registered in the current workspace, use the get() static method on the Datastore class: # Get a named datastore from the current workspace
datastore = Datastore.get(ws, datastore_name='your datastore name')
Box 2: ws
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file is stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:
At the end of each month, a new folder with that month's sales file is added to the sales folder.
You plan to use the sales data to train a machine learning model based on the following requirements:
- You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.
- You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.
- You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace.
What should you do?
Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
Answer is Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
A: *replaces* the the existing dataset -> can't directly filter data before the specific month
B: captures all the sales data from different folders in *one dataset* -> can't can't directly filter data before the specific month
C: requires registering multiple datasets
You create an Azure Machine Learning compute target named ComputeOne by using the STANDARD_D1 virtual machine image.
ComputeOne is currently idle and has zero active nodes.
You define a Python variable named ws that references the Azure Machine Learning workspace. You run the following Python code:
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
Check the answer section
Answer is No - Yes - Yes
The code given actually represents the compute creation, if there was no computeOne then the code will create one but as in this case there is already a computeOne, Exception block will not be executed.
Complete the sentence by selecting the correct option in the answer area.
CSV
DOCX
ARFF
TXT
Answer is ARFF
Weka, a popular machine learning and data mining software, supports the ARFF (Attribute-Relation File Format) as its standard data format. The ARFF format is a plain text file format that allows you to define the attributes and data instances in a structured manner.
Use the Convert to ARFF module in Azure Machine Learning Studio, to convert datasets and results in Azure Machine Learning to the attribute-relation file format used by the Weka toolset.
The ARFF data specification for Weka supports multiple machine learning tasks, including data preprocessing, classification, and feature selection. In this format, data is organized by entities and their attributes, and is contained in a single text file.
You have been tasked with designing a deep learning model, which accommodates the most recent edition of Python, to recognize language.
You have to include a suitable deep learning framework in the Data Science Virtual Machine (DSVM).
Which of the following actions should you take?
You should consider including Rattle.
You should consider including TensorFlow.
You should consider including Theano.
You should consider including Chainer.
Answer is You should consider including TensorFlow.
TensorFlow is an open-source software library for data flow and differentiable programming across a range of tasks. It was developed by the Google Brain team and is used for building and training machine learning models, particularly neural networks. TensorFlow provides a flexible and efficient platform for implementing machine learning algorithms, and supports a variety of programming languages including Python, C++, and Java. The library provides a comprehensive set of tools and functionality, including visualization and debugging tools, and it has a strong community of developers and users contributing to its ongoing development. TensorFlow is widely used in both academia and industry for various applications such as image classification, natural language processing, and reinforcement learning.
You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation.
You have already configured a k parameter as the number of splits. You now have to configure the k parameter for the cross-validation with the usual value choice.
Recommendation: You configure the use of the value k=10.
Will the requirements be satisfied?
Yes
No
Answer is Yes
k=10 is a common choice for the number of folds in k-fold cross-validation, and it can satisfy the requirement to evaluate the model on a partial data sample. However, the appropriate value for k may depend on the size of the dataset and the desired level of accuracy. In some cases, a value other than 10 might be more appropriate.