Professional Data Engineer on Google Cloud Platform Certification Dump Questions Answers Examples

Professional Data Engineer on Google Cloud Platform

68%

Question 181

You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given day. The model will be based on multiple attributes collected from multiple sources. Telemetry data, including location in GeoJSON format, will be pulled from each ship and loaded every hour.
You want to have a dashboard that shows how many and which ships are likely to cause delays within a region. You want to use a storage solution that has native functionality for prediction and geospatial processing.

Which storage solution should you use?

BigQuery

Cloud Bigtable

Cloud Datastore

Cloud SQL for PostgreSQL

Answer is BigQuery

Geospatial and ML functionality is with bigquery

Question 182

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set.
You want to increase the AUC of the model.

What should you do?

Perform hyperparameter tuning

Train a classifier with deep neural networks, because neural networks would always beat SVMs

Deploy the model and measure the real-world AUC; it's always higher because of generalization

Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Answer is Perform hyperparameter tuning

AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.

Reference:
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc?hl=en

Question 183

You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset.

Which database and data model should you choose?

Create a table in BigQuery, and append the new samples for CPU and memory to the table

Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second

Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second

Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.

Answer is Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second

A tall and narrow table has a small number of events per row, which could be just one event, whereas a short and wide table has a large number of events per row. As explained in a moment, tall and narrow tables are best suited for time-series data.

For time series, you should generally use tall and narrow tables. This is for two reasons: Storing one event per row makes it easier to run queries against your data. Storing many events per row makes it more likely that the total row size will exceed the recommended maximum (see Rows can be big but are not infinite).

Reference:
https://cloud.google.com/bigtable/docs/schema-design-time-series#patterns_for_row_key_design

Question 184

Why do you need to split a machine learning dataset into training data and test data?

So you can try two different sets of features

To make sure your model is generalized for more than just the training data

To allow you to create unit tests in your code

So you can use one dataset for a wide model and one for a deep model

Answer is To make sure your model is generalized for more than just the training data

Splitting data ensures that the model created works with test data (similar features)

Question 185

The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types of cluster nodes?

Workers

Masters, workers, and parameter servers

Workers and parameter servers

Parameter servers

Answer is Workers and parameter servers

Workers and parameter servers. You can't choose the number of master node.

Reference:
https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers

Question 186

Which software libraries are supported by Cloud Machine Learning Engine?

Theano and TensorFlow

Theano and Torch

TensorFlow

TensorFlow and Torch

Answer is TensorFlow

Cloud Machine learning supports Tensflow, scikit=learn & XGBoost Reference https://cloud.google.com/ai-platform/docs/ml-solutions-overview#code_your_model
https://cloud.google.com/ai-platform/training/docs/overview

Question 187

Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)

The wide model is used for memorization, while the deep model is used for generalization.

A good use for the wide and deep model is a recommender system.

The wide model is used for generalization, while the deep model is used for memorization.

A good use for the wide and deep model is a small-scale linear regression problem.

Answers are;
The wide model is used for memorization, while the deep model is used for generalization.
B. A good use for the wide and deep model is a recommender system.

Wide model is used for memorization and deep model is used for generalization to make model think like human, both needs to be used to create a recommender system like search.

Reference:
https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html

Question 188

To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what would your command start with?

gcloud ml-engine local train

gcloud ml-engine jobs submit training

gcloud ml-engine jobs submit training local

You can't run a TensorFlow program on your own computer using Cloud ML Engine .

Answer is gcloud ml-engine local train

Train model in production like environment using Cloud ML engine

Reference:
https://cloud.google.com/sdk/gcloud/reference/ml-engine/local/train

Question 189

If you want to create a machine learning model that predicts the price of a particular stock based on its recent price history, what type of estimator should you use?

Unsupervised learning

Regressor

Classifier

Clustering estimator

Answer is Regressor

Regression is the supervised learning task to model and predict numerical value

Question 190

Suppose you have a dataset of images that are each labeled as to whether or not they contain a human face. To create a neural network that recognizes human faces in images using this labeled dataset, what approach would likely be the most effective?

Use K-means Clustering to detect faces in the pixels.

Use feature engineering to add features for eyes, noses, and mouths to the input data.

Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.

Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two categories.

Answer is Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.

Multiple layer to detect face

< Previous Page Next Page >

Professional Data Engineer on Google Cloud Platform

278 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam