Professional Data Engineer on Google Cloud Platform
68%
278 QUESTIONS AS TOTAL
Question 181
You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given day. The model will be based on multiple attributes collected from multiple sources. Telemetry data, including location in GeoJSON format, will be pulled from each ship and loaded every hour. You want to have a dashboard that shows how many and which ships are likely to cause delays within a region. You want to use a storage solution that has native functionality for prediction and geospatial processing.
Which storage solution should you use?
BigQuery
Cloud Bigtable
Cloud Datastore
Cloud SQL for PostgreSQL
Answer is BigQuery
Geospatial and ML functionality is with bigquery
Question 182
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set.
You want to increase the AUC of the model.
What should you do?
Perform hyperparameter tuning
Train a classifier with deep neural networks, because neural networks would always beat SVMs
Deploy the model and measure the real-world AUC; it's always higher because of generalization
Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC
Answer is Perform hyperparameter tuning
AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.
You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset.
Which database and data model should you choose?
Create a table in BigQuery, and append the new samples for CPU and memory to the table
Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second
Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.
Answer is Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
A tall and narrow table has a small number of events per row, which could be just one event, whereas a short and wide table has a large number of events per row. As explained in a moment, tall and narrow tables are best suited for time-series data.
For time series, you should generally use tall and narrow tables. This is for two reasons: Storing one event per row makes it easier to run queries against your data. Storing many events per row makes it more likely that the total row size will exceed the recommended maximum (see Rows can be big but are not infinite).
Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)
The wide model is used for memorization, while the deep model is used for generalization.
A good use for the wide and deep model is a recommender system.
The wide model is used for generalization, while the deep model is used for memorization.
A good use for the wide and deep model is a small-scale linear regression problem.
Answers are; The wide model is used for memorization, while the deep model is used for generalization.
B. A good use for the wide and deep model is a recommender system.
Wide model is used for memorization and deep model is used for generalization to make model think like human, both needs to be used to create a recommender system like search.
If you want to create a machine learning model that predicts the price of a particular stock based on its recent price history, what type of estimator should you use?
Unsupervised learning
Regressor
Classifier
Clustering estimator
Answer is Regressor
Regression is the supervised learning task to model and predict numerical value
Question 190
Suppose you have a dataset of images that are each labeled as to whether or not they contain a human face. To create a neural network that recognizes human faces in images using this labeled dataset, what approach would likely be the most effective?
Use K-means Clustering to detect faces in the pixels.
Use feature engineering to add features for eyes, noses, and mouths to the input data.
Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.
Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two categories.
Answer is Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.