Professional Data Engineer on Google Cloud Platform Certification Dump Questions Answers Examples

Professional Data Engineer on Google Cloud Platform

25%

Question 61

Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

Dataproc Worker

Dataproc Viewer

Dataproc Runner

Dataproc Editor

Answer is Dataproc Worker

Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role).

Reference:
https://cloud.google.com/dataproc/docs/concepts/service-accounts#important_notes
https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/service-accounts#service_account_requirements_and_limitations

Question 62

What are the minimum permissions needed for a service account used with Google Dataproc?

Execute to Google Cloud Storage; write to Google Cloud Logging

Write to Google Cloud Storage; read to Google Cloud Logging

Execute to Google Cloud Storage; execute to Google Cloud Logging

Read and write to Google Cloud Storage; write to Google Cloud Logging

Answer is Read and write to Google Cloud Storage; write to Google Cloud Logging

Service accounts authenticate applications running on your virtual machine instances to other Google Cloud Platform services. For example, if you write an application that reads and writes files on Google Cloud Storage, it must first authenticate to the Google Cloud Storage API. At a minimum, service accounts used with Cloud Dataproc need permissions to read and write to Google Cloud Storage, and to write to Google Cloud Logging.

Reference:
https://cloud.google.com/dataproc/docs/concepts/service-accounts#important_notes
https://www.googleapis.com/auth/cloud.useraccounts.readonly
https://www.googleapis.com/auth/devstorage.read_write
https://www.googleapis.com/auth/logging.write

Question 63

Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

Hive

Pig

YARN

Spark

Answers are; Hive, Pig and Spark

Cloud Dataproc provides out-of-the box and end-to-end support for many of the most popular job types, including Spark, Spark SQL, PySpark, MapReduce, Hive, and Pig jobs.

YARN is a resource manager.

Reference:
https://cloud.google.com/dataproc/docs/resources/faq#what_type_of_jobs_can_i_run

Question 64

By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?

Windows at every 100 MB of data

Single, Global Window

Windows at every 1 minute

Windows at every 10 minutes

Answer is Single, Global Window

Dataflow's default windowing behavior is to assign all elements of a PCollection to a single, global window, even for unbounded PCollections

Reference:
https://cloud.google.com/dataflow/model/pcollection

Question 65

Which of the following is not true about Dataflow pipelines?

Pipelines are a set of operations

Pipelines represent a data processing job

Pipelines represent a directed graph of steps

Pipelines can share data between instances

Answer is Pipelines can share data between instances

In the Dataflow SDKs, a pipeline represents a data processing job. You build a pipeline by writing a program using a Dataflow SDK. A pipeline consists of a set of operations that can read a source of input data, transform that data, and write out the resulting output. The data and transforms in a pipeline are unique to, and owned by, that pipeline. While your program can create multiple pipelines, pipelines cannot share data or transforms.

Reference:
https://cloud.google.com/dataflow/model/pipelines

Question 66

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

dataflow.worker

dataflow.compute

dataflow.developer

dataflow.viewer

Answer is dataflow.worker

The dataflow.worker role provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline

Reference:
https://cloud.google.com/dataflow/access-control

Question 67

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

PCollection

Transform

Pipeline

Sink API

Answer is Transform

In Google Cloud, the Dataflow SDK provides a transform component. It is responsible for the data processing operation. You can use conditional, for loops, and other complex programming structure to create a branching pipeline.

Reference:
https://cloud.google.com/dataflow/model/programming-model
https://cloud.google.com/dataflow/docs/concepts/beam-programming-model#concepts

Question 68

Which of the following is NOT true about Dataflow pipelines?

Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner

Dataflow pipelines can consume data from other Google Cloud services

Dataflow pipelines can be programmed in Java

Dataflow pipelines use a unified programming model, so can work both with streaming and batch data sources

Answer is Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner

Dataflow pipelines can also run on alternate runtimes like Spark and Flink, as they are built using the Apache Beam SDKs

Reference:
https://cloud.google.com/dataflow/

Question 69

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.

Which Google database service should you use?

Cloud SQL

BigQuery

Cloud Bigtable

Cloud Datastore

Answer is Cloud Datastore

It is a fully managed and serverless solution that allows for transactions and will autoscale (storage and compute) without the need to manage any infrastructure.
A is wrong: Cloud SQL is fully a managed transactional DB, but only the storage grows automatically. As your user base increases, you will need to increase the CPU/memory of the instance, and to do that you must edit the instance manually (and the questions specifically says "you do not want to manage infrastructure scaling")
B is wrong: Bigquery is OLAP (for analytics). NoOps, fully managed, autoscales and allows transactions, but it is not designed for this use case.
C is wrong: Bigtable is a NoSQL database for massive writes, and to scale (storage and CPU) you must add nodes, so it is completely out of this use case.

Reference:
https://cloud.google.com/datastore/docs/concepts/overview

Question 70

You work for a large bank that operates in locations throughout North America. You are setting up a data storage system that will handle bank account transactions. You require ACID compliance and the ability to access data with SQL.

Which solution is appropriate?

Store transaction data in Cloud Spanner. Enable stale reads to reduce latency.

Store transaction in Cloud Spanner. Use locking read-write transactions.

Store transaction data in BigQuery. Disabled the query cache to ensure consistency.

Store transaction data in Cloud SQL. Use a federated query BigQuery for analysis.

Answer is Store transaction in Cloud Spanner. Use locking read-write transactions.

Since the banking transaction system requires ACID compliance and SQL access to the data, Cloud Spanner is the most appropriate solution. Unlike Cloud SQL, Cloud Spanner natively provides ACID transactions and horizontal scalability.

Enabling stale reads in Spanner (option A) would reduce data consistency, violating the ACID compliance requirement of banking transactions.

BigQuery (option C) does not natively support ACID transactions or SQL writes which are necessary for a banking transactions system.

Cloud SQL (option D) provides ACID compliance but does not scale horizontally like Cloud Spanner can to handle large transaction volumes.

By using Cloud Spanner and specifically locking read-write transactions, ACID compliance is ensured while providing fast, horizontally scalable SQL processing of banking transactions.

Reference:
https://cloud.google.com/blog/topics/developers-practitioners/your-google-cloud-database-options-explained

< Previous Page Next Page >

Professional Data Engineer on Google Cloud Platform

278 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam