What are the minimum permissions needed for a service account used with Google Dataproc?
Execute to Google Cloud Storage; write to Google Cloud Logging
Write to Google Cloud Storage; read to Google Cloud Logging
Execute to Google Cloud Storage; execute to Google Cloud Logging
Read and write to Google Cloud Storage; write to Google Cloud Logging
Answer is Read and write to Google Cloud Storage; write to Google Cloud Logging
Service accounts authenticate applications running on your virtual machine instances to other Google Cloud Platform services. For example, if you write an application that reads and writes files on Google Cloud Storage, it must first authenticate to the Google Cloud Storage API. At a minimum, service accounts used with Cloud Dataproc need permissions to read and write to Google Cloud Storage, and to write to Google Cloud Logging.
Which of the following job types are supported by Cloud Dataproc (select 3 answers)?
Hive
Pig
YARN
Spark
Answers are; Hive, Pig and Spark
Cloud Dataproc provides out-of-the box and end-to-end support for many of the most popular job types, including Spark, Spark SQL, PySpark, MapReduce, Hive, and Pig jobs.
Which of the following is not true about Dataflow pipelines?
Pipelines are a set of operations
Pipelines represent a data processing job
Pipelines represent a directed graph of steps
Pipelines can share data between instances
Answer is Pipelines can share data between instances
In the Dataflow SDKs, a pipeline represents a data processing job. You build a pipeline by writing a program using a Dataflow SDK. A pipeline consists of a set of operations that can read a source of input data, transform that data, and write out the resulting output. The data and transforms in a pipeline are unique to, and owned by, that pipeline. While your program can create multiple pipelines, pipelines cannot share data or transforms.
You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?
PCollection
Transform
Pipeline
Sink API
Answer is Transform
In Google Cloud, the Dataflow SDK provides a transform component. It is responsible for the data processing operation. You can use conditional, for loops, and other complex programming structure to create a branching pipeline.
You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.
Which Google database service should you use?
Cloud SQL
BigQuery
Cloud Bigtable
Cloud Datastore
Answer is Cloud Datastore
It is a fully managed and serverless solution that allows for transactions and will autoscale (storage and compute) without the need to manage any infrastructure.
A is wrong: Cloud SQL is fully a managed transactional DB, but only the storage grows automatically. As your user base increases, you will need to increase the CPU/memory of the instance, and to do that you must edit the instance manually (and the questions specifically says "you do not want to manage infrastructure scaling")
B is wrong: Bigquery is OLAP (for analytics). NoOps, fully managed, autoscales and allows transactions, but it is not designed for this use case.
C is wrong: Bigtable is a NoSQL database for massive writes, and to scale (storage and CPU) you must add nodes, so it is completely out of this use case.
You work for a large bank that operates in locations throughout North America. You are setting up a data storage system that will handle bank account transactions. You require ACID compliance and the ability to access data with SQL.
Which solution is appropriate?
Store transaction data in Cloud Spanner. Enable stale reads to reduce latency.
Store transaction in Cloud Spanner. Use locking read-write transactions.
Store transaction data in BigQuery. Disabled the query cache to ensure consistency.
Store transaction data in Cloud SQL. Use a federated query BigQuery for analysis.
Answer is Store transaction in Cloud Spanner. Use locking read-write transactions.
Since the banking transaction system requires ACID compliance and SQL access to the data, Cloud Spanner is the most appropriate solution. Unlike Cloud SQL, Cloud Spanner natively provides ACID transactions and horizontal scalability.
Enabling stale reads in Spanner (option A) would reduce data consistency, violating the ACID compliance requirement of banking transactions.
BigQuery (option C) does not natively support ACID transactions or SQL writes which are necessary for a banking transactions system.
Cloud SQL (option D) provides ACID compliance but does not scale horizontally like Cloud Spanner can to handle large transaction volumes.
By using Cloud Spanner and specifically locking read-write transactions, ACID compliance is ensured while providing fast, horizontally scalable SQL processing of banking transactions.