Professional Data Engineer on Google Cloud Platform Certification Dump Questions Answers Examples

Professional Data Engineer on Google Cloud Platform

89%

Question 241

Google Cloud Bigtable indexes a single value in each row. This value is called the _______.

primary key

unique key

row key

master key

Answer is row key

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data.
A single value in each row is indexed; this value is known as the row key.

Reference:
https://cloud.google.com/bigtable/docs/overview

Question 242

Cloud Bigtable is a recommended option for storing very large amounts of ____________________________?

multi-keyed data with very high latency

multi-keyed data with very low latency

single-keyed data with very low latency

single-keyed data with very high latency

Answer is single-keyed data with very low latency

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data.
A single value in each row is indexed; this value is known as the row key. Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.

Reference:
https://cloud.google.com/bigtable/docs/overview#storage-model

Question 243

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

500 TB

1 GB

1 TB

500 GB

Answer is 1 TB

Cloud Bigtable is not a relational database. It does not support SQL queries, joins, or multi-row transactions. It is not a good solution for less than 1 TB of data.

Reference:
https://cloud.google.com/bigtable/docs/overview#title_short_and_other_storage_options

Question 244

Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?

You expect to store at least 10 TB of data.

You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.

You need to integrate with Google BigQuery.

You will not use the data to back a user-facing or latency-sensitive application.

Answer is

HDD storage is suitable for use cases that meet the following criteria:

You expect to store at least 10 TB of data.
You will not use the data to back a user-facing or latency-sensitive application.
Your workload falls into one of the following categories:

Batch workloads with scans and writes, and no more than occasional random reads of a small number of rows or point reads.
Data archival, where you write very large amounts of data and rarely read that data.

Reference:
https://cloud.google.com/bigtable/docs/choosing-ssd-hdd#use-cases-hdd

Question 245

When you design a Google Cloud Bigtable schema it is recommended that you _________.

Avoid schema designs that are based on NoSQL concepts

Create schema designs that are based on a relational database design

Avoid schema designs that require atomicity across rows

Create schema designs that require atomicity across rows

Answer is Avoid schema designs that require atomicity across rows

All operations are atomic at the row level. For example, if you update two rows in a table, it's possible that one row will be updated successfully and the other update will fail. Avoid schema designs that require atomicity across rows.

Reference:
https://cloud.google.com/bigtable/docs/schema-design#row-keys

Question 246

Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

Field promotion

Randomization

Salting

Hashing

Answer is Field promotion

By default, prefer field promotion. Field promotion avoids hotspotting in almost all cases, and it tends to make it easier to design a row key that facilitates queries.

Reference:
https://cloud.google.com/bigtable/docs/schema-design-time-series#ensure_that_your_row_key_avoids_hotspotting

Question 247

Which is not a valid reason for poor Cloud Bigtable performance?

The workload isn't appropriate for Cloud Bigtable.

The table's schema is not designed correctly.

The Cloud Bigtable cluster has too many nodes.

There are issues with the network connection.

Answer is The Cloud Bigtable cluster has too many nodes.

The Cloud Bigtable cluster doesn't have enough nodes. If your Cloud Bigtable cluster is overloaded, adding more nodes can improve performance. Use the monitoring tools to check whether the cluster is overloaded.

Reference:
https://cloud.google.com/bigtable/docs/performance

Question 248

When a Cloud Bigtable node fails, ____ is lost.

all data

no data

the last transaction

the time dimension

Answer is no data

A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Tablets are stored on Colossus, Google's file system, in SSTable format. Each tablet is associated with a specific Cloud Bigtable node.
Data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:
Rebalancing tablets from one node to another is very fast, because the actual data is not copied. Cloud Bigtable simply updates the pointers for each node.
Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.
When a Cloud Bigtable node fails, no data is lost

Reference:
https://cloud.google.com/bigtable/docs/overview

Question 249

Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

A sequential numeric ID

A timestamp followed by a stock symbol

A non-sequential numeric ID

A stock symbol followed by a timestamp

Answers are;
A sequential numeric ID
A timestamp followed by a stock symbol

Using a timestamp as the first element of a row key can cause a variety of problems.
In brief, when a row key for a time series includes a timestamp, all of your writes will target a single node; fill that node; and then move onto the next node in the cluster, resulting in hotspotting.
Suppose your system assigns a numeric ID to each of your application's users. You might be tempted to use the user's numeric ID as the row key for your table.
However, since new users are more likely to be active users, this approach is likely to push most of your traffic to a small number of nodes.

Reference:
https://cloud.google.com/bigtable/docs/schema-design-time-series#ensure_that_your_row_key_avoids_hotspotting

Question 250

For the best possible performance, what is the recommended zone for your Compute Engine instance and Cloud Bigtable instance?

Have the Compute Engine instance in the furthest zone from the Cloud Bigtable instance.

Have both the Compute Engine instance and the Cloud Bigtable instance to be in different zones.

Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.

Have the Cloud Bigtable instance to be in the same zone as all of the consumers of your data.

Answer is Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.

It is recommended to create your Compute Engine instance in the same zone as your Cloud Bigtable instance for the best possible performance,
If it's not possible to create a instance in the same zone, you should create your instance in another zone within the same region. For example, if your Cloud
Bigtable instance is located in us-central1-b, you could create your instance in us-central1-f. This change may result in several milliseconds of additional latency for each Cloud Bigtable request.
It is recommended to avoid creating your Compute Engine instance in a different region from your Cloud Bigtable instance, which can add hundreds of milliseconds of latency to each Cloud Bigtable request.

Reference:
https://cloud.google.com/bigtable/docs/creating-compute-instance

< Previous Page Next Page >

Professional Data Engineer on Google Cloud Platform

278 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam