You have an Apache Spark DataFrame named temperatures. A sample of the data is shown in the following table.
You need to produce the following
table by using a Spark SQL query.
How should you complete the query?
Box 1: PIVOT
PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output. And PIVOT runs aggregations where they're required on any remaining column values that are wanted in the final output.
Incorrect Answers: UNPIVOT carries out the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.
Box 2: CAST
If you want to convert an integer value to a DECIMAL data type in SQL Server use the CAST() function. Example:
SELECT - CAST(12 AS DECIMAL(7,2) ) AS decimal_value; Here is the result: decimal_value 12.00
You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers. The file contains two attributes named FirstName and LastName.
You need to copy the data from the JSON file to an Azure Synapse Analytics
table by using Azure Databricks. A new column must be created that concatenates the FirstName and LastName values.
You create the following components:
● A destination table in Azure Synapse
● An Azure Blob storage container
●
A service principal
Which five actions should you perform in sequence next in is Databricks notebook?
1) mount onto DBFS
2) read into data frame
3) transform data frame
4) specify temporary folder
5) write to table in SQL data warehouse About temporary folder
You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by IoT devices in the JSON format.
You need to modify the job to accept data generated
by the IoT devices in the Protobuf format.
Which three actions should you perform from Visual Studio on sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select
and Place:
1 Add an Azure Stream Analytics Customer Deserializer Project(.net) project to the Solution
2 Add .net deseriliaizer Code to ProtoBuf to customer deserializer project
3. Change the event Serialization format to protobuf in the input.json File of the job and reference the DLL.
You are responsible for providing access to an Azure Data Lake Storage Gen2 account.
Your user account has contributor access to the storage account, and you have the application ID and access key.
You plan to use PolyBase
to load data into an enterprise data warehouse in Azure Synapse Analytics.
You need to configure PolyBase to connect the data warehouse to storage account.
Which three components should you create in sequence?
1.- A database scoped credential
2.- an External data sorce
3.- a external file format
Question 395
AdventureWorks sells bicycles and bicycle parts directly to customers and distributors. The company currently has a single office in the Netherlands, and have been selling bicycles in the United States, Germany and Spain through a chain of distributors and through online sales on its website. The fulfillment of delivery is done by local distribution centers.
The company is planning to expand by establishing new offices because the sales growth in these countries has been increasing over the last 3 years. The location are:
Tokyo, Japan
Seattle, USA
Chicago, USA
Berlin, Germany
Barcelona, Spain
Paris, France
In a highly competitive market, in which AdventureWorks has been in business for the last 15 years, it wants to become the most innovative bicycle company, providing both current and future bicycle owners with best in class technology and service that provides unique experiences.
The Research and Development department of AdventureWorks has successfully conceived the next wave of innovative products, and they are relying on Data Engineers, AI Engineers and Data Scientists to assist with both the design and implementation of the solution.
Given the increased level of sales and expansion at global scale, the existing data infrastructure won't meet the overall business requirements or the future growth that AdventureWorks aspires to. The Chief Information and Technology Officers have expressed the desire to abandon existing on-premises systems and move to the cloud to meet the growth expected. This is supported by the CFO as there has been a request for replacement hardware as the existing infrastructure comes to its end of life. The CFO is aware that the cloud could offer alternatives that are more cost efficient.
As a Senior Data Engineer, you will assist AdventureWorks in the solution design and implementation to meet the business, functional and technical requirements that the company has set forth to be successful for growth, expansion, and innovation strategies. You will execute this in a way that minimizes operational costs and can be monitored for effectiveness.
In a discovery workshop you ascertained the following information:
AdventureWorks Website
The web developers at AdventureWorks are transferring the existing website from an on-premises instance of IIS, to an Azure Web App. They have requested that a data store is made available that will hold the images of the products that are sold on the website.
Current Sales / Ordering system
The current software on which bicycle purchases are tracked, is a web-based application which directly stores order information into an on-premises SQL Server database named AdventureWorks2012. The current application is deployed with high-availability provided by SQL Server 2012 Always-on Availability groups. Due to global expansion and data governance requirements, AdventureWorks will transition this system to better serve their customers and will be looking for global availability of its application and data sales and ordering purposes, particularly during the months of November and December when demand for bikes grow ahead of the holiday period.
Data Analysis
The business reporting is currently being provided by a single on-premises database that is configured as a data warehouse, it holds a database named AdventureWorksDW which is used to provide historical reporting and descriptive analytics. In recent times, that server has been struggling to process the reporting data in a timely manner, as a result the organization has evaluated the data warehouse capabilities of Azure Synapse Analytics and want to migrate their on-premises data to this platform. Your team should ensure that access to the data is restricted.
In addition, AdventureWorks would like to take their data analytics further and start to utilize predictive analytics capabilities. This is currently not an activity that is undertaken. The organization understands that a recommendation or a text analytics engine could be built and would like you to direct them on what would be the best technology and approach to take in implementing such a solution that is also resilient and performant.
You are also assessing the tooling that can help with the extraction, load and transforming of data into the data warehouse, and have asked a Data Engineer within your team to show a proof of concept of Azure Data Factory to explore the transformation capabilities of the product
Customer Service / Presales
Customer service and pre-sales departments are currently experiencing scale issues due to the high call volumes. The organization wants to support the customer services staff in handling the call volumes through the implementation of chat bots in which future bicycle owners can:
Find which bicycle is best for them:
Through a set of questions with the chat bot, custom recommendations are given to potential bike owners, who then can take the recommendation and place an order, or can be redirect to a sales specialist to help them with their needs
Check status on current orders:
Retrieve status on current orders, and estimated delivery times
Find bicycle parts suitable for their existing bicycle:
Existing bicycle owners can find recommended bicycle parts and accessories based on the serial number or model number of their bicycle
Existing bicycle owners, can upload a picture of their bicycle or take a picture of the serial number of their bicycle to assist with the identification of their bicycle and have recommended bicycle parts
Over the last few years the customer services departments have observed an increase in calls from fraudulent customer who are asking for support for bikes that are no longer in warranty, or bikes that have not even been purchased at AdventureWorks. The department are currently relying on the experience of customer services agents to identify this. As a result, they would like to implement a system that can help the agents track in real-time who could be making a fradulent claim.
Finally, given its global expansion, the customer service / presales chat bot needs to respond to requests for data in near real-time regardless of where the customer is located. The chatbot should also support multiple languages such as Dutch, German, French, English, Spanish, and Japanese. This work will be handled by the AI Engineers, but they have requested a platform is provided by the Data Engineer that enables them to store conversation history.
Social Media Analysis
In recent years, the marketing department at the organization have run a wide variety of twitter campaigns at various times of the year. They are keen to measure the impact of their work by tracking social media assets such as hashtags during those campaigns. They would like to have the capability of tracking any hashtag of any name.
Connected bicycle
AdventureWorks Bicycles can be equipped with an innovate built-in bicycle computer which consist of automatic locking features of the bicycle, as well as operational status. Information captured by this bicycle computer includes:
Bicycle model, serial number and registered owner
Bicycle location (latitude longitude)
Current status (stationary, in motion)
Current speed in kilometers per hours
Bicycle Locked / Unlocked
Bicycle parts and components information (on electrical bicycles)
First party and 3rd party applications can have access the information of the bicycle computer that must be secure and for the integration into mobile applications and real time display of location and bike ride sharing information.
Furthermore, daily summary data can be saved to flat files that include Bicycle model, serial number, registered owner and a summary of the total miles cycled per day and the average speed.
Bicycle Maintenance services
Existing bicycle owners can opt in to getting notifications on when their bicycle needs repair, based on:
Telemetry from electrical bicycle based on sensor data
Bicycle usage information coming from the built-in bicycle computers based on average mileage / wear and tear
This predictive maintenance scenario is a service in which bike owners can opt-in, offered as a paid service.
Finally, all services that are proposed should have a comprehensive business continuity that meets the corporate objective of minimizes restore times when recovering the data for a given service.
Which requirement is likely to be the easiest to implement from a data engineering perspective?
Social Media Analysis
Connected Bicycle
AdventureWorks Website
Answer is AdventureWorks Website
The AdventureWorks Website requirement is the need to create a data store to host images. This can be achieved using an Azure Blob store. The Connected Bicycle is a more complex requirement as it will be dealing with Streaming Data from a bicycle. The Social Media Analysis is another complex requirement as it requires the collection of streaming data of an application
Question 396
AdventureWorks sells bicycles and bicycle parts directly to customers and distributors. The company currently has a single office in the Netherlands, and have been selling bicycles in the United States, Germany and Spain through a chain of distributors and through online sales on its website. The fulfillment of delivery is done by local distribution centers.
The company is planning to expand by establishing new offices because the sales growth in these countries has been increasing over the last 3 years. The location are:
Tokyo, Japan
Seattle, USA
Chicago, USA
Berlin, Germany
Barcelona, Spain
Paris, France
In a highly competitive market, in which AdventureWorks has been in business for the last 15 years, it wants to become the most innovative bicycle company, providing both current and future bicycle owners with best in class technology and service that provides unique experiences.
The Research and Development department of AdventureWorks has successfully conceived the next wave of innovative products, and they are relying on Data Engineers, AI Engineers and Data Scientists to assist with both the design and implementation of the solution.
Given the increased level of sales and expansion at global scale, the existing data infrastructure won't meet the overall business requirements or the future growth that AdventureWorks aspires to. The Chief Information and Technology Officers have expressed the desire to abandon existing on-premises systems and move to the cloud to meet the growth expected. This is supported by the CFO as there has been a request for replacement hardware as the existing infrastructure comes to its end of life. The CFO is aware that the cloud could offer alternatives that are more cost efficient.
As a Senior Data Engineer, you will assist AdventureWorks in the solution design and implementation to meet the business, functional and technical requirements that the company has set forth to be successful for growth, expansion, and innovation strategies. You will execute this in a way that minimizes operational costs and can be monitored for effectiveness.
In a discovery workshop you ascertained the following information:
AdventureWorks Website
The web developers at AdventureWorks are transferring the existing website from an on-premises instance of IIS, to an Azure Web App. They have requested that a data store is made available that will hold the images of the products that are sold on the website.
Current Sales / Ordering system
The current software on which bicycle purchases are tracked, is a web-based application which directly stores order information into an on-premises SQL Server database named AdventureWorks2012. The current application is deployed with high-availability provided by SQL Server 2012 Always-on Availability groups. Due to global expansion and data governance requirements, AdventureWorks will transition this system to better serve their customers and will be looking for global availability of its application and data sales and ordering purposes, particularly during the months of November and December when demand for bikes grow ahead of the holiday period.
Data Analysis
The business reporting is currently being provided by a single on-premises database that is configured as a data warehouse, it holds a database named AdventureWorksDW which is used to provide historical reporting and descriptive analytics. In recent times, that server has been struggling to process the reporting data in a timely manner, as a result the organization has evaluated the data warehouse capabilities of Azure Synapse Analytics and want to migrate their on-premises data to this platform. Your team should ensure that access to the data is restricted.
In addition, AdventureWorks would like to take their data analytics further and start to utilize predictive analytics capabilities. This is currently not an activity that is undertaken. The organization understands that a recommendation or a text analytics engine could be built and would like you to direct them on what would be the best technology and approach to take in implementing such a solution that is also resilient and performant.
You are also assessing the tooling that can help with the extraction, load and transforming of data into the data warehouse, and have asked a Data Engineer within your team to show a proof of concept of Azure Data Factory to explore the transformation capabilities of the product
Customer Service / Presales
Customer service and pre-sales departments are currently experiencing scale issues due to the high call volumes. The organization wants to support the customer services staff in handling the call volumes through the implementation of chat bots in which future bicycle owners can:
Find which bicycle is best for them:
Through a set of questions with the chat bot, custom recommendations are given to potential bike owners, who then can take the recommendation and place an order, or can be redirect to a sales specialist to help them with their needs
Check status on current orders:
Retrieve status on current orders, and estimated delivery times
Find bicycle parts suitable for their existing bicycle:
Existing bicycle owners can find recommended bicycle parts and accessories based on the serial number or model number of their bicycle
Existing bicycle owners, can upload a picture of their bicycle or take a picture of the serial number of their bicycle to assist with the identification of their bicycle and have recommended bicycle parts
Over the last few years the customer services departments have observed an increase in calls from fraudulent customer who are asking for support for bikes that are no longer in warranty, or bikes that have not even been purchased at AdventureWorks. The department are currently relying on the experience of customer services agents to identify this. As a result, they would like to implement a system that can help the agents track in real-time who could be making a fradulent claim.
Finally, given its global expansion, the customer service / presales chat bot needs to respond to requests for data in near real-time regardless of where the customer is located. The chatbot should also support multiple languages such as Dutch, German, French, English, Spanish, and Japanese. This work will be handled by the AI Engineers, but they have requested a platform is provided by the Data Engineer that enables them to store conversation history.
Social Media Analysis
In recent years, the marketing department at the organization have run a wide variety of twitter campaigns at various times of the year. They are keen to measure the impact of their work by tracking social media assets such as hashtags during those campaigns. They would like to have the capability of tracking any hashtag of any name.
Connected bicycle
AdventureWorks Bicycles can be equipped with an innovate built-in bicycle computer which consist of automatic locking features of the bicycle, as well as operational status. Information captured by this bicycle computer includes:
Bicycle model, serial number and registered owner
Bicycle location (latitude longitude)
Current status (stationary, in motion)
Current speed in kilometers per hours
Bicycle Locked / Unlocked
Bicycle parts and components information (on electrical bicycles)
First party and 3rd party applications can have access the information of the bicycle computer that must be secure and for the integration into mobile applications and real time display of location and bike ride sharing information.
Furthermore, daily summary data can be saved to flat files that include Bicycle model, serial number, registered owner and a summary of the total miles cycled per day and the average speed.
Bicycle Maintenance services
Existing bicycle owners can opt in to getting notifications on when their bicycle needs repair, based on:
Telemetry from electrical bicycle based on sensor data
Bicycle usage information coming from the built-in bicycle computers based on average mileage / wear and tear
This predictive maintenance scenario is a service in which bike owners can opt-in, offered as a paid service.
Finally, all services that are proposed should have a comprehensive business continuity that meets the corporate objective of minimizes restore times when recovering the data for a given service.
Which data platform technology could be used to implement the predictive analytics capabilities that AdventureWorks desires?
Azure Stream Analytics
Azure Databricks
Azure Storage Account
Answer is Azure Databricks
Azure Databricks is a server-less platform optimized for Microsoft Azure which provides one-click setup, streamlined workflows and an interactive workspace for Spark-based Machine learning applications.
Azure Stream Analytics will ingest streaming data from applications or IoT devices and gateways into an event hub or an Internet of Things (IoT) hub in real-time. At which point the event or IoT hub will stream the data into Stream Analytics for real-time analysis.
Azure Storage offers a massively scalable object store for data objects and file system services for the cloud. Creating a storage account as a Blob means that you cannot query the data directly.
Question 397
AdventureWorks sells bicycles and bicycle parts directly to customers and distributors. The company currently has a single office in the Netherlands, and have been selling bicycles in the United States, Germany and Spain through a chain of distributors and through online sales on its website. The fulfillment of delivery is done by local distribution centers.
The company is planning to expand by establishing new offices because the sales growth in these countries has been increasing over the last 3 years. The location are:
Tokyo, Japan
Seattle, USA
Chicago, USA
Berlin, Germany
Barcelona, Spain
Paris, France
In a highly competitive market, in which AdventureWorks has been in business for the last 15 years, it wants to become the most innovative bicycle company, providing both current and future bicycle owners with best in class technology and service that provides unique experiences.
The Research and Development department of AdventureWorks has successfully conceived the next wave of innovative products, and they are relying on Data Engineers, AI Engineers and Data Scientists to assist with both the design and implementation of the solution.
Given the increased level of sales and expansion at global scale, the existing data infrastructure won't meet the overall business requirements or the future growth that AdventureWorks aspires to. The Chief Information and Technology Officers have expressed the desire to abandon existing on-premises systems and move to the cloud to meet the growth expected. This is supported by the CFO as there has been a request for replacement hardware as the existing infrastructure comes to its end of life. The CFO is aware that the cloud could offer alternatives that are more cost efficient.
As a Senior Data Engineer, you will assist AdventureWorks in the solution design and implementation to meet the business, functional and technical requirements that the company has set forth to be successful for growth, expansion, and innovation strategies. You will execute this in a way that minimizes operational costs and can be monitored for effectiveness.
In a discovery workshop you ascertained the following information:
AdventureWorks Website
The web developers at AdventureWorks are transferring the existing website from an on-premises instance of IIS, to an Azure Web App. They have requested that a data store is made available that will hold the images of the products that are sold on the website.
Current Sales / Ordering system
The current software on which bicycle purchases are tracked, is a web-based application which directly stores order information into an on-premises SQL Server database named AdventureWorks2012. The current application is deployed with high-availability provided by SQL Server 2012 Always-on Availability groups. Due to global expansion and data governance requirements, AdventureWorks will transition this system to better serve their customers and will be looking for global availability of its application and data sales and ordering purposes, particularly during the months of November and December when demand for bikes grow ahead of the holiday period.
Data Analysis
The business reporting is currently being provided by a single on-premises database that is configured as a data warehouse, it holds a database named AdventureWorksDW which is used to provide historical reporting and descriptive analytics. In recent times, that server has been struggling to process the reporting data in a timely manner, as a result the organization has evaluated the data warehouse capabilities of Azure Synapse Analytics and want to migrate their on-premises data to this platform. Your team should ensure that access to the data is restricted.
In addition, AdventureWorks would like to take their data analytics further and start to utilize predictive analytics capabilities. This is currently not an activity that is undertaken. The organization understands that a recommendation or a text analytics engine could be built and would like you to direct them on what would be the best technology and approach to take in implementing such a solution that is also resilient and performant.
You are also assessing the tooling that can help with the extraction, load and transforming of data into the data warehouse, and have asked a Data Engineer within your team to show a proof of concept of Azure Data Factory to explore the transformation capabilities of the product
Customer Service / Presales
Customer service and pre-sales departments are currently experiencing scale issues due to the high call volumes. The organization wants to support the customer services staff in handling the call volumes through the implementation of chat bots in which future bicycle owners can:
Find which bicycle is best for them:
Through a set of questions with the chat bot, custom recommendations are given to potential bike owners, who then can take the recommendation and place an order, or can be redirect to a sales specialist to help them with their needs
Check status on current orders:
Retrieve status on current orders, and estimated delivery times
Find bicycle parts suitable for their existing bicycle:
Existing bicycle owners can find recommended bicycle parts and accessories based on the serial number or model number of their bicycle
Existing bicycle owners, can upload a picture of their bicycle or take a picture of the serial number of their bicycle to assist with the identification of their bicycle and have recommended bicycle parts
Over the last few years the customer services departments have observed an increase in calls from fraudulent customer who are asking for support for bikes that are no longer in warranty, or bikes that have not even been purchased at AdventureWorks. The department are currently relying on the experience of customer services agents to identify this. As a result, they would like to implement a system that can help the agents track in real-time who could be making a fradulent claim.
Finally, given its global expansion, the customer service / presales chat bot needs to respond to requests for data in near real-time regardless of where the customer is located. The chatbot should also support multiple languages such as Dutch, German, French, English, Spanish, and Japanese. This work will be handled by the AI Engineers, but they have requested a platform is provided by the Data Engineer that enables them to store conversation history.
Social Media Analysis
In recent years, the marketing department at the organization have run a wide variety of twitter campaigns at various times of the year. They are keen to measure the impact of their work by tracking social media assets such as hashtags during those campaigns. They would like to have the capability of tracking any hashtag of any name.
Connected bicycle
AdventureWorks Bicycles can be equipped with an innovate built-in bicycle computer which consist of automatic locking features of the bicycle, as well as operational status. Information captured by this bicycle computer includes:
Bicycle model, serial number and registered owner
Bicycle location (latitude longitude)
Current status (stationary, in motion)
Current speed in kilometers per hours
Bicycle Locked / Unlocked
Bicycle parts and components information (on electrical bicycles)
First party and 3rd party applications can have access the information of the bicycle computer that must be secure and for the integration into mobile applications and real time display of location and bike ride sharing information.
Furthermore, daily summary data can be saved to flat files that include Bicycle model, serial number, registered owner and a summary of the total miles cycled per day and the average speed.
Bicycle Maintenance services
Existing bicycle owners can opt in to getting notifications on when their bicycle needs repair, based on:
Telemetry from electrical bicycle based on sensor data
Bicycle usage information coming from the built-in bicycle computers based on average mileage / wear and tear
This predictive maintenance scenario is a service in which bike owners can opt-in, offered as a paid service.
Finally, all services that are proposed should have a comprehensive business continuity that meets the corporate objective of minimizes restore times when recovering the data for a given service.
Which data platform technology could be used to help AdventureWorks scale globally?
Azure Data Factory
Azure Data Catalog
Azure Cosmos DB
Answer is Azure Cosmos DB
Azure Cosmos DB is a globally distributed, multi-model database that can offer sub second query performance. Azure Data Factory (ADF) is a cloud integration service that orchestrates that movement of data between various data stores. Azure Data Catalog is a single, central place for all of an organization's users to contribute their knowledge and build a community and culture of data sources that are owned by an organization.
Question 398
Mike is the data engineer for Contoso and has a Data Warehouse created with a database named Crystal. Within the database is a table named DimSuppliers. The suppliers' information is stored in a single text file named Suppliers.txt and is 1200MB in size. It is currently stored in a container with an Azure Blob store. Your Azure Synapse Analytics is configured as Gen 2 DW30000c. How can Mike maximize the performance of the data load?
Increase the Gen 2 DWU units.
Split the text file into 60 files of 20MB each.
Use Gen 1 DW6000.
Answer is Split the text file into 60 files of 20MB each.
Split the text file into 60 files of 20MB each. eparating the single text file of Suppliers.txt into 60 files can take advantage of the fact that Gen 2 DW30000c uses 60 compute nodes and the parallelism of the data load can be evenly spread for quicker performance. Increasing the Gen 2 DWU units will not work as Gen 2 DW30000c is the highest limit and cannot be scaled beyond this. Gen 1 compute nodes has less power than Gen 2 compute nodes and will not improve the performance.
Question 399
Background
Proseware, Inc, develops and manages a product named Poll Taker. The product is used for delivering public opinion polling and analysis.
Polling data comes from a variety of sources, including online surveys, house-to-house interviews, and booths at public events.
Polling data
Polling data is stored in one of the two locations:
An on-premises Microsoft SQL Server 2019 database named PollingData Azure Data Lake Gen 2
Data in Data Lake is queried by using PolyBase
Poll metadata
Each poll has associated metadata with information about the poll including the date and number of respondents. The data is stored as JSON.
Phone-based polling
Security
Phone-based poll data must only be uploaded by authorized users from authorized devices
Contractors must not have access to any polling data other than their own
Access to polling data must set on a per-active directory user basis
Data migration and loading
All data migration processes must use Azure Data Factory
All data migrations must run automatically during non-business hours
Data migrations must be reliable and retry when needed
Performance
After six months, raw polling data should be moved to a storage account. The storage must be available in the event of a regional disaster. The solution must minimize costs.
Deployments
All deployments must be performed by using Azure DevOps. Deployments must use templates used in multiple environments
No credentials or secrets should be used during deployments
Reliability
All services and processes must be resilient to a regional Azure outage.
Monitoring
All Azure services must be monitored by using Azure Monitor. On-premises SQL Server performance must be monitored.
You need to ensure that phone-based poling data can be analyzed in the PollingData database.
How should you configure Azure Data Factory?
Use a tumbling schedule trigger
Use an event-based trigger
Use a schedule trigger
Use manual execution
Answer is Use a schedule trigger
When creating a schedule trigger, you specify a schedule (start date, recurrence, end date etc.) for the trigger, and associate with a Data Factory pipeline.
Scenario:
All data migration processes must use Azure Data Factory
All data migrations must run automatically during non-business hours
Litware, Inc, is an international car racing and manufacturing company that has 1,000 employees. Most employees are located in Europe. The company supports racing teams that complete in a worldwide racing series.
Physical Locations
Litware has two main locations: a main office in London, England, and a manufacturing plant in Berlin, Germany.
During each race weekend, 100 engineers set up a remote portable office by using a VPN to connect the datacentre in the London office. The portable office is set up and torn down in approximately 20 different countries each year.
Existing environment
Race Central
During race weekends, Litware uses a primary application named Race Central. Each car has several sensors that send real-time telemetry data to the London datacentre. The data is used for real-time tracking of the cars.
Race Central also sends batch updates to an application named Mechanical Workflow by using Microsoft SQL Server Integration Services (SSIS).
The telemetry data is sent to a MongoDB database. A custom application then moves the data to databases in SQL Server 2017. The telemetry data in MongoDB has more than 500 attributes. The application changes the attribute names when the data is moved to SQL Server 2017.
The database structure contains both OLAP and OLTP databases.
Mechanical Workflow
Mechanical Workflow is used to track changes and improvements made to the cars during their lifetime.
Currently, Mechanical Workflow runs on SQL Server 2017 as an OLAP system.
Mechanical Workflow has a named Table1 that is 1 TB. Large aggregations are performed on a single column of Table 1.
Requirements
Planned Changes
Litware is the process of rearchitecting its data estate to be hosted in Azure. The company plans to decommission the London datacentre and move all its applications to an Azure datacentre.
Technical Requirements
Litware identifies the following technical requirements:
Data collection for Race Central must be moved to Azure Cosmos DB and Azure SQL Database. The data must be written to the Azure datacentre closest to each race and must converge in the least amount of time.
The query performance of Race Central must be stable, and the administrative time it takes to perform optimizations must be minimized.
The datacentre for Mechanical Workflow must be moved to Azure SQL data Warehouse.
Transparent data encryption (IDE) must be enabled on all data stores, whenever possible.
An Azure Data Factory pipeline must be used to move data from Cosmos DB to SQL Database for Race Central. If the data load takes longer than 20 minutes, configuration changes must be made to Data Factory.
The telemetry data must migrate toward a solution that is native to Azure.
The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB Request Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the Ru/s.
Data Masking Requirements
During rare weekends, visitors will be able to enter the remote portable offices. Litware is concerned that some proprietary information might be exposed. The company identifies the following data masking requirements for the Race Central data that will be stored in SQL Database:
Only show the last four digits of the values in a column named SuspensionSprings.
Only Show a zero value for the values in a column named ShockOilWeight.
You are monitoring the Data Factory pipeline that runs from Cosmos DB to SQL Database for Race Central.
You discover that the job takes 45 minutes to run.
What should you do to improve the performance of the job?
Decrease parallelism for the copy activities.
Increase that data integration units.
Configure the copy activities to use staged copy.
Configure the copy activities to perform compression.
Answer is Increase that data integration units.
Performance tuning tips and optimization features. In some cases, when you run a copy activity in Azure Data Factory, you see a "Performance tuning tips" message on top of the copy activity monitoring, as shown in the following example. The message tells you the bottleneck that was identified for the given copy run.
It also guides you on what to change to boost copy throughput. The performance tuning tips currently provide suggestions like:
Use PolyBase when you copy data into Azure SQL Data Warehouse.
Increase Azure Cosmos DB Request Units or Azure SQL Database DTUs (Database Throughput Units) when the resource on the data store side is the bottleneck.