DP-600: Implementing Analytics Solutions Using Microsoft Fabric
73%
82 QUESTIONS AS TOTAL
Question 51
You are building a solution by using a Fabric notebook.
You have a Spark DataFrame assigned to a variable named df. The DataFrame returns four columns.
You need to change the data type of a string column named Age to integer. The solution must return a DataFrame that includes all the columns.
How should you complete the code?
Check the answer area
Answer is df.withColumn("age", col("age").cast("int")).show()
Question 52
You have a Fabric warehouse that contains a table named Staging.Sales. Staging.Sales contains the following columns.
You need to write a T-SQL query that will return data for the year 2023 that displays ProductID and ProductName and has a summarized Amount that is higher than 10,000.
Which query should you use?
SELECT ProductID, ProductName, SUM(Amount) AS TotalAmount FROM Staging.Sales WHERE DATEPART(YEAR,SaleDate) = '2023' GROUP BY ProductID, ProductName HAVING SUM(Amount) > 10000
SELECT ProductID, ProductName, SUM(Amount) AS TotalAmount FROM Staging.Sales GROUP BY ProductID, ProductName HAVING DATEPART(YEAR,SaleDate) = '2023' AND SUM(Amount) > 10000
SELECT ProductID, ProductName, SUM(Amount) AS TotalAmount FROM Staging.Sales WHERE DATEPART(YEAR,SaleDate) = '2023' AND SUM(Amount) > 10000
SELECT ProductID, ProductName, SUM(Amount) AS TotalAmount\rFROM Staging.Sales WHERE DATEPART(YEAR,SaleDate) = '2023' GROUP BY ProductID, ProductName HAVING TotalAmount > 10000
Answer is
SELECT ProductID, ProductName, SUM(Amount) AS TotalAmount
FROM Staging.Sales
WHERE DATEPART(YEAR,SaleDate) = '2023'
GROUP BY ProductID, ProductName
HAVING SUM(Amount) > 10000
Question 53
You have a data warehouse that contains a table named Stage.Customers. Stage.Customers contains all the customer record updates from a customer relationship management (CRM) system. There can be multiple updates per customer.
You need to write a T-SQL query that will return the customer ID, name. postal code, and the last updated time of the most recent row for each customer ID.
How should you complete the code?
Check the answer section
WITH CUSTOMERBASE AS (
SELECT CustomerID, CustomerName, PostalCode, LastUpdated,
ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY LastUpdated DESC) as X
FROM LakehousePOC.dbo.CustomerChanges
)
SELECT CustomerID, CustomerName, PostalCode, LastUpdated
FROM CUSTOMERBASE
WHERE X = 1
Question 54
You have a Fabric tenant that contains a machine learning model registered in a Fabric workspace.
You need to use the model to generate predictions by using the PREDICT function in a Fabric notebook.
Which two languages can you use to perform model scoring?
T-SQL
DAX
Spark SQL
PySpark
Answers are;
C. Spark SQL
D. PySpark
Notebook only accepts the languages: PySpark, Spark, Spark SQL and SparkR
You are analyzing the data in a Fabric notebook.
You have a Spark DataFrame assigned to a variable named df.
You need to use the Chart view in the notebook to explore the data manually.
Which function should you run to make the data available in the Chart view?
displayHTML
show
write
display
Answer is display
Display allow to see chart and inspect statistiques.
A is another possibility for displaying data but not based on the requirements of this question, it is separate from chart view.
You are analyzing customer purchases in a Fabric notebook by using PySpark.
You have the following DataFrames:
transactions: Contains five columns named transaction_id, customer_id, product_id, amount, and date and has 10 million rows, with each row representing a transaction. customers: Contains customer details in 1,000 rows and three columns named customer_id, name, and country.
You need to join the DataFrames on the customer_id column. The solution must minimize data shuffling.
You write the following code. from pyspark.sql import functions as F
results =
Which code should you run to populate the results DataFrame?
Answer is transactions.join(F.broadcast(customers), transactions.customer_id == customers.customer_id)
In Apache Spark, broadcasting refers to an optimization technique for join operations. When you join two DataFrames or RDDs and one of them is significantly smaller than the other, Spark can "broadcast" the smaller table to all nodes in the cluster. This approach avoids the need for network shuffles for each row of the larger table, significantly reducing the execution time of the join operation.
Question 57
You have a Microsoft Power BI report and a semantic model that uses Direct Lake mode.
From Power BI Desktop, you open Performance analyzer as shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
Check the answer section
Answer is Automatic & DirectQuery
The picture comes from https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
In this article you can see there are table1 and view1, performance analyzer shows:
- First card is linked to Table1 so direct lake is used
- Second card is linked to View1 so it does direct query
As the model can use direct lake and direct query you can conclude that the fallback behavior is automatic.
You have a Microsoft Power BI semantic model.
You plan to implement calculation groups.
You need to create a calculation item that will change the context from the selected date to month-to-date (MTD).
You have a Microsoft Power BI report named Report1 that uses a Fabric semantic model.
Users discover that Report1 renders slowly.
You open Performance analyzer and identify that a visual named Orders By Date is the slowest to render. The duration breakdown for Orders By Date is shown in the following table.
What will provide the greatest reduction in the rendering duration of Report1?
Enable automatic page refresh.
Optimize the DAX query of Orders By Date by using DAX Studio.
Change the visual type of Orders By Date.
Reduce the number of visuals in Report1.
Answer is Reduce the number of visuals in Report1.
While optimizing the DAX query could slightly improve performance, the DAX query duration (27 ms) is already very low. The "Other" duration (1047 ms) is significantly higher than the DAX query (27 ms) and the visual display (39 ms) durations combined. This indicates that most of the time is spent on backend processes such as data preparation, transformations, or communication with the data source. By reducing the number of visuals in Report1, you can decrease the overall load on the report rendering process.
You have a Microsoft Fabric tenant that contains a dataflow.
You are exploring a new semantic model.
From Power Query, you need to view column information as shown in the following exhibit.
Which three Data view options should you select?
Show column value distribution
Enable details pane
Enable column profile
Show column quality details
Show column profile in details pane
Answers are;
A. Show column value distribution
C. Enable column profile
D. Show column quality details
Show column value distribution: This option provides a visual representation of the distribution of values in each column, which is visible in the exhibit.
Enable column profile: This option displays statistics and other detailed information about each column, including value distribution, which aligns with the data shown in the exhibit.
Show column quality details: This option shows the quality of the data in each column, indicating valid, error, and empty values, as displayed in the exhibit.