Quantcast
Channel: Power Query topics
Viewing all articles
Browse latest Browse all 31082

Does Power BI fully support Azure SQL Data Warehouse?

$
0
0

Hello

 

Power BI seems to work in a way that means it cannot get the best out of Azure SQL Data Warehouse - more specifically Power BI seems likely to generate unnecessary data movement between the compute nodes.  I want to outline my use case, check my understanding and see if anyone else has any comments/thoughts…

 

Example:  I have the following (subset of) tables containing web analytics data from an e-Commerce solution:

Session

PageView

PageEvent

(One session has multiple page views, and each page view has multiple events).

 

Session joins to PageView on SessionID

PageView joings to PageEvent on PageViewID

 

In Azure SQL DW, all three tables contain the SessionID and all three tables are Hash distributed on this field.  

This means all the data for a particular session resides within one compute node in the Azure SQL DWH.

This means hand-written analytical queries typically involve no data movement between compute nodes (other than to return the final results) as they are written roughly as follows:

SELECT ...

FROM Session s

INNER JOIN PageView pv

ON s.SessionID = pv.SessionID

INNER JOIN PageEvent pe

ON pv.PageViewID = pe.PageViewID and

pv.SessionID = pe.SessionID

 

When Power BI is connected to Azure DWH via Direct Query, it presumably won't generate the last line in the query above (the additional join criteria on SessionID), since it only supports one field joins.

This looks like it will lead to additional data movement between the compute nodes of an Azure SQL DWH instance.

I have investigated this by running a query (from Management Studio) similar to the above with and without the last line, and reviewing the steps in the query execution from sys.dm_pdw_request_steps.

 

With the “pv.SessionID = pe.SessionID” line:

 

step_index     operation_type           distribution_type      location_type

0              OnOperation              Unspecified            Control

1              PartitionMoveOperation   Unspecified            DMS

2              ReturnOperation          Unspecified            Control

3              OnOperation              Unspecified            Control

 

Without the “pv.SessionID = pe.SessionID” line:

 

step_index     operation_type           distribution_type      location_type

0              RandomIDOperation        Unspecified            Control

1              OnOperation              AllComputeNodes        Compute

2              BroadcastMoveOperation   Unspecified            DMS

3              OnOperation              Unspecified            Control

4              PartitionMoveOperation   Unspecified            DMS

5              OnOperation              AllComputeNodes        Compute

6              ReturnOperation          Unspecified            Control

7              OnOperation              Unspecified            Control

 

This additional join criteria is important since it enforces that the scope of the query is within each control node and so no data movement between nodes is needed until returning results.

 

This seems to make Power BI less useful as a client to generally browse an Azure SQL Data Warehouse.  Of course, it is still possible to write specific queries using the additional join criteria, but then that isn’t using Power BI as a general DWH client.

 

 


Viewing all articles
Browse latest Browse all 31082

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>