Quantcast
Channel: Power Query topics
Viewing all articles
Browse latest Browse all 30988

Azure Blob Storage Gzip files result in double sized download?

$
0
0

I have some data stored in Azure blob storage, as gzip'd CSV files. 

 

I'm then pulling the data into Power BI desktop and using a function with Binary.Decompress to decompress the files. 

 

When I refresh the data, it shows as downloading way more than is actually on the storage - a blob container which should have around 5-600mb of files results in a reported download of well over 1gb. The queries are as follows:

 

Unzip Function: 

 

(gZipFile) => 
let 
    #"Unzip" = Binary.Decompress(gZipFile, Compression.GZip),
    #"CSV" = Csv.Document(#"Unzip"),
    #"Headers" = Table.PromoteHeaders(#"CSV", [PromoteAllScalars=true])
in
    #"Headers"

Blob Retrieval: 

 

let
    Source = AzureStorage.Blobs("apdigitalproducts"),
    #"blobcontainer" = Source{[Name="googleanalyticsdata"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(#"blobcontainer",{"Content", "Name"}),
    #"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "Data", each fnDecompress([Content])),
    #"Removed Columns1" = Table.RemoveColumns(#"Invoked Custom Function",{"Content"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns1", "Data", {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "gaSmiley Tongueageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "gaSmiley TongueageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"}, {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "gaSmiley Tongueageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "gaSmiley TongueageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"})
in
    #"Expanded Data"

Any ideas? Are the blobs being decompressed at the server side or something? I cannot work out at all what is going on here.


Viewing all articles
Browse latest Browse all 30988

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>