I am writing a custom connector to interface with an API, and I'm running into some incredibly frustrating issues with the way that Web.Contents is behaving. The connector goes as follows;
- supply a list of IDs (referencing server stored objects)
- supply a Username and Password (The authorization is UsernamePassword)
- The username and password are combined to make a short term auth token, which are sent to a /login endpoint server side that returns a longer lived authtoken that will be used later
- Using this new authoken that was the result of hitting the /login endpoint, make a dynamic amount of calls (dependant on other information that was returned in the /login request) to an endpoint that generates csv files. Most of these csv files are relatively small (~100 rows, 5 columns) with one large csv at the end (~400,000 rows and ~10 columns).
- All of these csvs are then processed (promote headers, modify some column types, etc.) and then placed into a navigation table, which is then shown to the user.
The connector looks something like this:
shared Connector.Contents = (IDs as list) =>
let
longLivedAuthToken =
let
shortLivedAuthToken = Extension.GetCredentials()[Username] & Extension.GetCredentials()[Password],
response = Binary.Buffer(Web.Contents("/Login", Authorization = shortLivedAuthToken)),//This does not get cached
in
response[longLivedAuthToken],
BigCSV =
let
table = Binary.Buffer(Web.Contents("/BigCSVEndpoint", Authorization = longLivedAuthToken)),//Neither does this
// ^
//It looks like this doesn't use the value longLivedAuthToken,
//But rather it replaces this with the actual call (think a C macro)
promoted = Table.PromoteHeaders(table),
named = CustomRenameFunction(promoted, Table.First(promoted)),
in
named,
manySmallCSVs =
let
manySmallCSVs = List.Transform(IDs, each Binary.Buffer(Web.Contents("/SmallCSVEndpoint", Authorization = longLivedAuthToken, query = _))),//Or this
// ^
//It looks like this doesn't use the value longLivedAuthToken,
//But rather it replaces this with the actual call (think a C macro)
promoted = List.Transform(manySmallCSVs, each Table.PromoteHeaders(_)),
renamed = List.Transform(manySmallCSVs, each (CustomRenameFunction(_, Table.First(_))))
in
renamed
navtable =
let
allCSVs = List.Combine(manySmallCSVs, {BigCSV}),
table = Table.GenerateNavigationTableFromList(allCSVs)
in
table
in
navtable;
My issue is two fold:
Firstly, I can see in the server logs that a log in attempt is made over 7-8 times for each CSV in the import (in general there are 7-10 csvs). This eventually causes the /login endpoint to return a 409 Conflict error (too many concurrent users of the endpoint) which causes the whole import to fail. I have tried buffering the results of the login call using Binary.Buffer() on the Web.Contents call, but it does not seem to work.
Secondly, the first issue also manifests with the CSV downloads. Hitting these endpoints with a browser generally takes ~2 seconds to download the files individually in a single request, but for some bizarre reason importing them in the connector takes minutes, and on the server side there are literally hundreds of requests being made. (It looks like Power BI is downloading them in small chuncks, and making subsequent requests to get the next chunck? This is not programmed behaviour on the API side but most of the request contain the line 'expect 100 continue' which seems to indicate this is what is happening)
I have found that disabling parallel loads solves the issue of the /login attempts returning a 409 status code, but that is just because they all happen in series rather than in parallel, the same number of total calls are still made and the whole import takes a lot longer (scales with the number of csvs).
An interesting thing that I've noted though is that there seem to be Three distinct phases of the import. The first phase occurs immediately after supplying the ID and credentials, and results in ~3 calls to the /login endpoint and a similarly small number of calls to the csv endpoints. This phase ends when the preview window opens with all the csv files in the navigation table unchecked. The names of these tables are derived from specific cells in the tables, so they must have been loaded into memory for the names to have been pulled out (and the endpoints send the csvs in full). At this point I would expect that all the necessary requests have been made, and that no further API calls are necessary (otherwise how could the tables be dynamically named based off of data inside the table?).
Now when I start checking all the boxes for the tables, phase 2 starts, and calls start being made again for what I'm assuming is preview data. Why does this have to happen, shouldn't the data already be stored/cached?
Phase 2 ends and Phase 3 starts when I click the load button. Doing this causes all the endpoints to start being hit again, accounting for a further 4-5 calls per CSV which finally overwhelms the /login endpoint and the load errors out.
I managed to eek out an error message from Phase 3 that looks somethin like this
Formulas:
section Section1;
shared _csv1 = let
Source = Connector.Contents("471516961986314240"),
_csv11 = Source{[Key="_csv1"]}[Data]
in
_csv11;
shared _csv2 = let
Source = Connector.Contents("471516961986314240"),
_csv21 = Source{[Key="_csv2"]}[Data]
in
_csv21;
shared _csv3 = let
Source = Connector.Contents("471516961986314240"),
_csv31 = Source{[Key="_csv3"]}[Data]
in
_csv31;
...
This error message contains a shared _csvX = let... once for each check box that I checked. What it looks like it's doing internally is calling the connector (which gets all the csv's) once for every csv (which again, makes no sense because the csv's have already been imported). So in total if the import generates 7 output csv's, this explicitly fetches 7*7 csv's, or 49. This relationship of ~n^2 coincides with the number of login attempts that I am seeing. I'm assuming that this is not intended behaviour, and that the Source = Connector.Contents("...") should be cached from the initial import that generated the navtable that I interacted with, or that at least only one more call to it should be made. It's impossible to find out what is and isn't being cached, and the lazy loading evaluation model means that even if I write Binarry.Buffer(), which would result in an in memory copy being kept and subsequent uses of the output being fetched from cache rather than from calling Web.Contents, the call to Binary.Buffer itself seems to be deffered until the contents are actually needed, which means that a call to it can be outside of the initial scope where it would have been useful to cache the information.
I think that this is essentially what is going wrong:
let
rand = (List.Random(2)),
x = rand,
y = rand,
z = rand
in
{x{0},y{0},z{0}}
In this example, you would expect that rand gets evaluated into a list containing two random numbers, and x,y,z would then reference that List. The output would then be a list containing the same three numbers.
This is not the case. The evaluation of List.Random(2) is put off until it is absolutely needed, and the evaluation model generates something like this as the actual output {(List.Random(2)){0}, (List.Random(2)){0}, (List.Random(2)){0}} For a dynamic call like this you end up with three diferent numbers.
If you add a buffer and change the code to
let
rand = List.Buffer(List.Random(2)),//Buffer this now
x = rand,
y = rand,
z = rand
in
{x{0},y{0},z{0}}
Then the output generated is a list containing three identical numbers.
It seems as if in my connector buffering Web.Contents does not have the same effect, and I'm left with a waterfall of API calls that should not need to be made.
Any help would be greatly appreciated.