When downloading data from the web, it’s often best to grab the data from APIs that are designed for machine-to-machine communication than from the site that’s actually visible on the screen. Not only is the download usually faster, but you also often get more additional parameters that can be very useful. In this article I’m going to show you how to retrieve the relevant URLs for downloading files from webpages (without resorting to external tools like Fiddler) and how to tweak them to your needs.
Retrieving the URL to download files from webpages
Say I want to download historical stock prices from this webpage:
https://finance.yahoo.com/quote/AAPL/history?p=AAPL
The screen will show a link to a download:
If I click on the button, a download dialogue will appear and some browsers will even show me the URL that’s behind it:
But when I close the dialogue, the URL will disappear as well. Fortunately in some browsers, you will be able to grab that URL in the options of the link to the downloaded file like in Chrome-based browsers like so:
But there are other ways as well. Sometimes, a mouse right-click on the download-button reveals a link that takes me to the download link:
But that’s not guaranteed to work everywhere as well. The last resort for me is to inspect the element: Rightclick the download link and choose “Inspect” (or “Inspect element”) instead:
This opens up the full monty of your site. You should then be able to find the URL near the highlighted position (indicating the element that you’re inspecting):
Tweaking the URL
Now let’s examine the catch and see how we can exploit it:
The first part up until the “?” is the main query, but after the question mark we see 4 query parameters:
Start- and Endate, interval and events. In this case, they correspond to the options on the webpage itself. Playing around with the parameters and checking the resulting URLs reveal that “wk” can be used to retrieve weekly data and “mo” is the abbreviation for monthly data. Leaves the question how to decipher the parameters for start- and endate.
As it turns out, they are noted as Unix timestamps . They represent a timestamps as the number of seconds after the start of 1st January 1970. So to transform a date to it, one has to:
- Determine its distance to 1/1/1970: Subtract #duration(25569,0,0,0) and then
- Convert it to seconds: * 86400
This is what the formula would look like:
Number.From ( DateTime.From ( DateTimeInput ) - #duration ( 25569,0,0,0 ) ) * 86400
Now you’re able to determine the interval for the download dynamically from whatever the refresh of your data reveals. Hope you found this useful.
BTW: The formula to calculate it the other way around looks like so:
#datetime ( 1970,1,1,0,0,0 ) + Duration.From ( UnixTimestamp / 86400 )
Closing with a general hint on how you have to adjust the queries if they shall be refreshable in the Power BI service: http://blog.datainspirations.com/2018/02/17/dynamic-web-contents-and-power-bi-refresh-errors/
Enjoy & stay healthy & queryious 😉
Great tip. Thanks for sharing.
Brilliant! I appreciate all of the extraneous information too…converting datetimes to UNIX, etc. Wonderful article.
Even in 2023, I found this very useful.