This is a step-by-step description of how to combine multiple tables from one webpage into one table with categories as output. You can also apply this technique to combine tables from other sources as well (like from folder method for example or multiple different webpages (see this article)).
Sometimes the page you want to scrape has multiple tables like here:
And you want to combine them into 1 with a Category-column like so:
Overview
I will present 2 methods here:
- Append-method: This is the obvious one and is fast for just a few tables.
- Add-Column-method: A bit more complicated but will be faster for a large number of tables and is also suitable for a dynamic number of tables.
You will also find 2 options at the end of this article:
- Use custom functions for multi-step table transformations
- Use dynamic filters to select the desired tables
Append method
Call your webpage:
Next select the tables you need in the navigation pane:
This will return 3 separate queries:
You then add a column “Category” on each of these queries and state which is which (“Top Gainers”, “Top Decliners” and “Top Actives”):
Append all these queries:
Add all queries to be appended to the right window:
Add column method
The first method can get a bit tedious if you have a large number of queries, then this might be better:
Select just one table from the navigation pane (no matter which) and delete all steps apart from the 1st (Source):
That will return a table with all content elements of the page:
Add an Index column:
and filter the desired tables (here: Index 2,4 and 6) (an option for a dynamic filter is described at the end of this article)
A sneak in to the table contents shows us that the headers are in the first rows of the table:
So we add another column where we promote the headers like this:
Then you can use a conditional column for the Categories:
Now remove all columns you don’t need and expand “WithHeaders”:
Option1: Use custom functions
If the transformations on the tables contain multiple steps instead of just one here (Table.PromoteHeaders), you can use a custom function and use that in the Add-Column-step instead like Matt has described here. Just be aware, that custom functions don’t work in the service (at the time of writing).
Option2: Dynamic filter
If you are not sure that your desired tables always sit on the same relative position in the webpage and therefore the use of an index will not work, you can use more dynamic approaches for filtering like this for example:
Add a column which tries to access the first value in “Column3” of the Data-table:
If that returns “Symbol”, it’s your table and you can filter on that instead. As you can see, this formula might return errors:
So you have to remove them first before your filter on that column will work:
If you don’t like the conditional column to create the categories, watch out for my next article where I show more options to add columns to a table.
Enjoy & stay queryious;-)
Excellent advice, particularly on use of custom functions.
I also appreciate you showing a ‘real world’ example to explain the theory. This helps me conceptualize the benefits.
Thank you.
Thank you so much Doug – that’s very motivating!
Cheers, Imke
Pingback: Web Scraping With Power BI – Curated SQL
Pingback: Web Scraping 2: Scrape multiple pages at once and refresh in Power BI service – The BIccountant