Power Query – Data Gathering
By Simon-Pierre Morin, eng.
(beginner level)
As part of the Agile tour Québec 2023, I presented solutions regarding Excel reports related problems that Mathieu Boisvert presented.
In that presentation, I mentioned that one of the main problems comes from the fact that information is generally decentralized. Indeed, to make some adequate follow-up, it’s often required to connect to an ERP, a work tracking service (Jira, Azure DevOps, or others), pipelines, version control systems, and possibly even more. Moreover, these multiple sources are not always compatible when we talk about formatting (dates, hours, numbers, text encoding, regional parameters, language) not even talking about the fact that sometimes the data is in plain free text (users can freely write anything) which add an extra error source. Even more, manual extraction management in Excel tend to use outdated data and requires redoing manual operations at each report refresh.
Data sets within Power Query
The solution would be to use a data model by acquiring data sets in a centralized an automated way using Power Query. Other blogs will follow to discuss about optimization in Power Query and Power BI, but take in consideration that Power Query is usable in both Excel and Power BI. It is then possible to do a progressive transition if you target to use Power BI instead of Excel at the end.
Power Query use the M programming language. That language allows to connect to virtually any data sources. Over a hundred sources are already natively supported, not counting generic connectors. After establishing the connexion, we can do the followings:
- perform data operations using Power Query
- format data, so a date will be a date for example
- manage what to do with potential errors
- put relationship between data sets (that can be complemented in the semantic model tools)
- add extra local operations using DAX (Power BI) or VBA (Excel)
In the case of known connectors, format transformation operations are generally managed by the server prior to receive the data, but it is always possible to modify even further what is received.
The data model allows to create relationship between data that would be possible otherwise with these unrelated data sources.
By putting your data in a data set model and by giving Power Query your data transformation logic (could also include a part of your business logic), you won’t have to repeat those manual steps ever again. Even better, with a single click on the “Refresh” button, your data will get up to date, saving you a lot of hours for the management of your reports.
In following blogs, I will explain in more technical details how you can optimize your connexions to Web content using REST API and OData Feeds, how to use data flows to boost performances of your report and some tricks using M to create connexion wrappers that can limit authentication request count. I will also describe pros and cons of these tricks so you can take enlighten decisions on when to use one trick or the other.
Still curious?
For those who want to learn even more, follow the blog series called “Series – An agile project dashboard with Power BI” by Simon-Pierre Morin and Mathieu Boisvert.