stepsAI 01

Power Query – Optimization by reducing steps

By Simon-Pierre Morin, eng.
(beginner-intermediate level)

In the wonderful world of data acquisition with Power Query, it may happen that some requests are quite long to process, even when the preview only load the first 1000 rows. There are several methods to optimize such queries in Power Query. This article will present some of them with their advantages and disadvantages.

The reason why we want to consider this blog as intermediate level comes from the fact that while using these techniques, you will lose the possibility to use the visual editor to modify you optimized queries. You will still be able to use the visual editor to create new steps that you will be able to modify using the advanced editor.

Visual Editor Code Base

For the whole purpose of this article, I will base my examples on this code which could have been created in Power Query editor without the use of the advanced editor. For readability, I still went to the advance editor to make some room. Normally, each step is on a single line.

let
    Source = OData.Feed(
        "https://analytics.dev.azure.com/HypershellTechno/zzz_Formation/_odata/v4.0-preview/WorkItems?$select=WorkItemId, State, Title, WorkItemType, Priority, Risk, Severity, StoryPoints, TagNames, StateCategory, ParentWorkItemId"&
        "&$expand=Area($select=AreaName,AreaPath), Project($select=ProjectId,ProjectName), Iteration($select=IterationName,IterationPath,IsEnded), AssignedTo($select=UserName)"&
        "&$orderby=WorkItemId asc", 
        null, 
        [Implementation="2.0"] 
    ),

    expandProject = Table.ExpandRecordColumn(
        Source,
        "Project",
        {"ProjectName", "ProjectId"},
        {"ProjectName", "ProjectId"}
    ),

    expandArea = Table.ExpandRecordColumn(
        expandProject,
        "Area",
        {"AreaName", "AreaPath"},
        {"AreaName", "AreaPath"}
    ),

    expandIteration = Table.ExpandRecordColumn(
        expandArea,
        "Iteration",
        {"IterationName", "IterationPath "},
        {"IterationName", "IterationPath "}
    ),

    expandAssign = Table.ExpandRecordColumn(
        expandIteration,
        "AssignedTo",
        {"UserName"},
        {"AssignedTo.UserName"}
        ),

    Sort_byID = Table.Sort(
        expandAssign,
        {{"WorkItemId", Order.Ascending}}
    ),

    url = Table.AddColumn(
        Sort_byID,
        "URL",
        each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId])
    ),

    icon = Table.AddColumn(url, "Icon",
        each
            if [WorkItemType] = "Epic" then "👑"
            else if [WorkItemType] = "Feature" then "🏆"
            else if [WorkItemType] = "User Story" then "📘"
            else if [WorkItemType] = "Bug" then "🐞"
            else if [WorkItemType] = "Action Plan" then "📑"
            else if [WorkItemType] = "Issue" then "🚩"
            else if [WorkItemType] = "Task" then "📋"
            else if [WorkItemType] = "Test Case" then "📃"
            else if [WorkItemType] = "Test Plan" then "🗃"
            else if [WorkItemType] = "Test Suite" then "📂"
            else "🚧"
     ),

    addTitlePlus = Table.AddColumn(
        icon,
        "Title+",
        each [Icon] & " " & [Title]
    ),

    del_icon = Table.RemoveColumns(addTitlePlus,{"Icon"})

in del_icon

The steps for that query look like this in the editor:

image 8

It is important to remind ourselves that Power Query keep in memory each steps preview and needs to recalculate each of them every time there is a change or at data refresh. Many of these steps have absolutely no added value to be kept in memory. So, there is a possibility to save on both time execution and memory.

Fusion of Embedded Steps

One possibility is to embed steps one inside the other. We could then think about changing the four column expansions as follows:

expand = Table.ExpandRecordColumn(
        Table.ExpandRecordColumn(
            Table.ExpandRecordColumn(
                Table.ExpandRecordColumn(
                    Source
                    , "Project"
                    , {"ProjectName", "ProjectId"}
                )
                , "Area"
                , {"AreaName", "AreaPath"}
            )
            , "Iteration"
            , {"IterationName", "IterationPath"}
        )
        , "AssignedTo"
        , {"UserName"}
    ),

In the editor, as for Power Query, this is considered as one single step rather than four.

image 1

This is by far the most optimal performance-wise. Though, not only did we lose the gear in the visual editor to help us to edit that step, but it became really hard to maintain that step due to a massive lack of readability (here, I was kind enough to indent the code to help with the line feeds. But on one single line, it’s almost impossible to say which parameter goes with each function.).

I have to admit though that this example is quite extreme. A typical use case of embedding is generally limited to two steps, not four like it is the case here.

Sub steps – the let…in operator

Another way to reduce the steps in the editor without losing the readability is to use let…in.

    expand = let
        expandProject = Table.ExpandRecordColumn(
            Source,
            "Project",
            {"ProjectName", "ProjectId"}
        ),

        expandArea = Table.ExpandRecordColumn(
            expandProject,
            "Area",
            {"AreaName", "AreaPath"}
        ),

        expandIteration = Table.ExpandRecordColumn(
            expandArea,
            "Iteration",
            {"IterationName", "IterationPath", "IsEnded"}
        ),

        expandAssign = Table.ExpandRecordColumn(
            expandIteration,
            "AssignedTo",
            {"UserName"},
            {"AssignedTo.UserName"}
        )

    in expandAssign,

This technique allows to reduce the intermediate steps by considering them all as one. A single preview is also processed. On the performance side, this technique is comparable to the embedding. There will still have a slight difference between the two techniques caused by the local variables that are created, but the memory and performance differences are insignificant.

image 1

Adding to the gain similar to the embedding, we have a nice readability and maintainability bonus. Debugging becomes quite simple since we only need to undo the let..in, which will cause the steps to reappear in the editor. It is important to note though that the step (and thus the reference for the following steps) is the name of the variable preceding the let and not the visible identifier that could be present after the in. To avoid a mistake at debugging, you could give the same name to you last sub step as the variable of the step, but it’s optional. (Don’t forget to also adjust the last coma when it’s not your very last step.)

Delete Repeating and Useless Steps

A title like this one may seem straightforward, but did you notice that a sort step was added even if the OData call already do the sorting? Same goes for the type change step. In the code base example, the step wasn’t initially there, but to conform to the best practice, URL must be of type text rather than any. It’s also possible that you used ReplaceValue instead of ReplaceText when you modified your column. In most cases, we would have added a step to adjust the types:

changeType = Table.TransformColumnTypes(formatTitle,{{"Title", type text}, {"URL", type text}})

Notice that if you had done all your type change at the same time, the editor already does a list of lists to change all the types together. But it may happen that those steps are done at different moment.

changeType = Table.TransformColumnTypes(formatTitle,{ {"Title", type text} })
{…}
changeType2 = Table.TransformColumnTypes(otherOperation,{ {"URL", type text} })

This second version is to avoid as much as possible for the matter of both performance and memory. This system function is definitely more efficient to do it all at once. But there is another way that even prevent the addition of the extra step.

For some unknown reason, the visual editor doesn’t allow to add that parameter, but adding the type parameter manually don’t prevent the visual editor to work. While creating a column, you can specify directly the type. Here is the code created by the editor.

url = Table.AddColumn(
    expand,
    "URL",
    each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId])
),

And here is the modified version with the type.

url = Table.AddColumn(
    expand,
    "URL",
    each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId]),
    type text
),

It is also possible to do this when entering manual data. (There are several methods, but here is the one using Records.)

Table.FromRecords(
    {
        [ ID = 7665, StartDate = #date(2013,05,12), EndDate = null, GivenName = "John", FamilyName = "Smith" , isAnnual = true, efficiency = 2.54 ],
        [ ID = 7788, StartDate = #date(2015,02,28), EndDate = null, GivenName = "Jane", FamilyName = "Doe", isAnnual = true, efficiency = 1.54 ],
        [ ID = 7890, StartDate = #date(2020,03,27), EndDate = #date(2021,06,30), GivenName = "Juliette", FamilyName = "Capulet", isAnnual = false, efficiency = 0.68 ],
        [ ID = 8756, StartDate = #date(2020,11,17), EndDate = #date(2021,06,30), GivenName = "Romeo", FamilyName = "Montaigu", isAnnual = false, efficiency = 0.35 ],
        [ ID = 8857, StartDate = #date(2021,06,30), EndDate = null, GivenName = "Barbara", FamilyName = "Delacruz", isAnnual = false, efficiency = 3.4 ],
        [ ID = 9563, StartDate = #date(2023,12,01), EndDate = null, GivenName = "George", FamilyName = "Washington", isAnnual = true, efficiency = 1.48 ]
    }
    , let 
        _t = ((type nullable text) meta [Serialized.Text = true]), 
        _d = ((type nullable datetime) ),
        _i = ((type nullable Int64.Type) ),
        _n = ((type nullable number) ),
        _b = ((type logical) )
    in type table [ID = _i, StartDate = _d, EndDate = _d, GivenName = _t, FamilyName = _t, isAnnual = _b, efficiency = _n]
)

By adding the type at the creation, it is possible to get rid of the conversion step, which is a double performance gain. Take the time to read the official documentation of the system functions that you are using to learn more about them. A lot of them have extra useful parameters that you can take advantage of that are not available in the visual editor or that the editor use poorly for some reason.

To still keep removing useless steps, rather than adding an icon column to calculate the title for then remove the said column, it is possible to add directly the icon in the original column without adding extra steps.

formatTitle = Table.ReplaceValue(
        url,
        each [Title],
        each (
            if [WorkItemType] = "Epic" then "👑"
            else if [WorkItemType] = "Feature" then "🏆"
            else if [WorkItemType] = "User Story" then "📘"
            else if [WorkItemType] = "Bug" then "🐞"
            else if [WorkItemType] = "Action Plan" then "📑"
            else if [WorkItemType] = "Issue" then "🚩"
            else if [WorkItemType] = "Task" then "📋"
            else if [WorkItemType] = "Test Case" then "📃"
            else if [WorkItemType] = "Test Plan" then "🗃"
            else if [WorkItemType] = "Test Suite" then "📂"
            else "🚧"
        ) & [Title],
        Replacer.ReplaceText,{"Title"})

By merging the if…elseif…else and the title by concatenation, we avoid two totally useless steps, which are the creation and the deletion of an intermediate column state. In a following article, I will present how we can improve even more that structure without using a conditional structure, but even in that state, it is a pretty good first enhancement. Always try to see if your steps really have a purpose or if it is possible to do more by doing less, as it is the case here.

Conclusion – Solution So Far

There are many methods to optimize Power Query steps. By putting together what was presented in this article, we get the solution bellow. Once again, the indented formatting is optional but helps for the readability without impacting performances. I recommend that, when using the advance editor, you use similar technique for readability. As a little bonus, I added a local variable to edit more easily the OData query in the editor. I will talk in more details of such variables in another article.

Solution Query

let 
    query = "$select=WorkItemId, State, Title, WorkItemType, Priority, Risk, Severity, StoryPoints, TagNames, StateCategory, ParentWorkItemId"&
"&$expand=Area($select=AreaName,AreaPath),Project($select=ProjectId,ProjectName),Iteration($select=IterationName,IterationPath,IsEnded),AssignedTo($select=UserName)"&
"&$orderby=WorkItemId asc",

    Source = OData.Feed("https://analytics.dev.azure.com/KuriosIT/_odata/v4.0-preview/WorkItems?" & query, null, [Implementation="2.0"] ),

    expand = let
        expandProject = Table.ExpandRecordColumn(
            Source,
            "Project",
            {"ProjectName", "ProjectId"}
        ),
        expandArea = Table.ExpandRecordColumn(
            expandProject,
            "Area",
            {"AreaName", "AreaPath"}
        ),
        expandIteration = Table.ExpandRecordColumn(
            expandArea
            "Iteration"
            {"IterationName", "IterationPath", "IsEnded"}
        ),
        expandAssign = Table.ExpandRecordColumn(
            expandIteration,
            "AssignedTo",
            {"UserName"},
            {"AssignedTo.UserName"}
        )
    in expandAssign,
    url = Table.AddColumn(
        expand,
        "URL",
        each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId]),
        type text
    ),
    formatTitle = Table.ReplaceValue(
        url,
        each [Title],
        each (
            if [WorkItemType] = "Epic" then "👑"
            else if [WorkItemType] = "Feature" then "🏆"
            else if [WorkItemType] = "User Story" then "📘"
            else if [WorkItemType] = "Bug" then "🐞"
            else if [WorkItemType] = "Action Plan" then "📑"
            else if [WorkItemType] = "Issue" then "🚩"
            else if [WorkItemType] = "Task" then "📋"
            else if [WorkItemType] = "Test Case" then "📃"
            else if [WorkItemType] = "Test Plan" then "🗃"
            else if [WorkItemType] = "Test Suite" then "📂"
            else "🚧"
        ) & [Title],
        Replacer.ReplaceText,{"Title"})
in
    formatTitle

You can find a Power BI file that includes that Power Query example by following this link. There are also other examples in DAX and Power Query using custom functions that will be discussed in the next blogs that are included in the file.

Still curious?

Other blogs about optimization and advance techniques will come soon. For those who want to learn even more, follow the blog series called “Series – An agile project dashboard with Power BI” by Simon-Pierre Morin and Mathieu Boisvert.

Similar Posts