stepsAI 01

Power Query - Optimize with fewer steps

In the wonderful world of data acquisition with Power Query, it can happen that some queries take longer than others, even in the preview, which only loads the first 1000 lines. There are several ways to optimize queries in Power Query. This article presents some of them, with their advantages and disadvantages.

The reason why this article is considered intermediate is that, by using these techniques, you will lose the ability to use the visual editor to modify your optimized steps. It is still possible to use the visual editor to create new steps and then modify them using the advanced editor.

Base created with visual editor

For the rest of this article, I'm going to use this basic example, which would be done in the Power Query editor without using the advanced editor. For the sake of readability, I've taken the liberty of spacing out the code in the advanced editor, since normally each step is on a single line.

let
    Source = OData.Feed(
        "https://analytics.dev.azure.com/HypershellTechno/zzz_Formation/_odata/v4.0-preview/WorkItems?$select=WorkItemId, State, Title, WorkItemType, Priority, Risk, Severity, StoryPoints, TagNames, StateCategory, ParentWorkItemId"&
        "&$expand=Area($select=AreaName,AreaPath), Project($select=ProjectId,ProjectName), Iteration($select=IterationName,IterationPath,IsEnded), AssignedTo($select=UserName)"&
        "&$orderby=WorkItemId asc",
        null,
        [Implementation="2.0"]
    ),

    expandProject = Table.ExpandRecordColumn(
        Source,
        "Project",
        { "ProjectName", "ProjectId"},
        {"ProjectName", "ProjectId"}
    ),

    expandArea = Table.ExpandRecordColumn(
        expandProject,
        "Area",
        { "AreaName", "AreaPath"},
        {"AreaName", "AreaPath"}
    ),

    expandIteration = Table.ExpandRecordColumn(
        expandArea,
        "Iteration",
        {"IterationName", "IterationPath"},
        {"IterationName", "IterationPath"}
    ),

    expandAssign = Table.ExpandRecordColumn(
        expandIteration,
        "AssignedTo",
        {"UserName"},
        {"AssignedTo.UserName"}
        ),

    Sort_byID = Table.Sort(
        expandAssign,
        {{"WorkItemId", Order.Ascending}}
    ),

    url = Table.AddColumn(
        Sort_byID,
        "URL",
        each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId])
    ),

    icon = Table.AddColumn(url, "Icon",
        each
            if [WorkItemType] = "Epic" then "👑"
            else if [WorkItemType] = "Feature" then "🏆"
            else if [WorkItemType] = "User Story" then "📘"
            else if [WorkItemType] = "Bug" then "🐞"
            else if [WorkItemType] = "Action Plan" then "📑"
            else if [WorkItemType] = "Issue" then "🚩"
            else if [WorkItemType] = "Task" then "📋"
            else if [WorkItemType] = "Test Case" then "📃"
            else if [WorkItemType] = "Test Plan" then "🗃"
            else if [WorkItemType] = "Test Suite" then "📂"
            else "🚧"
     ),

    addTitlePlus = Table.AddColumn(
        icon,
        "Title+",
        each [Icon] & " " & [Title]
    ),

    del_icon = Table.RemoveColumns(addTitlePlus,{"Icon"})

in del_icon

The query steps look like this in the editor:

image 8

It's important to remember that Power Query stores the previews of each of the intermediate steps, and must recalculate them every time there's a change or we update the data. Many of these steps have no added value in being stored in memory. We can therefore optimize both execution time and memory space used.

Merging nested steps

One possibility is to nest the steps within each other. We could therefore think of modifying the four column expansions as follows:

expand = Table.ExpandRecordColumn(
        Table.ExpandRecordColumn(
            Table.ExpandRecordColumn(
                Table.ExpandRecordColumn(
                    Source
                    Project
                    { "ProjectName", "ProjectId"}
                )
                Area
                { "AreaName", "AreaPath"}
            )
            Iteration
            { "IterationName", "IterationPath"}
        )
        AssignedTo
        { "UserName"}
    ),

In the editor and for Power Query, this is treated as a single step called "expand", rather than the four steps previously described.

image 1

This is the most optimal method in terms of performance. The downside is that, in addition to losing the gear for editing in the interface, it becomes more difficult to do the maintenance caused by poor readability (here, I've taken the trouble to indent the code with line feeds. However, when it's on a single line, it becomes virtually impossible to know which parameter goes with which function).

This example shows an extreme case. The typical use case for nesting is usually limited to two steps, not four as shown here.

Sub-steps - the operator let...in

Another way of reducing the number of intermediate steps in the editor without losing legibility is to use let...in.

    expand = let
        expandProject = Table.ExpandRecordColumn(
            Source,
            "Project",
            { "ProjectName", "ProjectId"}
        ),

        expandArea = Table.ExpandRecordColumn(
            expandProject,
            "Area",
            { "AreaName", "AreaPath"}
        ),

        expandIteration = Table.ExpandRecordColumn(
            expandArea,
            "Iteration",
            { "IterationName", "IterationPath", "IsEnded"}
        ),

        expandAssign = Table.ExpandRecordColumn(
            expandIteration,
            "AssignedTo",
            {"UserName"},
            {"AssignedTo.UserName"}
        )

    in expandAssign,

This technique reduces the number of intermediate steps by treating them as a single one. A single preview is also calculated. In terms of performance, this technique is comparable to nesting. There will still be a difference from nesting due to the creation of local variables, but the impact on memory is negligible.

image 1

In addition to the similar benefits of nesting, we get a bonus in terms of readability and maintainability. It's actually very easy to debug these steps, since by undoing the let...inthe steps will reappear in the editor. Note, however, that the step (and therefore the reference for subsequent steps) becomes the name of the variable preceding the let and not the variable name visible after the in. To avoid forgetting during debugging, you could always call your last step by the same name as your variable, but this is optional. (And don't forget to reset the comma if it's not the last step).

Eliminate redundant or unnecessary steps

A title like this seems obvious, but did you notice that a sorting step had been added even though the OData query already does this sorting? The same applies to type modifications. In the basic example, the step had not been added, but to comply with best practice, the type of the new URL column should have been text rather than any. It's also possible that you used ReplaceValue rather than ReplaceText when modifying your values. Typically, we would add the following step to adjust for erroneous types:

changeType = Table.TransformColumnTypes(formatTitle,{{"Title", type text}, {"URL", type text}})

Note that if you have made all your type changes at the same time, the editor will already make a list. On the other hand, it sometimes performs these steps at different times

changeType = Table.TransformColumnTypes(formatTitle,{ {"Title", type text} })
{...}
changeType2 = Table.TransformColumnTypes(otherOperation,{ {"URL", type text} })

This second version should be avoided, both in terms of performance and memory. It's much more efficient to do it all at once. However, there's an even simpler way to avoid this step altogether.

For some reason, the graphics editor doesn't allow you to add this parameter, but adding it manually doesn't prevent you from using the graphics editor. When the column is created, you specify the column type directly. So here's the step as written by the editor.

url = Table.AddColumn(
    expand,
    "URL",
    each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId])
),

And here's the version modified to indicate the type at creation.

url = Table.AddColumn(
    expand,
    "URL",
    each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId]),
    type text
),

This can also be done for manually created data. (There are several methods, but here's one that uses Record).

Table.FromRecords(
    {
        [ ID = 7665, StartDate = #date(2013,05,12), EndDate = null, GivenName = "John", FamilyName = "Smith" , isAnnual = true, efficiency = 2.54 ],
        [ ID = 7788, StartDate = #date(2015,02,28), EndDate = null, GivenName = "Jane", FamilyName = "Doe", isAnnual = true, efficiency = 1.54 ],
        [ ID = 7890, StartDate = #date(2020,03,27), EndDate = #date(2021,06,30), GivenName = "Juliette", FamilyName = "Capulet", isAnnual = false, efficiency = 0.68 ],
        [ ID = 8756, StartDate = #date(2020,11,17), EndDate = #date(2021,06,30), GivenName = "Romeo", FamilyName = "Montaigu", isAnnual = false, efficiency = 0.35 ],
        [ ID = 8857, StartDate = #date(2021,06,30), EndDate = null, GivenName = "Barbara", FamilyName = "Delacruz", isAnnual = false, efficiency = 3.4 ],
        [ ID = 9563, StartDate = #date(2023,12,01), EndDate = null, GivenName = "George", FamilyName = "Washington", isAnnual = true, efficiency = 1.48 ]
    }
    let
        _t = ((type nullable text) meta [Serialized.Text = true]),
        _d = ((type nullable datetime) ),
        _i = ((type nullable Int64.Type) ),
        _n = ((type nullable number) ),
        _b = ((type logical) )
    in type table [ID = _i, StartDate = _d, EndDate = _d, GivenName = _t, FamilyName = _t, isAnnual = _b, efficiency = _n]
)

By adding the type to the creation, you can eliminate the conversion step altogether, which is a double gain in performance. Take the trouble to read the official documentation for the functions you're using to learn more about the various parameters that the visual editor ignores or uses incorrectly.

Again to eliminate unnecessary steps, rather than adding a column for the icon, then calculating the title and then destroying the icon, you can create the change directly without any additional steps.

formatTitle = Table.ReplaceValue(
        url,
        each [Title],
        each (
            if [WorkItemType] = "Epic" then "👑"
            else if [WorkItemType] = "Feature" then "🏆"
            else if [WorkItemType] = "User Story" then "📘"
            else if [WorkItemType] = "Bug" then "🐞"
            else if [WorkItemType] = "Action Plan" then "📑"
            else if [WorkItemType] = "Issue" then "🚩"
            else if [WorkItemType] = "Task" then "📋"
            else if [WorkItemType] = "Test Case" then "📃"
            else if [WorkItemType] = "Test Plan" then "🗃"
            else if [WorkItemType] = "Test Suite" then "📂"
            else "🚧"
        ) & [Title],
        Replacer.ReplaceText,{"Title"})

By directly concatenating our structure if...elseif...else and the title, we avoid two steps that were totally unnecessary, namely the creation and deletion of a column for an intermediate state. In a future article, I'll show you how to further improve this structure without using a conditional structure, but it's still a significant first improvement. Always try to see if your intermediate steps are really necessary, or if you can do more with less, as is the case here.

Conclusion - the current solution

There are many ways to optimize Power Query requests. Putting everything we've seen in this article together, we get the solution presented below. Note that indented formatting is optional, but helps readability and has no impact on the editor or performance. I therefore suggest that, when using the advanced editor, you use a similar technique to aid readability. As a bonus, I've added a local variable that makes it easier to edit the query in the editor. I'll talk more about these variables in another article.

Request

let
    query = "$select=WorkItemId, State, Title, WorkItemType, Priority, Risk, Severity, StoryPoints, TagNames, StateCategory, ParentWorkItemId"&
"&$expand=Area($select=AreaName,AreaPath),Project($select=ProjectId,ProjectName),Iteration($select=IterationName,IterationPath,IsEnded),AssignedTo($select=UserName)"&
"&$orderby=WorkItemId asc",

    Source = OData.Feed("https://analytics.dev.azure.com/KuriosIT/_odata/v4.0-preview/WorkItems?" & query, null, [Implementation="2.0"] ),

    expand = let
        expandProject = Table.ExpandRecordColumn(
            Source,
            "Project",
            { "ProjectName", "ProjectId"}
        ),
        expandArea = Table.ExpandRecordColumn(
            expandProject,
            "Area",
            { "AreaName", "AreaPath"}
        ),
        expandIteration = Table.ExpandRecordColumn(
            expandArea
            "Iteration"
            {"IterationName", "IterationPath", "IsEnded"}
        ),
        expandAssign = Table.ExpandRecordColumn(
            expandIteration,
            "AssignedTo",
            {"UserName"},
            {"AssignedTo.UserName"}
        )
    in expandAssign,
    url = Table.AddColumn(
        expand,
        "URL",
        each baseURL&"/"&[ProjectName]&"/_workitems/edit/"&Text.From([WorkItemId]),
        type text
    ),
    formatTitle = Table.ReplaceValue(
        url,
        each [Title],
        each (
            if [WorkItemType] = "Epic" then "👑"
            else if [WorkItemType] = "Feature" then "🏆"
            else if [WorkItemType] = "User Story" then "📘"
            else if [WorkItemType] = "Bug" then "🐞"
            else if [WorkItemType] = "Action Plan" then "📑"
            else if [WorkItemType] = "Issue" then "🚩"
            else if [WorkItemType] = "Task" then "📋"
            else if [WorkItemType] = "Test Case" then "📃"
            else if [WorkItemType] = "Test Plan" then "🗃"
            else if [WorkItemType] = "Test Suite" then "📂"
            else "🚧"
        ) & [Title],
        Replacer.ReplaceText,{"Title"})
in
    formatTitle

You can find the Power BI file that includes this Power Query example at following this link. This example also presents a few cases in DAX and another solution including functions, which will be discussed in subsequent articles.

Still curious?

More articles on optimization and advanced techniques are to come. For those who wish to continue and learn more, I invite them to follow the blog series entitled " Series - Creating an agile dashboard with Power BI "by Simon-Pierre Morin and Mathieu Boisvert.