Aggregation functions group multi-row data into a single row based on some grouping. The functions are calculated by looping through the table rows and working out each group’s result.
This article will cover the basic principles of working with aggregation functions, including:
- Types of aggregation functions
- The general function syntax
- Working with aggregation functions in the data load editor
- Working with aggregation functions in visualisations
- Nested aggregation
For a complete, up-to-date list of aggregation functions, please see the Qlik Sense help pages here.
Types of aggregation functions
There are a huge number of aggregation functions supported by Qlik Sense, but they can be broken down into just a few categories:
- Basic – these are functions you intuitively should know from working with code and Microsoft Excel. Basic functions include [Min], [Max], [Avg], [Sum] etc. You are unlikely to ever build an app without using at least one of these functions.
- Counter – these functions are for, well, counting. They do go beyond the expected [Count] though. Additional functions include [NullCount] (count of NULL values), [MissingCount] (count of missing values), [NumericCount] (count of numeric values) as well as others. These functions are also very useful and you are likely to use [Count] as a minimum in your apps.
- Financial – these functions are fairly niche but you may use them if you are working with financial data or project evaluation. They include functions such as [IRR], [NPV] and other variants of those.
- Statistical – these functions are probably less niche than the financial ones but you may still not use them often. They are useful if you plan on going beyond the usual [Avg] in the app and are essential for doing some basic regression modelling. Some examples include [Median], [Skew], [Fractile] and various [Linest] functions.
- Statistical test – these are essential if you’re doing any modelling in the app. Included are various [TTest], [ZTest] and [Chi2Test] functions.
- String – these functions are very useful if you work with any string data that requires modification or analysis. Some functions in this category are [Concat] and [Maxstring].
The aggregation function syntax
Not surprisingly, the function syntax can differ massively depending on the aggregation function. For up-to-date syntax, please refer to the Qlik help pages here.
Understanding the function syntax given by Qlik Sense
Unfortunately, it isn’t easy to decipher the syntax provided in the help pages linked above if you aren’t used to reading it. In this section, we will explain how to read this syntax so that you can apply this thinking to every function in the help pages.
To explain the syntax, we will use a couple of different functions that have a few different syntax elements and decipher them in turn.
Example 1: FirstSortedValue
This function returns the first value after the data are sorted from low to high by the column you specify.
FirstSortedValue ([ distinct ] value, sort-weight [, rank ])
The rules for reading this are as follows:
- The function name itself is first, and all the parameters you need to enter go between brackets: ()
- Commas separate the various elements you will need to enter. In this function, there are 3 elements: the [distinct] value, sort-weight and the [rank]
- Anything between square brackets [] is an optional parameter. In this function, [distinct] is an optional qualifier which you only need to include if you want distinct values. If you do include it, DO NOT write it in brackets – write the word distinct. Equally, the [rank] is also optional. You will get the first value if you don’t enter a rank. If you want the 5th value, enter 5. Again, do not enter the brackets, just the value
An example of this function being used with all the optional parameters included could be:
FirstSortedValue (distinct "Shirt Colour", "April 2023 Sales", 2)
The above function will return the second least popular shirt colour in April 2023 based on the number of sales.
An example of the function without using optional parameters may be:
FirstSortedValue ("Product Name", "April 2023 Sales")
This code would now return the name of the product that had the lowest sales in April 2023.
Example 2: LINEST_B
This function returns the ‘b’ value of a linear regression.
LINEST_B (y_value, x_value[, y0 [, x0 ]])
The rules for reading this are as follows:
- The function name itself is first and all the parameters you need to enter go between brackets: ()
- Commas separate the various elements you will need to enter. In this function, there are 4 elements: y_value, x_value, [y0] and [x0]
- Anything between square brackets [] is an optional parameter. Both [y0] and [x0] are optional parameters in this function. Do not be confused by double set of [], it’s just because the two optional values follow each other. If you are including these values, you DO NOT need to use []; separate the parameters with a comma.
An example of this function being used with all the optional parameters included could be:
LINEST_B ("Ice Cream Sales", "Temperature", 0, 30)
With this code, the function will calculate the ‘b’ value of the linear regression given the values in fields “Ice Cream Sales” and “Temperature” as well as a forced y-axis intercept of 0 and x-axis value of 30.
An example of the function without using optional parameters may be:
LINEST_B ("Ice Cream Sales", "Temperature")
Now, the ‘b’ value is calculated purely on the values in fields “Ice Cream Sales” and “Temperature”.
How to use aggregation functions in the data load editor
Now that you understand how to read the syntax of Qlik functions, you need to understand how to use aggregation functions in the load. The rules are:
- They MUST be used within a LOAD or SELECT statement
- The statement MUST have a GROUP BY clause if the load/ select statement includes other fields. If it doesn’t, you will get an error when trying to reload the app.
- The GROUP BY works exactly the same as with SQL, you will be aggregating over some fields.
- Note that Qlik Sense doesn’t support the use of ‘having’ in the ‘load’ statement, so if you’re used to using SQL, be aware that you cannot filter the data table in the same way as in SQL by the aggregation results. Instead, you have to use a preceding load or a resident table to achieve the same result (see below for examples of how to use both).
A very basic load may look something like this:
TABLE:
LOAD
"Customer Name",
Avg("Order Value") as "Average Order Value"
FROM [my_data.qvd] (qvd)
GROUP BY
"Customer Name";
The loaded table will include a list of customer names in the first field and their average order value in the second.
How to use aggregation functions in visualisations
Aggregation functions can be used in charts in the exact same way as in the data load editor. The rules are:
- The functions do not have to be used with a dimension. For example, you can create a table that contains a single measure with an aggregation function.
- If multiple dimensions are used, the aggregation function will aggregate over each of those. For example, for a table with two dimensions and an aggregation measure, it is equivalent to having both dimensions in the group by clause of the load script.
- Aggregation functions must be added as a measure, not a dimension, unless the Aggr() function is used. See ‘Nested aggregation in visualisations’ below for more information on the Aggr() function.
A basic table showing customers total orders and their first and last order dates would look like as follows (formulas are in the table headers):
Nested aggregation
Nested aggregation in the data load editor
Nested aggregation means you are wrapping one aggregating function inside of another, such as Sum(Count(“My Field”)). Nested aggregation in the load script is NOT allowed. The only way to do this sort of transformation is by using preceding or resident loads.
A simple example of using a preceding load to aggregate twice could be something like the following:
TABLE:
LOAD
"Customer Name",
Avg("Average Order Value") as "Average Monthly Order Value"
GROUP BY
"Customer Name";
LOAD
"Customer Name",
"Month Year",
Avg("Order Value") as "Average Order Value"
FROM [my_data.qvd] (qvd)
GROUP BY
"Customer Name",
"Month Year";
With this script, the final table loaded contains the customer name and what each customer’s average order values per month are.
Doing the same thing with a resident load would look like this:
TABLE:
LOAD
"Customer Name",
"Month Year",
Avg("Order Value") as "Average Order Value"
FROM [my_data.qvd] (qvd)
GROUP BY
"Customer Name",
"Month Year";
TABLE2:
LOAD
"Customer Name",
Avg("Average Order Value") as "Average Monthly Order Value"
RESIDENT TABLE
GROUP BY
"Customer Name";
DROP TABLE TABLE;
The result is the same but your final table is called ‘TABLE2’.
Nested aggregation in visualisations
In visualisations, you can create nested aggregations by using the chart function Aggr(). The syntax of the function is as follows:
Aggr({SetExpression}[DISTINCT] [NODISTINCT] expr, StructuredParameter{, StructuredParameter})
Lets understand what each part of the syntax means:
- Aggr – this is the function. All the parameters go in between the () brackets.
- {SetExpression} – this is a placeholder for a set analysis expression if required. It is optional, and if you’re not using a set expression, leave this out.
- [DISTINCT] [NODISTINCT] – these are optional function qualifiers. By default, [DISTINCT] is assumed. It means that for each unique set of combinations you are aggregating over, only one line of results is generated. If you use [NODISTINCT], multiple results lines will be generated.
- expr – this is the aggregating function you are aggregating over a set of fields.
- StructuredParameter{, StructuredParameter} – these are the fields you are aggregating over. Only one is required; any additional ones can be added as comma-separated values.
As an example, image that we want to aggregate the total customer spend so that it can be used as a dimension in the chart.
The underlying data is structured as such:
We want to sum the ‘Value’ by the ‘Customer ID’ and use this as an interval dimension in a bar chart. As a measure, we want to use the distinct count of customer IDs to see how many customers have spent this much. Here’s how we do it.
In the chart measure, use the basic Count() counting function:
Count(distinct [Customer ID])
In the chart dimension, use the following combination of Class(), Aggr() and Sum() functions:
Class(Aggr(Sum(Value),[Customer ID]),5000,'£')
- The Aggr() function aggregates the sum of “Value” over each “Customer ID”.
- The Class() function formats the result of the Aggr() into class ‘buckets’. See more on Class() function here.
Note: the Aggr() function, while incredibly useful, can be problematic. Firstly, it is resource-heavy so be wary of using it if you have a model with many tables and millions of records. Secondly, it can lead to misleading results for your users. For example, if a user filters by the newly created ‘Spend Bucket’, Qlik actually selects the customer IDs associated with that value. There are situations where this is an undesirable outcome.