Running Total Using SQL

Table or query "Instructor" like below, need to running total to sort instructor in order.

CNT	Instructor
1	Lastname1, Firstname1
1	Lastname2, Firstname2
1	Lastname3, Firstname3
1	Lastname4, Firstname4
1	Lastname5, Firstname5

SQL: SELECT (SELECT SUM(b.CNT)
FROM Instructor b
WHERE b.Instructor <= a.Instructor) AS Sort, a.Instructor
FROM Instructor AS a
ORDER BY a.Instructor;

Sort	Instructor
1	Lastname1, Firstname1
2	Lastname2, Firstname2
3	Lastname3, Firstname3
4	Lastname4, Firstname4
5	Lastname5, Firstname5

Table or query "ag" like below, need to running total for Credit Earn for each student.

stunumterm	stunum	term	CrAtt	CrErn
123456789200000QF	123456789	200000QF		12
123456789201410QF	123456789	201410QF	12	12
123456789201501QF	123456789	201501QF	6	6
123456789201504QF	123456789	201504QF	6	6
123456789201507QF	123456789	201507QF	8	8
123456789201510QF	123456789	201510QF	7	7
123456789201601QF	123456789	201601QF	12	12
123456789201604QF	123456789	201604QF	5	5
123456789201607QF	123456789	201607QF	6	6
123456789201610QF	123456789	201610QF	12	12
123456789201701QF	123456789	201701QF	12	12
123456789201704QF	123456789	201704QF	8	8
123456789201707QF	123456789	201707QF	10	10
123456789201710QF	123456789	201710QF	6	6
123456789201801QF	123456789	201801QF	8	8
123456789201804QF	123456789	201804QF	15	15
123456789201807QF	123456789	201807QF	11	11
123456789201810QF	123456789	201810QF	13	13
123456789201901QF	123456789	201901QF	11	11
234567890201410QF	234567890	201410QF	10	10
234567890201501QF	234567890	201501QF	12	12
234567890201504QF	234567890	201504QF	9	9
234567890201507QF	234567890	201507QF	6	6

SQL: SELECT a.stunumterm, a.stunum, a.term, a.CrAtt, a.CrErn,
(select sum(CrErn) from ag as b where b.stunum =a.stunum and b.stunumterm<=a.stunumterm) AS SumCrErn
FROM AG AS a
ORDER BY a.stunumterm

stunumterm	stunum	term	CrAtt	CrErn	SumCrErn
123456789200000QF	123456789	200000QF		12	12
123456789201410QF	123456789	201410QF	12	12	24
123456789201501QF	123456789	201501QF	6	6	30
123456789201504QF	123456789	201504QF	6	6	36
123456789201507QF	123456789	201507QF	8	8	44
123456789201510QF	123456789	201510QF	7	7	51
123456789201601QF	123456789	201601QF	12	12	63
123456789201604QF	123456789	201604QF	5	5	68
123456789201607QF	123456789	201607QF	6	6	74
123456789201610QF	123456789	201610QF	12	12	86
123456789201701QF	123456789	201701QF	12	12	98
123456789201704QF	123456789	201704QF	8	8	106
123456789201707QF	123456789	201707QF	10	10	116
123456789201710QF	123456789	201710QF	6	6	122
123456789201801QF	123456789	201801QF	8	8	130
123456789201804QF	123456789	201804QF	15	15	145
123456789201807QF	123456789	201807QF	11	11	156
123456789201810QF	123456789	201810QF	13	13	169
123456789201901QF	123456789	201901QF	11	11	180
234567890201410QF	234567890	201410QF	10	10	10
234567890201501QF	234567890	201501QF	12	12	22
234567890201504QF	234567890	201504QF	9	9	31
234567890201507QF	234567890	201507QF	6	6	37

Introduction

One typical question is, how to calculate running totals in SQL Server. There are several ways of doing it and this article tries to explain a few of them.

Test environment

First we need a table for the data. To keep things simple, let's create a table with just an auto incremented id and a value field.

--------------------------------------------------------------------
-- table for test 
--------------------------------------------------------------------
CREATE TABLE RunTotalTestData (
   id    int not null identity(1,1) primary key,
   value int not null
);

And populate it with some data:

--------------------------------------------------------------------
-- test data
--------------------------------------------------------------------
INSERT INTO RunTotalTestData (value) VALUES (1);
INSERT INTO RunTotalTestData (value) VALUES (2);
INSERT INTO RunTotalTestData (value) VALUES (4);
INSERT INTO RunTotalTestData (value) VALUES (7);
INSERT INTO RunTotalTestData (value) VALUES (9);
INSERT INTO RunTotalTestData (value) VALUES (12);
INSERT INTO RunTotalTestData (value) VALUES (13);
INSERT INTO RunTotalTestData (value) VALUES (16);
INSERT INTO RunTotalTestData (value) VALUES (22);
INSERT INTO RunTotalTestData (value) VALUES (42);
INSERT INTO RunTotalTestData (value) VALUES (57);
INSERT INTO RunTotalTestData (value) VALUES (58);
INSERT INTO RunTotalTestData (value) VALUES (59);
INSERT INTO RunTotalTestData (value) VALUES (60);

The scenario is to fetch a running total when the data is ordered ascending by the id field.

Correlated scalar query

One very traditional way is to use a correlated scalar query to fetch the running total so far. The query could look like:

--------------------------------------------------------------------
-- correlated scalar
--------------------------------------------------------------------
SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

When this is run, the results are:

id   value   running total
--   -----   -------------
1    1       1
2    2       3
3    4       7
4    7       14
5    9       23
6    12      35
7    13      48
8    16      64
9    22      86
10   42      128
11   57      185
12   58      243
13   59      302
14   60      362

So there it was. Along with the actual row values, we have a running total. The scalar query simply fetches the sum of the value field from the rows where the ID is equal or less than the value of the current row. Let us look at the execution plan:

What happens is that the database fetches all the rows from the table and using a nested loop, it again fetches the rows from which the sum is calculated. This can also be seen in the statistics:

Table 'RunTotalTestData'. Scan count 15, logical reads 30, physical reads 0...

Using join

Another variation is to use join. Now the query could look like:

--------------------------------------------------------------------
-- using join 
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;

The results are the same but the technique is a bit different. Instead of fetching the sum for each row, the sum is created by using a GROUP BY clause. The rows are cross joined restricting the join only to equal or smaller ID values in B. The plan:

The plan looks somewhat different and what actually happens is that the table is read only twice. This can be seen more clearly with the statistics.

Table 'RunTotalTestData'. Scan count 2, logical reads 31...

The correlated scalar query has a calculated cost of 0.0087873 while the cost for the join version is 0.0087618. The difference isn't much but then again it has to be remembered that we're playing with extremely small amounts of data.

Using conditions

In real-life scenarios, restricting conditions are often used, so how are conditions applied to these queries. The basic rule is that the condition must be defined twice in both of these variations. Once for the rows to fetch and the second time for the rows from which the sum is calculated.

If we want to calculate the running total for odd value numbers, the correlated scalar version could look like the following:

--------------------------------------------------------------------
-- correlated scalar, subset
--------------------------------------------------------------------
SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id
                       AND b.value % 2 = 1)
FROM  RunTotalTestData a
WHERE a.value % 2 = 1
ORDER BY a.id;

The results are:

id   value   runningtotal
--   -----   ------------
1    1       1
4    7       8
5    9       17
7    13      30
11   57      87
13   59      146

And with the join version, it could be like:

--------------------------------------------------------------------
-- with join, subset
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id        <= a.id
AND   a.value % 2  = 1
AND   b.value % 2  = 1
GROUP BY a.id, a.value
ORDER BY a.id;

When actually having more conditions, it can be quite painful to maintain the conditions correctly. Especially if they are built dynamically.

Calculating running totals for partitions of data

If the running total needs to be calculated to different partitions of data, one way to do it is just to use more conditions in the joins. For example, if the running totals would be calculated for both odd and even numbers, the correlated scalar query could look like:

--------------------------------------------------------------------
-- correlated scalar, partitioning
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, (SELECT SUM(b.value) 
                               FROM RunTotalTestData b
                               WHERE b.id <= a.id
                               AND b.value%2 = a.value%2)
FROM   RunTotalTestData a
ORDER BY a.value%2, a.id;

The results:

even   id   value   running total
----   --   -----   -------------
0      2    2       2
0      3    4       6
0      6    12      18
0      8    16      34
0      9    22      56
0      10   42      98
0      12   58      156
0      14   60      216
1      1    1       1
1      4    7       8
1      5    9       17
1      7    13      30
1      11   57      87
1      13   59      146

So now the partitioning condition is added to the WHERE clause of the scalar query. When using the join version, it could be similar to:

--------------------------------------------------------------------
-- with join, partitioning
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id      <= a.id
AND   b.value%2  = a.value%2
GROUP BY a.value%2, a.id, a.value
ORDER BY a.value%2, a.id;

With SQL Server 2012

SQL Server 2012 makes life much more simpler. With this version, it's possible to define an ORDER BY clause in the OVER clause.

So to get the running total for all rows, the query would look:

--------------------------------------------------------------------
-- Using OVER clause
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(a.value) OVER (ORDER BY a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

The syntax allows to define the ordering of the partition (which in this example includes all rows) and the summary is calculated in that order.

To define a condition for the data, it doesn't have to be repeated anymore. The running total for odd numbers would look like:

--------------------------------------------------------------------
-- Using OVER clause, subset
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(a.value) OVER (ORDER BY a.id)
FROM   RunTotalTestData a
WHERE a.value % 2 = 1
ORDER BY a.id;

And finally, partitioning would be:

--------------------------------------------------------------------
-- Using OVER clause, partition
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, SUM(a.value) OVER (PARTITION BY a.value%2 ORDER BY a.id)
FROM   RunTotalTestData a
ORDER BY a.value%2, a.id;

What about the plan? It's looking very different. For example, the simple running total for all rows looks like:

And the statistics:

Table 'Worktable'. Scan count 15, logical reads 85, physical reads 0...
Table 'RunTotalTestData'. Scan count 1, logical reads 2, physical reads 0...

Even though the scan count looks quite high at first glance, it isn't targeting the actual table but a worktable. The worktable is used to store intermediate results which are then read in order to create the calculated results.

The calculated cost for this query is now 0.0033428 while previously with the join version, it was 0.0087618. Quite an improvement.

Introduction

Test environment

Correlated scalar query

Using join

Using conditions

Calculating running totals for partitions of data

With SQL Server 2012

SQL Server: Calculating Running Totals, Subtotals and Grand Total Without a Cursor

Sample Data Used by Examples

Running Total On Each Record

Running Total for Each OrderDate

Running Totals With Subtotals and Grand totals

Conclusion