The GROUP BY Clause

The GROUP BY statement is often used with aggregate functions (such as COUNT, MAX, and SUM) so that you can group a set of rows by one or more columns. Here’s a high-level overview of how you can use it, as well as information on how it works and some tips to keep in mind.

As an example, here’s a table with some pets data.

 

 

We can count the number of each species present in our table using the GROUP BY clause.

 

How the GROUP BY Clause Works

 

Let’s say that you have the following SQL statement:

SELECT this, that, other
FROM tableName
GROUP BY tableName.this, tableName.that, tableName.other

 

As you already know, this part of our statement returns a set of results.

SELECT this, that, other
FROM tableName

 

The set of results is then piped into our GROUP BY clause:

GROUP BY tableName.this, tableName.that, tableName.other

 

The GROUP BY statement produces a new table reference with just three columns. So, we see that by applying a GROUP BY clause, you reduce the number of columns available in any local clauses that follow (including SELECT).

If you’ve ever seen an error thrown when you try to reference columns not in the GROUP BY clause in later SELECT statements, this is why.

 

Arguments to Aggregate Functions

You may still be able to use other columns as arguments of aggregate functions even if they’re not in the GROUP BY clause:

SELECT this, that, MAX(other)
FROM tableName
GROUP BY tableName.this, tableName.that

 

Caveat for Users of MySQL

Relating to the example immediately above, standard SQL requires that all non-aggregate fields in the Select statement with a GROUP BY clause also appear in the GROUP BY clause. This is not always the case with MySQL. So, this would be acceptable to a MySQL database:

SELECT this, that, MAX(other)
FROM tableName
GROUP BY tableName.this

 

Sometimes, this will be okay, especially in cases where values in the column don’t change. However, in situations where this is NOT the case, MySQL doesn’t warn you that you’ll end up returning indeterminate results. According to the MySQL manual, “The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.”

 

Takeaways

The GROUP BY clause is a very useful function for data analysis because it allows you to group your data based on one or more columns. However, there are some pitfalls to be aware of, especially when using MySQL, so take care to return exactly what you intended.