Correlated subqueries are a different type of subqueries. In some situations, you must use a correlated subquery to obtain the correct reply or result data set. In this article, we will cover how to identify or detect these situations in which you must use a correlated subquery. We also show some examples of correlated subqueries, and we will even talk about how to avoid overusing correlated subqueries.
IN WHAT SITUATIONS NEED CORRELATED SUBQUERIES?
There are some data questions where the SQL must include a correlated subquery to solve the problem which is particularly true in queries where we’re looking for what we might call negatives or negatives data questions.
Let’s review the sample tables we will use: We have two tables: the employee table and the payment history table. The second table has a column called payment_type that indicates if a payment is part of a regular salary, a bonus, or an award. Let’s see some sample data on the tables we will use:
The negative data question we will solve now is: “Obtain the employee names who never received a bonus” and the SQL query to answer this question is the following:
WHAT IS A CORRELATED SUBQUERY?
A correlated subquery is a query which is executed many times. One time per each record returned by the outer query which is the reason for the name correlated: a correlation between the number of times the subquery is executed with the number of records returned by the outer query (the outer query is the external query, not the subquery).
The main difference between a correlated subquery and a simple subquery is that correlated subqueries reference columns from the outer table. In the above example, ph.employee_id = e1.employee_id is a reference to the external subquery table. To identify a correlated query, look for these kinds of references. If you find at least one, you have a correlated subquery!
The negative part of the data question is often solved in the correlated subquery by using a NOT EXISTS operator in the WHERE clause. EXISTS is an operator always followed by a subquery. If the subquery returns at least one record, then the EXISTS evaluates to TRUE. If the subquery returns empty, then the EXISTS evaluates to FALSE. Note we use NOT EXISTS, which is the opposite to EXISTS.
SHOULD WE USE FOR POSITIVES DATA QUESTIONS?
Must we use a correlated subquery for a positive data question? No, you should not. However, if you want, you can do it. And in fact, many SQL developers do it. The reason is: For positive questions, we usually have a JOIN condition or a relationship between two tables to reply the data question. Let’s change to positive the data question of the previous example and solve it by using a JOIN instead of a correlated subquery.
The question to solve: Obtain the names of employees who received a bonus payment.
And the SQL query without using a correlated subquery to reply to this question is:
As a conclusion I would like to mention we don’t overuse correlated subqueries, if we can solve a data question with a JOIN instead of a correlated subquery, we should do it. You should only apply or use correlated subqueries for negative data questions or other cases where a correlated subquery is the only way to solve.
A COMPLEX EXAMPLE OF CORRELATED SUBQUERY
In this example, we will obtain the employee names, salary and month when they made a salary higher than their average monthly salary. Let’s see:
If you are reading, you can be able to identify in the previous example which is the reference to the outer subquery table. It is “ph2.employee_id = e1.employee_id”.
Correlated subqueries must be used carefully In some cases; they are a “must”. However, they tend to have poor performance, especially with huge tables. So if you can avoid, avoid them, even with small tables. You never know how many records a table will have next year!