Tweens: The Power Of Set Operations In SQL

In relational databases, it is frequent to have many to many relationships. We can have a customer-product relation, a student-course relation, physician-patient relation among many other forms. In this article, we are going to solve a complex problem, a query which is complicated to write for many developers.

In these days all of us know what a streaming video platform is. Moreover, perhaps many of us have a monthly bill from one of the biggest streaming platforms: Netflix. The problem we are going to solve is “Find all the pairs of persons being tweens in Netflix.” What means that?, A person who saw the exact set of movies that the other person saw. Let’s see an example to clarify what we are looking for.

For simplicity we are not going to use foreign keys, so we will have only one table with personal names and movie names like we can see in the following image:

 

TABLE: person_movie

 

Well, if we review the previous table we can find the pairs [John, Peter], [Helen, Mary]. Is there any other pair? No, if we look, Ana, saw the same movies as Helen and Mary. However, Ana saw some other movies (Armageddon), and this is the reason because she is not in any pair. Remember we are looking at pairs of people who saw the same sets of movies.

OK, we already describe our problem, is not complicated to understand. Perhaps the complicated part is to solve in SQL. By fortune we have set operators in SQL like UNION, EXCEPT and INTERSECT. Let’s try to understand our problem regarding sets.

Let’s first examine in the set of movies seen by Ana:

annas movies

Let’s add now the collection of movies seen by John

 

We know the movie in the intersection is “Superman”. This is good because “Superman” is a movie seen by both. The problem here is that Ana and John have both movies which are not in the intersection. And this is what we don’t want: movies outside the intersection. In other words, if the UNION of both sets of movies is the same as the INTERSECTION of both sets, we have a pair.

 

 

Rephrasing the problem: we are looking for persons A and B where:

MOVIES SEEN BY A   UNION   MOVIES SEEN BY B

Is the same as

MOVIES SEEN BY A   INTERSECT  MOVIES SEEN BY B

In other words, we are looking for pairs of persons A and B where:

(

MOVIES SEEN BY A   UNION   MOVIES SEEN BY B

)

MINUS

(

MOVIES SEEN BY A   INTERSECT  MOVIES SEEN BY B

)

…is the empty set. Let’s start to write this idea in SQL

 

 

If we run we obtain the following results:

 

 

In the previous result, we can see several redundant pairs. In the example, why is [Peter, Peter] in the result. Well, Peter saw the same movies as Peter does, this is the reason. Another redundant pair we have is [John, Peter] which is correct, but we have also [Peter, John]. We only want one of these. To solve this two issues, we will add a filter clause in the WHERE to indicate we want only pairs where the first person name is less than the second person name. This filter will eliminate the redundant pairs, let’s see:

 

 

And the result is better, not perfect yet, but better

 

 

We still have an issue: we have duplicated pairs!. Why? Because we use as the base table in the main FROM the many-to-many table (person_movie), which is wrong. If we use the “person” table instead of “person_movie”, we will have the correct results. However in this article to simplify we used only one table. Anyway, the solution is straightforward, add a DISTINCT clause, and we will have the correct result. Caution: In real life, we should use the “person” table, because by using the many-to-many table, we are overloading with extra work the database engine. Why? Because we are evaluating each pair of persons many times, and at the end, we discard the additional results. Let’s see in the next image the query and the correct results

 

 

CONCLUSION

We can continue working with this idea of tweens. First of all, you can apply the same approach to many other database tables. Everywhere a many-to-many table is, you can use this idea. Moreover, you can try to find sets of movies which are popular. If one person appears in many pairs, this fact means that many people saw precisely the same set of movies. Go deeper and do it yourself!