Upsert is an exciting and powerful SQL statement not supported on all database managers. In this article, we will cover a technique to emulate the upsert statement with other regular SQL statements in PostgreSQL. You can apply this technique to other database engines too.
The UPSERT statement
We use “UPSERT” statement when we have two tables. One is the destination table, where we will do updates or inserts (hence the name UPSERT), and the other table is the source table, where the information to be inserted or updated is stored. Let’s see an example.
Example tables: employee and employee_news
Let’s suppose we have a table called employee as we can see in the first image, which is the destination table and the employee_news table as the source table. Both tables have the same schema. To understand how both tables are related, we can think of employee_news as a table that we receive once a month with the changes to apply to the employee table.
employee_news table (source)
UPSERT resulting table
To understand what we expect from the upsert execution, we will show the resulting employee table expected. The upsert statement only modifies the destination table by inserting nonexisting records (based on implied value) and updating previously existing records. In the following image, the new records are shown in green and the fields with changes in blue.
The implementation of UPSERT we are covering here is based on the combination of the following two SQL statements:
- UPDATE with RETURNING *
- INSERT of a result set produced with a SELECT
In the next paragraphs, we will explain both SQL statements
UPSERT: UPDATE THE MATCHING RECORDS
Let’s see first the SQL update to modify the employee table with the records on “employee_news” table:
In the previous image, we can see how to update a table (employee) based on another table with similar schema (employee_news). As we can see in the last line of the update, we affected or changed only records matching on implied value.
The next point to explain is the RETURNING * sub-clause, which is a powerful sub-clause that can be used in the insert, delete and update SQL statement. It allows us to return as a result set all the records affected by an “insert”, “delete” or “update”. Then, in our upsert implementation, we will add to the previous update the RETURNING * to return all the update affected rows as they were a regular result set of a select. This result set (implemented as the upsert temporary table in our example) will be used in the insert as we will see in the next paragraph.
UPSERT: INSERTING THE NON MATCHING RECORDS
The next image shows the insert of the non-matching records(records existing in “employee_news” and not existing in “employee”). The key to exclude the non-matching records is the WHERE NOT EXISTS clause.
The idea behind the NOT EXISTS clause is: if the record in employee_news doesn’t exist in the upsert temporary table, then this record must be inserted into “employee”. It is simple!
LET’S PUT ALL THE PIECES TOGETHER
Now we can see the complete upsert implementation. It is easy to see each part: first, the update part, and second the insert part.
We covered a technique to emulate the upsert statement, by combining an update and an insert. You can apply this technique to any pair of tables with identical schemas. Let’s try to adapt to your database!