Upsert is an interesting and powerful SQL statement not supported on all database managers. In this article we will cover a technique to emulate the upsert statement with other regular SQL statements in PostgreSQL. You can apply this technique in other database engines too.
The UPSERT statement
The upsert statement is used when we have two tables. One is the destination table, where we will do updates or inserts (hence the name upsert) and the other table is the source table, where the information to be inserted or updated is stored. Let’s see an example.
Example tables : employee and employee_news
Let’s suppose we have a table called employee as we can see in the first image, which is the destination table and the employee_news table as the source table. Both tables have the same schema. To understand how both tables are related, we can think employee_news as a table that we receive once a month with the changes to apply to the employee table.
employee_news table (source)
UPSERT resulting table
To understand what we expect from the upsert execution, we will show the resulting employee table expected. The upsert statement only modify the destination table by: inserting non existing records (based on emplid value) and updating previously existing records. In the following image the new records are showed in green, and the fields with changes in blue.
The implementation of upsert we are covering here is based on the combination of the following two SQL statements:
- UPDATE with RETURNING *
- INSERT of a result set produced with a SELECT
In the next paragraphs we will explain both SQL statements
UPSERT: UPDATE THE MATCHING RECORDS
Lets see first the SQL update to modify the employee table with the records on employee_news table:
In the previous image we can see how to update a table (employee) based on other table with similar schema (employee_news). As we can see in the last line of the update, we affected or changed only records matching on emplid value.
The next point to explain is the RETURNING * sub-clause, which is a really powerful sub-clause that can be used in the insert, delete and update sql statement. It allows us to return as a result set all the records affected by an insert, delete or update. Then, in our upsert implementation, we will add to the previous update the RETURNING * to return all the update affected rows as they were a regular result set of a select. This result set (implemented as the upsert temporary table in our example) will be used in the insert as we will see in the next paragraph.
UPSERT: INSERTING THE NON MATCHING RECORDS
The next image shows the insert of the non matching records(records existing in employee_news and not existing in employee). The key to exclude the non matching records is the WHERE NOT EXISTS clause,.
The idea behind the NOT EXISTS clause is: if the record in employee_news doesn’t exist in the upsert temporary table, then this record must be inserted into employee. It is simple !
LET’S PUT ALL THE PIECES TOGETHER
Now we can see the complete upsert implementation. It is easy to see each part: first, the update part, and second the insert part.
We covered a technique to emulate the upsert statement, by combining an update and an insert. You can apply this technique to any pair of tables with identical schemas. Let’s try to adapt to your database!