Upsert in SQL

Upsert in SQL

Upsert is an operation that ➊ inserts new records into the database and ➋ updates existing ones. Let’s see how it works in different DBMS. The examples are interactive, so you can read and practice.

We will use the toy employees table:

┌────┬───────┬────────┬────────────┬────────┐
│ id │ name  │  city  │ department │ salary │
├────┼───────┼────────┼────────────┼────────┤
│ 11 │ Diane │ London │ hr         │ 70     │
│ 12 │ Bob   │ London │ hr         │ 78     │
│ 21 │ Emma  │ London │ it         │ 84     │
│ 22 │ Grace │ Berlin │ it         │ 90     │
│ 23 │ Henry │ London │ it         │ 104    │
│ 24 │ Irene │ Berlin │ it         │ 104    │
│ 31 │ Cindy │ Berlin │ sales      │ 96     │
│ 32 │ Dave  │ London │ sales      │ 96     │
└────┴───────┴────────┴────────────┴────────┘

Let’s say we are adding two new employees:

(25, 'Frank', 'Berlin', 'it', 120)
(33, 'Alice', 'Berlin', 'sales', 100)

Hello, Frank. Hello, Alice.

And updating the other two:

- (11, 'Diane', 'London', 'hr', 70)
+ (11, 'Diane', 'Berlin', 'hr', 70)

- (21, 'Emma', 'London', 'it', 84)
+ (21, 'Emma', 'London', 'it', 95)

Diane is moving to Berlin, while Emma’s salary is increasing to 95.

You can read this article from start to finish, or you can skip to a section about the specific database engine:

MySQL/MariaDBSQLitePostgreSQLSQL ServerOracleOthers

INSERT

Let’s try the easy way:

insert into employees
(id, name, city, department, salary)
values
(11, 'Diane', 'Berlin', 'hr', 70),
(21, 'Emma', 'London', 'it', 95),
(25, 'Frank', 'Berlin', 'it', 120),
(33, 'Alice', 'Berlin', 'sales', 100);

Edit

The first table is “before insert”, the second is “after insert”. We’ll use this notation for all examples.

As expected, it fails for Diane and Emma: the employee ID is a primary key, so we can’t have two employees with the same ID. We’ll have to try something else.

MySQL: REPLACE

MySQL’s replace into inserts records with new primary/unique keys (in our case, employee IDs) and replaces the ones with existing keys:

replace into employees
(id, name, city, department, salary)
values
(11, 'Diane', 'Berlin', 'hr', 70),
(21, 'Emma', 'London', 'it', 95),
(25, 'Frank', 'Berlin', 'it', 120),
(33, 'Alice', 'Berlin', 'sales', 100);

Edit

Added Frank and Alice, replaced Diane and Emma.

The bad thing about replace is that it does an actual delete followed by insert, which has some unexpected side effects like firing on delete triggers.

Also, you can’t reference the old values when replacing the record. For example, you can’t set city = Berlin (prev. London) for Diane.

Documentation

To ignore the duplicates and only perform inserts, you can use insert ignore:

insert ignore into employees
(id, name, city, department, salary)
values
(11, 'Diane', 'Berlin', 'hr', 70),
(21, 'Emma', 'London', 'it', 95),
(25, 'Frank', 'Berlin', 'it', 120),
(33, 'Alice', 'Berlin', 'sales', 100);

Edit

Added Frank and Alice, ignored Diane and Emma.

Documentation

replace is probably not the best way to do upserts in MySQL. There is a better alternative ↓

MySQL: INSERT…ON DUPLICATE KEY UPDATE

MySQL’s on duplicate key update inserts records with new primary/unique keys (in our case, employee IDs) and updates the ones with existing keys:

insert into employees
(id, name, city, department, salary)
values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
on duplicate key update
  city = values(city),
  salary = values(salary);

Edit

Added Frank and Alice, updated Diane and Emma.

When updating a record, you can reference the old value by column name (city) and the new value with the values() function (values(city)). Now we can actually update a value, not just replace it:

insert into employees
(id, name, city, department, salary)
values
  (11, 'Diane', 'Berlin', 'hr', 70)
on duplicate key update
  city = concat(values(city), ' (prev. ', city, ')');

Edit

Updated the city for Diane and kept the old value for reference.

Starting with MySQL 8.0, you can alias the new values and not use the ugly values() function:

insert into employees
(id, name, city, department, salary)
values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
  as new
on duplicate key update
  city = new.city,
  salary = new.salary;

Edit

Using the new alias for new records.

Documentation

SQLite: INSERT OR…

SQLite’s insert or replace inserts records with new primary/unique keys (in our case, employee IDs) and replaces the ones with existing keys:

insert or replace into employees
(id, name, city, department, salary)
values
(25, 'Frank', 'Berlin', 'it', 120),
(33, 'Alice', 'Berlin', 'sales', 100),
(11, 'Diane', 'Berlin', 'hr', 70),
(21, 'Emma', 'London', 'it', 95);

Edit

Added Frank and Alice, replaced Diane and Emma.

Like MySQL’s replace, SQLite’s or replace deletes existing duplicate records before inserting the new ones. It fires delete triggers in some cases and does not allow references to old values.

SQLite provides a number of alternative strategies instead of replace:

  • ignore: skips the duplicate records (similar to MySQL’s ignore).
  • rollback: aborts the current statement and rolls back the entire transaction.
  • abort: aborts the current statement and reverts any changes made by it, but does not roll back the transaction.
  • fail: aborts the current statement, but does not revert any changes already made by it, and does not roll back the transaction.
insert or abort into employees
(id, name, city, department, salary)
values
(11, 'Diane', 'Berlin', 'hr', 70),
(21, 'Emma', 'London', 'it', 95),
(25, 'Frank', 'Berlin', 'it', 120),
(33, 'Alice', 'Berlin', 'sales', 100);

Edit

Since there are conflicts, abort reverts all changes. The table remains as it was.

Documentation

insert or replace is probably not the best way to do upserts in SQLite. There is a better alternative ↓

SQLite: INSERT…ON CONFLICT

SQLite’s on conflict do update inserts records with new primary/unique keys (in our case, employee IDs) and updates the ones with existing keys:

insert into employees
(id, name, city, department, salary)
values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
on conflict do update set
  city = excluded.city,
  salary = excluded.salary;

Edit

Added Frank and Alice, updated Diane and Emma.

When updating a record, you can reference the old value by column name (city) and the new value with the excluded prefix (excluded.city).

You can optionally disable some updates using the where clause:

insert into employees
(id, name, city, department, salary)
values
  (21, 'Emma', 'London', 'it', 95),
  (22, 'Grace', 'Berlin', 'it', 105)
on conflict do update set
  salary = excluded.salary where excluded.salary < 100;

Edit

We only allow salaries under 100, so Emma gets updated but Grace does not.

Besides do update, there is also do nothing, which ignores conflicts and only inserts new records:

insert into employees
(id, name, city, department, salary)
values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
on conflict do nothing;

Edit

Added Frank and Alice, ignored Diane and Emma.

on conflict even allows you to specify different rules for different uniqueness constraints:

create unique index idx_employees_name on employees(name);

insert into employees
(id, name, city, department, salary)
values
  -- increase Emma's salary to 95
  (21, 'Emma', 'London', 'it', 95),
  -- change Grace's name to Diane
  (22, 'Diane', 'Berlin', 'it', 90)
on conflict (id) do update set
  city = excluded.city,
  salary = excluded.salary
on conflict (name) do nothing;

Edit

We only allow unique names (stupid, I know), and “Diane” is already taken. So Emma’s salary is updated, but Grace’s name is not.

Documentation

Code playgrounds

Interactive examples in this post are powered by codapi — the platform I’m building. Use it to embed code playgrounds into your online course, documentation, or blog.

PostgreSQL: INSERT…ON CONFLICT

PostgreSQL’s on conflict do update is very similar to SQLite’s (in fact, SQLite’s implementation was inspired by PostgreSQL’s). It inserts records with new primary/unique keys (in our case, employee IDs) and updates the ones with existing keys:

insert into employees
(id, name, city, department, salary)
values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
on conflict (id) do update set
  city = excluded.city,
  salary = excluded.salary;

Edit

Added Frank and Alice, updated Diane and Emma.

When updating a record, you can reference the old value by column name (city) and the new value with the excluded prefix (excluded.city).

Note that the constraint reference after on conflict (it is called conflict targetid in our case) is required. You can also specify a constraint name:

alter table employees add constraint uniq_employees_name unique (name);

insert into employees
(id, name, city, department, salary)
values
  -- change Grace's name to Diane
  (22, 'Diane', 'Berlin', 'it', 90)
on conflict on constraint uniq_employees_name do nothing;

Edit

We only allow unique names, and “Diane” is already taken. Grace’s name remains unchanged.

As you can see from the example above, PostgreSQL supports do nothing just like SQLite. It also supports the where clause:

insert into employees
(id, name, city, department, salary)
values
  (21, 'Emma', 'London', 'it', 95),
  (22, 'Grace', 'Berlin', 'it', 105)
on conflict (id) do update set
  salary = excluded.salary where excluded.salary < 100;

Edit

We only allow salaries under 100, so Emma gets updated but Grace does not.

Unlike SQLite, PostgreSQL does not allow multiple on conflict clauses:

create unique index idx_employees_name on employees(name);

insert into employees
(id, name, city, department, salary)
values
  (21, 'Emma', 'London', 'it', 95),
  (22, 'Diane', 'Berlin', 'it', 90)
on conflict (id) do update set
  city = excluded.city,
  salary = excluded.salary
on conflict (name) do nothing;

Edit

This is an error: only one on conflict is allowed.

Documentation

insert...on conflict is a fine way to do upserts on older PostgreSQL versions. For modern ones (15+) there is a better alternative ↓

SQL Standard: MERGE

Now that we’ve seen all the creative ways vendors have implemented upsert in their systems, let’s look at the standard way — the merge statement:

merge into employees
using (
  values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
) as new (id, name, city, department, salary)
on employees.id = new.id
when matched then update set
  city = new.city,
  salary = new.salary
when not matched then insert
  values (id, name, city, department, salary);

Edit

Added Frank and Alice, updated Diane and Emma.

Here is what happened:

  • We specified the merge source in the using clause (here we use the values as the source, but using also accepts tables, views and subqueries).
  • We gave the merge source an alias (new) for future reference.
  • We defined the match criteria between the merge source (our new values) and the merge target (the employees table) using the on clause (employees.id = new.id).
  • We set the action to take when the merge source and merge target match by the given criteria (update city and salary with new values).
  • We set the action to take when the merge source and merge target don’t match (i.e., there are new records in the source, so we should insert them).

Whew. That’s a lot to process. But the merge statement can do much more. For example, you can change the inserted values:

merge into employees
using (
  values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
) as new (id, name, city, department, salary)
on employees.id = new.id
when matched then update set
  city = new.city,
  salary = new.salary
when not matched then insert
  values (id, name || ' (new!)', city, department, salary);

Edit

We still update Diane and Emma, but we also change Frank and Alice’s names on insert.

Or even delete the matching records:

merge into employees
using (
  values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
) as new (id, name, city, department, salary)
on employees.id = new.id
when matched then delete
when not matched then insert
  values (id, name, city, department, salary);

Edit

Added Frank and Alice, deleted Diane and Emma.

Or ignore new records altogether by omitting the when not matched:

merge into employees
using (
  values
  (11, 'Diane', 'Berlin', 'hr', 70),
  (21, 'Emma', 'London', 'it', 95),
  (25, 'Frank', 'Berlin', 'it', 120),
  (33, 'Alice', 'Berlin', 'sales', 100)
) as new (id, name, city, department, salary)
on employees.id = new.id
when matched then update set
  city = new.city,
  salary = new.salary;

Edit

Updated Diane and Emma, ignored Frank and Alice.

You can perform conditional inserts, updates and deletes using when...and clause:

merge into employees
using (
  values
  (21, 'Emma', 'London', 'it', 95),
  (22, 'Grace', 'Berlin', 'it', 105)
) as new (id, name, city, department, salary)
on employees.id = new.id
when matched and new.salary < 100 then update set
  city = new.city,
  salary = new.salary;

Edit

We only allow salaries under 100, so Emma gets updated but Grace does not.

merge is supported in PostgreSQL (15+), SQL Server (2008+) and Oracle (11+). Of course, each of them has its own quirks. For example, PostgreSQL adds the non-standard do nothing action, while SQL Server supports different when not matched actions depending on where the value is missing — in a source or target dataset.

→ Documentation: StandardPostgreSQLSQL ServerOracle

Other database engines

BigQuery, H2, Snowflake, DB2 — all support merge (each to some extent).

MariaDB supports replace and on duplicate key update, just like MySQL.

Clickhouse does not have upsert SQL statements, but it does provide a ReplacingMergeTree table engine that automatically performs a replace-style upsert on insert.

Cockroachdb supports insert...on conflict (like PostgreSQL), but with an optional conflict target. It also provides an upsert statement (ta-da!) that works like an automatic on conflict do update.

DuckDB supports insert or replace and insert or ignore (like SQLite) as well as insert...on conflict (also like SQLite), but without multiple clauses.

Summary

We’ve reviewed ways to do upserts in each of the major databases:

  • replace or on duplicate key update in MySQL (MariaDB).
  • insert or replace and on conflict do in SQLite.
  • on conflict do in older PostgreSQL versions.
  • merge in modern PostgreSQL, SQL Server and Oracle.

It’s funny that (almost) no DBMS has actually named the statement upsert, despite the popularity of the name.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.