Deleting Duplicate Data

In most cases, you don’t want duplicate rows in your database. The best practice is to set up your insert process so that data that is already in your tables aren’t added again, but what if that horse has left that barn and you already have duplicates?

 

 

In this post, we’ll show you how you remove duplicate data in your MySQL databases.

 

At a Glance

There are just a few necessary steps when removing duplicate data from your table:

  1. Back up your table!
  2. Find the duplicate rows
  3. Remove the duplicate rows

 

Back Up Your Table

As is the case anytime you make permanent changes to your table; we strongly recommend backing up your data before you begin!

 

Find the Duplicate Rows

The first thing you’ll want to do is identify the duplicate rows in your table. Generally speaking, you can do so using either of these techniques:

You can use DISTINCT in your SELECT statement to find unique data combinations in your table.

SELECT DISTINCT name, owner, species, sex FROM pet

 

Another way is to add GROUP BY, naming the columns that you’re selecting in your query. This allows you to choose unique combinations of data.

SELECT name, owner, species, sex FROM pet GROUP BY (name);

 

Remove the Duplicate Rows

Now that you know what rows you need to keep (and by extension, the rows you need to delete), you can go ahead and write the statement that will remove your duplicate rows.

In this specific instance, our script will

  1. Create a temporary table
  2. Move the unique data we identified in the previous step to the temporary table
  3. Drop the original table
  4. Rename the temporary table with the name of the original table

 

CREATE TABLE temporaryTable
SELECT name, owner, species, sex FROM pet GROUP BY (name);
DROP TABLE pet;
ALTER TABLE temporaryTable RENAME TO pet;

 

 Takeaways

It’s best to prevent duplicate data from getting written to your database, but sometimes you aren’t able to do that. If you need to remove duplicate data, however, it isn’t a tricky task.