Problem
Given a table in MySQL, how do you remove duplicate rows?
Input
first_name | last_name | |
---|---|---|
John | Doe | john.doe@email.com |
Jane | Smith | jane.smith@email.com |
John | Doe | john.doe@email.com |
Bob | Johnson | bob.johnson@email.com |
Alice | Williams | alice.williams@email.com |
Jane | Smith | jane.smith@email.com |
Try Hands-On: Fiddle
Create Input Table: Gist
Desired Output
first_name | last_name | |
---|---|---|
John | Doe | john.doe@email.com |
Jane | Smith | jane.smith@email.com |
Bob | Johnson | bob.johnson@email.com |
Alice | Williams | alice.williams@email.com |
Solution 1:
Using SELECT DISTINCT
CREATE TABLE employees_no_duplicates AS
SELECT DISTINCT *
FROM employees;
Explanation:
To remove duplicate rows from the employees table, you can use the DISTINCT keyword in a SELECT statement to identify unique rows and then insert the results into a new table.
Alternately you can use the DELETE statement to remove duplicates from the original table. Let’s see how to do that.
Solution 2:
Using a SELF JOIN
This approach works if there is an id
field which does not have the values duplicated.
DELETE e1
FROM employees e1
JOIN employees e2
ON e1.id < e2.id
AND e1.first_name = e2.first_name
AND e1.last_name = e2.last_name
AND e1.email = e2.email;
Output:
id | first_name | last_name | |
---|---|---|---|
3 | John | Doe | john.doe@email.com |
4 | Bob | Johnson | bob.johnson@email.com |
5 | Alice | Williams | alice.williams@email.com |
6 | Jane | Smith | jane.smith@email.com |
Try Hands-On: Fiddle
Explanation:
This approach uses a DELETE statement with a self-join to remove duplicate rows from the original table.
It directly modifies the existing table, so be cautious when using this approach, as it permanently removes data.