If you work with IT systems, sooner or later you’ll have the need to write SQL to “fix” data in a database table (or tables). One key piece of advice is to make sure that the SQL script you come up with is reversible, just in case you need to change things back or make additional changes on the data set that was changed.
Here’s an example of what I mean:
If this was your data conversion script (to flag users as deleted):
set deleted = 1 where id = (
select id from users
where to_be_deleted = 1
and deleted = 0);
Assuming that all deleted users have the flag “to_be_deleted” set to 1, this script is not reversible. Once run, how would you distinguish the data set you changed (set to deleted = 1) from all the other users that already have a flag of deleted = 1?
A few options here (there are many ways to solve this):
- Set to_be_deleted to a value unique to your data change (e.g. to 99)
- Generate individual update statements for each user id you are updating (use SQL to do this).
- If there is a date/time stamp field use that to identify your change
- If there is an “updated by user” type field use that to identify your change
The key is to think about the scenario, “after running this script, can I identify the data set that was changed easily?”