I’ve worked with database schemas that often had a soft delete field indicator in tables (such as deleted_at, invalidated, and so on) to keep track of the deletion status of records.
It’s obvious that they offer some benefits, for example the ability to revert a soft deleted record by just switching or nulling a field.
Are soft deletes worth it?
Some of the problems I’ve encountered when introducing soft deletions in my own database architecture:
Composite Keys
Suppose a record with a composite primary key of (field1, field2).
If that record is soft deleted, and then another one is created with the same composite key, there would be a conflict.
Solutions:
Create a surrogate key. But more often than not they would just make queries and joins unnecessarily difficult, especially in applications where data is modified often.
Delete the previous record. But that would be considered a side effect rather than a good practice, in addition to losing the audit trail benefit.
Drop the primary key and just put a unique constraint where indicator = NULL. Seems good, but I don't see what am I am losing by replacing a PRIMARY KEY constraint with a UNIQUE one.
Workarounds and Impact on Code
Since more often than not databases are used alongside some sort of application, there could be implications on the code.
Some ORMs support soft deletions out of the box by modifying queries, but it’s not guaranteed (e.g., Hibernate recently added out-of-the-box support for soft deletions). This would mean that developers would have to write their own queries, making sure that the indicator is always checked when needed.
Performance Implications
As the database grows, deleted records will grow. Having 20% to 40% deleted records in a table would be unnecessary weight for the database. Indexes would need to exclude deleted records.
A solution could be to delete extremely old records with a scheduled job.
The benefits of soft deletions might not be worth it and better solutions for deletion recovery and auditing may exist (e.g., just having a deleted records table).
Another edge case is cascade soft deletion. While on standard deletions we have ON CASCADE, soft deletions might need more sophisticated approaches, such as triggers and functions that mark the children as soft deleted.