A general description of the proposed change and reasoning behind it is on GitHub: https://github.com/LemmyNet/lemmy/issues/3697
Linear execution of these massive changes to votes/comments/posts with concurrency awareness. Also adds a layer of social awareness, the impact on a community when a bunch of content is black-holed.
An entire site federation delete / dead server - also would fall under this umbrella of mass data change with a potential for new content ownership/etc.
This is weird. Account deletion should be handled by JOIN at lookup time, so comments/posts only display if the account is active. No mass updates, pipelines or otherwise
AccountDelete has a marketed feature to overwrite all post and comment content.
Hmm ok, false sense of security there since another advertised feature is the open API (meaning no restrictions on scraping bots so there will definitely be archives of deleted posts), but whatever.
How does this sound: encrypt the comments in the db using a random key stored in the account row. Then at account deletion, overwrite that key, so the comments can no longer be decrypted. Maybe there is a way to purge those comments altogether during the next VACUUM. No idea how often that happens though.
I’m inclined to encourage we bite the bullet while data is still relatively small and change delete/removed field into a unified status field, enum or integer or is it a character? I think I’ve seen code that says= true and and =‘t’
EDIT: I created a new posting regarding consolidating some of these fields that yield the same results.
And have some timestamps of deleted, even if that’s off on another table. Need to thigh this through some more.
I think the true / ‘t’ thing is just postgres and how it handles boolean fields.
If they’re not all boolean, then yes we should fix.
Question… user ban, are moderators doing the remove data option? The API seems to allow it on a testing server install, but I don’t know what moderators are actually doing in the field.
That ban removal is another potential high I/O operation. There are accounts that copy postings in mass off hacker news, Reddit, etc. And I could see banning one of those accounts triggering a lot of PostgreSQL activity.
I did perform some tests yesterday. DeleteAccount does NOT Delete the communities created by the account. If the account deleted was the moderator, the community will be left with zero moderators. That person’s post and comments are overwritten, but a post created in the community by an account other than the deleted one should still function (not sure how deep the testing went on that, still enhancing testing).