Hacker News new | past | comments | ask | show | jobs | submit login

Except part of their 'value prop' is "We have this giant trove of human created content, and AI companies need to start paying us to utilize it when training their models".



Well, if that's the case, they can easily switch to "show 'deleted' content" - removing your comments doesn't delete them from Reddit's DB.


To add to the other comments, they have confirmed in the past they don’t keep track of history.

That could have changed, sure, but nothing indicates that is the case.

You could also go the GDPR route and request all your data be deleted, if you are subject to that. They would be forced to comply with that request.


I did a GDPR request and it showed the content of my deleted comments

but if you overwrite first with garbage then delete that's what shows up in the dump


Which is how shreddit works, it changes it to garbage then deletes it.


Nice. I wouldn’t have expected them to have changed this but it’s good to see it at least partially confirmed.


GDPR has exceptions, this would be such an exception that doesn't allow you to simply invoke GDPR and get everything deleted. They just have to anonymize the poster by deleting their signature, avatar, profile,... while the posts can stay intact (unless they contain personal info).


> (unless they contain personal info)

Do you think they are going to go through each of your comments individually to determine if it contains personal info or not? That requires actual time and effort, and as an individual user you or your comments are nowhere even remotely in the vicinity of important enough for that.

It’s far easier for them to take the loss and delete wholesale.

Do they do this? I don’t know, I haven’t done it. But it certainly could be argued that by them not doing this, they are not fully complying with the requirements of a GDPR deletion request.


That would destroy their whole value proposition, your user generated content is their goldmine. Of course they don't have to sift through your thousands of posts to find the one that has GDPR info, you'll have to show them the post.


You cannot impose requirements like that as part of the deletion process.

It is 100% not the user’s responsibility to keep track of this information. That is explicitly a requirement of the provider and it is fully on them to ensure that a deletion request deletes all the personal data. If they didn’t tag it properly and can’t ensure that, then that is their problem to solve not yours.

Their value proposition is also not a single user’s data. It’s the entirety of the data set. One user’s data is nearly worthless, certainly not worth enough to have a human review it. Which was my point.


I take it you haven't seen how forums deal with GDPR notices? It's exactly how I described. The profile is anonymized/emptied and the posts stay.


Just because a bunch of forums do it that way doesn't mean it's correct. When I was at Twitter the Compliance team determined that all user-generated content was Personal Data under the scope of GDPR. Those forums may be getting away with it right now, but they're playing with fire. If someone wanted to raise a stink with regulators they could be in trouble. I guarantee you if Reddit or a site of similar scope tried something like that someone would.


I know of two forums in two different EU countries that have also consulted with lawyers and determined that it was sufficient. I am inclined to give higher weight to your opinion simply because Twitter Compliance will have access to a larger team of lawyers, do you know if they were American lawyers or EU lawyers?


"When it became known that post edits were not saved but post deletions were saved, code was added to edit your post prior to deletion."


Most deletion apps I've seen also offer the ability to redact your comments with nonsense edits before deletion.


That does assume that reddit actually updates the comment entry in their backend instead of keeping histories of the edits which I'm not sure we've had any insight into.


Keeping a history is far more work (and cost!) than maintaining a deleted flag. I'm inclined to believe they don't keep a history.


That’s my assumption too, keeping a history for every user increases complexity tenfold, while flagging deleted comments seems to be a common practice even in smaller companies because it’s so simple.

Another user said they tried the data dump GDPR request and the comments included deleted comments but only the edited version, so I guess this can be verified at least.


Actually being able to show history of edits would be a nice feature tbh.


There are public data dumps of Reddit comments available all the way up to December 2022. And they're only roughly ~2TB all together.

There's nothing stopping AI companies from just using those instead of paying Reddit $50 million to scrape all of them using the API. It would also be 10x-100x quicker to do that rather than hammer their API for the comments (the API sucks for mass data retrieval)


Sure, but companies doing that also wouldn’t be paying Reddit for that data.

The point of shredding comments isn’t to hurt the companies scraping the data (although that might be a nice side effect). Ultimately it’s to hurt Reddit.


Where would someone find these?



merci good fellow :)


lieto di aiutare l'amico


Ah, good point. If this is the case, yeah, shred away. Still it's too bad that this greed will make it harder for humans to see useful old discussions.


It's not greed, it's capitalism. This is the system working as it's intended.


That's worth remembering, yes - without that we wouldn't have had Reddit in the first place.


Maybe we wouldn't have had Reddit, but I was perfectly content going to multiple phpBB boards when I was younger to discuss my hobbies.

Reddit always felt like a place to take from since it got lots of people by being centralized, but not really a place to contribute to unless you were contributing to the group think aspect of the community.

Communities specifically set up for answering questions or R&D were in my opinion the only valuable communities. Figuring out things or learning from the huge number of users was helpful, but it was never a fun place to just talk about any of my interests.


I'm really not a fan of comments like these. There is nothing inherently wrong with capitalism, and 'problems' like this could be solved vid regulation.

So no, the problem really is greed, and the extreme resistance to regulation in the US.


Which in this case is both greed and also bad


Well, capitalism is about choices. There are a multiple choices reddit could make, and there are multiple choices reddit users can make.

This is a fairly classic case of "you aren't the customer, you're the product", but that isn't the only way capitalism works.

This is capitalism and greed and disdain for the user.

I don't know if that will kill or materially damage reddit, but that combo kills plenty of regular businesses. (Salient difference is probably that user != customer, as with many internet businesses.)


Do you think that reddit actually deletes comments when the user presses delete? My assumption would be that it just sticks up a "do not display" flag in the database. I'm sure that there's some influence that GDPR has though.


The plug in I use (I think nuke Reddit) overwrites comments with random blarg that’s realistic sounding text, then deletes them.

I’m sure Reddit keeps all versions as well. But I think it would be impractical to restore to the correct version at scale unless they want to manually review to find the “right” version to restore.

I think if they got a specific subpoena for me, they could find my comments with a manual investigation, but I expect that will never happen as there’s no reason for anyone to do that.

I just want to remove my content from Reddit.com and make it harder if they decide to undelete or otherwise not respect my decision.

I’m surprised Reddit still allows edit and undelete and expect them to remove the functionality soon.


> I’m sure Reddit keeps all versions as well. But I think it would be impractical to restore to the correct version at scale unless they want to manually review to find the “right” version to restore.

If they retain versioning history I'm sure it would be easy to identify a mass edit and revert all of those edits from the user. If it wasn't easy, for some reason, it would probably be easy to revert all edits after, say, 2 days of posting.

Given that everything posted to Reddit becomes the property of Reddit (okay, perpetually licensed to Reddit), I don't know that much legally could be done about this. Unless they restored stuff posted while under-age, or PII, maybe.


Just need to update the script to also create new comments with random garbage, edit those to other random garbage, then delete. Add in some random delays between actions, randomize the order of all individual actions, and this would make it very difficult for admins to separate legitimate activity from script activity.


If on a new page those new comments would get downvoted to oblivion. If on an older page they'd be partially identifiable by dint of being on an older page.

But sure, things could be done to make this more difficult. It's probably not worthwhile on Reddit's part to do anything to stop this, just as it hasn't been too worthwhile for websites to evade ad blockers. The number of people who mass delete is just too small to matter.

If I worked at Reddit and wanted to do something about it though (and was a programmer), I'd add an option under individual deleted comments for viewers to click to view the comment (and any versions). And possibly add an option for viewers to restore a version entirely. This would save helpful comments, at least until some jerk decided to automate the process and restore everything. So maybe the complete restoration is a bad idea.


If I worked at Reddit and wanted to do something about it though (and was a programmer), I'd add an option under individual deleted comments for viewers to click to view the comment (and any versions).

That could still backfire. Users may be very unhappy that their unedited comments are accessible forever. This may drive them away from commenting and participating in general.

The goal of mass-deleters is to drive down engagement. If Reddit makes the entire edit history of each comment accessible, then mass deleters could flood that history with bogus, AI-generated crap. Although it may still be possible to determine which edit was the last real one, the effort to do so goes way up, and engagement goes down as a result.


> But I think it would be impractical to restore to the correct version at scale unless they want to manually review to find the “right” version to restore.

They could restore all comments a month after controversy/blackout events from about a month before such events.

That would probably restore the majority as most people are deleting/overwriting their comments as a reaction to or as a part of these events.


Of course they could, but it’s very unlikely. Even if 10% of the users did this, there would be an uproar.

It’s much more likely they just disallow editing and deleting.


I was thinking more restore for their own dataset they might want to use to sell or whatever, I agree they wouldn't restore them publicly.


This is probably true, but at least one implication of what the program in the title does is edit your existing comments with something before marking them as deleted, because at some point (this is probably no longer true) Reddit did not store your entire comment history.

There are obviously ways to defeat this in analysis, but it does make Reddit's job slightly harder if they want to leverage that data. It would also probably be interesting to also just edit them and not delete them in some cases in some randomized way, which would make it even harder to reliably tease out good comments from noise.


if(comment.IsSoftDeleted) { write("[deleted]") } else { write(comment.Content) }


Some time last year I attempted to make a similar tool. I was able to retrieve comments that had been deleted in the requests so I suspect that there is a "display flag" of sorts that is checked against.


Most of these tools first edit+save the comment with a word or single letter overwriting the original text in the db, then delete it.


CCPA also has a right to delete clause


That's applicable to personal information only. Everything a user posts to Reddit that isn't personal information, Reddit can use however they want.

> You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.


GDPR only applies to EU citizens though. If the data is truly valuable, I could imagine some work-arounds as well. E.g. maybe each reddit post is automatically a copyright work which you immediate give a perpetual license to reddit inc. You also automatically transfer copyright ownership to reddit inc and they license back your ability to share your comment.


GDPR only protects personal data. If someone requests deletion, you could probably keep the comments as long as you anonymize them (which Reddit does).

Your comments, including here in HN, are probably already covered by a scheme like that where you give the site operator an unrestricted license to use them. You can remove the association to your identity via GDPR, but to take down the content itself you’d need to go through the justice system.


GDPR applies to any company operating in the EU and storing the data of users, regardless of the citizenship of those users.


"pay us enough and we'll provide the real upvote numbers and not the fake ones" as well


perhaps it would be "fun" to replace the comment with a GPT summary .. in 10 words or less


Where the adversarial AI people at? Write comments that if fed into a model generate nonsense or falsehoods.


I might change most of my comments to various suggestions on how AI could enslave and/or torture humanity just to see how Eliezer Yudkowsky reacts.


Now, I'm not nessesarily advocating for it, but replacing all of your content with varying degrees of politically incorrect misinformation would be significantly more harmful to both Reddit and the GPT bots scraping its dataset than merely deleting the information.

Garbage in, garbage out after all.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: