Last year I made a simple IRC question answering bot that just searched reddit for your question and returned the top comment of the top result. It worked surprisingly well. It's amazing how many questions someone else has asked on reddit before. And the comment quality is usually pretty good. Sample conversation with it: https://i.imgur.com/LDD9isL.jpg
I improved on it a lot with a whitelist of subreddits and some machine learning to select the best thread. But I was only touching on what is possible with that data.
You can play with it here: https://kiwiirc.com/client/irc.snoonet.org/mybots
EDIT: It does only work well with certain kinds of questions. Thing's that would have been asked before and without too many unique keywords. I see some people ask it really unique questions, or talk to it like a regular chatbot without even asking questions. And then get frustrated when it returns nonsense. There are a lot of improvements that could be made with this with natural language processing and stuff. But right now it's pretty simple.
AMAbot: Too much explanation... Get out of the shower, you're a raisin. Also, makes me think of like babies, wrestling. Kind of like the opposite of atlas. why are you gay?
>>"This dataset will go nicely with the full Reddit Comment Corpus that I released a couple months ago. The link_id from each comment corresponds to the id key in each of the submission objects in this dataset."
My favorite was: "Who is your daddy and what does he do?" "It's not a tumor!"
While I agree Reddit 2015 does reflect more of the US mainstream, it is still far from being an actual reflection of popular US opinion. Reddit is far more atheist/libertarian than the US mainstream. And I would argue it is right now far more reactionary/right wing as well. It has become very popular among young white males who have unhealthy attitudes towards women due to the abundance of porn and generally retrograde attitudes towards women of Reddit in general (both the users and the site itself.)
Same goes for its attitudes towards race.
That's a pretty awful accusation to level at ~20 million people. I wonder if you can substantiate that?
When I'm on women-centric subs like /r/askwomen, I frequently see comments that would absolutely get torn apart in most other parts of the site. Not offensive or provocative ones, either. Just normal conversation that would get a slew of "shut up, SJW" anywhere else.
It's off-topic meta noise at best and harmful to the community at worst.
Anyway, I've found that the people who use shorthand insults like "SJW" are typically part of the entrenched majority group of their subculture — and want nothing more than to keep those pesky "outsiders" away from their clubhouse. (Despite the fact that those outsiders had, in fact, been there all along.)
"I don't like how this character was portrayed" is a fine criticism on its own, but behind that is the shadow of "..so what should be done about it?". Absolutely nothing of value lays down that path.
Put another way: Talk of frame rates leads to discussions about optimization. Talks of nonsensical writing leads to discussions of something else they could have done. Talks about portrayal of a "minority" character go straight into "the developers are *ist".
Why does "being part of the conversation" always seem to wind up in "I don't like this, the devs are assholes, it should be changed?"
I view this to be more of a reflection of society as a whole rather than the Reddit community specifically. Subreddits show a marked decrease in quality once they become large. Becoming a default subreddit is, in some ways, a death knell for the community.
Anecdotes are not data.
You sound just like the fundamentalist Evangelicals I grew up with.
That reads as if you intended it as a personal attack. Such comments are not welcome on Hacker News. Please don't post them here.
On reddit circa 2006, everybody had the same views on everything from politics to movies to cats to spiders.
Eventually it acquired a more diverse cross-section of the population. Of course, they've been doing their best over the past few months to crack down on the expression of views they don't like, so I'm not sure what the demographic looks like nowadays and how many people have fled to less censorship-happy pastures.
It's also possible that young people are becoming more conservative overall. Conservatism is looking a lot better ever since the idiocy of the Bush administration got replaced by the idiocy of the Obama administration.
Not that it makes up for the tragic deaths of many big engined classics....
• The Arabs will outbreed us! (Before or after the Turks will outbreed us? Or was it the Somalis? I lost track.)
• The refugees will cripple our economy! (153 million EU citizens are on tax-funded pensions or unemployment benefits. Please tell me how a million refugees is going to make a dent in that.)
Europe is doing the wrong thing.
The Comment data set is already on BigQuery, allowing for quick analysis without having to download that corpus (example: https://www.reddit.com/r/bigquery/comments/3kfnmq/reddit_sub... )
Once the Submission dataset is also on BigQuery, I'll write a blog post with more info on how to use it.
There's also the web archive and search engine indexing.
> This dataset will go nicely with the full Reddit Comment Corpus that I released a couple months ago. The link_id from each comment corresponds to the id key in each of the submission objects in this dataset.
Isn't that how hackernews is? not being able to delete comments
When the next snapshot is released, it will be possible to just perform a diff to discover all deletions and edits since.
So comment deletion was never going to save you.
This is actually what has been bothering me for a while now. In the past where search engines (well at least the big one) weren't that thorough, you didn't mind if something stayed on the Internet, since it wasn't easily accessible.
But it's a major difference if it's indexed/searchable or downloadable.
Teach kids about the internet. Many people are still under the assumption just deleting something would make it disappear for ever.
Maybe a lot of people from our generation are doomed to sharing too many things online, but we can at least save the next generation from themselves.
You make a mistake 10 years ago, may be a few close friends in your town know about you. Now you make a mistake, the whole world has access to that information.
Government regulations, bans are not going to do anything to stop the spread of information, we need to educate people to protect themselves from their own selves.
Not yet :)
Given their lax security and general cluelessness of the people in charge, I'm quite sure it will leak at some point, at least partially, perhaps to corporations, perhaps to the public, just as internal NSA docs have leaked. It's very hard to keep things like that airtight forever - all it takes is one slip up and all the info stored could be accessible at some point in the future. It's already indexed and searchable, just not by you.
The important point here is that looking at present day tech (as in your comment on search engine prowess) is not the way to look at it - one day all this information will be accessible to much more intelligent future algorithms, able to link it together in myriad ways and form an almost perfect picture of your life in retrospect. The data is there, and will be stored forever.
HN thread here: https://news.ycombinator.com/item?id=10303295