May be relevant: https://www.reddit.com/r/no_sob_story/
This is a subreddit of images with the possibly made of story removed. No story about the girlfriend etc.
(1) You might be better off writing a small python script for this. That is one heck of a query you wrote. I used to have a job where I regularly wrote queries like this, and they took up to an hour to run sometimes. When I discovered python I never turned back.
(2) This type of analysis has a flaw, so be careful what conclusions you draw from it. What you are doing describes the data, but it (a) does not identify a causal relationship between keywords and submission scores and (b) would likely hold very little predictive power. If you were to form a new set of data from simulating a random walk process to generate titles and upvote/downvote submissions, this analysis would also yield apparent "hive mind keywords" but obviously there is no underlying causal relationship in that case.
You should look up the topic of "cross validation". The easiest thing you could do and the best "first step" would be to take the reddit data and split it in half (while maintaining consistency obviously) so that you have two groups of data. Then perform your analysis on each group, and compare results.
Another method would be to take repeated random subsamples and perform your analysis. See if you get consistent results.
> All in all, this is still just a first step for analyzing the importance of keywords in Reddit submission [...] but [the next steps] require very significant and very expensive computing power.
Of course I want to use cross-validation and other things like that, but it's slightly harder to do on a 200GB dataset.