Hacker News new | past | comments | ask | show | jobs | submit login
Random Generation of English Sentences (1961) [pdf] (mt-archive.info)
58 points by polm23 on June 12, 2017 | hide | past | favorite | 12 comments



Can you imagine waking up one day in the near future and realizing that 96% of the tweets you have been reading for the past year have been generated by a computer?

The sheer thought of that just blew my mind a little.

It could drive the whole world mad.


> Can you imagine waking up one day in the near future

What do you mean "near future"? ;-)

> It could drive the whole world mad.

Mmmh, I think that has already happened a long time ago.

But think of the day when nobody reads Twitter any more, because it is all machine-generated, and all you are left with are machines talking to machines about machines talking to machines. Think of the day when Facebook is nothing but computers uploading core dumps in their photo streams?

(Emacs, of course, has a function to let ELIZA talk to Zippy the pinhead. Go figure. ;-P)


Twitter is probably up to 30% already.


Although, at this point it's mostly just spam, porn, and political bots at really really low quality. Imagine the day when worthwhile content with journalistic, artistic, or intellectual value is posted by machines. There are already some great bots pumping out tweets with color palette, non-sensical invention patent ideas, and historical events.


> Imagine the day when worthwhile content with journalistic, artistic, or intellectual value is posted by machines

One way to cheat at this is to use existing intellectual discourse as a template, and simply swap out nouns/verbs with other ones, and change the structure slightly perhaps using Markov chains.

Another way to appear more credible is to generate sources to backup your statements. Wikipedia is full of links and footnotes which prop up their entries. All that's needed is to scrape these and use them for your natural language generator.


They can increase 1000x the output of text, I can read only about as much as before.


Just wait until you have a tweet-reading bot that picks out the items most interesting to you. Arguably, we already have that in the form of Facebook's like-driven echo chamber algorithms. But it would be nice to have a version whose parameters are under your control rather than some large organization with questionable motives.


Yeah, but I usually don't get to see it because I only see the stuff that the people I follow tweet or retweet.

It's only when I click through into a tweet to look at the replies that I see the huge amount of junk that gets posted as replies to get more eyeballs.


Some details you might miss just reading the paper:

Victor Yngve, the author, became the first president what would become the ACL not long after publishing this.

This paper is showing off the language COMIT he developed. While it may not look like much, it directly led to the development of SNOBOL and other string processing languages.

This paper was a warm up to working on machine translation; turns out that progress in the field was slow and led to what might be called the first AI Winter, back in 1966!

https://en.wikipedia.org/wiki/ALPAC

It seems that he moved away from Computational Linguistics to more theoretical work after that, but he was publishing books even after 2000. He passed away in 2012, and you can read his obituary from the ACL here: http://www.aclweb.org/anthology/J12-3001


This is the state of the art text generation result from Google: https://arxiv.org/abs/1602.02410

* < S > With even more new technologies coming onto the market quickly during the past three years , an increasing number of companies now must tackle the ever-changing and ever-changing environmental challenges online . < S > Check back for updates on this breaking news story . < S > About 800 people gathered at Hever Castle on Long Beach from noon to 2pm , three to four times that of the funeral cortege . ` < S > We are aware of written instructions from the copyright holder not to , in any way , mention Rosenberg ’s negative comments if they are relevant as indicated in the documents , ” eBay said in a statement . < S > It is now known that coffee and cacao products can do no harm on the body . < S > Yuri Zhirkov was in attendance at the Stamford Bridge at the start of the second half but neither Drogba nor Malouda was able to push on through the Barcelona defence .


It is funny how google explores a mostly statistical approach. On the other hand we have Grammatical Framework (https://www.grammaticalframework.org/) which is much more along the line of the article posted.


What is shocking about this particular branch of literature is that (1) there is no tie to any possible use, commercial or not, and (2) the generated text is rarely better than what "disassociated-press" and many other toy programs have done for years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: