Hacker News new | past | comments | ask | show | jobs | submit login
AI Generated News Articles (sdan.io)
93 points by bigbird-media 5 days ago | hide | past | web | favorite | 57 comments






We're going through the HN hug of death. We're focusing on scaling it (but I'm a HS student running this on an old desktop my dad gave me on my family's slow internet, so not sure how far scalability will go for me... going to see if I can get some cache'd page link here).

Yeah, would it be possible to put it on a paste in, and post it here? Impressive for a high schooler to be doing this though!

Title: Will Apple, Facebook or Google Control The AR Momentum?

Article: “I wouldn’t be surprised if one day or another somebody says, ‘This is the future of gaming,'” reports Facebook’s head of virtual reality Omid Kordestani. And Kordestani admitted his past job of building from within the company is atypical for a big CEO (typical of Facebook) who may need to shape the future of a business outside the core team. He may find out how much of the potential of AR is already in its top app Facebook, where developers can shoot 3D models of stuffed animals or preschool objects in front of anyone’s faces, and how deeply Google was disrupted when its failed to move fast enough on VR before Facebook entered it.

But what it’s possible for Facebook and Apple to do is to lead the charge to develop a hardware- and/or software-independent AR ecosystem of platforms and apps. An Android-powered AR headset, which seems like it’s on its way, could eventually compete with Google Cardboard, and maybe provide its own open-source AR platform. Why not have Windows Embedded system OEMs build AR goggles and similar headsets, as they already sell hardware that can run augmented reality apps?

That could attract hardware companies like Dell, Asus, HP and Lenovo, as well as game consoles like Microsoft’s Xbox and Sony’s PlayStation. If Microsoft can muscle AR technology directly into the living room, it could make Oculus Rift obsolete. It could also entice developers away from Google’s $99 Daydream View AR headset which depends on smartphone technology, by throwing its weight behind hardware based on Windows 10. Meanwhile, Apple could cut out the middleman by making a standalone AR headset, although it already has a separate ARKit platform to create AR-themed apps for the iPhone and iPad. That way it could create a high-end AR headset for hardcore users like the kind of top manufacturers like LG, Sony, Samsung and Asus produce for their televisions and other gear.

At the same time, we’ll probably be stuck with Google’s Cardboard and Facebook’s Cardboard for a while. Google Cardboard’s track record is decent, and the ability to use Cardboard makes it easy to download and use a few AR apps that show other content than just pictures or blocks of text. Cardboard also does an excellent job of getting the platform out to more people than any other headset, with 45.8 million devices activated. But that’s less than half the number of Android-powered phones in the world as Android was running in early 2017. And from a hardware perspective, Google has a big advantage over Facebook, with plenty of current smartphones on its Cardboard list, including phones from Chinese manufacturers Huawei, Xiaomi, Oppo, Vivo, and Coolpad.

Then again, Google’s Cardboard headsets have started rolling out to more devices, and Google has been open about the likelihood of an Android-powered Cardboard headset, touting the dual displays and better cameras that now ship with the Pixel. While the hardware definitely isn’t there yet for it to take the lead in AR, Google seems to be confident it can make it happen and give users the compelling AR app experiences they want.

Meanwhile, neither Apple nor Facebook has the fragmentation problems with apps or hardware needed to dominate AR, which could be killer differentiators. Apple, by the way, has super serious ambitions to produce its own AR technology. The growing possibility that Apple is making such a headset may finally make it uncomfortable for Apple to cede this killer platform to Google.

When a platform war erupts like iPhone and Android, it’s going to take everyone’s best devices to achieve victory. There may be companies that switch allegiances, but that would only make it more possible for Apple, Facebook, and Google to enter it just as one platform would. But how long will that last, and will Apple, Facebook or Microsoft be the ones to seize the battlefield?

I haven't touched this article at all. In fact, I haven't touched any articles to date.


At some point, someone is going to do this with scientific literature. But tune it for hypothesis generation. Then we just need robots to run the experiments until we find the fountain of youth.

Alternatively, for optimizing publications rather than optimizing research, there's SCIgen (https://pdos.csail.mit.edu/archive/scigen/) which keeps getting accepted to questionable journals..

It's actually been done protein folding, not text. AI suggests designs which are being tested in the lab.

It's also been done with chemical synthesis by turning synthesis into a game and then using Alpha-like techniques on it: http://www.compchemhighlights.org/2018/03/planning-chemical-...

Actually I remember reading MIT did it in the early 2000s with some basic algorithms and I think it may have worked.

But looking at what I made, I'm pretty sure what you're describing is possible.


Hmm. If you stick it behind cloudflare, you might be able to leverage their caching. But I’m not sure how that works when the pages are dynamically generated.

You could also spit out the html to disk and just serve that behind nginx. Even on slow net, that might help.

Trying to think what else would scale...


It's behind Cloudflare :). Images are also behind a CDN I made... which historically has reduced load times for me (but with this much traffic not so much).

Oh, that might be why. If you configure cloudflare to serve the images, that seems like it could help. It’d be quick to test it out, too. https://support.cloudflare.com/hc/en-us/articles/202775670

What I did is: notrealnews.net is behind cloudflare

All images on notrealnews.net are behind a CDN I made at: fullsend.xyz (like all the images are hosted there and then I utilize Cloudflare to cache them [0])

[0]: https://developers.cloudflare.com/workers/tutorials/configur...


Question about your product: besides "gaming the system with creating fake or duplicated + slightly modified content" what is the value of this?

Question about you: did you ever consider using your skills to do something that will help humanity to solve actual problems we are facing?

Proposal: let this grow as a honeypot and publish a list of your customers after some while, so we can spot the "quality journalists" - that will be fun!

TODO for everybody: integrate fake news detector from sites like this into ublock-origin. Make them invisible.


In a world where disinformation and clickbait journalism is prevalent, we want to allow content creators to have that same rapid pace (pace of how quickly they make content) but make sure it’s factual/credible and demonstrates value.

Here's how we do this: Suppose a journalist as X amount of time to create an article (we're talking about lower-tier/repetitive journalism, which consists of a majority of journalism) They have two options:

1. Write some relatively bogus article to drive clicks

2. We write a majority of their article within seconds and they spend the X amount of time editing our article/regenerating it. We're planning on implementing fact-checking algos in end so after they're done editing, we ensure the content is legit.

We don't want to replace journalism. We just want to automate lower tier journalism (clickbait, repetitive sports articles) and hope they utilize their time ensuring the content is substantive to their audience.


You help people pollute the web. Got it.

> (we're talking about lower-tier/repetitive journalism, which consists of a majority of journalism)

The majority of content farms maybe. The majority of journalism is interviews, local events, editorial pieces, analysis, obituaries, event listings, etc. I work with dozens of news organizations and you do not seem to fully understand how a legitimate news business is run. Your tech may be interesting but in my opinion your tool is more harmful to journalism than it is helpful.


Here's how I'm going to use your technology:

I am going to resell it to small businesses so that they can publish a neverending stream of nonsense keyword-laden articles to improve their SEO. Hotels, restaurants, medical practices. Anyone with cash really.

That's what this is going to be used for.


That only holds up if X remains constant. Option #1 is already considered acceptable by these low-quality sites, so why wouldn't they instead choose:

3. Automatically write a relatively bogus article to drive clicks.

For a business model that already works for Option 1, Option 3 is the same thing with less overhead.

I do see how this can greatly enable propaganda / fake-news creation however. Time spent editing an article doesn't inherently mean time spent making it more accurate or reality-based; it can just as easily mean time spent twisting words to imply a propaganda goal. This may not be your goal, but it's an inevitable consequence of the technology.


Makes a lot of sense, I do hope you succeed.

Also, I work in the academia, something like this would be very helpful also in that area.


Thanks for the support! You can always email us at bigbird@bigbird.dev if you'd like to know more.

I am new to this space but after reading about your goals for this AI, does this mean that most news sites are just copying facts off one other?

Secondly how does fact checking work in cases like this with AI? I would imagine fact checking would get progressively more difficult as more articles become AI generated.

My biggest fear is, we will have one AI generating a fake story and and acting as the root node, then other AIs creating news off those this node and we will have fake stories spread in various forms like branches creating misinformation.


> does this mean that most news sites are just copying facts off one other?

This is how AP/Reuters/"newswire" content works. Since very few organisations can afford to have people everywhere, there's a sort of syndication model going on where the people that are in the right place for breaking news can sell it to a broker, who writes it up and then it appears across the rest of the media.

Press releases are even more of a thing. An organisation which wants to get in to the news will make a press release, which is basically a kit of quotes and "facts" that can be loosely re-written into an article.

It's also important to understand how limited journalistic "fact checking" really is. They're not detectives, nor scientists, and don't have the time to be either. Most of the time it consists simply of ringing someone being talked about, asking them "is this true", and printing the result.


1. I'm no journalist, but I'm fairly sure there's some news sites that break the news first and then others follow (either through primary or secondary sources)

2. I haven't implemented fact-checking algos. At the moment I'm not planning to use AI for it, just simple cross referencing (not finalized).

3. When we do start momentum, we'll get on board a highly specialized ethics team and safety team. We're not a state actor or influenced by anyone so we're just creating gimmick-y articles on the internet for now. We'll eventually combat those issues if and when similar services pop up.

One reason we want to be first is that we know state actors are going to pop up soon. Singapore, Russia, Philippines, the list goes on for which they want something like us to control propaganda. We're hoping we'll be able to gain enough momentum before they start (or get further along than they already are at) to set a good solid standard. But with all things, this will take time and deliberation.


At risk of sounding rude, “we’ll get to the ethics” later doesn’t sound reassuring.

Writing vs Curating.

Do we need to rewrite what's already written? How does this prevent bad actors? Over saturation with "good" content?

What if wr curated the most valuable human-written content with AI? So that humans can build trust with the [AI] entity. I think this type of curation is how we prevent bad actors.

Writing by AI should be used similarly to improving the quality of images video by filling-in-the-gaps. We already have too much "not duplicate" content if you ask me.


If you give us a title: "Woman breaks world record in March of 2020". We'll give you a pretty decent article about it. So we're not rewriting what's been written; we're generating content itself.

We not working with anyone at the moment. When we do we'll announce it publicly and work with an ethics team to ensure we're only supporting good actors (we know bad actors are on the look out for services like us).


> we're not rewriting what's been written; we're generating content itself.

I'm not sure I see the distinction.

Where are you getting the facts from? Unless you're creating completely fake content, then it's seems like it's going to be a re-write of some sort.


Seems like they're utilizing a version of GPT-2 : https://notrealnews.net/about/

I wonder how they're choosing the photo to go with the article.

Most of the photos match really well. I wonder if the results are cherrypicked and a human is matching photos with the articles, or if it's totally automated.


We use Google's Custom Search API.

We're basically taking headlines off RSS feeds (such as Wired, TC, NYT) attaching images through the Google API, and posting them.

You can learn more: https://notrealnews.net/about or https://bigbird.dev or email bigbird@bigbird.dev!


Hi, the articles are very readable so if its AI writing this count me impressed, but the explanations on bigbird.dev don't give me a lot of confidence, since their is a lot of focus on big bird templating the news and letting a human edit it... I would really like to see transparency in whether something I'm reading is machine generated (from what source material?) or human edited (by what human?)

And just because I'm being critical, this line at the bottom of your site claiming 20,000% growth 1 day after launch is... just making me think of how 1 is infinitely more views than 0.


When we do get customers, we'll definitely ensure it's stamped that it was generated using our engine :). If we actually get somewhere with this (in terms of funding/customers), our first priority is getting in a specialized ethics team on board

Right now we've heard that for time-sensitive info (like sports) it's easier to generate an article from us and edit off of that to get the point across. We're still implementing fact-checking algorithms to ensure quality and credibility. (check out Wapo's Heliograf [0])

And yes, you've caught on! We were doing like 50 page views to one day 10,000. I'll remove it soon because as you've explained, it doesn't make sense.

[0]: https://www.washingtonpost.com/pr/wp/2017/09/01/the-washingt...


>first priority is getting in a specialized ethics team on board

Sean Spicer is available!


Yeah, same with the "our product is used by many media outlets"or describing this as your first "customer"

But hey, fake it till you make it


We technically do have a customer although they're not paying for it. Just want to see how they're using it to iterate on the product.

But yeah I do agree I have to take that off... sounds a bit outlandish in retrospect.


This is really impressive as a HS student great job!

Another way to augment how the news is written instead of focusing on templates you could also to do something like autocomplete.

Recall that GPT-2 is designed to predict the next word in a sequence. If you did conditional generation on the front part of the sentence then you could autocomplete the rest of the target sentence. (have up to five autocompleted versions that have the best sore).


Yes, I was thinking of implementing that and probably will soon. Usability wise, it make sense (and supports our mission statement), but in practice that means we need a GPU always available to do that (currently using Colab, meaning GPUs are scarce and I only have time to generate full articles rather than be able to attend to users' needs).

If this project does go somewhere, that'll be my next feature.


Since you are student you should look into Google cloud credits and use preemptible instances with GPU

I did :). And Google didn't give me any (mostly because I'm not an official "researcher" under a professor.

I do use GCP all the time though (make new accounts and utilize the sweet $300 in compute).


Try github student pack free DO hosting credit, needs a credit card tho.

Inference might be able to be done on a CPU . Try that first

Very cool. I was thinking about this space earlier today.

How long does it take for a submission to post to the site? I understand the dashboard is probably an internal tool, but it is quite.... opaque. Also, what does "Domain to Publish To" mean? I get why Not Real News is on there but I don't understand Techcrunch et al.


1. We have a job queue: publish whatever you want and one of our generator workers will take care of it eventually (we don't have any GPUs so we're relying on Colab... meaning turnaround time varies a lot... we have to manually start them up)

2. Yeah that's not right. Should read "which author do you want to publish as" (I'll fix this after traffic dies down, don't want downtime :) )

Basically, if you want to read rewritten articles of TC news, you'd go here: https://notrealnews.net/author/techcrunch/ or Wired news: https://notrealnews.net/author/wired.

We want to make that distinction for organizational purposes.


Most fake news is generated by humans and then spread on social media. But the rise of robust systems such as OpenAI’s controversial GPT-2 point toward a future where AI-generated articles are close enough to the real thing to obfuscate nearly any issue.

@OP this looks interesting project.

- Can you share what kind of tech stack you used?

- What is the source/basis of the generated news?

- The site looks slick and beautiful doesnt look like WP what have you used to make it? (Not related to project directly but liked the UI so asked.)


Thanks! I promised some people that I'd write a blog post (and I will soon), but here's the basics:

Article lifecycle goes through job queues. This was because: wanted to learn queues :) and because I'm running this operation with Colab to create the content (so in the event Colab fails or goes down, I'll be able to requeue in my dashboard).

Source is just headlines. I just take headlines from RSS feeds and generated both articles and headlines based off that.

I used Ghost. I could've used WP, but haven't used WP before.


From the New Hampshire article:

> “The money is going to come in,” Mr. Sanders told ABC News. “Our voters are willing to spend two dollars to spend 60 or 70 bucks to go vote. That’s nothing to be afraid of.”


Considering that it is not real news I suggest you to block the articles from being published in search engines using robots.txt or meta noindex.

robots.txt won't stop links being indexed.

I previously made the website private and plan to do so later this week (there's going to be a password lock).

Otherwise, I'll go ahead and go into Google Search Console an ensure nothing is indexed.


Reposting this comment just in case it goes down again:

This is an article taken directly from my site:

Title: Will Apple, Facebook or Google Control The AR Momentum?

Article: “I wouldn’t be surprised if one day or another somebody says, ‘This is the future of gaming,'” reports Facebook’s head of virtual reality Omid Kordestani. And Kordestani admitted his past job of building from within the company is atypical for a big CEO (typical of Facebook) who may need to shape the future of a business outside the core team. He may find out how much of the potential of AR is already in its top app Facebook, where developers can shoot 3D models of stuffed animals or preschool objects in front of anyone’s faces, and how deeply Google was disrupted when its failed to move fast enough on VR before Facebook entered it.

But what it’s possible for Facebook and Apple to do is to lead the charge to develop a hardware- and/or software-independent AR ecosystem of platforms and apps. An Android-powered AR headset, which seems like it’s on its way, could eventually compete with Google Cardboard, and maybe provide its own open-source AR platform. Why not have Windows Embedded system OEMs build AR goggles and similar headsets, as they already sell hardware that can run augmented reality apps?

That could attract hardware companies like Dell, Asus, HP and Lenovo, as well as game consoles like Microsoft’s Xbox and Sony’s PlayStation. If Microsoft can muscle AR technology directly into the living room, it could make Oculus Rift obsolete. It could also entice developers away from Google’s $99 Daydream View AR headset which depends on smartphone technology, by throwing its weight behind hardware based on Windows 10. Meanwhile, Apple could cut out the middleman by making a standalone AR headset, although it already has a separate ARKit platform to create AR-themed apps for the iPhone and iPad. That way it could create a high-end AR headset for hardcore users like the kind of top manufacturers like LG, Sony, Samsung and Asus produce for their televisions and other gear.

At the same time, we’ll probably be stuck with Google’s Cardboard and Facebook’s Cardboard for a while. Google Cardboard’s track record is decent, and the ability to use Cardboard makes it easy to download and use a few AR apps that show other content than just pictures or blocks of text. Cardboard also does an excellent job of getting the platform out to more people than any other headset, with 45.8 million devices activated. But that’s less than half the number of Android-powered phones in the world as Android was running in early 2017. And from a hardware perspective, Google has a big advantage over Facebook, with plenty of current smartphones on its Cardboard list, including phones from Chinese manufacturers Huawei, Xiaomi, Oppo, Vivo, and Coolpad.

Then again, Google’s Cardboard headsets have started rolling out to more devices, and Google has been open about the likelihood of an Android-powered Cardboard headset, touting the dual displays and better cameras that now ship with the Pixel. While the hardware definitely isn’t there yet for it to take the lead in AR, Google seems to be confident it can make it happen and give users the compelling AR app experiences they want.

Meanwhile, neither Apple nor Facebook has the fragmentation problems with apps or hardware needed to dominate AR, which could be killer differentiators. Apple, by the way, has super serious ambitions to produce its own AR technology. The growing possibility that Apple is making such a headset may finally make it uncomfortable for Apple to cede this killer platform to Google.

When a platform war erupts like iPhone and Android, it’s going to take everyone’s best devices to achieve victory. There may be companies that switch allegiances, but that would only make it more possible for Apple, Facebook, and Google to enter it just as one platform would. But how long will that last, and will Apple, Facebook or Microsoft be the ones to seize the battlefield?

I don't touch/favor "good" articles. I just posted the latest one at the time.


This is really unethical.

Being up front that these are AI generated and keeping people posted on how far along this technology is, is probably the more ethical thing to do.

Just wait until someone clones the website and removes the domain, then it gets quoted in some tabloid and then that is posted on a forum, then tweeted and finally the tweet is posted as news somewhere.

Additional point: the problem is not even the false stories themselves, it's the undermining of legitimacy in real news that it provokes.


What is "real news"? Anyone with some Wordpress experience and a domain can make their own "news site" and publish "real news".

I don't want to confirm or deny this: Singapore, Russia, China, and other countries are already doing this at a far more malicious and massive scale.

We're at least hoping people would understand how "AI generated news articles" look like, so one day they can spot them and know that the news you're reading is not real news.


The "what is real news?", "truth isn't truth!", "it's impossible to know anything for sure, so nothing can be said to be true or false!" arguments are kind of played out these days.

https://www.washingtonpost.com/politics/2018/12/11/truth-isn...

Giuliani: And when you tell me that, you know, he should testify because he’s going to tell the truth and he shouldn’t worry, well that’s so silly because it’s somebody’s version of the truth. Not the truth ...

Todd: Truth is truth. I don’t mean to go like ―

Giuliani: No, it isn’t truth. Truth isn’t truth. The president of the United States says, “I didn’t —”

Todd: Truth isn’t truth? Mr. Mayor, do you realize, what, I, I, I —

Giuliani: No, no, no —

Todd: This is going to be a bad meme.

Giuliani: Don’t do, don’t do this to me.

Todd: Don’t do ‘truth isn’t truth’ to me.


It's one thing to bring up the topic for discussion with some examples, it's another to create an entire site constantly generating bogus stories just to prove a point. I don't find the topic distasteful, I find the exposition irresponsible.



Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: