
AI Generated News Articles - bigbird-media
https://sdan.io/nrn
======
DonHopkins
SimCity 2000 had a random newspaper article generator that would tell you
stories based on stuff happening in your city.

[https://www.abandonwaredos.com/abandonware-
screenshot.php?gi...](https://www.abandonwaredos.com/abandonware-
screenshot.php?gid=630&idi=YWJhbl9pbWdfc2NyZWVucy9zaW1jaXR5MjAwMC0zLmpwZw==&tit=simcity-2000)

[https://procedural-
generation.tumblr.com/post/134086657418/s...](https://procedural-
generation.tumblr.com/post/134086657418/simcity-2000-newspapers-
simcity-2000-doesnt)

[https://lparchive.org/SimCity-2000/Update%205/](https://lparchive.org/SimCity-2000/Update%205/)

[http://blog.cocoia.com/2007/creative-
interaction/](http://blog.cocoia.com/2007/creative-interaction/)

------
bigbird-media
We're going through the HN hug of death. We're focusing on scaling it (but I'm
a HS student running this on an old desktop my dad gave me on my family's slow
internet, so not sure how far scalability will go for me... going to see if I
can get some cache'd page link here).

~~~
sillysaurusx
Hmm. If you stick it behind cloudflare, you might be able to leverage their
caching. But I’m not sure how that works when the pages are dynamically
generated.

You could also spit out the html to disk and just serve that behind nginx.
Even on slow net, that might help.

Trying to think what else would scale...

~~~
bigbird-media
It's behind Cloudflare :). Images are also behind a CDN I made... which
historically has reduced load times for me (but with this much traffic not so
much).

~~~
sillysaurusx
Oh, that might be why. If you configure cloudflare to serve the images, that
seems like it could help. It’d be quick to test it out, too.
[https://support.cloudflare.com/hc/en-
us/articles/202775670](https://support.cloudflare.com/hc/en-
us/articles/202775670)

~~~
bigbird-media
What I did is: notrealnews.net is behind cloudflare

All images on notrealnews.net are behind a CDN I made at: fullsend.xyz (like
all the images are hosted there and then I utilize Cloudflare to cache them
[0])

[0]:
[https://developers.cloudflare.com/workers/tutorials/configur...](https://developers.cloudflare.com/workers/tutorials/configure-
your-cdn/)

------
FAKEDETECTOR
Question about your product: besides "gaming the system with creating fake or
duplicated + slightly modified content" what is the value of this?

Question about you: did you ever consider using your skills to do something
that will help humanity to solve actual problems we are facing?

Proposal: let this grow as a honeypot and publish a list of your customers
after some while, so we can spot the "quality journalists" \- that will be
fun!

TODO for everybody: integrate fake news detector from sites like this into
ublock-origin. Make them invisible.

~~~
bigbird-media
In a world where disinformation and clickbait journalism is prevalent, we want
to allow content creators to have that same rapid pace (pace of how quickly
they make content) but make sure it’s factual/credible and demonstrates value.

Here's how we do this: Suppose a journalist as X amount of time to create an
article (we're talking about lower-tier/repetitive journalism, which consists
of a majority of journalism) They have two options:

1\. Write some relatively bogus article to drive clicks

2\. We write a majority of their article within seconds and they spend the X
amount of time editing our article/regenerating it. We're planning on
implementing fact-checking algos in end so after they're done editing, we
ensure the content is legit.

We don't want to replace journalism. We just want to automate lower tier
journalism (clickbait, repetitive sports articles) and hope they utilize their
time ensuring the content is substantive to their audience.

~~~
ghego1
Makes a lot of sense, I do hope you succeed.

Also, I work in the academia, something like this would be very helpful also
in that area.

~~~
bigbird-media
Thanks for the support! You can always email us at bigbird@bigbird.dev if
you'd like to know more.

------
bigmit37
I am new to this space but after reading about your goals for this AI, does
this mean that most news sites are just copying facts off one other?

Secondly how does fact checking work in cases like this with AI? I would
imagine fact checking would get progressively more difficult as more articles
become AI generated.

My biggest fear is, we will have one AI generating a fake story and and acting
as the root node, then other AIs creating news off those this node and we will
have fake stories spread in various forms like branches creating
misinformation.

~~~
bigbird-media
1\. I'm no journalist, but I'm fairly sure there's some news sites that break
the news first and then others follow (either through primary or secondary
sources)

2\. I haven't implemented fact-checking algos. At the moment I'm not planning
to use AI for it, just simple cross referencing (not finalized).

3\. When we do start momentum, we'll get on board a highly specialized ethics
team and safety team. We're not a state actor or influenced by anyone so we're
just creating gimmick-y articles on the internet for now. We'll eventually
combat those issues if and when similar services pop up.

One reason we want to be first is that we know state actors are going to pop
up soon. Singapore, Russia, Philippines, the list goes on for which they want
something like us to control propaganda. We're hoping we'll be able to gain
enough momentum before they start (or get further along than they already are
at) to set a good solid standard. But with all things, this will take time and
deliberation.

~~~
gdubs
At risk of sounding rude, “we’ll get to the ethics” later doesn’t sound
reassuring.

------
sdan
Seems like they're utilizing a version of GPT-2 :
[https://notrealnews.net/about/](https://notrealnews.net/about/)

~~~
sillysaurusx
I wonder how they're choosing the photo to go with the article.

Most of the photos match really well. I wonder if the results are cherrypicked
and a human is matching photos with the articles, or if it's totally
automated.

~~~
bigbird-media
We use Google's Custom Search API.

We're basically taking headlines off RSS feeds (such as Wired, TC, NYT)
attaching images through the Google API, and posting them.

You can learn more:
[https://notrealnews.net/about](https://notrealnews.net/about) or
[https://bigbird.dev](https://bigbird.dev) or email bigbird@bigbird.dev!

~~~
jazzyjackson
Hi, the articles are very readable so if its AI writing this count me
impressed, but the explanations on bigbird.dev don't give me a lot of
confidence, since their is a lot of focus on big bird templating the news and
letting a human edit it... I would really like to see transparency in whether
something I'm reading is machine generated (from what source material?) or
human edited (by what human?)

And just because I'm being critical, this line at the bottom of your site
claiming 20,000% growth 1 day after launch is... just making me think of how 1
is infinitely more views than 0.

~~~
bigbird-media
When we do get customers, we'll definitely ensure it's stamped that it was
generated using our engine :). If we actually get somewhere with this (in
terms of funding/customers), our _first priority is getting in a specialized
ethics team on board_

Right now we've heard that for time-sensitive info (like sports) it's easier
to generate an article from us and edit off of that to get the point across.
We're still implementing fact-checking algorithms to ensure quality and
credibility. (check out Wapo's Heliograf [0])

And yes, you've caught on! We were doing like 50 page views to one day 10,000.
I'll remove it soon because as you've explained, it doesn't make sense.

[0]: [https://www.washingtonpost.com/pr/wp/2017/09/01/the-
washingt...](https://www.washingtonpost.com/pr/wp/2017/09/01/the-washington-
post-leverages-heliograf-to-cover-high-school-football/)

~~~
DonHopkins
>first priority is getting in a specialized ethics team on board

Sean Spicer is available!

------
DailyHN
Writing vs Curating.

Do we need to rewrite what's already written? How does this prevent bad
actors? Over saturation with "good" content?

What if wr curated the most valuable human-written content with AI? So that
humans can build trust with the [AI] entity. I think this type of curation is
how we prevent bad actors.

Writing by AI should be used similarly to improving the quality of images
video by filling-in-the-gaps. We already have too much "not duplicate" content
if you ask me.

~~~
bigbird-media
If you give us a title: "Woman breaks world record in March of 2020". We'll
give you a pretty decent article about it. So we're not rewriting what's been
written; we're generating content itself.

We not working with anyone at the moment. When we do we'll announce it
publicly and work with an ethics team to ensure we're only supporting good
actors (we know bad actors are on the look out for services like us).

~~~
DailyHN
> we're not rewriting what's been written; we're generating content itself.

I'm not sure I see the distinction.

Where are you getting the facts from? Unless you're creating completely fake
content, then it's seems like it's going to be a re-write of some sort.

------
zitterbewegung
This is really impressive as a HS student great job!

Another way to augment how the news is written instead of focusing on
templates you could also to do something like autocomplete.

Recall that GPT-2 is designed to predict the next word in a sequence. If you
did conditional generation on the front part of the sentence then you could
autocomplete the rest of the target sentence. (have up to five autocompleted
versions that have the best sore).

~~~
bigbird-media
Yes, I was thinking of implementing that and probably will soon. Usability
wise, it make sense (and supports our mission statement), but in practice that
means we need a GPU always available to do that (currently using Colab,
meaning GPUs are scarce and I only have time to generate full articles rather
than be able to attend to users' needs).

If this project does go somewhere, that'll be my next feature.

~~~
spicyramen
Since you are student you should look into Google cloud credits and use
preemptible instances with GPU

~~~
bigbird-media
I did :). And Google didn't give me any (mostly because I'm not an official
"researcher" under a professor.

I do use GCP all the time though (make new accounts and utilize the sweet $300
in compute).

~~~
kalium_xyz
Try github student pack free DO hosting credit, needs a credit card tho.

------
bkanber
Very cool. I was thinking about this space earlier today.

How long does it take for a submission to post to the site? I understand the
dashboard is probably an internal tool, but it is quite.... opaque. Also, what
does "Domain to Publish To" mean? I get why Not Real News is on there but I
don't understand Techcrunch et al.

~~~
bigbird-media
1\. We have a job queue: publish whatever you want and one of our generator
workers will take care of it eventually (we don't have any GPUs so we're
relying on Colab... meaning turnaround time varies a lot... we have to
manually start them up)

2\. Yeah that's not right. Should read "which author do you want to publish
as" (I'll fix this after traffic dies down, don't want downtime :) )

Basically, if you want to read rewritten articles of TC news, you'd go here:
[https://notrealnews.net/author/techcrunch/](https://notrealnews.net/author/techcrunch/)
or Wired news:
[https://notrealnews.net/author/wired](https://notrealnews.net/author/wired).

We want to make that distinction for organizational purposes.

------
techaddict009
@OP this looks interesting project.

\- Can you share what kind of tech stack you used?

\- What is the source/basis of the generated news?

\- The site looks slick and beautiful doesnt look like WP what have you used
to make it? (Not related to project directly but liked the UI so asked.)

~~~
bigbird-media
Thanks! I promised some people that I'd write a blog post (and I will soon),
but here's the basics:

Article lifecycle goes through job queues. This was because: wanted to learn
queues :) and because I'm running this operation with Colab to create the
content (so in the event Colab fails or goes down, I'll be able to requeue in
my dashboard).

Source is just headlines. I just take headlines from RSS feeds and generated
both articles and headlines based off that.

I used Ghost. I could've used WP, but haven't used WP before.

------
samename
From the New Hampshire article:

> “The money is going to come in,” Mr. Sanders told ABC News. “Our voters are
> willing to spend two dollars to spend 60 or 70 bucks to go vote. That’s
> nothing to be afraid of.”

------
JimWestergren
Considering that it is not real news I suggest you to block the articles from
being published in search engines using robots.txt or meta noindex.

~~~
dazc
robots.txt won't stop links being indexed.

~~~
bigbird-media
I previously made the website private and plan to do so later this week
(there's going to be a password lock).

Otherwise, I'll go ahead and go into Google Search Console an ensure nothing
is indexed.

------
bigbird-media
Reposting this comment just in case it goes down again:

This is an article taken directly from my site:

Title: Will Apple, Facebook or Google Control The AR Momentum?

Article: “I wouldn’t be surprised if one day or another somebody says, ‘This
is the future of gaming,'” reports Facebook’s head of virtual reality Omid
Kordestani. And Kordestani admitted his past job of building from within the
company is atypical for a big CEO (typical of Facebook) who may need to shape
the future of a business outside the core team. He may find out how much of
the potential of AR is already in its top app Facebook, where developers can
shoot 3D models of stuffed animals or preschool objects in front of anyone’s
faces, and how deeply Google was disrupted when its failed to move fast enough
on VR before Facebook entered it.

But what it’s possible for Facebook and Apple to do is to lead the charge to
develop a hardware- and/or software-independent AR ecosystem of platforms and
apps. An Android-powered AR headset, which seems like it’s on its way, could
eventually compete with Google Cardboard, and maybe provide its own open-
source AR platform. Why not have Windows Embedded system OEMs build AR goggles
and similar headsets, as they already sell hardware that can run augmented
reality apps?

That could attract hardware companies like Dell, Asus, HP and Lenovo, as well
as game consoles like Microsoft’s Xbox and Sony’s PlayStation. If Microsoft
can muscle AR technology directly into the living room, it could make Oculus
Rift obsolete. It could also entice developers away from Google’s $99 Daydream
View AR headset which depends on smartphone technology, by throwing its weight
behind hardware based on Windows 10. Meanwhile, Apple could cut out the
middleman by making a standalone AR headset, although it already has a
separate ARKit platform to create AR-themed apps for the iPhone and iPad. That
way it could create a high-end AR headset for hardcore users like the kind of
top manufacturers like LG, Sony, Samsung and Asus produce for their
televisions and other gear.

At the same time, we’ll probably be stuck with Google’s Cardboard and
Facebook’s Cardboard for a while. Google Cardboard’s track record is decent,
and the ability to use Cardboard makes it easy to download and use a few AR
apps that show other content than just pictures or blocks of text. Cardboard
also does an excellent job of getting the platform out to more people than any
other headset, with 45.8 million devices activated. But that’s less than half
the number of Android-powered phones in the world as Android was running in
early 2017. And from a hardware perspective, Google has a big advantage over
Facebook, with plenty of current smartphones on its Cardboard list, including
phones from Chinese manufacturers Huawei, Xiaomi, Oppo, Vivo, and Coolpad.

Then again, Google’s Cardboard headsets have started rolling out to more
devices, and Google has been open about the likelihood of an Android-powered
Cardboard headset, touting the dual displays and better cameras that now ship
with the Pixel. While the hardware definitely isn’t there yet for it to take
the lead in AR, Google seems to be confident it can make it happen and give
users the compelling AR app experiences they want.

Meanwhile, neither Apple nor Facebook has the fragmentation problems with apps
or hardware needed to dominate AR, which could be killer differentiators.
Apple, by the way, has super serious ambitions to produce its own AR
technology. The growing possibility that Apple is making such a headset may
finally make it uncomfortable for Apple to cede this killer platform to
Google.

When a platform war erupts like iPhone and Android, it’s going to take
everyone’s best devices to achieve victory. There may be companies that switch
allegiances, but that would only make it more possible for Apple, Facebook,
and Google to enter it just as one platform would. But how long will that
last, and will Apple, Facebook or Microsoft be the ones to seize the
battlefield?

I don't touch/favor "good" articles. I just posted the latest one at the time.

------
radarsat1
This is really unethical.

~~~
ramblerman
Being up front that these are AI generated and keeping people posted on how
far along this technology is, is probably the more ethical thing to do.

~~~
radarsat1
Just wait until someone clones the website and removes the domain, then it
gets quoted in some tabloid and then that is posted on a forum, then tweeted
and finally the tweet is posted as news somewhere.

Additional point: the problem is not even the false stories themselves, it's
the undermining of legitimacy in real news that it provokes.

~~~
bigbird-media
What is "real news"? Anyone with some Wordpress experience and a domain can
make their own "news site" and publish "real news".

I don't want to confirm or deny this: Singapore, Russia, China, and other
countries are already doing this at a far more malicious and massive scale.

We're at least hoping people would understand how "AI generated news articles"
look like, so one day they can spot them and know that the news you're reading
is not real news.

~~~
DonHopkins
The "what is real news?", "truth isn't truth!", "it's impossible to know
anything for sure, so nothing can be said to be true or false!" arguments are
kind of played out these days.

[https://www.washingtonpost.com/politics/2018/12/11/truth-
isn...](https://www.washingtonpost.com/politics/2018/12/11/truth-isnt-truth-
rudy-giulianis-flub-tops-s-quotes-year/)

Giuliani: And when you tell me that, you know, he should testify because he’s
going to tell the truth and he shouldn’t worry, well that’s so silly because
it’s somebody’s version of the truth. Not the truth ...

Todd: Truth is truth. I don’t mean to go like ―

Giuliani: No, it isn’t truth. Truth isn’t truth. The president of the United
States says, “I didn’t —”

Todd: Truth isn’t truth? Mr. Mayor, do you realize, what, I, I, I —

Giuliani: No, no, no —

Todd: This is going to be a bad meme.

Giuliani: Don’t do, don’t do this to me.

Todd: Don’t do ‘truth isn’t truth’ to me.

------
Cartonju
Most fake news is generated by humans and then spread on social media. But the
rise of robust systems such as OpenAI’s controversial GPT-2 point toward a
future where AI-generated articles are close enough to the real thing to
obfuscate nearly any issue.

