
Introducing the 4chan API - nthitz
http://www.4chan.org/news#108
======
afhof
I feel kinda silly just having spent the long weekend writing a threaded 4chan
scraper. This is a super welcome change though. Even if you don't visit 4chan
regularly you can't ignore the VAST amount of content people upload there. I
imagine some interesting statistics will come of this ( I know I plan to).

~~~
phogster
I don't know. Getting an API to 4chan is like getting a parking pass to the
local garbage dump. What statistics are you actually mining here?

~~~
afhof
4chan represents a social cauldron unlike others. The first and foremost
question I had hoped to find out was "What do people talk about when they
don't have an identity to think of?" There is almost zero consequence in the
case of failure. If you say something and no one responds, or everyone insults
you, it is completely forgotten within a few minutes. You can say almost
anything you want without fear of retribution. (Where else in life does this
exist?) So, with no rules, what do people want to talk about?

~~~
daulex
I understand what you're saying.

BUT, be careful with the concept of 4chan and "no retribution". There have
been countless examples of 4chan "reacting" to threads, I'm sure I don't need
to go into detail here.

------
calinet6
The first thing that jumps to mind with this is... "Oh $#!*." I don't know
why, but it scares me what could come out of this.

~~~
54mf
Came to post this. The only redeemable value in 4chan, in my opinion, is that
the fact that posts _aren't_ archived makes for a very interesting social
experiment. An API firehose pretty much puts an end to that.

~~~
ANTSANTS
Selective archives (users vote to archive a thread) have been around since at
least 2007:

<http://4chanarchive.org/>

<http://suptg.thisisnotatrueending.com/archive.html>

Full board archives have been around for quite some time as well:

<http://easymodo.net/> (the original complete archiver, now dead)

<http://archive.foolz.us/>

<https://archive.installgentoo.net/>

etc.

I'd say the fact that your posts are most likely to be forgotten, even if it
is archived, is much more of a negative aspect of the site than a positive.
How many times have I spent 30 minutes on a post, only for no one to respond
to it, or worse, realize that the thread 404'd? It makes you look at yourself
and wonder why you bothered.

Forced anonymity is the interesting part of imageboards -- the text BBS
equivalents to anonymous imageboards, based off the original 2chan, manage to
maintain a very similar flavor while featuring permanent archival of all
posts, and enjoy longer-form discussion as a result.

~~~
moot
> I'd say the fact that your posts are most likely to be forgotten, even if it
> is archived, is much more of a negative aspect of the site than a positive.

This is the most magical aspect of 4chan, which is why I don't care for
archives.

~~~
chimeracoder
> This is the most magical aspect of 4chan, which is why I don't care for
> archives.

The written word allows us to lend ideas (memes, concepts, what have you) a
sense of permanence that they never would have had otherwise. But at the same
time, it prevents them from evolving in a way they otherwise might have, if
their exact origins were not so easily recorded & referenced.

I don't think it's a coincidence that 4chan, which lacks this permanence, is
the origin of so many of the top memes of the past decade (and by 'meme', I
don't just mean things like LOLcats).

(Gleick argues this same point in the first few chapters of The Information,
for those who are interested).

EDIT: Just realized who I was replying to - if I may ask, are you concerned at
all that an official API might detract from 4chan (by making said content more
traceable)?

~~~
pessimizer
>The written word allows us to lend ideas (memes, concepts, what have you) a
sense of permanence that they never would have had otherwise.

I agree, and that's why it was horrible for 4chan. In the beginning, there
were new memes, concepts, and what have you every other day, and old stuff was
forgotten (or rather used to show you'd been there for a while.) Now, it's
just a constant recycling of the first few years of the site.

IMHO it was caused by archives and meme dictionaries. No need to lurk moar
anymoar. Also, very little reason to laugh.

------
Jagat
I'm a new graduate student in an American university. As part of my Data
Mining/NLP project, I'm wondering if I can do something cool with this fresh
API. Any ideas?

~~~
nyan_sandwich
create a markov chain 4chan slang generator.

track usages of phrases over time. (thinking of the recent evolution of
"rustled my jimmies" derivatives)

See what topics are trending

Fuck maybe I should build some of this...

~~~
kami8845
do you have some more ideas? these are really good

~~~
nyan_sandwich
um... make a 4chan app that lets users up/downvote threads, then builds a
naive bayesian model of what keywords (eg. "toasting epic bread") are
correllated with the kind of threads you like. Netflix-like. A sort of
automatic cream-extractor.

de-anonymizer based on posting times, writing style, what baits them to
respond, etc. The Thread-Local unique ID's would help, giving you more stuff
that you knew came from same user. Don't know if this one is practical. Kindof
scraping the bottom here...

That's it for now.

------
bootload
_"... The decision to release an API was partially out of necessity, but also
because I'm curious to see how people will use it. ..."_

And _who_. The API just made a group of intelligence hackers very happy
indeed.

------
xefer
It still requires scrapping to discover the thread ids though does it not?

~~~
moot
We'll have indexes and a catalog view soon.

~~~
aneisf
That's great news. No hope for a posting API I presume?

~~~
moot
Probably not. Since we don't have user accounts, and already have a bit of a
spam problem, I'd be pretty worried about what a real POST API might bring.

~~~
aneisf
I figured as much. Just wishful thinking on my part--it'd be nice to be able
to build a client wholly atop the API.

~~~
nyan_sandwich
It's not like the html form does anything special. Forms don't have funny
markup you have to scrape or produce. It's just a POST.

I guess there's mime/url encoding or whatever, but that's hardly an issue.

~~~
toomanymike
There's a captcha

------
tarice
What I'm taking away from the comments below is:

"Everything that could be done with this API has already been done using HTML
parsing. This development will simply make those applications faster."

Truth?

~~~
lnanek2
Yeah, and there have been Python scripts anyone interested passes around and
shares too, so you haven't even had to write it yourself...

------
ddod
Could someone explain to me how this could be leveraged (or if it could be) to
gather a sort of stream of messages, a la the Twitter streaming API or
reddit.com/r/all/comments.json?

I'd be interested in doing some language statistics and comparing them to the
aforementioned networks.

~~~
zevyoura
Elsewhere in the comments here, moot said, "We'll have indexes and a catalog
view soon." So for now, you need the thread id.

------
terhechte
Sadly read-only, though it's not much work parsing the HTML and faking a
submit through a Post request. Good luck submitting a 4chan app to Apple's app
store though :)

~~~
Lockyy
They have already existed. They all got pulled recently.

------
volaski
Forgive me if this is a noob question, but does 4chan restrict embedding of
images.4chan.org images from external urls? I was just playing around with the
API and it seems all the images are rendered as the placeholder image that
says "4chan.org".

If this is true, I don't know how to utilize this API to make something
valuable since all I can do is get the url or text. Somebody please enlighten
me. Thanks!

~~~
lnanek2
This sort of protection is usually done by checking the referrer header, which
is trivial to set when retrieving something programmatically or when using
standard tools like wget. The API seems focused on reducing the processing
costs of browser extensions that let the user view the page, but add extra
features to the page, anyway. Those would probably still seem like a normal
browser view of the image to the site by default even if browser plugins can't
perform the trivial client sent header change (not sure if the browser plugin
API exposes it).

------
astrojams
This could transform 4chan as mobile and desktop clients are created. God I
hate the web interface.

~~~
unkoman
The mobile adapted web interface is pretty good now.

------
3143
Can anyone repost the info for those of us who can't (or prefer not to) visit
4chan at work?

~~~
jonny_eh
Every thread will be available as JSON.

------
evolve2k
Regarding financial sustability, have you thought about charging for the new
API?

~~~
joethompson
I'm sure this is about trying to improve site performance, and charging for it
would inevitably cause everyone to continue scraping the HTML, thus defeating
the point.

------
jasimq
This is interesting. Would surely give it a try and integrate with our app

------
angersock
Something I just cobbled together:

    
    
       curl http://api.4chan.org/b/res/423418552.json | python -mjson.tool
       curl http://api.4chan.org/b/res/423418552.json | json_pp
    

Example for grabbing a thread and prettyprinting the JSON of it.

Because, you know, we need more 4chan in the house.

(EDIT: brief skimming of the comments indicates it may be semi-offensive, so
be warmed. We're skimming /b/, after all.)

