

Ask PG: Karma Data Dump - chasingsparks

This might be outside the scope of the Ask HN tag (although it is not really a feature suggestion either), but can you provide an SQL dump of all users Karma (i.e. url, vote sums, and dates). I think there are many people who might be curious about the statistical properties.
======
jacquesm
For the life of me I can't find the sql connection in arc, would you help me
find it ;) ?

<http://ycombinator.com/arc/arc3.tar>

Personally I think releasing that data is a bad idea. It might help people to
game the system and it might inadvertently release more than intended, see:

[http://arstechnica.com/tech-policy/news/2009/09/your-
secrets...](http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-
online-in-databases-of-ruin.ars)

~~~
kogir
Why assume that SQL is involved at all? Hint: it's not.

~~~
chasingsparks
TSV, JSON, or whatever then -- it's the less relevant detail. I just want a
dump and I consider it uncouth to scrape if someone is willing to freely
provide a dump -- or at least, I would prefer permission.

~~~
mnemonik
It is all flat files. Read the arc code, it is very interesting.

~~~
joubert
Is it transactional?

~~~
wlievens
I think it's the write-to-a-new-file-every-time kind of transactional. That
probably works just fine for this kind of application.

~~~
joubert
OK, so if your data model depends on multiple files for persistence, there's
no transactional behavior?

------
ShabbyDoo
I am interested in the Parato-ization of up/downvotes. If one recorded all
users who visited each news item and whether or not that user voted at least
once, what would the distribution of voting percentages look like?

In the US, old people are a powerful voting block because they vote a lot. Do
a small fraction of HN readers have a highly disproportionate effect on karma?

------
zck
There's no database for HN. Everything is saved as text files. The save-votes
function (line 133 of news.arc), which writes out the hash table of a user's
votes to a file, is as follows:

(def save-votes (u) (save-table (votes* u) (+ votedir* u)))

So the data could be gotten as a zip, but not a SQL dump.

~~~
kqr2
Interesting. If HN ever decides to go the database route, sqlite might be a
good fit since it's relatively simple and everything is stored as a single
file.

~~~
benhoyt
PostgreSQL stores everything in flat files too (on Windows, in a directory
like "C:\Program Files\PostgreSQL\8.1\data\base\100929"), and I'm guessing
MySQL would too.

------
deutronium
I think this information would be really cool, maybe you could cluster people
together depending on the content of their submissions, to find people whose
submissions you could monitor.

~~~
chasingsparks
That was part of my reasoning. There are many karma algorithms that could be
deployed in different contexts. If the HN readers found some good
alternatives, maybe PG -- or whoever maintains the sites code -- could
integrate it as a user preference. For example, I would prefer to read the
comments of people who have a few very strong comments rather than people who
have many average comments.

(Note: I am completely ignorant about HN's code -- I don't know how they cache
or calculate anything).

------
zkz
By the way, what would happen if all karma was reset?

~~~
blasdel
The front page would be empty until people started upvoting from the _new_
page.

Comments would be ranked purely reverse-chronological until people started
voting.

It'd be less disruptive to the front of the site than another revisiting of
'Erlang sure is interesting'. Past threads would slowly be mined for re-karma
as they were re-submitted and found in google searches.

------
adrinavarro

        I think there are many people who might be curious about the statistical properties

I am also :)

------
noodle
this would probably be better titled "ask pg". he's probably the only one who
can [reasonably] help you.

~~~
chasingsparks
edited.

------
yan
I would be one of those people.

