
How Long Does a YC Hiring Link Stay On the Front Page? - kiyoto
http://kiyototamura.tumblr.com/post/86982326972/how-long-does-a-yc-hiring-link-stay-on-the-front-page
======
gojomo
Looks like the original ranking formula as announced at...

[http://ycombinator.com/newsnews.html#12may11](http://ycombinator.com/newsnews.html#12may11)

...was tweaked a bit: now starting at #6 rather than #4; now descending one
position every 8 minutes rather than every 15 minutes.

(I'm surprised they don't stay for a full day; that would seem a reasonable
way to reach anyone who visits at least once daily.)

~~~
NKCSS
I'm pretty sure companies would pay handsomly to be featured for 150 minutes
on the HN frontpage :) It's huge exposure, so I wouldn't complain :)

~~~
kiyoto
I agree. We hired our frontend engineer through HN. And it was surely not a
dedicated link like the ones YC companies get to post but one of those monthly
job boards =)

------
zopf
Cool analysis. I wonder if you could show something like a LOESS curve fitted
across all the articles' timeseries? Or if they're all roughly linear
descents, I wonder if you could show the distribution of slopes - do some
descend faster than others? Why?

And then, a bone to pick:

Need a beefy RDBMS for 15mm rows? Maybe if you want to store the whole
denormalized table in memory, but if you're just indexing a small field (or
even partial-indexing a larger field) you should have no problem. The table
will just spill to disk and page in as necessary, and you're mostly appending
anyway so you shouldn't have much trouble. Plus, you could normalize the data:
store the (large) article title in an Articles table with an id (hash of
title?) and then just store the ranks in a Ranks table for _less_ overall
storage than the NoSQL database (thus needing a less-beefy machine).

Nothing against modern Not-only-SQL solutions or document stores, but don't
discount RDBMS. Schemas aren't so scary or unwieldy that you should never use
them.

Anyway, thanks for an informative post!

~~~
kiyoto
>Need a beefy RDBMS for 15mm rows? Maybe if you want to store the whole
denormalized table in memory, but if you're just indexing a small field (or
even partial-indexing a larger field) you should have no problem.

Good point. Honestly, I don't have that much experience with using row-based
RDBMS for analytics purposes (my background is mostly in finance where folks
use expensive proprietary columnar databases) and Hadoop. Any good resources
on testing the limits of using MySQL/PostgreSQL for analytics?

~~~
zopf
I've spoken to friends who've played with billion+ row Oracle RDBMS installs,
and we (at Next Big Sound) have an offline snapshot MySQL instance with tables
of up to about a hundred million rows (with over a hundred columns).

That said, I agree that distributed columnar stores end up being much more
useful for large-scale analytics, and the power of high computation
parallelism seals the deal. We've mostly moved on from those snapshot MySQL
databases to Impala running on top of our Hadoop cluster, so you're preaching
to the choir :)

That said, a hell of a lot of analytics can be done in a properly-structured
SQL database, and schema changes aren't a big deal as long you don't need to
do them online in a production system.

More info: [http://stackoverflow.com/questions/14733462/can-mysql-
handle...](http://stackoverflow.com/questions/14733462/can-mysql-handle-
tables-which-will-hold-about-300-million-records)

~~~
kiyoto
Thanks a lot!

Yea, I felt like a total n00b when I came to the web startup world a few years
ago. This sounds ridiculous, but one and only database I had used until that
point is kdb+ (kx.com). I had no idea about the performance/tradeoffs of any
other databases.

I agree with you that properly-structured SQL databases can scale
horizontally/vertically. That said, I've noticed that the set of people who
know SQL performance well and the set of data analysts/statistically inclined
folks do not overlap much (myself included), and frankly, data analysts should
be able to focus on analysis, not SQL optimizations.

In a way, this is the problem Impala (and other MPP databases) solves at many
companies: it's not that their data analysis cannot be handled with
MySQL/Oracle, but it's cheaper and quicker to throw all the data in HDFS and
query via Impala (sans some cost associated with setting up/maintaining
Impala).

------
jacquesm
This reads like an extremely well disguised ad.

~~~
pearjuice
[https://en.wikipedia.org/wiki/Advertorial](https://en.wikipedia.org/wiki/Advertorial)

------
brickmort
This was a very cool read. nice job!

