
Top 100 words in News.YC titles - pg
http://ycombinator.com/newsnews.html?9jan08
======
ivankirigin
I presume you're implementing search?

~~~
pg
Good guess.

~~~
chris_l
Take a look at SOLR: <http://lucene.apache.org/solr/> that makes indexing very
easy

------
gojomo
Dr. Seuss wrote 'Green Eggs and Ham' with only 50 words. Can these 100 be
strung together (allowing repetition) into something remotely meaningful and
grammatical?

~~~
imp
It's hard without connecting words. I was able to come up with a few headlines
though:

\- Google launches first big company platform using python where people make
money over hacker life.

\- Lisp application time better vs. javascript, python, ruby.

\- Startup entrepreneur launches better open source ruby tech blog.

*edit, I just now saw the other headlines by readers, but I don't feel like submitting these as actual stories.

------
arasakik
Interesting. I wonder what the results would look like if the popularity (in
points) of the submissions were incorporated.

------
DanielBMarkham
So now that we have the most popular keywords, who can make a title with the
most keywords in it?

<http://news.ycombinator.com/item?id=96660>

~~~
brk
Mine: <http://news.ycombinator.com/item?id=96753>

------
jakewolf
I like the end: "problem platform. next website need computer better."

------
ivankirigin
Note to self: like best hacker make way good (10) ruby application time future

------
danielha
Is that listed by frequency? If news.yc existed in 2000 I wonder what the list
would look like.

~~~
pg
Yes; I changed the page to clarify. Though in fact it would be hard to
generate such a list without it being in order of frequency.

~~~
emfle
Heh, this is algorithm bait.

The obvious algorithm is to sort all the words in order of increasing
frequency, then print the 100 first. This algorithm is O(n log n) because of
the sorting. It generates the list in order of frequency.

There is an O(n) algorithm that first picks some random item as pivot, then
does a partioning like QuickSort where items with higher frequency than the
pivot are moved to the front of the array and those with lower frequency to
the back.

If the first partition has more than a 100 items, then the algorithm only has
to recurse into that part. If it has fewer ( _k_ ), then it prints everything
in the first partition, and recurses into the second to generate the 100 - _k_
best items.

This is expected O( _n_ ) and will not generate the list in order of
frequency.

------
rokhayakebe
This is the only crowd where "money" comes almost last.

~~~
brent
Granted its ambiguous, but million closer to the top than money. Its also
easier to say :-).

------
german
There is YC, thought I'd find PG also.

------
lst
(Apparently someone forgot to add 'remove #\|').

