Hacker News new | comments | show | ask | jobs | submit login

really? for what value of incredibly? i bet it's way faster than writing all that code.

[edit: to save people reading below i missed the sum requirement, which this wouldn't do, and a rough back of the envelope calculations suggests it would take about the same length of time as the code took. so i was wrong, sorry...]




I didn't have 70 GB of disk space free (two small SSDs in my laptop), but I just started running it on my old machine and will let you know the results. The grammar, parser, and traversal functions took me somewhere between two and three hours in total to think up and write; my gut feeling is that writing and sorting 70 GB of data on a spinning disk would be slower, though I may be wrong.

Another issue with using sort that you still need the sum of the numbers. You could store this as "wordnum;realnum", but now you've got over 100 GB of data to sort, and you still need to parse this out when you do your single pass after to get to the correct byte count. You could spend some time optimizing the sorting format (again using tokens), but now you're spending time doing work that's getting you even closer to an actual solution in code.

The coded solution is also scalable, although you could argue the necessity of scaling in this context. If a company were to introduce this puzzle looking for some byte in the first trillion numbers sorted (instead of billion), you'd be looking to sort several terabytes of data on disk; the constant memory solution, once tweaked to include the billions case in the same way the millions case is handled, would take roughly 15 minutes to run.


ah, sorry - i had missed the sum of numbers. also, after reading the sort man page, for fast sorting you may need to write out the data in a set of files (each of which fits in memory), sort each separately (well, sort before writing), and then use sort --merge to join them.

i'm impressed at how quickly you wrote the code. i guess it would have been better to say "i bet it would be quicker for me...".

[edit] actually, you can probably work it out. say you get 10MB/s to a disc. you need to write 10 files (each 7GB) which takes 700s or about 10 minutes each. so it's about 100 minutes for writing them (sorted in memory first) and then about 100 minutes reading them in the merge (i'm thinking you can filter the output to find the answer without writing again). so you'd expect around 3 hours, or about the time the code took to write.

[edit 2 to avoid yet more posts] thanks + good luck with the job application (i did one of their questions a year or two back and, while it was really interesting, all i got as a reply was "we've finished hiring this year"... although in their defense - and perhaps like you - i was doing it more for fun anyway)]


Upvoted you for the interesting discussion.

After I saw your first comment, I started writing out all billion numbers. It appears to be getting exponentially slower, which I suppose is to be expected as the file gets too big to fit in available contiguous blocks (there was only 100 GB free before it started).

It's been running for about an hour and is only half done, although writing 7 GB files would definitely be much faster; I don't know if sort --merge creates one big file at the end or not, but there are ways around that since it would be the bottleneck. It does sound like it could be done with sort in a semi-reasonable amount of time, although there'd still be more work to do once it's sorted.

In terms of writing the code itself, the parser is pretty similar to a toy computer algebra system I had written in the past (it's much simpler), so I had some pre-disposition to it in that sense, and the traversal is fairly straight forward given the grammar. I chose a problem that I thought I could solve in an afternoon because I knew I had real work to do the following Monday and still had to write the blog post, and Parkinson's Law may have helped me push through it a bit too. :)

Edit to reply to edit: This puzzle is actually retired from ITA's fleet, and I'm not interested in applying to ITA/Google anyway. I'm happily employed, and actually this is sort of the reverse: I'm looking to hire remote software devs at BuySellAds.com and was hoping this would pique the interest of the qualified segment of devs looking for a job.


My thoughts are this problem is complex regardless. It's pretty good that you could solve this all in one afternoon.

Are there any blogs, programming books, that you recommend that would help, someone improve their programming skills?

Thanks


I subscribe to the school of thought that you learn by doing, so my best advice is to put yourself out there and write some code. I've read lots of blogs, and a small handful of programming books (K&R, parts of SICP, parts of CLRS, Programming Pearls), but at each step I took it upon myself to actually do what the author was doing. Reading code or watching lectures isn't going to make you a better programmer any more than watching tennis will make you a better tennis player. You presumably already know the rules, you just need to practice to get to the next level.

Try writing anything that sounds fun. Write your own JSON parser, or a trie, or a B+ tree, or an implementation of the travelling salesman or knapsack problems. Wikipedia will get you started on all of these, and from there you can write the code. Too often people promote the idea that "Well, JSON libraries exist, why write your own?", but that misses the point. For one, it's fun. Professional tennis players exist too, and I am definitely never going to be as good as they are, but that doesn't mean it's not worth playing tennis if I enjoy doing so. More importantly, though, if you've written your own JSON parser, then the next time a new protocol or file format comes out and your language of choice isn't officially supported, you'll be able to write your own and be ahead of the game. All of these skills are transferable.

An added bonus is that when you go to apply for programming jobs, your portfolio of fun side projects will speak volumes about your ability to code. Don't worry that you don't have the best, most efficient, most popular JSON parser. The important part is that you spent your personal time bettering yourself, and that you are interested in being great at what you do.


Print "e"

is even faster, but also missed the point.


well, before i realised i had missed the sum part (which i suspect was added to foil lazy people like me) i would have argued that perhaps they are looking for someone who knows enough about the tools already available to find the fastest solution.

not that i have anything against the approaches here - the dynamic programming approach over trees is really cool and i probably wouldn't have thought of it myself. i just thought it amusing that there seemed to be a simpler way.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: