Show HN: 1M Song Dataset dev in 10 mins

deepkut · on Nov 7, 2011

I really think your website should be more indicative of what MD does. I had to look it up elsewhere. Regardless, very cool video.

kky · on Nov 7, 2011

Thank you, I appreciate the feedback.

angryasian · on Nov 7, 2011

I wish the site would write up an explanation, rather than providing a video. Could someone who watched summarize ?

simon_weber · on Nov 7, 2011

"Hawk is Heroku for Hadoop: an on-demand, easy-to-use cloud service for big data. With Hawk any company will be able to extract the value from their big data without the large amount of effort and cost that Hadoop otherwise requires."

In the 6 minute example, they load a dataset from S3, then use Pig and Python to process it. You can "illustrate" each step of your code, which pulls out small, relevant samples from the dataset and shows the results.

nashequilibrium · on Nov 7, 2011

How does your offering compare to AWS elsatsic MapReduce using Pig?

jacabado · on Nov 7, 2011

Looks amazing, can't wait to get access to it.

I'm starting my thesis on music information retrieval, just studying the related work for now. If anybody has any suggestion on the directions I could follow would be really welcome.

My initial idea would be to focus on playlist generation taking into account user's history and usage. So far I've seen a lot of related work exploiting song similarity, some cool work on music mood and some on assisted playlist building. I'm also not ruling out recommendation or discovery.

kky · on Nov 7, 2011

Thank you! There is a link to request an invite at mortardata.com -- if you'd like access, let us know. We just got a lot of invitation requests from this post, but we'll invite you as soon as we can.

kky · on Nov 7, 2011

And also, if you haven't seen musicmachinery.com, do check it out.

bbq · on Nov 7, 2011

Wow, this is a great resource. Thanks for sharing.

jacabado · on Nov 7, 2011

I was just going through their archives trying to answer my question and found a great discussion on this post:

http://musicmachinery.com/2011/05/14/how-good-is-googles-ins...

There are some insightful comments from names I recognize from my research.

ajdavis · on Nov 7, 2011

Very cool demo, this looks like an amazing tool (and I don't even know much about Hadoop!). One question -- it looks like you're skipping over the time it takes for the "illustrate" function to calculate your results. How long does it take for this million-song dataset?

kky · on Nov 7, 2011

Oh thanks for asking that -- it takes about 30s to illustrate the million song set; we're working to make it faster!

fasouto · on Nov 7, 2011

+10 for the product, seems really nice. But I don't like the main site, it's not very informative

I'm doing something similar for my master thesis, a pig console embedded in js and also Cassandra support. I expect to release it in mid-January.

res0nat0r · on Nov 7, 2011

This looks pretty awesome. So is this just using Elastic Mapreduce on the backend? Can you use your existing AWS credentials for this with a Hawk surcharge on top? This looks like lots of fun to use. Can't wait.

kky · on Nov 7, 2011

We actually built on EC2, not Elastic MapReduce. Invoices come from Mortar only -- that way when we can achieve bulk AWS savings, we can keep our cost lower.

I'm glad it looks awesome, thanks!

latch · on Nov 7, 2011

FWIW, looks great. Took me a while to hit the maximize button on the videos...for a while I thought "man, they really didn't do a good job, I can't read any of the text"

kky · on Nov 7, 2011

Oh, great to know that, thanks!

ajdavis · on Nov 7, 2011

I had the same thought at first -- next time, make sure the text is bold enough to be legible at 320x240. Once I maximized the vid it looked great, of course.

vitalyg · on Nov 7, 2011

Very cool tool. Can the tool be integrated with my existing Hadoop cluster or do I need to transfer all the data to MH?

revertts · on Nov 7, 2011

Do I get any control over node types, or is "number of nodes" the only knob I can turn?

kky · on Nov 7, 2011

Right now, just "number of nodes". But that's definitely something we'll be adding soon.

dennisgorelik · on Nov 7, 2011

How is it different from SQL?