Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: 1M Song Dataset dev in 10 mins (mortardata.com)
77 points by kky on Nov 6, 2011 | hide | past | favorite | 22 comments



I really think your website should be more indicative of what MD does. I had to look it up elsewhere. Regardless, very cool video.


Thank you, I appreciate the feedback.


I wish the site would write up an explanation, rather than providing a video. Could someone who watched summarize ?


"Hawk is Heroku for Hadoop: an on-demand, easy-to-use cloud service for big data. With Hawk any company will be able to extract the value from their big data without the large amount of effort and cost that Hadoop otherwise requires."

In the 6 minute example, they load a dataset from S3, then use Pig and Python to process it. You can "illustrate" each step of your code, which pulls out small, relevant samples from the dataset and shows the results.


How does your offering compare to AWS elsatsic MapReduce using Pig?


Looks amazing, can't wait to get access to it.

I'm starting my thesis on music information retrieval, just studying the related work for now. If anybody has any suggestion on the directions I could follow would be really welcome.

My initial idea would be to focus on playlist generation taking into account user's history and usage. So far I've seen a lot of related work exploiting song similarity, some cool work on music mood and some on assisted playlist building. I'm also not ruling out recommendation or discovery.


Thank you! There is a link to request an invite at mortardata.com -- if you'd like access, let us know. We just got a lot of invitation requests from this post, but we'll invite you as soon as we can.


And also, if you haven't seen musicmachinery.com, do check it out.


Wow, this is a great resource. Thanks for sharing.


I was just going through their archives trying to answer my question and found a great discussion on this post:

http://musicmachinery.com/2011/05/14/how-good-is-googles-ins...

There are some insightful comments from names I recognize from my research.


Very cool demo, this looks like an amazing tool (and I don't even know much about Hadoop!). One question -- it looks like you're skipping over the time it takes for the "illustrate" function to calculate your results. How long does it take for this million-song dataset?


Oh thanks for asking that -- it takes about 30s to illustrate the million song set; we're working to make it faster!


+10 for the product, seems really nice. But I don't like the main site, it's not very informative

I'm doing something similar for my master thesis, a pig console embedded in js and also Cassandra support. I expect to release it in mid-January.


This looks pretty awesome. So is this just using Elastic Mapreduce on the backend? Can you use your existing AWS credentials for this with a Hawk surcharge on top? This looks like lots of fun to use. Can't wait.


We actually built on EC2, not Elastic MapReduce. Invoices come from Mortar only -- that way when we can achieve bulk AWS savings, we can keep our cost lower.

I'm glad it looks awesome, thanks!


FWIW, looks great. Took me a while to hit the maximize button on the videos...for a while I thought "man, they really didn't do a good job, I can't read any of the text"


Oh, great to know that, thanks!


I had the same thought at first -- next time, make sure the text is bold enough to be legible at 320x240. Once I maximized the vid it looked great, of course.


Very cool tool. Can the tool be integrated with my existing Hadoop cluster or do I need to transfer all the data to MH?


Do I get any control over node types, or is "number of nodes" the only knob I can turn?


Right now, just "number of nodes". But that's definitely something we'll be adding soon.


How is it different from SQL?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: