

Show HN: 1M Song Dataset dev in 10 mins - kky
http://mortardata.com/million-song-dataset-in-10-minutes

======
deepkut
I really think your website should be more indicative of what MD does. I had
to look it up elsewhere. Regardless, very cool video.

~~~
kky
Thank you, I appreciate the feedback.

------
angryasian
I wish the site would write up an explanation, rather than providing a video.
Could someone who watched summarize ?

~~~
simon_weber
"Hawk is Heroku for Hadoop: an on-demand, easy-to-use cloud service for big
data. With Hawk any company will be able to extract the value from their big
data without the large amount of effort and cost that Hadoop otherwise
requires."

In the 6 minute example, they load a dataset from S3, then use Pig and Python
to process it. You can "illustrate" each step of your code, which pulls out
small, relevant samples from the dataset and shows the results.

------
jacabado
Looks amazing, can't wait to get access to it.

I'm starting my thesis on music information retrieval, just studying the
related work for now. If anybody has any suggestion on the directions I could
follow would be really welcome.

My initial idea would be to focus on playlist generation taking into account
user's history and usage. So far I've seen a lot of related work exploiting
song similarity, some cool work on music mood and some on assisted playlist
building. I'm also not ruling out recommendation or discovery.

~~~
kky
And also, if you haven't seen musicmachinery.com, do check it out.

~~~
bbq
Wow, this is a great resource. Thanks for sharing.

~~~
jacabado
I was just going through their archives trying to answer my question and found
a great discussion on this post:

[http://musicmachinery.com/2011/05/14/how-good-is-googles-
ins...](http://musicmachinery.com/2011/05/14/how-good-is-googles-instant-mix/)

There are some insightful comments from names I recognize from my research.

------
nashequilibrium
How does your offering compare to AWS elsatsic MapReduce using Pig?

------
ajdavis
Very cool demo, this looks like an amazing tool (and I don't even know much
about Hadoop!). One question -- it looks like you're skipping over the time it
takes for the "illustrate" function to calculate your results. How long does
it take for this million-song dataset?

~~~
kky
Oh thanks for asking that -- it takes about 30s to illustrate the million song
set; we're working to make it faster!

------
fasouto
+10 for the product, seems really nice. But I don't like the main site, it's
not very informative

I'm doing something similar for my master thesis, a pig console embedded in js
and also Cassandra support. I expect to release it in mid-January.

------
res0nat0r
This looks pretty awesome. So is this just using Elastic Mapreduce on the
backend? Can you use your existing AWS credentials for this with a Hawk
surcharge on top? This looks like lots of fun to use. Can't wait.

~~~
kky
We actually built on EC2, not Elastic MapReduce. Invoices come from Mortar
only -- that way when we can achieve bulk AWS savings, we can keep our cost
lower.

I'm glad it looks awesome, thanks!

------
latch
FWIW, looks great. Took me a while to hit the maximize button on the
videos...for a while I thought "man, they really didn't do a good job, I can't
read any of the text"

~~~
kky
Oh, great to know that, thanks!

~~~
ajdavis
I had the same thought at first -- next time, make sure the text is bold
enough to be legible at 320x240. Once I maximized the vid it looked great, of
course.

------
vitalyg
Very cool tool. Can the tool be integrated with my existing Hadoop cluster or
do I need to transfer all the data to MH?

------
revertts
Do I get any control over node types, or is "number of nodes" the only knob I
can turn?

~~~
kky
Right now, just "number of nodes". But that's definitely something we'll be
adding soon.

------
dennisgorelik
How is it different from SQL?

