

Scala Data Pipelines for Music Recommendations at Spotify - flying_whale
http://www.slideshare.net/MrChrisJohnson/scala-data-pipelines-for-music-recommendations

======
ajones
Maybe it's just me, but it seems that several data engineering organizations
are picking up Scala. Is there a reason for this outside of Slide 13 in the
presentation that I am missing?

My team has plans of moving away from Scala and towards Python primarily
because of the job market. There is a feeling that it will be significantly
easier to find a good data engineer who uses Python than it will be to find
the same who uses Scala.

~~~
lmm
There are certainly some cool big-data tools in Scala (personally I find Spark
much more compelling than Scalding, but the fact that I'm even comparing two
reasonable choices is more than you'd get with many languages), but I don't
think that's what's driving adoption. My case for Scala would be that it's a
great language to write and an even better one to read: even more expressive
than Python but much safer to refactor, a type system powerful enough to
encode most of the business-level constraints and easy enough that you'll
actually use it, and it's easy to understand which parts of the code do what
because the type system makes it possible to isolate and track effects in a
reliable way. In all the companies I've seen adopting Scala it's been a
bottom-up thing, less because it offers some particular killer feature than
because developers want it and it makes them more productive.

Hiring experience with a specific language is of course easier with more
popular languages. But unless you're hiring emergency consultants or
something, it's more important to hire smart people who'll be able to learn
whatever language comes up. And I suspect that while it'll be very easy to
find someone who "knows Python", the talent level is... variable. Whereas
people who know Scala, though harder to find, will be the kind of people who
learn multiple languages, who enjoy learning new skills even when there's no
obvious market for them - i.e. the kind of people who you want to hire.

~~~
stoplight
> and an even better one to read...and it's easy to understand which parts of
> the code do what

It's not always easy to understand (from Odersky himself
[https://gist.github.com/odersky/6b7c0eb4731058803dfd#file-
fo...](https://gist.github.com/odersky/6b7c0eb4731058803dfd#file-foldingviews-
scala-L333)):

def toVector: Vector[B] = fold(Vector[B]())(_ :+ _)(_ ++ _)

To a Scala veteran, I'm sure that's easy to understand; to someone who's been
learning the language (like me) it looks like gibberish.

I've also had to write code like this:

client.post(args).mapTo[Response].map(r => (r.success, r.serverException,
r.unhandledException) match {

case (Some(response), None, None) => response

case (None, Some(serverEx), None) => throw serverEx

case (None, None, Some(unhandledEx)) => throw unhandledEx

})

simply because of the underlying api that other co-workers have built. When it
takes you longer than 30 seconds to explain how a piece of code works to
others, something is wrong.

All that being said, I do like the language; just not the compile times. It's
also pretty close to ruby (with which I'm most familiar):

ruby:

numbers = [1, 2, 3, 4, 5]

numbers.select { |n| n >= 4 }

# [4, 5]

Scala:

val numbers = List(1, 2, 3, 4, 5)

numbers.filter(n => n >= 4)

// List(4, 5)

~~~
lmm
> def toVector: Vector[B] = fold(Vector[B]())(_ :+ _)(_ ++ _)

> To a Scala veteran, I'm sure that's easy to understand; to someone who's
> been learning the language (like me) it looks like gibberish.

Compare what the same code would look like in Ruby; something like (hope I get
the syntax right):

    
    
        def toVector = fold(new Vector()) {|x, y| x :+ y} {|x, y| x ++ y}
    

Are the extra |x, y|s actually clarifying anything? Or are they just syntactic
ceremony? Maybe it's just my scala experience talking, but I think the scala
example is clearer; there's very little performance to get in the way, just
the meat of what the function's actually doing

(You can think that's a good or bad function to have, but that's a library
question, not a language question).

> simply because of the underlying api that other co-workers have built.

You can write bad APIs in any language; with a more sensible one that would
look like:

    
    
        client.post(args).map{
          case Success(response) => response
          case ServerError(ex) => throw ex
          case UnhandledError(ex) => throw ex
        }
    

Explicitly handling the different cases is exactly the kind of debugging
advantage I was talking about; it takes up-front effort to distinguish between
ServerError and UnhandledError, but the result is code where you can see all
the possibilities and know exactly where any given failure might be happening.
(And again, the language doesn't force you to do this; you _can_ just write
the happy path and allow any kind of exception to happen at any point. But
you'll pay the price in debugging, as I have in Python, and as I presume you
have in Ruby).

------
mortenjorck
All this work with matrix factorization and alternating least squares to find
undiscovered gems a user might actually enjoy, and then they torpedo it with a
simplistic and utterly useless "popular in your area" algorithm.

------
BhavdeepSethi
Things to worry about? -Be patient with the compiler.

This should be number one!

