Hacker News new | past | comments | ask | show | jobs | submit login
Scala Best Practices (github.com)
83 points by century19 on Oct 31, 2014 | hide | past | web | favorite | 82 comments



Scala affords imperative place-based Von-Neumann style programming by design, because it is intended to ease the pain of working Java programmers not create more. See Odersky's Scala by Example [1].

He follows "show don't tell"[2] to meet people where they are. Odersky recognises that marginally better is still better Scala was designed pragmatically.

To an outsider, lists like this make it appear that entering Scala's language community is hard. There are just so many mores. [3] This is unfortunate because via Odersky's Coursera class, Scala has one of the best onboarding processes among contemporary languages.

[1] http://www.scala-lang.org/docu/files/ScalaByExample.pdf

[2] https://en.m.wikipedia.org/wiki/Show,_don%27t_tell

[3] http://learncodethehardway.org/blog/AUG_19_2012.html


Hello @brudgers,

The intent of this list isn't to scare people or to make them follow my conventions.

When working within larger teams, regardless of language, you need a list of do's and don'ts. And powerful or unfamiliar languages are especially problematic for rookies - because they need to make choices (e.g. what abstraction should I use to model this or that and before anybody mentions Python's philosophy - as an ex-Python developer I must say that it has led to horrible non-orthogonal choices).

I do not believe in imposed choices - mimicking what other people's are doing without understanding the reasoning behind it can lead to really awful hairballs, as anybody that worked within an enterprise-ish team can attest to. For example in this rule I'm trying to undo the damage done by another "best practice" that people are mindlessly following - https://github.com/alexandru/scala-best-practices/blob/maste...

Also many rules in that list are not necessarily related to Scala - does a language protect you from horrible, convoluted and big functions that do too many things? Of course not. Does a language keep from storing dates without timezone info? Of course not.

It's a work in progress though, feedback appreciated.


I know you are, as Hanselman says, "helping people fall into the pit of success". I guess the question is, can this list really be highly opinionated and canonical and remain practically useful?

The criticism of "best practices" lists is that everything becomes a "core principle". By which I am suggesting that "best practices for Scala" encompasses both Scala and everything in programming. Everything except, unfortunately, the prime directive: get something working first. Which in the moment often means ignoring best practices and just writing code.

There are places where Java date/time are appropriate because that is both sufficient and the convention...and because the ways in which JODA time can accommodate obscurities derived from esoteric abstractions are manifest. JODA is a best practice for people who know JODA well and who can reasonably predict that future maintenance will be done by qualified persons, ie. people who know Java well, and not necessarily Scala.

Being a library on top of Java, Scala inherits best practices from it. Acknowledging these as such and handling them with a pointer will keep the list it DRY...or rather DRO [don't repeat others].

I am thinking that in programming, the form in which best practices are embodied is not a list because lists evolve to wiki's as corollary to Greenspun's 10th rule [1]. I think best practices are better implemented as frameworks because frameworks can be highly opinionated without being open for debate.


I agree, this is why I placed: Rule 0.1 - MUST NOT follow advice blindly (and I meant it)

> get something working first

Yes and no. Getting something working first introduces technical debt. If the developers are responsible enough to go back later and fix it, it's OK, but in larger teams (over 3 people) this tends to not happen, then you're left with shitty components that become a liability and for which the reasonable thing to do would be to throw them away and restart from scratch, which also never happens because management doesn't agree.

In the context of a startup or personal projects, of course, things get shipped fast, mistakes are made, components are fixed later or rewritten, etc... but btw, some things are easy to do once you get used to it (e.g. immutable data-structures) and some choices are more natural to make once you know what you're doing (e.g. Actors vs Future vs Rx).

This is what I'm trying to pass on to colleagues - do stuff, but don't be superficial about it, because there's nothing worse than slapping an "if" in there that will fix a bug and cause 3 others. As in, if you need to do compromises, do them, but at least know why you did that and have a plan for fixing it.


None of those things are Scala specific. Nor are they solved by reading a list...even reading Code Complete is just a starting point. There's still a big gap to synthesis and execution.

Philosophically, isn't a list of best practices a form of technical debt? I mean all the burden of tooling it up has been kicked down the road. Then again, building an entire framework might be premature optimization.


Effective refactoring should actually be easier to do with a bigger team, one reason, amongst other, is that you have more hands on deck for code reviews.


Um...It's "Odersky"


I'm a big fan of the guide that Twitter put together titled Effective Scala: http://twitter.github.io/effectivescala/

For those who want to learn Scala, Twitter has also put together a set of lessons called Scala School: https://twitter.github.io/scala_school/


Useful resource, but as often happens with lists of best practices, I do not agree with some things Twitter is giving advice on.

For example - https://twitter.github.io/effectivescala/#Control structures-Returns

They are saying that "return" can be used for "reducing nesting". First of all, that's a lie - if a return exists in that code, doesn't mean that the branch no longer exists, it only means that it isn't explicit and far more effective for reducing nesting would be to break those functions in smaller ones. I happen to believe that usage of "return" in Scala is never appropriate and it was a design mistake; at the very least they abstained from adding "break" or "continue". Scala is an expression oriented language and really, if you're going to write Scala as if you'd write Java, then just pick Java because Scala, contrary to other people's opinions, is not a better Java in the sense that it actively works against you if you're doing things like that.

I tried outlining the rationale in this rule, which I consider to be non-optional (hence the usage of MUST, instead of SHOULD :)) - https://github.com/alexandru/scala-best-practices/blob/maste...


Hello, I'm the author of this list - I compiled it primarily for my colleagues, but hopefully it will help others.

It isn't complete yet and I have more stuff to add and some stuff still needs clarification, like Rule 4.3 (just created an issue for it).

If you have suggestions, feedback, things you disagree with, I'm happy to make edits or to accept issues and pull requests :-)


1.2 I disagree with your stance (and your misuse of "NP-complete" in justifying it is a pet peeve; it is a technical term that has a specific meaning). IMO manually maintained formatting is not worth the cost; being able to run an autoformatter whenever code is unclear is a huge benefit.

2.4 I would caveat that useless traits are useless in application code, but make sense in libraries published for use by other teams.

2.5 Disagree. The general contract of equals and hashCode does not demand that they are immutable; there are a wide variety of mutable java classes that offer similar equals and hashCode. Other aspects of the case class sugar can be very useful even for a mutable type. The correct solution is to avoid equals and hashCode in general (they're broken by default anyway); use something like scalaz's Equal, and don't define Equal instances for mutable types.

2.7 You should explain a bit about how to achieve the same results using e.g. scalaz.\/ and for/yield.

3.1 I find the arguments here very weak.

3.4 Stream has many problems and Iterable is hard to reason about. I think the suggestion to use lazy collections is bad advice. Temporary intermediate values ought to be caught by escape analysis and not cause GC pressure at all. I have never seen hard evidence of such values causing a problem in a real-world program. 3.3 applies.

4.2 It's worth mentioning the advantages of scalaz Future (more sensible handling of ExecutionContexts, also theoretically higher performance because of having fewer task boundaries).

Also I would emphasise that an actor with no mutable state is almost always a mistake (it should usually be replaced with use of Future or similar), as this is a common anti-pattern I see in newcomers' code.

4.3 It's reasonable to put actually heavy computation on dedicated threads (so that I/O threads can pre-empt compute threads). So there are times when this does make sense. You would certainly expect an explicit ExecutionContext in that case though.

5.2 Disagree. become is harder to reason about than ordinary mutable state (var and friends), as your example demonstrates. Actors encapsulate mutable state, that's their raison d'etre, if you're using them at all you've already given up on purity or referential transparency.


1.2. I do not agree and I tried justifying that with samples. If you want to use auto-formatters, that's fine, but I find it disrespectful to reformat the code of other people that know what they are doing.

2.4. You missed my point. I didn't say traits are useless, but that people shouldn't define traits that are useless.

2.5. That rule includes a reference to a paper from 1993. And yes, the contract of equals and hashCode is broken. People should strive for sane equality, not a sane Object.equals, because there is not such thing. If that's not enough to convince you of it, because syntactic sugar, we'll have to agree to disagree but to be honest I refuse to work with code violating such a fundamental rule.

2.7. Thanks for the feedback, will do.

3.1. Point taken, will add more details.

3.4. What problems does Stream have? Also for me, those operators make much more sense in the context of lazy collections - that Scala's library does such a shitty job on lazy collections, that's a pet peeve of mine. On "escape analysis", I hope you're not relying on it because it doesn't work. The algorithm implemented by the JVM is very limited.

On evidence, I have, countless of times. But you know, search for the "mechanical sympathy" google group and ask there. Rule 3.3 applies to doing optimizations, but that doesn't mean one has a free pass to do stupid shit, ignoring CS 101. I personally don't see a conflict there.

4.2. I'm not familiar with Scalaz' Future. Will take a look.

On actors with no mutable state, I agree, but I also see another problem in that simple actors with mutable state could be modeled without state, as Futures (or similar) as well. I find it tough to explain to people when Futures should be used instead of Actors. I find it tough to make that choice myself. That's why this rule doesn't try to impose something specific - its purpose being to make people think about tradeoffs and not use something blindly.

4.3. This rule already has an issue started by me, as a reminder to myself that I did a shitty job - https://github.com/alexandru/scala-best-practices/issues/8

5.2. I think you miss-understood that rule. I'm not saying that actors should not mutate their state, but I'm saying that it is better to control the scope of those variables by means of context.become in combination with pure functions. You know, the Erlang way.

Also, actors are really NOT about mutating state willy-nilly, sorry but that misses the forrest from the trees. Actors are about bidirectional asynchronous communications, non-determinism and state machines. And as always, just because mutation happens, that doesn't mean you can't limit its effects. And it's actually really, really advised to do so on the JVM, in the presence of Futures that can capture the actor's internal state and so on and so forth.


1.2 That's just, like, your opinion man. It's a reasonable position but it's not one where there's a community consensus and does not belong in a best practices guide.

2.4 No, you missed my point. Useless traits like the one in your example can be reasonable in the context of a library that needs to be able to evolve while maintaining backwards compatibility. And I think this distinction is worth drawing explicitly, because otherwise people will default to doing the "safe" thing everywhere, like they did with JavaBean getters/setters (which are a sensible if tedious thing for Java libraries to use, but pure wastefulness in application code).

2.5 Honestly if you're working with JVM libraries, you're already working with a lot of code that violates that "fundamental rule". And if you're not working with JVM libraries, why would you bother using Scala rather than Haskell? :P.

4.2/actors with no mutable state: It's a guideline that has plenty of false negatives - it doesn't capture all the cases where someone writes an Actor that ought to just be a Future or similar - but I don't think it has many false positives, and it's just such a common mistake that for me it's worth sharing explicitly.

5.2 I'm prepared to be convinced, but what you've written there doesn't convince me. I don't see any difference between the two examples in terms of being "pure, immutable or referentially transparent", and to my eyes the accidental complexity of "become" (a new concept) is higher than that of ordinary mutable variables. Telling me that that way is "the Erlang way" or to "just wait until you'll have to model a state machine with 10 states in it and dozens of possible transitions and effects to go along with it" sounds like bluster, tbh. Maybe it just needs better examples.


Just as a matter of style I would put a comment like //dont do this or //bad practice in the negative examples, so that it stands out better.



Hi bad_user,

I seen this on Twitter and just by scanning it quickly I learned some things, so that's why I posted it here and thanks for sharing.

Can I ask about rule 5.2. SHOULD mutate state in actors only with "context.become"?

How would you use become to avoid using vars to keep state in an Actor like this?

class Account(var account: AccountDetails) extends Actor { def receive = { case msg: UpdateAccount => { account = msg.account } } }


Actors are good at encapsulating state, therefore using mutable variales inside an actor through is very natural, and one might even question if an actor is needed if it does not have any mutable state inside it.

However, 5.2 does not concern the elimination of vars, but rather better management of multiple states (Finite state machines)

Your example, just like the example in 5.2, only has a single state, so no context.become is strictly necessary. The author makes clear that when you have multiple states, the collection of top-level mutable vars will be hard to maintain. I had this problem a few times and I am glad this gets mentioned in the guide.

I agree though, that a better example would include at least two states.


I'll add a better example, thanks for the feedback.


As @wernerb already pointed out, the idea is to use context.become such that inside your actor the only side-effects happening are due to context.become, otherwise the data structures are left immutable, the helper functions are left pure, etc... and it becomes easier (less error prone) to manage complex actors that way.

Another side-effect of this for example is that you're not going to worry about vars that leak into Future computations, which is a mistake that even experienced developers are doing, because it's an easy one to make.

Here's your sample translated into a state-machine with 2 states: https://gist.github.com/alexandru/63eebe8e49e796f31e73


OK, I think I get it. Will have to try out the example.

Thanks for the explanation and the code.


Hi,

Quick comment on item 2.3: SHOULD NOT update a var using loops or conditions.

you conclude that this expression is the best option: val sum = elements.map(_.value).sum

This expression is not equivalent to the previous options in performance and memory, as the map expression will create a entire new sequence.

For large sequences, a better option is to write: val sum = elements.view.map(_.value).sum


Just as C++11 looks worlds different (and better) than C++ circa 1990, I imagine that the same dynamic will play out with Scala, which I tend to analogize as Java++. Certain idiom will develop, certain language features will be deprecated but by nature of the platform will not be able to be removed; &c.

I find some idiom used by the more experienced Scala devs on my team to be distasteful: I hate hate hate the semantic confusion between the Option type and sequences that leads people to map over Options (or the dual use of is/nonEmpty). I would always prefer pattern matching, or getOrElse, for instance.

I think I'm just a fan of smaller languages; I certainly understand -- and commend -- Dr Odersky's approach to a pragmatic extension of a (bad) industry language, just as I appreciated Stoustroup's; but given my druthers, I'd just as soon be working in something perhaps less pragmatic but certainly smaller and less warty.

If anybody knows what language that is, please tell me. Or invent it.


> I hate hate hate the semantic confusion between the Option type and sequences

Option is a Monad in Scala and could also be modeled as an Applicative Functor - which are very generic abstractions that transcend sequences. Don't let the naming scare you, you can view them as design patterns, it just so happens that people aren't familiar with them.

In Scala mapping, filtering and flatMap-ing are generic and for example Scala's for-comprehensions work on all things implementing map, filter and flatMap including things that are not sequences and for good reason.

And most importantly - `Option` would be completely useless without mapping, filtering or flatMap-ing on it. Because often you have to combine multiple options and what happens in practice, without an easy to use API, people would just do "Option.get" and trade a null pointer exception for a NoSuchElementException, which would defeat the purpose of using Option.

The semantic confusion between an Option and a sequence is the same confusion between a sequence and a Set - none and if there is, then you need to go back to the basics. First of all, because Option really is a collection of zero or one elements and second because all of them are Monads.

> I imagine that the same dynamic will play out with Scala, which I tend to analogize as Java++

I don't know what makes people say things like this. Scala is nothing like Java and I worked with C++ and with Perl too, in production. Scala has some warts in it, but such comparisons are unwarranted.


Thank you, this is very helpful. I tend to think of the Option/Maybe type as signaling to the reader a SQL-like 3VL value, where the distinction between having a thing and not having a thing is not contingent on the thing-in-itself. Clearly, I need to lift my level of thinking up at least one level of abstraction.

> I don't know what makes people say things like this.

It's a social and not a technical analogy (although I think the technical analogy is stronger than you apparently do). C++ took an established industry language and opened the door to a whole new level of abstraction; just as you can write C in C++ (missing the whole point), you can write Java in Scala (again, missing the point). C++'s affordances to prevent this were weaker than Scala's, which means that the acceptance of Scala's idiomatic way will probably happen much faster than that of C++'s did.

It wasn't intended to be a derogatory analogy; I admire modern C++, even as I am happy not to be writing it.


> `Option` would be completely useless without mapping, filtering or flatMap-ing on it.

This position is a bit extreme. It's fine for pedestrian use to use pattern matching or isDefined.


> Just as C++11 looks worlds different (and better) than C++ circa 1990, I imagine that the same dynamic will play out with Scala, which I tend to analogize as Java++. Certain idiom will develop, certain language features will be deprecated but by nature of the platform will not be able to be removed; &c.

I think popularity (particularly with the enterprise) ossifies a language; it's too late to fix certain mistakes. Certainly this is the case for Scala already. But don't underestimate the difference between 5 years of cruft and 25 years of cruft.

> I find some idiom used by the more experienced Scala devs on my team to be distasteful: I hate hate hate the semantic confusion between the Option type and sequences that leads people to map over Options

That's not confusion but enlightenment. Options aren't Sequences, but both are Monads.

> I think I'm just a fan of smaller languages; I certainly understand -- and commend -- Dr Odersky's approach to a pragmatic extension of a (bad) industry language, just as I appreciated Stoustroup's; but given my druthers, I'd just as soon be working in something perhaps less pragmatic but certainly smaller and less warty.

You might like Idris - it looks a lot like a slimmed-down, more principled Scala with the benefits of lessons learned. Of course the price for that is that it's less mature and has a much smaller community.


You might prefer Kotlin?[0] It's going in the same direction as Scala, but not quite so far.

By the way, I'm curious what you mean by "the semantic confusion between the Option type and sequences that leads people to map over Options". Mapping and 'flatMapping' are very general things in Scala, and apply to many types that are not collections, like Futures / Observables / etc.

[0] http://kotlinlang.org/


It's unfamiliarity with the idiom that makes me uncomfortable, to be perfectly frank. I'll get used to it.

I won't get used to how alien and ultimately weak Scala's support for algebraic data types is, and that's not changing. This comes down to the affordances that Odersky and the Scala team have decided to favor, which is OO flavored in a way that I don't like. I vastly prefer the ML style:

  data Foo = Bar s | Baz
to the sealed class/case nightmare of Scala. Again, this is not a technical nit -- this is an idiomatic and personal one.


Yeah, but the cool thing about Scala is that you get to pick the kind of polymorphism you want, depending on whether the list of Nouns will evolve versus the list of Verbs.

Like, those case classes are also classes that can have polymorphic methods on them. So for example, you can do this: https://gist.github.com/alexandru/2604948978c497adc8e2 - instead of doing this: https://gist.github.com/alexandru/41acda9bdd694bec38b2

Which variant is better, it depends, but both versions have tradeoffs, depending on direction. Also, Scala's OOP blend with FP is best in breed. In Ocaml for example is like having 2 type-systems in the same language, whereas in Scala it's more like a turtles all the way down kind of thing which I like.


Also, Scala's OOP blend with FP is best in breed. In Ocaml for example is like having 2 type-systems in the same language, whereas in Scala it's more like a turtles all the way down kind of thing which I like.

This is a perfectly reasonable stance to take; I think my Objective-C background would have me argue that it makes more sense to keep two distinct paradigms syntactically distinct, but that's a matter of taste. I haven't done anything serious with OCaml -- perhaps that's a direction I'll explore after my Haskell toy project is finished.

Thanks for the replies, you've shed much light. And the OP is great, too; I'll definitely be keeping it close at hand.


TBH, I think that is a legitimate technical nit. Case classes end up being very useful, but they're not as principled as the ADTs that they're generally used to replace, and that shows up in a bunch of weird corner cases. It's one of the prices you pay for trying to cram this stuff into the JVM / OOP universe.

Speaking to your sibling comment on Haskell and Kotlin: I also happen to think that Haskell is by far the better-designed language. I think Kotlin does a nice job of cleaning up some of the warts of Java, but I don't really want a better Java -- I just want a better language. Scala is definitely less elegant than Haskell (Haskell/Scalaz-style FP is particularly gross) but there's a large class of problems for which I really do think it's the best language / ecosystem that happens to actually exist.


I complain, but at the end of the day, I really do prefer working in the JVM with a language that can e.g. curry functions and implement at least partial type inference, given the constraints that we work under. I think it was a good choice; but I'm not likely to grab for Scala for fun.


Honestly Kotlin has 80% of the complexity of Scala, 20% of the functionality, almost no library ecosystem and 0 stable releases. I don't know why so many people consider it a serious choice.

If you want java with type inference you can use Xtend they have had many stable releases and it has excellent tool support.


I think I'd be more interested in Haskell-- than Kotlin; I am not a fan of statically dispatched OO language implementations. I am (perhaps unsurprisingly, given my employer) much more a fan of Smalltalk derived languages when writing in an OO style.


Definitely try Haskell, you'll like it. But do note it's completely idiomatic in Haskell to map over Maybe (aka Option in Scala), which was your initial objection in this subthread :)


This is a case of PEBCAK, not the language design. I'll get used to it.


Seems like an awful lot of rules to hold in one's head. Scala just looks like a giant tumor of complexity to me.


What other people find complex, I personally find refreshing. Could you please describe why it looks like a giant tumor of complexity?

There's a difference between true complexity (i.e. complecting or that lead to accidental bugs) and things just being unfamiliar. I try to stay away from the former, while not shying away from the later - since I've picked up Scala about 3 years ago, my knowledge expanded with a lot of useful abstractions that help me do my job better, abstractions that weren't natural or popular or properly implemented in the other languages I worked with. This isn't to say that Scala doesn't have true complexity in it, but then again, what language doesn't?


a) Martin Odersky has admitted as much[1].

b) The compiler internals are reportedly a mess[2]--likely due at least in part to the extreme number of different syntax variants.

c) I find rule 4.1 in OP interesting ("SHOULD avoid concurrency like the plague it is"). According to Odersky's Coursera course[3], one of the main advantages of functional programming as enabled by Scala is easy, safer concurrency (reduced/absent mutable shared state). Yet here we see a presumably seasoned Scala veteran basically saying that this is a failure and we should avoid concurrency whenever possible because it results in too many bugs.

1. http://www.infoworld.com/article/2609013/java/scala-founder-...

2. https://www.youtube.com/watch?v=TS1lpKBMkgg

3. https://www.coursera.org/course/progfun


I really can't reply to that, sorry.


It's always funny to see how these things get interpreted. :-)

a) There is a lot of work going on at the foundations of the language – most of the stuff is unlikely to be felt by users.

b) Compilers are hard, there is very likely no "nice" compiler out there (except for toy languages). This has almost nothing to do with syntax variants. Nevertheless, there are multiple groups which are working on improving the codebase.

c) Concurrency is hard to get right, especially "traditional" approaches like synchronizing, locks, etc. If it's not necessary, don't use it. Otherwise use the right abstractions for your problem, as mentioned in the next paragraphs.


To be fair these are not rules, just best practices, like naming conventions.

i.e.

   > 2.1. MUST NOT use "return"
   > 1.1. SHOULD enforce a reasonable line length
   > 1.5. Names MUST be meaningful 
   *(x, xs, xss, ns, are meaningful names by the way)*
   > 3.3. SHOULD NOT apply optimizations without profiling
...


When you think about them, a lot of those rules aren't Scala-specific (though some are). So you could just as well say "most programming languages are giant tumors of complexity".

For example, if you write Java, most (or similar, or even more complex) rules apply.


"Java is just is bad" is not a ringing endorsement.


Heh. I'm not one of them, but a lot of people think Java would be mostly fine with just minor updates.

In any case, I chose Java as an example because (sadly) it's what I'm forced to use in my day job, with some Scala side-projects (thankfully!). But I didn't mean to single Java out: a lot of the advice applies in general to software design, regardless of language. You know, "choose the right abstraction", "beware concurrency", "name things wisely", etc.


As you get more experienced with Scala, you begin to develop better habits and notice more patterns and you will throw away a lot of the unneeded complexity.

Scala can be complex (on paper), but in practice it doesn't have to be.


I'm not so sure what to think of Scala as a language, but I think the naming and formatting recommendations in this document have merit beyond a single language.

There are only two hard things in Computer Science: cache invalidation and naming things.

Truer words were seldom written. Just compare the naming culture in Common Lisp with that of OCaml, for example. Worlds of difference.

And don't even get me started on the “standard” way of formatting in C family languages...


There are only two hard things in Computer Science: cache invalidation and naming things and off-by-one errors.


I've heard this a lot but is cache invalidation as hard of a problem as , say, distributed systems like bitcoin?


I thought of a single process as the frame of reference. Once you enter distributed systems, complexity (and consequently hardness) can achieve arbitrary levels.


Hi again bad_user,

Glad your collection of best practices got the attention and discussion they deserved today. I notice you posted them to HN 8 days ago and nothing. Unpredictable eh?

One more request, with your best practice "3.2. MUST NOT put things in Play's Global" - could you add an example of the correct way to "come up with your own freaking namespace"?

I assume putting authentication in there like SecureSocial does is ok?


"And if a public API is not thread-safe for some reason (like the usual compromises made in software development), then state this fact in BOLD CAPITAL LETTERS."

I prefer to use the `Unsafe` prefix for this, to make it abundantly clear without looking at the call. E.g., `updateUnsafe`.

I also use the `Unsafe` suffix when defining an HTML template rendering method that does not do escaping.


This is so awesome! I've been slowly learning Scala and was looking for a "Effective Scala" type book.

Thanks for making this!


Just keep in mind that it's an opinionated list. I routinely and intentionally violate several of these rules and have no intention of stopping that.

Much of it will depend on your application domain and needs. For example, if you're dealing with matrices, avoiding destructive updates will only give you a performance hit without an increase in maintainability.


Which section are you referring to? 2.1, 2.2 maybe?

Sorry of for the simple question, but I think I'm missing something. Wouldn't avoiding a destructive update give better performance? And the trade-off would be higher maintainability? With a mutable structure, you're 'destroying' just reusing the same locations in memory and just replacing, versus say an immutable structure returning a completely new structure (new memory allocations).

But to your main point, yes I won't take this as the canonical source of coding standards. I just like to read through stuff like this because it gives me alot of areas to explore and research. Like why do this vs that, what's the tradeoffs, etc..

<<edit>> Whoop's looks like i have my logic/terminology flipped! Saw this on SO which cleared things up a bit.

http://stackoverflow.com/questions/6964233/what-is-a-destruc...

Also just noticed the + vs += in some of the mutable collections, which helps to explain.


> Which section are you referring to? 2.1, 2.2 maybe?

Much of section 2 and 4. Also, I'm not using Play, Akka, or any other web frameworks (since I don't do web stuff), so those don't apply to me and I don't have a basis for evaluating them.

> Wouldn't avoiding a destructive update give better performance?

No. In the absence of destructive updates, you either need to copy matrices or use a data structure that typically incurs a log(n) overhead with a fairly large constant.


> In the absence of destructive updates, you either need to copy matrices or use a data structure that typically incurs a log(n) overhead with a fairly large constant.

That is not true - first of all, log(n) in our industry means log2(n). Scala's Vector for example is a log32(n), which given that the max size is Int.MaxValue, can be treated as constant access as far as algorithmic complexity is concerned. Persistent HashMaps are also implemented (instead of self-balancing binary trees), which again means less than log2(n). And Scala's Queue for example is O(1) for both enqueue() and dequeue() - it's just that every "n" operations it has to invert a list of pending items, which takes "O(n)" but given that it happens every "n" operations it is amortized.

It's not algorithmic complexity that bites you, algorithmic complexity is totally fine. The real problem is more low level, with the indirections happening due to usage of pointers. Depending on memory usage patterns, persistent data-structures could stress the GC enough to increase the number of stop-the-world pauses. Also in a multi-threading context, when you keep the pounding on the same reference holding an immutable data-structure, you drive the contention to the root of that tree and that is less efficient than a specialized mutable data-structure that can distribute those locks in multiple buckets ... but still far better than synchronizing on a standard mutable data-structure.

Actually, which is better in terms of performance depends on context. For example if you have shared reads, then an immutable data-structure is better because you do not need to synchronize those reads.

Also - I did work on a web service that was hit with tens of thousands of requests per second and because of the architecture we've built (asynchronous single producers, multi consumers), usage of immutable data-structures wasn't an issue and it was much saner. AND most people working on most project never end up with such hard optimization issues.

Therefore fear-mongering about something that in practice really is not an issue is not helpful, especially given the sanity that usage of immutable data-structures brings. See this other rule I added - https://github.com/alexandru/scala-best-practices/blob/maste...


> That is not true - first of all, log(n) in our industry means log2(n)

Which, for a 1000x1000 matrix, means a factor of about 20, not to mention the significant constant overhead from not using a contiguous memory layout.

> It's not algorithmic complexity that bites you, algorithmic complexity is totally fine.

I didn't say anything about algorithmic complexity. Note that I was writing log(n), not O(log(n)) in particular and pointing at the constant factor, too. A computation that takes two days instead of half an hour does make a very real difference to me.

> I did work on a web service that [...]

That's why I was careful to point out how different application domains matter. Web services are not numerical computations are not computer algebra are not big data.


> A computation that takes two days instead of half an hour does make a very real difference to me

IMHO, such a big difference only arrises when big differences in algorithmic complexity is involved.

Again I must mention that I'm of the opinion that if you know what you're doing, then it's fine, but you must be able to defend your choice, as an argument like "gosh, heard from the Internet that immutable data-structures are slower" is not acceptable given the extra sanity that it brings. Plus, depending on context (e.g. web services that are dealing with parallelism combined with I/O) immutable data-structures might actually improve throughput.

Your use-case is entirely legitimate, therefore that rule on usage of immutable data-structures is a SHOULD (optional), rather than a MUST (required). If you wanna do it, then do it - after all, the ability to get down and dirty is one thing I like about Scala, too bad that it doesn't have a GC-less version :-)


> IMHO, such a big difference only arrises when big differences in algorithmic complexity is involved.

No, this arises as the result of large constant factors, too. As I pointed out, log2(1e6) is approximately a factor of 20 alone and you pay additional constant overhead for having suboptimal memory layout and allocations. Remember, while functional programming languages love trees and linked lists, the hardware loves contiguous arrays. This is what the L1 and L2 cache and prefetching logic of your typical CPU are optimized for.


Thanks, I think I get it now. Helps when I actually read the API's :)


Hello @rbehrends, I marked rules that can be broken with SHOULD, in case you know what you're doing, versus rules that shouldn't be broken that are marked with a MUST. So optional versus non-optional.

I'm not against destructive updates, but I'm against doing destructive updates without knowing what you're doing. I'm also against usage of shared state without proper encapsulation and synchronization.

And when a developer makes such a problematic choice, it has to be able to defend it with arguments other than "couldn't think of anything better and we've got deadlines". I hate that argument with a passion, time is somehow always used as an argument for doing stupid shit, technical debt is then increased and then sprint after sprint you're left cleaning up stupid shit from previous sprints.

Yes, I'm very opinionated :-)


I used "opinionated" instead "wrong" intentionally. There's nothing wrong with being opinionated (I'm opinionated, too!), and it's pretty much unavoidable when it comes to coding style; I'd only quibble with calling the document "Best Practices", which seems to indicate an absolute rule. And yes, I also mean that I'm ignoring some of the "MUST" and "MUST NOT" rules (for a very simple example, using "return" at the end of a function in spite of 2.1 as a visual indicator; similarly, 2.9 or 2.14 are sometimes not an option – pun intended – because Scala's abstractions often come with a measurable performance hit; note the inherent conflict between 2.9/2.14 and 3.4 in particular, where often you have to break one or the other rule).

With respect to your points:

> I'm not against destructive updates, but I'm against doing destructive updates without knowing what you're doing.

With all due respect, the document doesn't make this clear. See, e.g.: "a public API exposing a mutable data-structure is an abomination of nature". That's a pretty absolute statement.

> I'm also against usage of shared state without proper encapsulation and synchronization.

There are really two parts to this. First of all, in principle I agree with the above, but that's not what your document says. For example, section 4.1 goes much farther than that; it may be exaggerated overstatement for effect, but it's really not backed up by anything.

Admittedly, part of the problem is that Scala has never fixed Java's broken [1] concurrency model (and may not be able to do so due to interoperability issues), but even so, this is a statement that's difficult to support in the general case.

There are also sections that don't really make sense, to be honest. For example, 4.7: Not only isn't there really a formal definition of "thread-safe", any attempt to define it (such as using non-interference per Owicki/Gries) makes it essentially a global property of the program, not the local property of a module (let alone its API). There are software design and implementation techniques that allow you to have composable module specifications in the face of concurrency, but that's something different.

In a way, yes, avoiding concurrency is not unreasonable advice in the face of the limited tools you have available and you can say that I'm lamenting more the (lack of) actual support for concurrency in modern programming languages rather than criticizing your statement; at the same time, even given the circumstances, section 4 strikes me as overall too limiting. There ARE other techniques to handle concurrency cleanly, after all.

[1] I use the word "broken" advisedly. See http://brinch-hansen.net/papers/1999b.pdf


On 2.1 I cannot agree with you, because if you're using return for clarity, you're circumventing the compiler, which is then unable to do proper inference, not to mention it has gotchas, because as I was saying, most people do not know how return works, and besides, if you need return for clarity, you're doing it wrong and I challenge you to show me a piece of code that is clearer with it.

On 2.9/2.14, as somebody that suffered from servers crashing due to enormous load and that had to profile the shit out of everything, I usually find such arguments against persistent data structures to be bullshit, and if you think it matters that much for your problems, you probably picked the wrong language and possibly the wrong platform. And NO, there is no conflict with 3.4.

On destructive updates again, there is a fine line between interesting and boring, plus I don't really have time to write a book. But thanks for the feedback, I'll add more details.

On 4.1, I really hope you're making a difference between concurrency and parallelism, right?

I don't need to back that claim up, just as I don't need to prove that the sky is blue. And as I was telling somebody else, concurrency is not something to take advantage of, but rather something one has to suffer from.

On Java's "broken concurrency model", actually I find Java to be quite sane, because at least it has a memory model that works well cross platform and that is generic enough to support all kinds of abstractions built on top, like a sane Future with C#-like async, or light weight threads (Quasar), or extremely efficient queues (LMAX Disruptor), or Erlang-style Actors (Akka), or reactive streams (Rx), or parallel collections (Scala), or STM that works (Scala-STM) or CSP or Agents or what have you. And because there is no silver bullet, all of these abstractions are useful to have in your toolbox. And surely a platform built for a certain abstraction (eg Erlang) will be better than a library, but also find that to be very limiting.

Again I challenge you to give me an example of a platform that handles concurrency better than Java.

Related to my list, I wasn't the one that posted it BTW, not trying to impose my will on anybody, take from it what you will.


I'm not very current on the Scala tooling - is there a reasonable lint tool which enforces these types of suggestions?


We've been working on https://github.com/scala/scala-abide. Our aim is to provide a nice platform for others to develop and distribute style checking rules. We (the core scalac devs) intend to focus on keeping the abide platform up to date with the compiler, letting the community evolve the set of rules.

Due to resource constraints, we haven't made a big push in advertising it yet, but we're happy to work with you if you'd like to start implementing rules. Have a look at recent PRs for some examples.


Thanks for the info, and kudos on the good work in general!

Abide looks quite nice, although I'm a very occasional user of Scala so far, so I'm not sure I'd be very good at coming up with and implementing rules. I'll definitely spread the word to some friends who use Scala more frequently, and may be able to contribute.


Thanks! Please don't hesitate to ask (ideally on scala-internals) if you'd like some pointers to get started with abide.


Brian McKenna's Wart Remover[1] is an excellent tool for capturing silent (but deadly) errors at compile time.

Can also turn on various scalac -X flags to, for example, turn warnings into compile time errors. Invoke scala REPL with -X to see available flags.

[1] https://github.com/typelevel/wartremover


Thanks, very useful.

It's nice to see that both Wart Remover and OP's post had reasonable explanations and examples to justify each rule. This makes the rule all that much better as a learning experience (contrast with, e.g. "I have never seen a piece of code that was not improved by refactoring it to remove the continue statement" from JavaScript: The Good Parts, which leaves me baffled and no wiser).


IntelliJ's Scala plug-in has some support for things like these.


I think it's commonly called your brain.


This is a really helpful guide, which I can see myself coming back to a lot.

I can't express how relieved I am to see someone else directing Scala developers away from the Cake Pattern. I've only ever seen it result in messy, confusing code.


Scala Best Practices:

1. Don't use Scala 2. Rule #1 is blatant opinion.


Hello, I'm the author of this list, which I hope to grow to be something more than a single-author thing.

Until now I've done non-trivial amounts of work on the job in Java, C#, C++, PHP, Python, Perl, Ruby, Javascript and Scala and I've played in my free time with half a dozen others, like Scheme, Clojure, Rust and Haskell.

Take what you will from this, but I found Scala to be the most sane thus far. My opinion may change in the future, since I don't really have loyalty for programming languages, but regardless - I find it more constructive and better for your knowledge to verbalize your dislikes with clear arguments, maybe there's something you're missing.


I don't have any particular loyalty either. But, in my opinion, and after taking considerable time looking into scala, it's not my preferred language. The syntax can quickly devolve into soup. It's entangled with Java too closely (unlike say clojure). And last time I tried it compile times were...awful. All of this is my opinion and if scala works for you that's great. This guide is a great start for scala devs not to shoot themselves in the foot.


> It's entangled with Java too closely (unlike say clojure)

In some ways it is saner than Clojure btw ...

For example Clojure devs pride themselves that their language doesn't have variables, but oh wait, it has Vars and function definitions must be dereferenced on call sites and those Vars can be modified wherever, whenever, as Clojure does not have the notion of "val" or "final". And of course you can do thread-local bindings for those Vars and that's why people have used it for things such as dependency injection, as a poor man's implicit arguments that do not appear in the function's signature. I don't even know if those persistent data-structures could be implemented in Clojure, as you need final's semantics for that and Clojure's data-structures are actually implemented in Java.

I also like Clojure's protocols a lot, but they've got limitations as compared to type-classes from Haskell or Scala. For example in LISP lazy by-name parameters are modeled by means of macros, which is OK, but then if you want a macro as part of an interface, tough luck (this is a LISP vs ML thing). Protocols aren't doing interface inheritance either. For example in Scalaz (a Scala library bringing in many goodies from Haskell) the Monoid type-class inherits the interface of Semigroup, because a Monoid really is a Semigroup, I don't think anybody can argue against it and it isn't "complecting" anything.

The process of learning Clojure has been very frustrating for me. I get it that the language doesn't have loops, I don't do loops in Scala either, but the absence of loops is the least interesting part of going functional - absence of local mutable state is not solving a big problem. The more interesting part for which I found little guidance is for modeling changes (in user input, or the various data sources that we use). In Scala we use more and more concepts and design patterns coming from Haskell, whereas I found Clojure to be maybe a little too pragmatic for my taste - for example Clojure devs aren't modeling Monads, a fact that is apparent starting with the standard "map", "filter" and "mapcat", builtin functions that work only on the builtin collections; Clojure doesn't even have some kind of Numeric protocol for building your own things that work with the numeric operators; and I hate things like the sorted-set using java.util.Comparable (what's up with the lack of abstractions in the standard library btw?).

Of course I could go on - and I'm sure Clojure developers will jump to rectify me - Clojure has a lot of cool things in it and that's why I keep on learning, because some day I might have an aha moment, plus everybody should learn a LISP, just like everybody should learn a functional statically typed language (be it Haskell, Ocaml or Scala).


I'm somebody who enjoys tinkering around with Clojure, I really wish that I could use it more at my day job, and I'm not particularly interested in Scala. Still, I have to thank you for this well thought out response. So often we have language wars that consist of two sides each with only a cursory knowledge of the other side's technology.

I like how pragmatic Clojure is, but I think I can understand when you said that it's "maybe a little too pragmatic". The more I learn about Haskell (which is still very little, to be honest), the more I wish that Clojure had more in common with Haskell. On the other hand, the new work on transducers to me seems to solve some of the problems that you bring up about map, filter, etc., in that transducers work with any manner of data sources and sinks.


Yeap, I also like the idea behind Transducers and I hope seeing it in other languages as well. As I said, Clojure has some really sweet things in it, which is why I keep tinkering with it as well, because I want this exposure.

For example I like how Clojure people are doing what they call "light-weight data modeling". In Scala and Haskell we get too focused on types, sometimes we go overboard, forgetting that the data should stay reasonably decoupled from short-term business needs.

So there's something really refreshing about Clojure's approach (which is in general LISP's approach to doing stuff). On the other hand I really like having a potent compiler that can help me deal with accidental complexity, which is why people will never agree on which is better, because it depends a lot on context (i.e. the kind of problems you're working on).


My take has always been that Clojure encourages one to model state when doing so is pragmatic. The tools provided (refs, agents, atoms and vars) are in my opinion excellent. For example, using dynamic variables for error handling can be very nice. A namespace could define something like * on-network-error* which can be dynamically bind at call site. The advantage to exceptions? * on-network-error* can be a function that tries to recover from the error.

You could implement the persistent data structures even in C. Being persistent is a matter of API, not implementation. But actually, Clojure's deftype creates immutable fields by default, so you do get the Java like final semantics (that is, you can't set the fields even though they are public).

The main reason why protocols differ from interfaces in Java or type classes in Haskell is because Clojure is a dynamically typed language. Protocol inheritance would have very little value as knowing that a monad is a functor is quite useless if you don't even know if something is a monad.

Macros don't really need polymorphism. The macro can always expand to a polymorphic function call or it can pass the s-expression to a polymorphic function.

You can actually implement the sequence interface for your own types too. But it is unfortunate that it is an interface, not a protocol, and the documentation for implementing it is nowhere to be found. What is available are functor and monad abstractions, in the contrib library.


I definitely agree that it's really a great, mind-opening experience for developers to learn a LISP and a functional statically typed language. I feel like you might be missing some things about Clojure here, but I'm too inexpert at it to really fill you in. Anecdotally, I find that when I'm faced with a problem in Clojure, I spend a lot of time thinking about what to do, but rarely spend much time actually writing or modifying code. In Scala, I feel like the type system and more familiar syntax (just because of my own experience) lets me just dig in, but I probably do more actual editing/typing/work. I basically really like both languages, even though both have their annoyances and rough edges, and I wish I could use them both at work - Clojure for when it made sense, and Scala the rest of the time.


First best practice? Don't use Scala.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: