
Applying the Unix Philosophy to Object-Oriented Design - sudonim
http://blog.codeclimate.com/blog/2012/11/28/your-objects-the-unix-way/
======
h2s
The post is essentially another take on the Single Responsibility Principle.
It's a good principle, and Rails model classes are also a good choice of an
example where it's frequently violated.

Rails practically _begs_ you to disregard SRP in your models. It entices you
to put all your validation logic, relationship specifications, retrieval logic
and business logic into one gigantic basket. Under these circumstances "fat
models" become "morbidly obese models".

Rails 3 introduced "concerns" as a means of organizing the bloat into mixins.
These are a testability nightmare. There doesn't seem to be any clean way of
testing concerns without instantiating one of the model classes that include
them.

The only way to use ActiveRecord without sacrificing testability, SRP, and the
OP's application of "the Unix philosophy to Object Design" seems to be by
limiting usage of it to to a pure persistence layer. But doing so seems to
preclude using most of the ActiveRecord features that make it so convenient in
the first place. Can anybody set me straight on this somehow? I would love to
be completely wrong about everything I've just written here.

~~~
swanson
You may want to checkout this short book by Giles Bowkett:
[http://gilesbowkett.blogspot.com/2012/11/i-wrote-ebook-in-
we...](http://gilesbowkett.blogspot.com/2012/11/i-wrote-ebook-in-week.html)

It goes into the areas (like AR models) where Rails breaks the classic "OOP
Golden Rules" and why it is able to get away with it in some cases.

~~~
h2s
I've just had a look at the free excerpt and I think you might have made the
most perfect book recommendation I've ever seen. A bit on the expensive side,
but it'll be good to get a fresh perspective on this after perhaps a little
too much Uncle Bob lately.

Thanks!

------
zimbatm
> UserContentSpamChecker.new(content).spam?

Noooooooooo !

Sorry. There is no purpose for these layers of indirections since you're never
passing the instance around to share state. If UserContentSpamChecker where a
value object you could have argued that you're doing a form of casting but
it's not. Best is to stay with a simple function:

    
    
        module UserContent
          extend self
    
          TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set
    
          def is_spam?(content)
            flagged_words(content).present?
          end
    
          protected
    
          def flagged_words(content)
            TRIGGER_KEYWORDS & content.split
          end
        end
    

Now you can just call the function directly:

> UserContent.is_spam?(content)

~~~
urbanautomaton
I see this assertion increasingly frequently, and it puzzles me: that if we're
not currently using an object to encapsulate state, we should prefer class
methods. What's the justification? Other than four characters saved (".new"),
what's the benefit in this approach?

State encapsulation is just one feature of objects. We may not be using it
right now, but the only thing you achieve with the above code is removing
future flexibility. It costs us nothing to allow for the future possibility of
encapsulated state, so why rule it out? Adding complexity when you don't yet
need it, fine, I quite understand objecting to that; but here you're actually
putting in effort to make a future modification more difficult.

Funnily enough, Code Climate's previous blog entry was on precisely this
topic, and is worth a read:

[http://blog.codeclimate.com/blog/2012/11/14/why-ruby-
class-m...](http://blog.codeclimate.com/blog/2012/11/14/why-ruby-class-
methods-resist-refactoring/)

For what it's worth, I don't see the point in instantiating a whole new spam
checker for every piece of content, so I'd probably change the OP's example to
read:

    
    
        class UserContentSpamChecker
    
          TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set
    
          def is_spam?(content)
            flagged_words(content).present?
          end
    
          protected
    
          def flagged_words(content)
            TRIGGER_KEYWORDS & content.split
          end
        end
    

If I really really wanted access to a default spam checker via a global
constant I can always add the following:

    
    
        class UserContentSpamChecker
          def self.is_spam?(content)
            new.is_spam?(content)
          end
        end
    

At least then if my UserContentSpamChecker class ever has to change (perhaps
it starts to use an external spam-checking service that's injected through the
constructor), then I only need to change code in one place. And other clients
that might want to inject a different spam-checking service (or a test double)
are perfectly able to.

~~~
zimbatm
I think that we agree on the bottom-line: UserContentSpamChecker has no need
for a state and is just some kind of namespace. That's the point that I wanted
to get trough.

Then we can argue on the best way to handle that namespace. I'm not
particularly fond of the class method approach neither but I don't think that
your approach is appropriate either. A namespace should be instantiated only
once in my opinion, that's why I'm going for the singleton object. Maybe the
problem is with ruby and it should provide another mechanism for managing
namespaces ?

    
    
        UserContentSpamChecker = namespace do
          TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set
    
          def is_spam?(content)
            flagged_words(content).present?
          end
    
          protected
    
          def flagged_words(content)
            TRIGGER_KEYWORDS & content.split
          end
        end
    
    
        class MyOtherClass
          import :spam_checker, UserContentSpamChecker
        
          def foo
            spam_checker.is_spam?(content)
          end
        end

~~~
vidarh
Keep in mind that in Ruby, classes and modules are just ordinary objects that
happens to be instances of the classes Class and Modules respectively.

So for all practical purposes, if you define a module it is not much different
than if you define a class, and then instantiate a single object (it is
_slightly_ different in that your object will be an instance of your class
rather than of the class Class).

Modules _are_ namespaces for Ruby (and pretty much only differs from classes
in that you can't create instances of modules)

What you describe above is done with modules:

    
    
        module UserContentSpamChecker
            def is_spam?(content)
                ...
            end
        end
    
        class MyOther Class
             include UserContentSpamChecker
    
             def foo
                 is_spam?(content)
             end
         end
    

If you want to be able to alias it, you'd do it with a method:

    
    
        module UserContentSpamChecker
            def self.is_spam?(content) # note the "self." to define a method callable on the UserContentSpamChecker object itself (of class Module)
                ...
            end
        end
    
        class MyOther Class
             def spam_checker; UserContentSpamChecker; end
    
             def foo
                 spam_checker.is_spam?(content)
             end
         end
    

(or you could do it with a class variable or class instance variable - example
class variable:)

    
    
         class MyOtherClass
              @@spam_checker = UserContentSpamChecker
    
              def foo
                  @@spam_checker.is_spam?(content)
              end
         end

~~~
zimbatm
I find that modules are good to add behaviour to an object like Enumerable but
I don't find them practical as a namespace holder.

Including is an all or nothing operation. In your first example #is_spam? is
now also a public method of MyOtherClass. The other issue with module includes
is that method name collisions are also much harder to debug.

I also like your second example but I think it would be clearer if
:spam_checker had a dedicated semantic. Something like:

    
    
        class Module
          def import(name, obj); define_method(name) { obj }; protected(name); end
        end

------
rm999
The tldr of the article is to write modular code. This is great advice, I hope
most intermediate software engineers already live by this.

If I recall my CS history correctly, Multics/Unix was one of the first major
software projects that embraced the modular philosophy. But modularity is a
natural approach when designing large systems, and has been embraced by
electrical and mechanical engineers for way longer than software systems have
existed. Modularity even extends to processes; Henry Ford made his millions by
modularizing the assembling process of his cars.

~~~
tcgv
... and hardware engineers as well. It is the usage of hardware pipelines that
makes possible GHz processors nowadays. It is a really wide concept that can
be applied to several kinds of processes.

------
beaumartinez
It's a nice idea, but look at how much code you have to write to obtain this
decoupling.

The "mental" abstraction is nice to have, but when you end up typing _more_
characters, it's just counterproductive.

Once you start repeating yourself, then it makes sense. To do it from the
start is overengineering—"you ain't gonna need it".

~~~
wting
Your argument taken to the extreme leads to code golf.

Sure abstraction is worth it. You get code reusability, better unit testing,
and faster understanding of the code.

Maybe a small script doesn't need it, but once something starts growing past a
few hundred lines you're better off.

~~~
comex
Nobody suggested taking his argument to the extreme.

A larger script might indeed need it, but knowing when you need abstraction
and when you don't is key. In this case, YAGNI.

------
swanson
Maybe it's because I've been reading about DCI lately, but I think this
example could go even further in moving business logic out of the AR model.

    
    
        class GuestbookLibrarian
            def initialize(rate_limiter, tweeter, spam_checker)
                @rate_limiter = rate_limiter
                @tweeter = tweeter
                @spam_checker = spam_checker
            end
    
            def add_entry(name, msg, ip_addr)
               raise PostingTooSpammy if @spam_checker.is_spammy? msg
               raise PostingTooFast if @rate_limiter.exceeded? ip_addr 
    
               entry = GuestbookEntry.create(:name => name, 
                                             :msg => msg, 
                                             :ip_addr => ip_addr)
               @rate_limiter.record(ip_addr)
               @tweeter.tweet_new_guestbook_post(name, msg)
    
               entry 
            end
        end
    
        class GuestbookController < ApplicationController
            ... SNIP ...
            
            rescue_from PostingTooSpammy, :with => :some_spam_handler
            rescue_from PostingTooFast, :with => :some_other_spam_handler
        
            def create
              #maybe these should be globals in an initializer somewhere if we
              #use them elsewhere? or in a :before_filter at least :)
              rate_limiter = UserContentRateLimiter.new
              tweeter = Tweeter.new
              spam_checker = UserContentSpamChecker.new
    
              librarian = GuestbookLibrarian.new(rate_limiter, tweeter, spam_checker) 
              entry = librarian.add_entry(params[:name], params[:msg], params[:ip_addr])
          
              redirect_to entry, :notice => "Thanks for posting"
            end
        end
    

Something like that? Too Java-y? Feedback appreciated :)

~~~
r00k
Interesting approach. It'd be a bit surprising in a Rails app, however, since
models usually handle their own validation.

That said, the fact that GuestBookEntry doesn't have validation as a concern
does make it simpler.

I _might_ refactor to something like this if a particular model had extremely
complicated validation logic, but wouldn't ever do this as a first pass.

~~~
swanson
I think in general models can handle their own validation (especially if you
use AR), but I guess the difference is that these validations hit external
"services", instead of like a :max_length validation.

I agree on the first pass comment - I imagine that the user stories went
something like:

    
    
        * v1 - An anon user can post to the guestbook
        * v2 - Guestbook comments are checked for spam
        * v3 - Post guestbook comments to twitter for Web 2.0-ness
    

And this is the refactored version after v3

------
akmiller
Yet the refactored example is still showing application domain logic being
built within the web framework. For simple applications that are only ever
going to be delivered via the web and only ever rely on one particular web
framework this might be ok. That being said, I'd like to see movement away
from this back to using web frameworks or anything else simply as the
communication platform between your application and the user.

~~~
pignata
Good point. There's some good discussion in the Rails community about ports-
and-adapters architecture around that theme. Matt Wynne gave a talk at goruco
this past year on the subject that's worth checking out:
[http://www.confreaks.com/videos/977-goruco2012-hexagonal-
rai...](http://www.confreaks.com/videos/977-goruco2012-hexagonal-rails)

------
arocks
The Microsoft Word example is not entirely accurate. Most office application
_are_ in fact having well defined interfaces through OLE [1]. This works in
several languages like Perl, VB, Delphi or Python.

[1]: <http://www.adp-gmbh.ch/perl/word.html>

~~~
stinos
good point.. it's not because the application itself seems monolithic the
underlying strcuture is.

Apart from that, how on earth would the author go about teaching an average
Joe-that-wants-to-type-a-letter that instead of having to open a single
program and start doing everything he can imagine, he now has to use 100
seperate programs each entitled to a certain task?

------
akkartik
Wow, I didn't realize the McIlroy who critiqued Knuth's famous literate
program also conceived of shell pipes in the first place:
<http://www.leancrew.com/all-this/2011/12/more-shell-less-egg>

------
sbov
I don't particularly see the unix philosophy ever shining through in object
oriented design. The problem is that objects are too inflexible - you can't
just glue methods together, piping arbitrary objects into methods like you can
do with data at a command line prompt.

------
rbranson
While this is certainly useful, it's pretty basic. Something that I find is a
much closer fit to UNIX pipes is iterators. UNIX pipes work similarly in that
all of the commands in the pipeline are executed in "parallel" and the OS
passes data incrementally between each process.

I primarily work on a Python codebase, and I've found that using iterators for
complex, fault-tolerant data pipelines allows decoupled design without many of
the performance & additional complexity drawbacks often encountered with
cleanly abstracted, decoupled code. For example, when executing a multi-get
for objects by primary key, the pipeline looks roughly like this:

1\. Fetch from heap cache

2\. Fetch from remote cache

3\. Fetch from backing store

4\. Backfill the heap cache

5\. Backfill the remote cache

6\. Apply basic filters (e.g. deleted == False, etc)

At each step there are usually two or three layers of abstraction underneath.
Much of the space requirements, and some of the overhead time at each step can
be collapsed to O(1) instead of O(N).

For example, a cache multiget abstraction on-top of memcache might look
something like this:

    
    
        def deserialize_user(serialized_user):
            return json.loads(serialized_user)
    
        def build_prefixed_memcache_key(prefix, key):
            return "%s:%s" % (prefix, key)
    
        def get_users_from_remote_cache(user_keys):
            cached_users = get_from_memcache(user_keys, "user")
            deserialized_users = {deserialize_user(value) for key, value in cached_users.iteritems()}
            return deserialized_users
    
        def get_from_memcache(keys, prefix):
            rekeyed = {build_prefixed_memcache_key(prefix, key): key for key in keys}
            from_memcache = []
            for chunk in chunks(rekeyed.keys(), 20):
                results = memcache_mget(chunk)
                from_memcache.extend(results)
            unkeyed = {rekeyed[key]: value for key, value in from_memcache}
            return unkeyed
    

Notice how at each step there is a large amount of "buffering" that causes
allocation, copying, and quite a bit of additional work. Each layer of
abstraction adds a pretty large cost to the step. Using an iterator
implementation, we can clean up this code and make it more performant:

    
    
        def get_users_from_remote_cache(user_keys):
            for key, user in get_from_memcache(user_keys, "user"):
                yield deserialize_user(user)
    
        def get_from_memcache(keys, prefix):
            for chunk in ichunks(keys, 20):
                rekeyed = {build_prefixed_memcache_key(prefix, key): key for key in chunk}
                for key, value in memcache_mget(rekeyed.keys()):
                    yield (rekeyed[key], value)
    

It's clear how much cleaner this code is. Notice how this snippet avoids the
large amount of "buffering" of data between steps and short-circuits quite a
bit of code when possible (for instance, if all of the fetches miss). In real
code that's heavily abstracted & layered, avoiding all of this work translates
into significant performance & cost advantages.

Iterators also allow building portions of the pipeline with built-in functions
that avoid the interpeter.

    
    
        def deserialize_users(users):
            return imap(pickle.loads, users)
    
        def get_users_from_remote_cache(user_keys):
            cached_users = get_from_memcache(user_keys, "user")
            only_values = imap(itemgetter(1), cached_users)
            return deserialize_users(only_values)
    

This block of code executes in only one pass through the interpreter (imap,
itemgetter, and pickle.loads are all implemented as native functions). This is
incredibly powerful because it means that iterator-based abstractions can be
built by combining these native building blocks without the overhead of
recursion within the interpreter at each step.

Pushing all the recursion into native code:

    
    
        def gprefix(prefix, delimiter):
            prefix_with_delim = prefix + delimiter
            prefixfn = partial(add, prefix_with_delim)
            unprefixfn = itemgetter(slice(len(prefix_with_delim), None))
            return prefixfn, unprefixfn
    
        def memcache_mget_chunked(keys):
            chunks = ichunks(keys, 20)
            result_blocks = imap(memcache_mget, chunks)
            flattened_results = chain.from_iterable(result_blocks)
            return flattened_results
    
        def get_values_from_memcache(keys, prefix):
            prefixfn = partial(add, "%s:" % prefix)
            prefixed_keys = imap(prefixfn, keys)
            key_pairs_from_cache = memcache_mget_chunked(prefixed_keys)
            values_from_cache = imap(itemgetter(1), key_pairs_from_cache)
            return values_from_cache
    

* I apologize for any code errors, this code wasn't tested in it's entirety.

~~~
nuxi
(Minor nit: the yield-based functions are generator functions, not iterators.)

This approach was also described by David Beazley at PyCon, the slides are
available at <http://www.dabeaz.com/generators/Generators.pdf>. He extends
this approach to generator multiplexing, coroutines etc., along with useful
examples. An excellent read.

------
pjmlp
This is the principle to always create loosely coupled components that can be
glued together.

I think the OO community has already learned that interfaces, categories,
traits, mixins, or whatever they are called in your favorite language, favor
more re-usability than plain inheritance.

The trick is to learn when inheritance, delegation or composition make sense.

As with everything in life, it takes time to proper learn them with lots of
failed experiences along the way.

------
louischatriot
Quite a long (although interesting) article. tldr version:
<http://tldr.io/tldrs/50b62f8abb22039977000471>

------
6ren
> write programs that do one thing and do it well. Write programs to work
> together.

ASIDE: Just been reading _The Wealth of Nations_ , and Smith talks about the
"division of labour", which is similar to this specialisation. By
concentrating on one task, a workman can increase his dexterity at it, not
waste time switching between tasks, and find ways to do it better. "Do one
thing and do it well" is a pithy exaggeration of this specialisation.

This division of labour is only possible because of trade: other workmen do
the tasks that you aren't doing, and you barter with each other to get what
you each need. Because you specialised, you each do your own task more
efficiently, so you are both better off. Programs that "work together" is
similar, because if you couldn't use another program, you'd _have_ to include
it in the first program. "Programs" would also include 3rd party libraries and
frameworks I think.

He goes onto say that one limitation on how much labour can be divided is the
market size: you to be able to trade what you make. If people don't need much
of what you produce, you've over-specialised. However, if you have access to a
large enough market, then in aggregate, with many people using a little bit,
you can survive. Larger markets therefore allow more specialisation, and
therefore more wealth. Rich civilisations grew up around navigable rivers
(especially with canals), and inland seas (calmer and safer to navigate than
the open ocean), because water-carriage facilitates trade with more people
(larger market), over larger distances, at lower transport cost and less risk.

Does the analogy fall down with free programs, since you won't go out of
business just because few people use it? Open source projects _do_ seem to
need users, for encouragement, bug reports, spreading the word, contributing
bug fixes etc - users pay in attention, not money. Without attention, projects
die.

Does the analogy fall down when it's the same person writing all the separate
programs? You can certainly specialise, and therefore do a better job; and
also "exchange" data between those parts. (Provided of course that
specialisation is actually more efficient, and outweighs the costs of
exchange). But the motivation doesn't apply in quite the same way, since the
parts of the code don't get paid - not in money, not in attention. You're more
like a communistic planned economy (or, within a firm).

But the interesting point I've been leading up to is: does a larger market for
programs lead to greater specialisation? For example, Java has a large number
of users, and has a ton of 3rd party programs - and that does seem to include
very specialised libraries.

Or, are other factors at work, leading to more one-package programs, such as
Word; or dividing the market into smaller ecosystems, such as Java itself, and
also other languages python/ruby/PHP, and even frameworks, like RoR. There are
barriers between these "markets", in the extra difficulty of using library
from one platform in another. For example, for JS, I get the impression there
are many libraries for doing the same web/forms/template etc. While
competition is healthy, and these frameworks are definitely improving on each
other, they aren't the "division of labour" of interest to this particular
discussion.

In programs, does a larger "market" of fellow coders result in more
specialisation in the "one thing" done well?

~~~
camus
A program doesnt getter better by doing the same task over and over again (
and i disagree with A.Smith anyway , a factory worker doesnt choose what
process he is going to use to acheive his task , no matter how many times he
executes his task , the way he has to work is decided by his boss, A.Smith is
and has been proven wrong on many things like the invisible hand");

~~~
mbreese
I've never read _The Wealth of Nations_ , but Smith wrote well before the
industrial revolution. So when he spoke of workers, he was referring to
tradesmen and craftsmen. For them, specialization does allow them to get
better by performing the same tasks over and over again.

In a way, the industrial revolution just changed the scope of his arguments,
but I think the basic idea still holds. Now, instead of thinking in terms of a
single worker, we could instead think in terms of a single factory. Each
factory, comprised of many workers, would then be able to perform one
specialized task well, and trade for the rest. It would be up to each factory
to organize itself in an optimal manner.

In thinking in terms of a program, it wouldn't be the program that gets
better, but the programmer. If they write a program to perform task X multiple
times, that program will inevitably get better with each iteration. So,
ideally, what you end up with is a series of optimized programs that can work
together in different configurations in order to perform different tasks in
the most efficient way.

At least, that would be the theory. I'll leave discussions of the invisible
hand of the market for later :)

------
dschiptsov
Unix philosophy has nothing to do with objects. It is about _interfaces_ ,
_streams_ and common intermediate data representation - _plain text_.

Scheme or Erlang are _real_ examples of following similar philosophy.

Modularity is much more basic concept, underlying any software. It is about
spiting code into blocks for later reuse, named, in case of procedures, or
anonymous in case of blocks or lambdas.

The notion that smaller, share-nothing procedures doing only one well-defined
task are better than bloated and close-coupled ones is a general one.

So, do not try to mislead others. There is nothing from UNIX Philosophy there,
just a basic concept of modularity.

~~~
mbreese
Nobody is claiming that the Unix philosophy depends on objects. The author was
trying to claim that applying some Unix principles to Object Oriented
Programming would be beneficial.

 _Do one thing, and do it extremely well_ is a simplified version of the Unix
design philosophy. And this can be applied various levels of abstraction,
including within OO programs.

I fail to see the controversy in this.

~~~
knieveltech
It doesn't become really controversial until someone points out that unclassed
functions are the logical extreme of this philosophy at which point the
predictable shit flinging starts.

------
camus
Well , maybe the author should have figured out that the way Unix works is
closer to functional programming than object oriented programming, before
writing this blog.

