Hacker News new | comments | show | ask | jobs | submit login
Applying the Unix Philosophy to Object-Oriented Design (codeclimate.com)
135 points by sudonim 1839 days ago | hide | past | web | favorite | 58 comments

Unix philosophy has nothing to do with objects. It is about interfaces, streams and common intermediate data representation - plain text.

Scheme or Erlang are real examples of following similar philosophy.

Modularity is much more basic concept, underlying any software. It is about spiting code into blocks for later reuse, named, in case of procedures, or anonymous in case of blocks or lambdas.

The notion that smaller, share-nothing procedures doing only one well-defined task are better than bloated and close-coupled ones is a general one.

So, do not try to mislead others. There is nothing from UNIX Philosophy there, just a basic concept of modularity.

Nobody is claiming that the Unix philosophy depends on objects. The author was trying to claim that applying some Unix principles to Object Oriented Programming would be beneficial.

Do one thing, and do it extremely well is a simplified version of the Unix design philosophy. And this can be applied various levels of abstraction, including within OO programs.

I fail to see the controversy in this.

It doesn't become really controversial until someone points out that unclassed functions are the logical extreme of this philosophy at which point the predictable shit flinging starts.

>Unix philosophy has nothing to do with objects.

I'm pretty sure it's an analogy.

> It is about interfaces, streams and common intermediate data representation

And this is what modularity is all about too.

> There is nothing from UNIX Philosophy there, just a modularity

Read the second paragraph: http://www.linfo.org/unix_philosophy.html

a system that is composed of components (i.e., modules) that can be fitted together or arranged in a variety of ways. - So, it is not just about modules, it is mostly about the glue.

In the same way, Lisp without an underlying list structure is just a prefix notation.

I was using the analogy of the composability of Unix tools as a role model for how to think about the segregation of roles in an object-oriented system. I agree with you that the ways in which we can couple these objects together through their interfaces is the core of what we're designing. I think this philosophy as an analogy (especially as outlined by ESR) is applicable to a wide variety of things beyond even programming. Interesting to think about, anyway. Thanks for your thoughts.

I disagree. I once saw a talk by Bob Martin that did a great job of explaining good OO design principles by looking at examples from Unix.

It's true that Unix isn't object-oriented, but the principles that made it great are ones that object-oriented developers can benefit from studying. And given the long lifetime and wide use of the Unix model, it's a great object of study.

>It's true that Unix isn't object-oriented, but the principles that made it great are ones that object-oriented developers can benefit from studying.

Of course because these principles are basic programming principles. Yet , this has nothing to do directly with object orientation. There is no "object orientation" in the way unix works.

No, but Unix is a very good and, more importantly, concrete implementation of these principles. What's wrong with claiming that applying them to your OO programs leads to better code?

> There is no "object orientation" in the way unix works.

Who said there was?

Good OO programming revolves around separation of concerns and coherent message passing. That sounds a lot like small tools and pipes to me. Of course, no analogy is perfect, by I think it is illustrative.

On the other hand, I think the article didn't do the topic justice because it only really talked about separation of concerns. It never really talked about the "pipes". And, as you mention, the consistency of "communication via text" is what makes those pieces so able to be composed.

And, in software, some of the "pipes" we've come up with have often been truly dreadful (COM, anyone?).

Good programming revolves around separation of concerns and coherent message passing. So nothing specific to OOP. You dont need object , inheritance or states to do that. just functions that take parameters and output a result. Seeing OOP everywhere even where there is none is misleading.

The post is essentially another take on the Single Responsibility Principle. It's a good principle, and Rails model classes are also a good choice of an example where it's frequently violated.

Rails practically begs you to disregard SRP in your models. It entices you to put all your validation logic, relationship specifications, retrieval logic and business logic into one gigantic basket. Under these circumstances "fat models" become "morbidly obese models".

Rails 3 introduced "concerns" as a means of organizing the bloat into mixins. These are a testability nightmare. There doesn't seem to be any clean way of testing concerns without instantiating one of the model classes that include them.

The only way to use ActiveRecord without sacrificing testability, SRP, and the OP's application of "the Unix philosophy to Object Design" seems to be by limiting usage of it to to a pure persistence layer. But doing so seems to preclude using most of the ActiveRecord features that make it so convenient in the first place. Can anybody set me straight on this somehow? I would love to be completely wrong about everything I've just written here.

> Rails practically begs you to disregard SRP in your models.

That's because the Active Record pattern is explicitly double responsibility: http://martinfowler.com/eaaCatalog/activeRecord.html That's why it's called "active record" -- there's domain logic and persistence in one object.

You may want to checkout this short book by Giles Bowkett: http://gilesbowkett.blogspot.com/2012/11/i-wrote-ebook-in-we...

It goes into the areas (like AR models) where Rails breaks the classic "OOP Golden Rules" and why it is able to get away with it in some cases.

I've just had a look at the free excerpt and I think you might have made the most perfect book recommendation I've ever seen. A bit on the expensive side, but it'll be good to get a fresh perspective on this after perhaps a little too much Uncle Bob lately.


> UserContentSpamChecker.new(content).spam?

Noooooooooo !

Sorry. There is no purpose for these layers of indirections since you're never passing the instance around to share state. If UserContentSpamChecker where a value object you could have argued that you're doing a form of casting but it's not. Best is to stay with a simple function:

    module UserContent
      extend self

      TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set

      def is_spam?(content)


      def flagged_words(content)
        TRIGGER_KEYWORDS & content.split
Now you can just call the function directly:

> UserContent.is_spam?(content)

I see this assertion increasingly frequently, and it puzzles me: that if we're not currently using an object to encapsulate state, we should prefer class methods. What's the justification? Other than four characters saved (".new"), what's the benefit in this approach?

State encapsulation is just one feature of objects. We may not be using it right now, but the only thing you achieve with the above code is removing future flexibility. It costs us nothing to allow for the future possibility of encapsulated state, so why rule it out? Adding complexity when you don't yet need it, fine, I quite understand objecting to that; but here you're actually putting in effort to make a future modification more difficult.

Funnily enough, Code Climate's previous blog entry was on precisely this topic, and is worth a read:


For what it's worth, I don't see the point in instantiating a whole new spam checker for every piece of content, so I'd probably change the OP's example to read:

    class UserContentSpamChecker

      TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set

      def is_spam?(content)


      def flagged_words(content)
        TRIGGER_KEYWORDS & content.split
If I really really wanted access to a default spam checker via a global constant I can always add the following:

    class UserContentSpamChecker
      def self.is_spam?(content)
At least then if my UserContentSpamChecker class ever has to change (perhaps it starts to use an external spam-checking service that's injected through the constructor), then I only need to change code in one place. And other clients that might want to inject a different spam-checking service (or a test double) are perfectly able to.

I think that we agree on the bottom-line: UserContentSpamChecker has no need for a state and is just some kind of namespace. That's the point that I wanted to get trough.

Then we can argue on the best way to handle that namespace. I'm not particularly fond of the class method approach neither but I don't think that your approach is appropriate either. A namespace should be instantiated only once in my opinion, that's why I'm going for the singleton object. Maybe the problem is with ruby and it should provide another mechanism for managing namespaces ?

    UserContentSpamChecker = namespace do
      TRIGGER_KEYWORDS = %w(viagra acne adult loans xrated).to_set

      def is_spam?(content)


      def flagged_words(content)
        TRIGGER_KEYWORDS & content.split

    class MyOtherClass
      import :spam_checker, UserContentSpamChecker
      def foo

Keep in mind that in Ruby, classes and modules are just ordinary objects that happens to be instances of the classes Class and Modules respectively.

So for all practical purposes, if you define a module it is not much different than if you define a class, and then instantiate a single object (it is slightly different in that your object will be an instance of your class rather than of the class Class).

Modules are namespaces for Ruby (and pretty much only differs from classes in that you can't create instances of modules)

What you describe above is done with modules:

    module UserContentSpamChecker
        def is_spam?(content)

    class MyOther Class
         include UserContentSpamChecker

         def foo
If you want to be able to alias it, you'd do it with a method:

    module UserContentSpamChecker
        def self.is_spam?(content) # note the "self." to define a method callable on the UserContentSpamChecker object itself (of class Module)

    class MyOther Class
         def spam_checker; UserContentSpamChecker; end

         def foo
(or you could do it with a class variable or class instance variable - example class variable:)

     class MyOtherClass
          @@spam_checker = UserContentSpamChecker

          def foo

I find that modules are good to add behaviour to an object like Enumerable but I don't find them practical as a namespace holder.

Including is an all or nothing operation. In your first example #is_spam? is now also a public method of MyOtherClass. The other issue with module includes is that method name collisions are also much harder to debug.

I also like your second example but I think it would be clearer if :spam_checker had a dedicated semantic. Something like:

    class Module
      def import(name, obj); define_method(name) { obj }; protected(name); end

> I think that we agree on the bottom-line: UserContentSpamChecker has no need for a state and is just some kind of namespace.

I'm not sure that we do. I agree that the current implementation of `UserContentSpamChecker#is_spam?` doesn't need state, but I don't agree with your conclusion that this distinction should be made obvious to clients, who surely just care that they have a thing that will check for spam, to which they can pass content. They don't care if it's stateful or not, as long as it accepts strings and returns booleans. Why are we trying to tell them that this particular method is stateless?

After all, even that offers a false guarantee. In your implementation, `#is_spam?` is just another method on an object instance - in this case an instance of class Module, referred to by the global constant UserContentSpamChecker. I can happily use instance variables in such a method:

    module Stateless
      extend self
      def no_state_here!
        @thing ||= 0
        @thing += 1
    > Stateless.no_state_here!
    => 1
    > Stateless.no_state_here!
    => 2
    > Stateless.no_state_here!
    => 3
Your implementation resists future refactoring, forces clients to create a hard dependency on a global constant, and doesn't make the guarantee you intend it to convey. We agree that the current implementation doesn't need an object instance, but can you explain what is actually better about avoiding one? Ruby is an object-oriented language, after all; objects are its common currency. It's not like we're introducing the Strategy pattern for a four-line method, we're just using Ruby as she is wrote. :-)

p.s. sorry vidarh, I see you've covered some of this already - serves me right for half-composing a reply then wandering off...

It's all about the semantic. As a client I like to know if a method is purely functional or if it's going to have side-effects because I care to know if the operation is going to be re-entrant or not. Also methods with side-effects have performance implications if they use I/O. The semantic should be used to convey that message.

Obviously if my future spam checker is going to be introducing a side effect (eg: Akismet) I am going to refactor my code a no longer user a class method. In any case I don't see how it would "resists future refactoring". Grepping for UserContentSpamChecker is not exactly rocket science.

> As a client I like to know if a method is purely functional or if it's going to have side-effects because I care to know if the operation is going to be re-entrant or not.

But your implementation doesn't convey this information. This simply isn't a guarantee you're going to obtain with Ruby; you have to inspect the code. A class method can do pretty much anything it likes; it could rewrite Object#method_missing if it wanted to. It can certainly store state, as I demonstrated. What's worse, it's global state, because every client is accessing a shared instance via a global constant.

As for refactoring, search and replace isn't difficult (although it is error-prone); the important point is that it's introducing unnecessary change. All that's changing is the internal implementation of your spam checker, and yet that change propagates to every single place your spam checker is used, all because of the way you implemented it originally. This doesn't seem like a good trade-off to me, given that we're not even obtaining a functional guarantee in return.

It feels like we're on totally separate universes.

I don't understand why you insist that shooting yourself in the foot is any argument. My goal is to convey a semantic to NOT have to read the code; obviously if we don't follow the same conventions it won't work.

I also don't understand how something being global is bad. A namespace is exactly that: global. It's like you would argue to not associate classes to constants because it makes them global. The issue comes when your class methods have side effects like Time.now or User.find because they make your tests harder to build. But that's not what I'm proposing.

Finally I understand the value proposition that you have regarding refactoring: if you change the implementation and keep the same interface then you don't need to touch the lines where the interface is used. But introducing a side effect rarely comes without parameters and because your lines look like SpamChecker.new(foo).spam? there is no way to introduce the new parameters without either changing the lines in question or using a global. As an example: using Akismet would require an API key. How do you introduce it without touching your interface ?

> It feels like we're on totally separate universes.

It does indeed. Never mind, it's happened before and will no doubt happen again. :-)

If all you're proposing is a coding convention, then I don't see why you're surprised/annoyed that the OP doesn't obey it; it's vanishingly rare. You're placing a significant structural constraint on your code that conveys only optional semantics, and ones that will apply to virtually no third-party code. Why not adopt a more rubyish convention and use bangs to denote methods that do have side-effects (c.f. String#gsub vs. String#gsub!)? This way you're not imposing any structural constraints on your code, only a minor naming restriction.

Re: refactoring, constructor params are only one possible change we might make. Our checker might store previously-seen content in an ivar, so that the first time it sees the content "hey sexy, why not call me?" it allows it to pass, but if it sees 1000 such messages in quick succession, it flags them as spam.

However: yes, changes to object instantiation will require changes to client code (although, as I pointed out earlier, it's easy to provide access to a default instance via a class method, providing both convenience and flexibility). In something other than a Rails application this would be mitigated using dependency injection, so that merely being a client of a particular object does not mean you have to know how to instantiate it. Unfortunately as Rails monopolises object construction for its own infernal purposes this isn't really possible, so you end up with unpalatable things like this knocking around all over the place:

    class GuestBookEntry < ActiveRecord::Base
      def spam_checker
        @spam_checker ||= UserContentSpamChecker.new(...)
While not ideal, this at least localises changes to one site in each client class, and one whose concern is directly related to the change you're making.

I don't think we'll have an agreement so I'll stop there :)

Every time I see something like SomeClass.new(val).some_method it makes me cringe. For me it's as if you would refactor Math.sqrt(2) to SquareRoot.new(2).value in case you want to change the implementation in the future.

Yeah, my immediate reaction was, "So, this is a post about mixins." And then it was, "But... where are the mixins?"

The tldr of the article is to write modular code. This is great advice, I hope most intermediate software engineers already live by this.

If I recall my CS history correctly, Multics/Unix was one of the first major software projects that embraced the modular philosophy. But modularity is a natural approach when designing large systems, and has been embraced by electrical and mechanical engineers for way longer than software systems have existed. Modularity even extends to processes; Henry Ford made his millions by modularizing the assembling process of his cars.

... and hardware engineers as well. It is the usage of hardware pipelines that makes possible GHz processors nowadays. It is a really wide concept that can be applied to several kinds of processes.

And again on hardware, piping with message boxes in RTOS's is a normal facility. The language in the pipe is not defined, unlike un*x, though serialisation protocols may be. Examples over net layers might be Thrift and Protocol Buffers. A shift to serialisation protocols in hardware such as JESD204B is an example at the hardware level.

The post is useful as a learning device for thinking about design approaches. I wouldn't call it fundamentally object orientation though, any more than I would RS232.

The main issue is that we are still in a learning process, specially because many people come to our industry without the required experience by regards to the said industries.

It's a nice idea, but look at how much code you have to write to obtain this decoupling.

The "mental" abstraction is nice to have, but when you end up typing more characters, it's just counterproductive.

Once you start repeating yourself, then it makes sense. To do it from the start is overengineering—"you ain't gonna need it".

Your argument taken to the extreme leads to code golf.

Sure abstraction is worth it. You get code reusability, better unit testing, and faster understanding of the code.

Maybe a small script doesn't need it, but once something starts growing past a few hundred lines you're better off.

Nobody suggested taking his argument to the extreme.

A larger script might indeed need it, but knowing when you need abstraction and when you don't is key. In this case, YAGNI.

It's worth it for most things you'll end up maintaining. How many of these "one-shot" scripts have stayed and grown to full-size systems, preferably by keeping the same hastily thrown together structure over the years?

> once something starts growing

Is a good time to refactor.

Maybe it's because I've been reading about DCI lately, but I think this example could go even further in moving business logic out of the AR model.

    class GuestbookLibrarian
        def initialize(rate_limiter, tweeter, spam_checker)
            @rate_limiter = rate_limiter
            @tweeter = tweeter
            @spam_checker = spam_checker

        def add_entry(name, msg, ip_addr)
           raise PostingTooSpammy if @spam_checker.is_spammy? msg
           raise PostingTooFast if @rate_limiter.exceeded? ip_addr 

           entry = GuestbookEntry.create(:name => name, 
                                         :msg => msg, 
                                         :ip_addr => ip_addr)
           @tweeter.tweet_new_guestbook_post(name, msg)


    class GuestbookController < ApplicationController
        ... SNIP ...
        rescue_from PostingTooSpammy, :with => :some_spam_handler
        rescue_from PostingTooFast, :with => :some_other_spam_handler
        def create
          #maybe these should be globals in an initializer somewhere if we
          #use them elsewhere? or in a :before_filter at least :)
          rate_limiter = UserContentRateLimiter.new
          tweeter = Tweeter.new
          spam_checker = UserContentSpamChecker.new

          librarian = GuestbookLibrarian.new(rate_limiter, tweeter, spam_checker) 
          entry = librarian.add_entry(params[:name], params[:msg], params[:ip_addr])
          redirect_to entry, :notice => "Thanks for posting"
Something like that? Too Java-y? Feedback appreciated :)

Interesting approach. It'd be a bit surprising in a Rails app, however, since models usually handle their own validation.

That said, the fact that GuestBookEntry doesn't have validation as a concern does make it simpler.

I might refactor to something like this if a particular model had extremely complicated validation logic, but wouldn't ever do this as a first pass.

I think in general models can handle their own validation (especially if you use AR), but I guess the difference is that these validations hit external "services", instead of like a :max_length validation.

I agree on the first pass comment - I imagine that the user stories went something like:

    * v1 - An anon user can post to the guestbook
    * v2 - Guestbook comments are checked for spam
    * v3 - Post guestbook comments to twitter for Web 2.0-ness
And this is the refactored version after v3

Not too Java-y, but too "DataMapper-y" for most Rails devs, I think (the pattern, not the ORM - the Ruby DataMapper ORM implements the ActiveRecord pattern, though DataMapper 2 actually finally will implement DataMapper)

EDIT: This is an interesting overview of where DataMapper 2 is going as a parallel to your example: http://solnic.eu/2012/01/10/ruby-datamapper-status.html

Yet the refactored example is still showing application domain logic being built within the web framework. For simple applications that are only ever going to be delivered via the web and only ever rely on one particular web framework this might be ok. That being said, I'd like to see movement away from this back to using web frameworks or anything else simply as the communication platform between your application and the user.

Good point. There's some good discussion in the Rails community about ports-and-adapters architecture around that theme. Matt Wynne gave a talk at goruco this past year on the subject that's worth checking out: http://www.confreaks.com/videos/977-goruco2012-hexagonal-rai...

The Microsoft Word example is not entirely accurate. Most office application are in fact having well defined interfaces through OLE [1]. This works in several languages like Perl, VB, Delphi or Python.

[1]: http://www.adp-gmbh.ch/perl/word.html

good point.. it's not because the application itself seems monolithic the underlying strcuture is.

Apart from that, how on earth would the author go about teaching an average Joe-that-wants-to-type-a-letter that instead of having to open a single program and start doing everything he can imagine, he now has to use 100 seperate programs each entitled to a certain task?

Wow, I didn't realize the McIlroy who critiqued Knuth's famous literate program also conceived of shell pipes in the first place: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg

I don't particularly see the unix philosophy ever shining through in object oriented design. The problem is that objects are too inflexible - you can't just glue methods together, piping arbitrary objects into methods like you can do with data at a command line prompt.

While this is certainly useful, it's pretty basic. Something that I find is a much closer fit to UNIX pipes is iterators. UNIX pipes work similarly in that all of the commands in the pipeline are executed in "parallel" and the OS passes data incrementally between each process.

I primarily work on a Python codebase, and I've found that using iterators for complex, fault-tolerant data pipelines allows decoupled design without many of the performance & additional complexity drawbacks often encountered with cleanly abstracted, decoupled code. For example, when executing a multi-get for objects by primary key, the pipeline looks roughly like this:

1. Fetch from heap cache

2. Fetch from remote cache

3. Fetch from backing store

4. Backfill the heap cache

5. Backfill the remote cache

6. Apply basic filters (e.g. deleted == False, etc)

At each step there are usually two or three layers of abstraction underneath. Much of the space requirements, and some of the overhead time at each step can be collapsed to O(1) instead of O(N).

For example, a cache multiget abstraction on-top of memcache might look something like this:

    def deserialize_user(serialized_user):
        return json.loads(serialized_user)

    def build_prefixed_memcache_key(prefix, key):
        return "%s:%s" % (prefix, key)

    def get_users_from_remote_cache(user_keys):
        cached_users = get_from_memcache(user_keys, "user")
        deserialized_users = {deserialize_user(value) for key, value in cached_users.iteritems()}
        return deserialized_users

    def get_from_memcache(keys, prefix):
        rekeyed = {build_prefixed_memcache_key(prefix, key): key for key in keys}
        from_memcache = []
        for chunk in chunks(rekeyed.keys(), 20):
            results = memcache_mget(chunk)
        unkeyed = {rekeyed[key]: value for key, value in from_memcache}
        return unkeyed
Notice how at each step there is a large amount of "buffering" that causes allocation, copying, and quite a bit of additional work. Each layer of abstraction adds a pretty large cost to the step. Using an iterator implementation, we can clean up this code and make it more performant:

    def get_users_from_remote_cache(user_keys):
        for key, user in get_from_memcache(user_keys, "user"):
            yield deserialize_user(user)

    def get_from_memcache(keys, prefix):
        for chunk in ichunks(keys, 20):
            rekeyed = {build_prefixed_memcache_key(prefix, key): key for key in chunk}
            for key, value in memcache_mget(rekeyed.keys()):
                yield (rekeyed[key], value)
It's clear how much cleaner this code is. Notice how this snippet avoids the large amount of "buffering" of data between steps and short-circuits quite a bit of code when possible (for instance, if all of the fetches miss). In real code that's heavily abstracted & layered, avoiding all of this work translates into significant performance & cost advantages.

Iterators also allow building portions of the pipeline with built-in functions that avoid the interpeter.

    def deserialize_users(users):
        return imap(pickle.loads, users)

    def get_users_from_remote_cache(user_keys):
        cached_users = get_from_memcache(user_keys, "user")
        only_values = imap(itemgetter(1), cached_users)
        return deserialize_users(only_values)
This block of code executes in only one pass through the interpreter (imap, itemgetter, and pickle.loads are all implemented as native functions). This is incredibly powerful because it means that iterator-based abstractions can be built by combining these native building blocks without the overhead of recursion within the interpreter at each step.

Pushing all the recursion into native code:

    def gprefix(prefix, delimiter):
        prefix_with_delim = prefix + delimiter
        prefixfn = partial(add, prefix_with_delim)
        unprefixfn = itemgetter(slice(len(prefix_with_delim), None))
        return prefixfn, unprefixfn

    def memcache_mget_chunked(keys):
        chunks = ichunks(keys, 20)
        result_blocks = imap(memcache_mget, chunks)
        flattened_results = chain.from_iterable(result_blocks)
        return flattened_results

    def get_values_from_memcache(keys, prefix):
        prefixfn = partial(add, "%s:" % prefix)
        prefixed_keys = imap(prefixfn, keys)
        key_pairs_from_cache = memcache_mget_chunked(prefixed_keys)
        values_from_cache = imap(itemgetter(1), key_pairs_from_cache)
        return values_from_cache
* I apologize for any code errors, this code wasn't tested in it's entirety.

(Minor nit: the yield-based functions are generator functions, not iterators.)

This approach was also described by David Beazley at PyCon, the slides are available at http://www.dabeaz.com/generators/Generators.pdf. He extends this approach to generator multiplexing, coroutines etc., along with useful examples. An excellent read.

This is the principle to always create loosely coupled components that can be glued together.

I think the OO community has already learned that interfaces, categories, traits, mixins, or whatever they are called in your favorite language, favor more re-usability than plain inheritance.

The trick is to learn when inheritance, delegation or composition make sense.

As with everything in life, it takes time to proper learn them with lots of failed experiences along the way.

Quite a long (although interesting) article. tldr version: http://tldr.io/tldrs/50b62f8abb22039977000471

> write programs that do one thing and do it well. Write programs to work together.

ASIDE: Just been reading The Wealth of Nations, and Smith talks about the "division of labour", which is similar to this specialisation. By concentrating on one task, a workman can increase his dexterity at it, not waste time switching between tasks, and find ways to do it better. "Do one thing and do it well" is a pithy exaggeration of this specialisation.

This division of labour is only possible because of trade: other workmen do the tasks that you aren't doing, and you barter with each other to get what you each need. Because you specialised, you each do your own task more efficiently, so you are both better off. Programs that "work together" is similar, because if you couldn't use another program, you'd have to include it in the first program. "Programs" would also include 3rd party libraries and frameworks I think.

He goes onto say that one limitation on how much labour can be divided is the market size: you to be able to trade what you make. If people don't need much of what you produce, you've over-specialised. However, if you have access to a large enough market, then in aggregate, with many people using a little bit, you can survive. Larger markets therefore allow more specialisation, and therefore more wealth. Rich civilisations grew up around navigable rivers (especially with canals), and inland seas (calmer and safer to navigate than the open ocean), because water-carriage facilitates trade with more people (larger market), over larger distances, at lower transport cost and less risk.

Does the analogy fall down with free programs, since you won't go out of business just because few people use it? Open source projects do seem to need users, for encouragement, bug reports, spreading the word, contributing bug fixes etc - users pay in attention, not money. Without attention, projects die.

Does the analogy fall down when it's the same person writing all the separate programs? You can certainly specialise, and therefore do a better job; and also "exchange" data between those parts. (Provided of course that specialisation is actually more efficient, and outweighs the costs of exchange). But the motivation doesn't apply in quite the same way, since the parts of the code don't get paid - not in money, not in attention. You're more like a communistic planned economy (or, within a firm).

But the interesting point I've been leading up to is: does a larger market for programs lead to greater specialisation? For example, Java has a large number of users, and has a ton of 3rd party programs - and that does seem to include very specialised libraries.

Or, are other factors at work, leading to more one-package programs, such as Word; or dividing the market into smaller ecosystems, such as Java itself, and also other languages python/ruby/PHP, and even frameworks, like RoR. There are barriers between these "markets", in the extra difficulty of using library from one platform in another. For example, for JS, I get the impression there are many libraries for doing the same web/forms/template etc. While competition is healthy, and these frameworks are definitely improving on each other, they aren't the "division of labour" of interest to this particular discussion.

In programs, does a larger "market" of fellow coders result in more specialisation in the "one thing" done well?

A program doesnt getter better by doing the same task over and over again ( and i disagree with A.Smith anyway , a factory worker doesnt choose what process he is going to use to acheive his task , no matter how many times he executes his task , the way he has to work is decided by his boss, A.Smith is and has been proven wrong on many things like the invisible hand");

I've never read The Wealth of Nations, but Smith wrote well before the industrial revolution. So when he spoke of workers, he was referring to tradesmen and craftsmen. For them, specialization does allow them to get better by performing the same tasks over and over again.

In a way, the industrial revolution just changed the scope of his arguments, but I think the basic idea still holds. Now, instead of thinking in terms of a single worker, we could instead think in terms of a single factory. Each factory, comprised of many workers, would then be able to perform one specialized task well, and trade for the rest. It would be up to each factory to organize itself in an optimal manner.

In thinking in terms of a program, it wouldn't be the program that gets better, but the programmer. If they write a program to perform task X multiple times, that program will inevitably get better with each iteration. So, ideally, what you end up with is a series of optimized programs that can work together in different configurations in order to perform different tasks in the most efficient way.

At least, that would be the theory. I'll leave discussions of the invisible hand of the market for later :)

Smith's view of specialization is not the same as today's "interchangeable cogs" view of disposable labor. He's talking about skilled crafts- and trades-man who can both use specialized machinery and acquire skill in its use, as well as avoiding the task-switching costs of single-person fabrication. Smith talks of market clearing not in hours or days, but months and years. His view of labor mobility tends to be generational rather than of a person having multiple careers in a single lifetime, as is the present fashion.

And while a given software program on its own won't generally improve performance over time (absent some learning and adaptive algorithms, though these are still in very limited use), in an appropriate (e.g.: Free Software or other iterative / continuous improvement) environment, coders will make incremental changes to code, which is far easier to accomplish when a program does a single, clearly identifiable task (or can be divided into modules which perform similarly).

Smith's use of the "invisible hand" analogy is a grossly misinterpreted and minor element within The Wealth of Nations.

Code that is organized so that it minimizes context switches is ofen faster and more efficient than code that doesn't.

E.g. sometimes this code:

  foreach (x in xs)

   foreach (x in xs)

   foreach (x in xs)
is faster/more efficient than this:

   foreach (x in xs)

I disagree. When I write a program, I find more and more of bugs the more often I run it. Just as the tradesmen refine their skills, I'm refining my program. (Both are our outputs.)

In addition, by being small and focused, my program will expose more of its bugs for each run because a higher proportion of the code is exercised. (Assuming that we're not talking about the exact same inputs, just like a tradesworker has some variability to work within.)

Well , maybe the author should have figured out that the way Unix works is closer to functional programming than object oriented programming, before writing this blog.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact