Hacker News new | past | comments | ask | show | jobs | submit login
Software reuse is more like an organ transplant than snapping Lego blocks (2011) (johndcook.com)
667 points by nilsandrey 12 days ago | hide | past | favorite | 228 comments

This makes me realize: the one example I can think of where software has been "Lego-like" is Unix piping. I'm not a Unix purist or anything, but they really hit on something special when it came to "code talking to other code".

Speculating about what made it stand apart: it seems like the (enforced) simplicity of the interfaces between pieces of code. Just text streams; one in, one out, nothing more nothing less. No coming up with a host of parallel, named streams that all have their own behaviors that need to be documented. And good luck coming up with a complicated data protocol built atop your text stream; it won't work with anything else and so nobody will use your program.

Interface complexity was harshly discouraged just by the facts-on-the-ground of the ecosystem.

Compare that with the average library interface, or framework, or domain-specific language, or REST API, etc. etc. and it becomes obvious why integrating any of those things is more like performing surgery.

I think the best way to describe pipes to the uninitiated is in terms of copy & paste.

Copy & paste is the basic, ubiquitous, doesn't-try-to-do-too-much IPC mechanism that allows normal users to shovel data from one program into another. As simple as it is, it's indispensable, and it's difficult to imagine trying to use a computer without this feature.

The same applies to pipes, even though they work a little bit differently and are useful in slightly different situations. They're the "I just need to do this one thing" IPC mechanism for slightly more technical users.

> difficult to imagine trying to use a computer without [copy & paste]

Remember the first iPhone? I wasn't into it, but a bunch of my (senior developer) colleagues were. I asked them how they lived without copy & paste and they all told me it was just no big deal.

A phone and a computer are used in very different ways. Even now, I probably use copy/paste once a week on my iPhone, and at least once an hour on my laptop, often more.

The Apple “Universal Clipboard” as well as airdrop is another example of this: the ability to easily copy text and files from one device to another is game-changing compared to the infrastructure you need to run in other systems.

I use copy and paste on my phone constantly.

I would guess the reason you dont do it much on iphone is the lack of multitasking support.

Android alt tabbing has always been a basic part of life and now split screening is a thing too.

You lot getting widgets soon, maybe you'll get alt tab next year?

I use Android and iOS constantly (mobile dev), and iOS's multitasking on the iPhone X is as easy as Android's (swipe the bottom of the phone, same action as double tapping the app switcher icon on Android). Same with split screen, I use the all the time on iPad.

What’s alt+tab without a physical keyboard?

An easily-understandable reference for non-Android users. One of Android's buttons (the square) goes to the app switcher, it functions as alt+tab on its own.

we have that on iOS though, it's the swipe up and/or to the right gesture. It's honestly the fastest way to switch between apps on any device I've ever used. the android dedicated button feels clunky in comparison.

I understand it's a "hidden gesture" and thus all the UX complaints apply, but from pure observation most android users use the app switcher about as often as PC users use alt+tab... basically not as many as you think.

Most mobile phone users who are even slight technophobes will switch apps the old way from the home screen or app drawer. at least with the iPhone the "going home" behavior can accidentally surface the app switcher which after enough exposure might train such users to use it.

Idk, I kinda hate all the UX paradigms that android introduced, it often felt like they ignored much of what xerox parc taught as objectively good UX and tried to go it on their own, only to have to relearn all the same mistakes that Mac and Windows learned over the last 3 decades.

I do it multiple times a day on my Android. I've even taken to using ctrl-c/x/v in Hacker's Keyboard because it's faster.

I've used copy/paste on my phone at least ten times today, and it's barely lunchtime.

> iPhone > no copy paste

(ignoring everyone young enough to not remember that iphone first many years didn't have copy paste)

The iphone NEVER replaced a computer.

It replaced the television. No need for copy paste there.

The ipad is today trying, very shy, to hint at replacing a couple small use cases of the computer.

They did feel the need to implement copy & paste a year or two later, so something seems off here.

Rich Hickey addresses this in his famous 2011 talk, titled Simple Made Easy.

> Are we all not glad we don’t use the Unix method of communicating on the web? Right? Any arbitrary command string can be the argument list for your program, and any arbitrary set of characters can come out the other end. Let’s all write parsers.

The way that I think about this is that the Unix philosophy, which this behavior is undoubtedly representative of, is at one end of a spectrum, with something like strict typing at the other end. Rich, being a big proponent of what is described in the article as "Lego-like" development clearly does not prefer either end of the spectrum, but something in-between. In my opinion as well, the future of software development is somewhere in the middle of this spectrum, although exactly where the line should be drawn is a matter of trade-offs, not absolute best and worst. My estimation is that seasoned developers who have worked in many languages and in a variety of circumstances have all internalized this.

> Let’s all write parsers

And yet, at least for the Unix tools I've used, nobody did write an elaborate parser. Instead, they all ended up using newlines to represent a sequence. There were never really nested structures at all. That least-common-denominator format they were forced into ended up making input and output really easy to quickly understand.

Maybe the problem with Rest APIs is that JSON does too good of a job at making it easy to represent complex structures? Maybe we'd be better off using CSV as our data format for everything.

Agreed. Thinking in arrays is very powerful. Simple.

Say Unix tools returned JSON. What good is that without a schema? Instead of a parser, let's all write schema transforms?

> And yet, at least for the Unix tools I've used, nobody did write an elaborate parser.

sed and awk are the elaborate parsers that are used in complex unix command line scripting.

I think what the OP is getting at is that the people writing the other command line tools didn't have to write that logic (and thankfully we can instead centralize it in a limited set of tools like sed and awk)

I may be mistaken, but I assumed that they meant that no parsing needed to be done between components, and that they just stream together, which is not really the case. You often have to use sed and awk (eleborate parsers) between different components to get the necessary portions of data from one component into the other...they don't just fit together like legos.

Yeah but sed and awk are amazing “sharp tools” for this sort of task: they have a learning curve, but 100 characters of awk can do some amazing stuff.

There's some truth to that, although there's still something to be said for a lack of recursive structure. Regular expressions can only match a flat pattern. Working with a JSON expression, even if you already have a JSON parser on hand, is often going to add more complexity than parsing a flat line of raw text using a regular expression.

What trade-offs make the most sense are highly dependent on the problem being addressed. If there were ever an instance where text being the universal input and output type made sense, it would be in a Unix environment. That being said, what percentage of shell scripts in the wild would you estimate contain at least one use of grep or a regular expression in general?

Also, tools like jq, xsv and xmlstarlet really solve this so-called “problem”: you just need an widely-supported serialization Formats and good tools to focus on the subset of structured data and pull it out into easily-manipulated text.

> Instead, they all ended up using newlines to represent a sequence.

The biggest mistake in the design of Linux is that they allow newlines in filenames. Seriously, what the heck?!

> Are we all not glad we don’t use the Unix method of communicating on the web? Right? Any arbitrary command string can be the argument list for your program, and any arbitrary set of characters can come out the other end. Let’s all write parsers.

Um, but that's exactly what we do use on the web. Oh, sure, there's some really popular formats for input and output with widely available parsers (HTML, XML, JSON), and HTTP itself (as well as other protocols) specifies headers and other formats that need to be parsed before you get to those body formats, including telling you what those headers tell you about the other formats so you can no which parsers you need to write, or use if you can find one written for you.

There is also an audience mismatch. Most of the criticism of pipelines is from professional programmers.

A lot of the value of scripting, pipes and associated concepts is low barrier to entry.

I agree with the value of strong typing. But I also remember how infuriating it was to learn to work with a type system when I was learning to write code.

When I need a quick answer of some sort, iteratively piping crap at a series of "| grep | awk" is exactly what the doctor ordered. Sure, I could bullet proof it nicely and make it reusable by investing time the time to write it in something saner, but there's zero reason to - I'm not likely to ever want to perform the same action again.

> In my opinion as well, the future of software development is somewhere in the middle of this spectrum

Unfortunately this is complexity in and of itself. I don't disagree that different cases require different tools, however the split should be strongly weighted in one direction. Mostly IPC over pipes with a few exceptions, mostly REST and JSON with a few exceptions, mostly language X with a few exceptions. Everyone will have their own preferences, but I think it's important to pick a side (or at least mostly pick a side) or else you accept chaos

I think of it as a matter of scale.

- At the scale of 1,000 to 1M lines of code, authors/programmers have some control, and you're dealing with a time scale of months or years.

- At the scale of 10M to 100M lines of code, no organization is an author with complete control, and you're dealing with a time scale of decades.

In the latter case, you can't make global changes to the system, so you end up with glue in the form of pipes and textual data dumps.

You're going to have some chaos because you can't go and change every system to be consistent.

So I would say the future of software development is at BOTH ends of the spectrum, not somewhere in the middle.


Good material on the subject: https://www.dreamsongs.com/Files/DesignBeyondHumanAbilitiesS...

If you look at the code in a web browser or operating system you will see this "lack of control". Once a codebase reaches a certain size, there's only so much you can do with it, and you end up with loosely coupled glue around it.

I think a reasonably minimal Debian system has to have somewhere between 100M and 1B lines of code in it, and that whole thing becomes a blob that you dump on a virtually free computer like a Raspberry Pi, etc.

I think if you look at the software that banks and airlines run on you will see the same thing. There's some code from the 60's or 70's written in some weird language at the heart of it. You don't really get to architect the system; it's more accurate to say that the system places its constraints on you.

> Are we all not glad we don’t use the Unix method of communicating on the web?

Uzbl is a collection of "web interface tools" that adhere to the Unix philosophy, that come together to create a browser.


> Are we all not glad we don’t use the Unix method of communicating on the web? Right? Any arbitrary command string can be the argument list for your program, and any arbitrary set of characters can come out the other end.

He clearly uses a different web to the one I use. >.>

In practice it’s totally inscrutable. I never remember or even feel comfortable guessing at anything more than the most basic. Meanwhile, any typed library in language X usually works immediately with no docs given a decent IDE.

I would argue that it thrives in those most basic cases, and isn't really suited to building truly complex systems. But I also don't think there's anything wrong with that. There's a use-case for simple pieces that are easy to snap together, and I think that use-case has been greatly under-served because lots of things that aim for it end up as complicated, multi-faceted APIs.

You could almost say that micro-services are trying to follow in the Unix tradition. But the problem is that a) they don't really get used in that ideal, small-scale use-case because they're almost always written and consumed internally, not exposed to the public, and b) they do get used in those huge, complex cases where their lego-ness stops being a virtue and starts being a liability.

> starts being a liability

SaaS is all the rage now, and the business of SaaS is disincentivized from making "lego-like" software. The easier it is to interoperate, arguably the easier it is to replace or reimplement your service.

Because a lot of the value of unix tools is derived from their interoperability, they play nice with each other.

Yeah, that is part of the problem too. Any for-profit corp would love to lock people in, and software megacorps no longer need to interoperate for their own survival. So we see increased lock-in across most of the (commercial) software world.

An interesting inverse-example is Slack, which does lots of interop with other services, because it isn't a trillion-dollar company that can attempt to give you everything you need under a single roof.

> I would argue that it thrives in those most basic cases, and isn't really suited to building truly complex systems

Right, just like LEGOS thrive in simple cases, are not suited for building complex systems.

That is why the idea of "Software LEGOs" is not reality. Or you can say that Unix Pipes are "software LEGOs", with similar limitations.

In your practice you have favored an alternative. (not "in practice", implying absolute)

In my practice I have used both with great success. For logging, parsing, and displaying playback data from field systems, UNIX and UNIX-like tools have been incredible. VNLog in particular is a wonderful way to interact with data if you need just a bit of structure on top of unix outputs.

And anyway, getting from not-so-great data to something that a typed library in language X can parse is a great job for plain old Unix tools.

I’m saying in the sense of “time I put in to value I get out”, I’ve put quite a lot of time into bash, but even though I’ve done certain actions many times, I remember few.

I recently picked up Swift and it was a dream. Never used it in nearly the same degree, but I was highly productive quickly.

The thing is: I don’t think it’s a bad paradigm (everything a stream of text), it’s just the “IDE” sucks completely. If there was inline autocomplete, peek-ahead results, a “verbose mode” where the options are all more like typed full-English objects, instant fuzzy search of actions, multi line with an easy way to edit, etc, it would probably be fantastic and I’d be a 100x shell developer. It would also basically feel like an IDE at that point for a normal programming language!

In practice it's been massively successful for decades.

I think piping works well because it's an opinionated framework for IPC. The strong opinions it holds are:

1) data is a stream of bytes

2) data is only a stream of bytes

That's it. And it turns out that's a pretty powerful abstraction... Except it requires the developer to write a lot of code to massage the data entering and/or leaving the pipe if either end of it thinks "stream of bytes" means something different. In the broad-and-flat space where it's most useful---text manipulation---it works great because every tool agrees what text is (kind of... Pipe some Unicode into something that only understands only ASCII and you're going to have a lousy day). When we get outside that space?

So while, on the one hand, it allows a series of processes to go from a text file to your audio hardware (neat!), on the other hand, it allows you to accidentally pipe /dev/random directly into your audio hardware, which, here's hoping you don't have your headphones turned all the way up.

This example also kind of handwaves something, in that you touched on it directly but called it a feature, not a bug: pipes are almost always the wrong tool if you do want structure. They're too flexible. It's way the wrong API for anything where you cannot afford to have any mistakes, because unless you include something in the pipe chain to sanity-check your data, who knows what comes out the other end?

A bit of a tangent, but... audio hardware can't really be treated as a stream of bytes either.

You used to be able to cat things to /dev/dsp, but- that used something like 8khz, 8-bit, mono audio. That's horrendous. Because, with just a stream of bytes, you have to settle for the least common denominator- /dev/dsp had IOCTLs to set sample rate, number of channels, and bit depth, but... with just a stream of bytes, you can't do that.

Similarly, video data via /dev/fb0 - AFAIK you don't even have defaults there to rely on, to display anything useful you need to do IOCTLs to find out about its format.

When do you not want structure? Seriously-

Plain, human-readable ASCII text is maybe a candidate - but even then there's implicit structure (things like handling CR/LF, tabs...)

Unicode text? You know that's structure. (Ever had a read() call return in the middle of a UTF-8 multi-byte sequence?

CSV? That's structure.

Tab-separated columns? That's structure.

Fixed-width columns? Also structure.

You don't get to not have structure. Structure is always there. The question is whether you get to have a spec for your structure, or whether it's just "well, from the output it looks like column 3 here is always the hostname of the server I want, so I'll use 'cut' or 'awk' to extract it". That approach can work in practice, but...

Yes, the structure is always there. The trick with unix pipes is that interpreting the structure is always on the receiving side. I don't care if this is CSV or TSV if all I want is the number of lines. I don't care about concrete representations such as JSON or YAML if all I want is to change one string (like a hardcoded server name) to another (like a variable reference). Can you see that? How about this: why would one need a spec or schema in order to pipe an XML file through gzip?

Yeah, I think people underestimate the value of an “escape hatch” when dealing with structured data: a Java-style API can be great at controlling downstream developers and easing library maintenance. But, as a “end-developer” such APIs can be absolutely miserable to discover when you’re trying to quickly pull out a bit of data.

But pretending JSON or YAML is something you can safely "change one string" on is a recipe for disaster unless you have total control of the whole pipe chain, input to output.

How is it less safe than changing them in a (scriptable) text editor?

It's not. Neither is a safe approach. A safe approach to mutating JSON is something like a small amount of JS to read in the JSON, mutate it as javascript objects, and JSON.stringify the result.

Good point. Perhaps it is more accurate to say that with pipes, the structure is purely up to correct semantic interpretation to maintain, whereas other abstractions carry structure alongside the payload as part of the spec of the abstraction.

Of course, one can always botch configuring that structure. You never really quite escape the rule of "garbage in, garbage out."

> That's it. And it turns out that's a pretty powerful abstraction

But that's not what an abstraction is! UNIX pipes are maximally low-level, as close to un-abstract as it is possible to get, except perhaps if it used bit-streams instead of byte-streams. It literally cannot get less abstract than that.

UNIX pipes are completely untyped, like a C codebase that uses only void* to pass all data. (Okay, some way of indicating end-of-stream is also needed, but it's still a good analogy.)

> pipes are almost always the wrong tool if you do want structure

Not almost the wrong tool, entirely the wrong tool.

You can't magically shoehorn types back into an untyped system when all the existing components assume untyped streams.

PowerShell's structured and typed object streams is more UNIX than UNIX: https://news.ycombinator.com/item?id=23423650

> You can't magically shoehorn types back into an untyped system when all the existing components assume untyped streams.

If this were true, then a typed system couldn't exist on top of machine code and random access memory, because the existing components (i.e. CPU + random access memory) have no knowledge of the types you are speaking of.

I think the assumption is false. Types can be shoehorned onto untyped data.

> But that's not what an abstraction is! UNIX pipes are maximally low-level

To put a spin on it, maximally low-level within it's particular domain, perhaps, sitting on top of the transistors below them, sitting on top of quantum physics.

Thinking a bit further, an abstraction appears to be the mapping in both directions between two domains. That being said, the two domains themselves, are independent, and don't appear to be the abstraction.

The verb "abstract" would mean we are mapping from one domain to the other.

I think "homomorphism" is the mathematical term for what I'm describing.

What's my point? We can always go lower. :)

The pipe is itself a huge abstraction. If you need proof, read the implementation. Buffering, open vs closed, process suspension when the pipe is empty and resumption when there is data... Pipes hide a lot of complexity under the hood, even as simple as they are.

The pipe abstracts the transfer of data, not the data being transferred.

> I'm not a Unix purist or anything, but they really hit on something special when it came to "code talking to other code".

I agree, and it just hit me while reading your comment that the special thing is not just that you can plug any program into any other program. It's that if one program doesn't work cleanly with another, this enforced simplicity means that you can easily modify the output of one program to work with another program. Unix command-line programs aren't always directly composable but they're adaptable in a way that other interfaces aren't.

It's not great for infrastructure, don't get me wrong. This isn't nuts and bolts. It's putty. But often putty is all you need to funnel the flow of information this one time.

You put this very nicely.

This is why functional programming and lisps are such fantastic development environment, because you can use components (functions) that are not very opinionated about what they are acting on.

Another fascinating thing is how we as a community reacted to this simplicity. One thing in particular that I find interesting is the conventions that have been built up. Many tools don't just accept text streams but react the same way to a common set of options and assume line-separation among other things. None of these are defined in the interface but were good ideas that were adopted between projects

It's a good analogy because you can make anything out of lego - even a car - but it won't be anything good for real use, just toys.

BTW if I was making UNIX command line today it would use LinkedHashMaps for everything instead of text streams.

Is is not the same description for REST API? You pass in a text body and get back a text body? Everyone uses JSON instead of a complicated data protocol?

A UNIX pipeline is more like a set of operations on the same object passing through the pipeline, and typically that object is a text file. REST APIs represent an ability to interact with a repository of information with a really specific protocol, yet the response is not in that same format as the input. You couldn't pass the response of a REST API into another REST API without managing it externally.


    cat myData | toolStripsWhitespace > myData

    var myData = RestApi->read(myDataRecordID)
    var mutatedData = someMutation(myData)
    RestApi->update(myDataRecordID, mutatedData)
You could of course write your ORM in a pipeline-like manner, many do. But that's got nothing to do with REST itself.

    var myData = new SomeObject(RestApi->read(myDataRecordID))

    RestApi->update(myDataRecordID, myData->toJSON())

Ah, so you're looking for something more like

    var myData = RestApi->get('/records', id=myDataRecordID)
    myData = RestApi->post('/tools/strips-whitespace', myData)
So, you can certainly write such HTTP endpoints to work exactly as you've described, but it doesn't really make sense to make these network calls when you can just run such a simple function in memory. So, we usually say make a call to an api, do whatever manipulating we want in code, make a call to another api, manipulate, and so on and so on. If we removed the ability to manipulate in code and forced you to make calls to '/tools' endpoints for all of that, then we'd be in the same place as UNIX pipelines, no?

On any old API maybe, but not a REST API. It's ignoring the core ideologies of REST, which I'll get to below the code, but despite that it's also just not the same in real world usage either:

    cat myData | stripWhitespace | allToUppercase | removeTheLetterC > myData

    var myData RestAPI->get'/records', id=myDataRecordID)
    myData = RestAPI->post('/tools/strips-whitespace', myData)
    myData = RestAPI->post('/tools/allToUppercase', myData)
    myData = RestAPI->post('/tools/removeTheLetterC', myData)
It's outside the scope of a REST API to have transformation endpoints. A REST API is supposed to allow simple access to a systems state, most often records in a database, in order to retrieve, store and update that state. REST standing for Representational State Transfer, and transformations don't fit into that at all. It's not a transfer of state from one system to another if you are giving it your data and having it manipulate it then give it back, it's asking one system to do work for you and give you the result back.

Of course in the real world, you can put whatever endpoints you want into your API, and there are dozens of wildly different APIs being called RESTful, but regardless of any of that, rarely are they conducive to being used in pipes, because the point the GP was getting at is that tools are made in the UNIX ecosystem with the pipeline in mind, and that is not the case with REST endpoints.

Thanks for the discussion by the way!

Right, I don't know if it's even appropriate really to be trying to compare HTTP APIs and Unix programs like this. As you've stated, they really are meant to serve different purposes. So the original discussion of "i wish i was building with legos" I'm not sure quite sure how you would even apply it to HTTP APIs.

I do think that there is an interesting point around "designed with pipelines in mind" though. I don't know if that would specifically make things more "lego-like", or if that would even specifically be a good thing. But it would be certainly interesting to analyze the list of "average library interface, or framework, or domain-specific language, or REST API" and evaluate each of them along those lines, and consider what it would look like if each was designed them to be more pipeline-centric.

Or maybe the original post came more from a feeling of "it's easy for me to think in terms of pipelines, so anything that forces me to do otherwise feels needlessly complicated"?

But, if we start thinking in terms of pipelines, then that's essentially what functional programming is, run a function, get this output, pass it to another function, get its output, run another function, get its output. So, maybe the love for pipelining comes out of actually a desire for a functional core. So it'd be more like "i only want to think in terms of functions at the core and not have to worry about side effects, and unix does this on a program level".

So, of course, you can't entirely get away from side effects, otherwise you'd literally be doing nothing. So we push it to the edges as far as we can. The analogy for side effects for Unix then would be `> myData` as the last step. So you could say that Unix's philosophy is to have programs that form a functional core, with there being only one possible side effect, which then only has one possible placement, which is at the end of everything. Sounds like a functional dream world.

Back to the list, if we wanted to design "average library interface" with this idea, then maybe you'd say here is a package with all the functional methods. And here is a package with all the side effect methods. And the design is to run a bunch of methods from the functional package before finally making a call to the side effect package.

If we then loop all the way back to REST APIs (and I mean REST spec this time) then I think I could make an argument that REST does in fact follow this functional core philosophy. You have GET endpoints which are your functions, that let you retrieve resources. And then you can POST against objects, which are your side effects, which let you create or update a specific resource. The spec limits the types of side effects you can make, almost to the point of making them dead simple. So, we then have a clear delineation between what is a function and what is a side effect, with a very limited and clear number of side effects that you're able to create.

Unix piping is basically functional programming.

If you ever wondered why some people are obsessed with functional programming this is the reason why:

Functional programming forces every primitive in your program to be a Lego Block.

A lot of functional programmers don't see the big picture. They see a sort of elegance with the functional style, they like the immutability but they can't explain the practical significance to the uninitiated.

Functional Programming is the answer to the question that has plagued me as a programmer for years. How do I organize my program in such a way that it becomes endlessly re-useable from a practical standpoint?

Functional programming transforms organ transplantation into lego building blocks.

The "lego" is the "function" and "connecting two lego blocks" is "function composition".

In short, another name for "Point free style" programming is "Lego building block style" programming.

You can't just compose two functions because they're both written in a functional programming language. The programmer has to have the foresight to make their types compatible. I think the novelty of the Unix pipe for interoperability is that they (typically) work on one agreed-upon kind of data: human readable, white-space separated. So a lot of tools "just work" with each other.

There's no reason you can't do this with functional programming, but obviously you can do it with non-functional programming too, and you could certainly fail to do this with functional programming.

Yes, but you write a short function/lambda for connecting them. Just like how you have sed/awk, or the many required flags to make any non-trivial operation "just work".

You're not free from the need to specify how to parse the strings just because your whole human readable system is stringly typed.

Nicely put. That aligns very nicely with the philosophy behind Clojure and for instance less so with Haskell. In Clojure you usually use only a few data types, often (immutable) hashmaps. A program then simply becomes a series of functions that transform data.

Don't people clojure try to wrap primative types. Say an Address type? Or is that not a thing?

Anytime someone mentions a nice thing about Haskell, you can be sure a Clojure programmer will appear shortly, without being previously referenced, to try and disparage them.

Not sure where the hostility is coming from. I didn’t read anything disparaging about Haskell in hencq’s comment. He/she just made a factual statement about Clojure’s approach to data types being analogous to Unix/strings when it comes to composition.

It comes with the same drawbacks (the lack of static typing is similar to strings with Unix; you have to be aware of the details of the data coming in and out because there is no type system to save you - unlike Haskell), but the benefit is easier composition.

Yeah, I didn't mean anything disparaging about Haskell at all. I only mentioned Haskell as an example of a functional programming language. I tried to make a point (apparently poorly) about the easy composition that using a limited number of (dynamic) types enables. Of course all the usual dynamic vs static tradeoffs apply. Thanks for doing a better job explaining that :-)

I don't think dynamic typing or static typing really applies.

In either case you have to compose functions With the correct types otherwise you hit an error. The difference with dynamic types and static types is when this error occurs. Run time for dynamic types and compilation time for static types. Either way some type of error will always occur regardless.

Though I agree with you if you have a limited number of types things become easier to deal with.

This is something that also works well in the nicer parts of javascript. Now nobody is ever going to accuse javascript of being a good functional language (disparage away), but at its core, you've pretty have arrays, maps, numbers, strings, booleans and functions. An Object is really just a map of key-value pairs where some values are functions. You can .map() an array of objects to an array of strings by feeding it a function that turns an object into a string. It's really nice to build those little blocks and cobble them together into something that accomplishes something more complex.

I use to be pretty negative about JS, but typescript saves the day. The lego aspect of functional programming always becomes more evident when you have static types.

I've been getting more mixed about Typescript lately. Quite often, you end up with everything being `any`, and every time you write something, the compiler makes you hunt down the missing types.

But being able to specify interfaces for lambdas is absolutely fantastic. Something like: "this function takes an object and returns a function that can take either a string or a number and returns a number" is easy to do in Typescript, and useful for figuring out where the lego block fits.

Clojure's static analysis tools can warn you of incompatible types

I.e. this will warn (inc "hello") in clj-kondo

It was kinda funny, and not nasty at all. Let's assume good faith here?

I mean what he describes is literally the same thing as haskell. Just untyped.

The message passing protocol in pure data is white space delimited with semicolons ending commands. FUDI is underappreciated, and enabled 'dynamic' programming of PD despite the available code having been built almost entirely with a mind towards instantiating all memory use ahead of time.

Good interfaces enable unexpected things.


If languages adopted structural typing, rather than nominal as the default, then it would be much easier to align the types (e.g. projection/renaming of columns). Most FP languages have a limited form of structural typing with tuples.

Objects in OCaml are structurally typed as well.

>You can't just compose two functions because they're both written in a functional programming language.

Obviously not, the types have to be compatible just like how a lego piece must be compatible.

>The programmer has to have the foresight to make their types compatible.

Just like a lego builder needs to have the foresight to see whether two lego components are compatible. The analogy still fits.

> I think the novelty of the Unix pipe for interoperability is that they (typically) work on one agreed-upon kind of data: human readable, white-space separated. So a lot of tools "just work" with each other.

The agreed upon data is just a type. Unix pipes compose functions that take in strings as input and deliver strings as output.

The spaces is just an oversight, the whole thing would have been more clean if the type was designed to represent multiple pieces of data like a tuple of words rather than a single string with words divided by spaces.

>you could certainly fail to do this with functional programming.

Just like how two lego pieces can't compose if they're not compatible pieces facing the right direction. Sockets and plugs must be compatible. A functional program still fits the analogy of legos perfectly.

Take a look at lego pieces here: https://brickarchitect.com/2019/2019-most-common-lego-parts/

There exists lego parts that can never compose with certain other parts. Just like functions and function composition.

What you're not seeing is how other styles of programming such as OOP or procedural programming fail to fit the analogy and literally become like organ transplantation.

Let's take a look at object composition. How does that work in the context of legos? It's like a lego block with a mutating hole in it and once you can put different things in that hole and that changes the overall lego block.

Also with OOP and procedural programming you get lego blocks that can mutate. Lego blocks can transform themselves and other lego blocks. They form an interconnected network of entities that are constantly mutating.

To fit the analogy of a lego block you need unchanging blocks.

The entire field of Procedural programming and OOP is basically the art of building systems with mutating primitives that constantly change. Imagine constructing buildings using bricks that change shape. Now imagine refactoring code that does this.... Organ transplantation.

Also you really just need to try it. Doing the type of grafting and refactoring that's an intrinsic part of programming literally becomes lego-like once you program using the functional style. To really see the big picture I suggest you exclusively try the point free style using Haskell.

That being said. There are still massive tradeoffs to doing functional programming. But if your goal is to program like you're using lego blocks, functional programming is the path.

Also with OOP and procedural programming you get lego blocks that can mutate. Lego blocks can transform themselves and other lego blocks. They form an interconnected network of entities that are constantly mutating.

That kind of sounds like a structure of living cells. They're more complex than a structure of lego blocks, but they can do a lot more too.

Don't take that as an argument in favor of OOP though. Building with legos is a lot easier to get right than building with cells would be.

You can create building blocks of unlimited complexity or pure simplicity.

The key lies in your choice of primitives. Do you start off with a set primitives that are extremely complex or a set primitives that are simple but complete in the sense that you can build entities of unlimited complexity by composing simpler primitives?

I would say the latter method is easier. But there's no reason why the former method won't work. Biological evolution has built many examples of working machines with primitives of unimaginable complexity.

Yes, it did. But it also had a couple billion years of trial and error to do so.

Basically, it's the infinite monkey theorem.

Yeah, I'm just saying it can work and it can work remarkably well and efficiently. It's just probably too complex for our limited intelligence to handle.

I think this is a rather simplified and naïve analysis. Getting functional programs and functional APIs to compose well with each other is just as much a challenge as in other language paradigms. Just because the logic is organized as functions doesn’t magically make things fit together. Your APIs need to speak in a consistent way as well, and the arrangement of your data needs to be the same or easily convertible between your “Lego pieces“. Having spent many years of my career writing functional code all day, it is just as easy to make a mess of things in functional programs as it is an object oriented programs. I do not believe either is inherently better at creating the “Lego“–style.

It has to be pure functional programming to get the full benefits of compositionality. Most mainstream "functional programming" is not necessarily pure and side-effects are not controlled/tracked. Analogous to how "mostly secure" is not secure, "mostly functional" does not get the full benefits.

Pure functional programming is complex, one has to compose effectful functions differently to pure functions. But the pieces do really fit like Lego bricks, especially when the same mathematical abstractions are used consistently. The Haskell community has been extremely effective at creating such consistency, by promoting various abstractions using category theory as a guide. The object-oriented community is not so different in this regard with their promotion of "patterns".

I've been writing Haskell professionally for 8 years. The problems with Haskell are the tooling, language stability, the learning curve and the difficulty of reasoning about performance, especially space usage. But composition and re-use works.

I'm sorry but this is a lot of shallow hyperbole. It doesn't matter if your functions are pure if they return a unique data type for your API or library that is not understood easily by the caller. So the solution is to make that data easily convertible or generic.. and guess what? That's no different than doing the same in an OO language. In the end, lego pieces are ultimately about the data itself, not how it is manipulated.

The key ingredient in making software reusable and portable is the talent and experience of the engineer, regardless of language.

> It doesn't matter if your functions are pure if they return a unique data type for your API or library that is not understood easily by the caller.

If it's an abstract data type, it doesn't have to be understood by the caller, it just has to be passed to another "lego brick" which understands the abstract interface. If it's a structured data-type, then there's no reason why it shouldn't be understood by the caller, the type should tell you how to consume it. I do not understand your point.

> So the solution is to make that data easily convertible or generic.. and guess what? That's no different than doing the same in an OO language

I guess you mean structured data types here? But you have missed my point completely about side-effects, it is side-effects (coupling via back-channels) that prevent composition, in general, in an OO language. OO languages also typically have an obsession with nominal types that can impede reuse.

> The key ingredient in making software reusable and portable is the talent and experience of the engineer, regardless of language.

You appear to be suggesting that languages/tools don't matter? Why not aspire towards languages that encourage safe composition and re-usable software? Your argument reduces down to "good motorcyclists don't need helmets".

All of your points make great sense in theory, but in practice it rarely works out so smoothly.

> If it's an abstract data type, it doesn't have to be understood by the caller, it just has to be passed to another "lego brick" which understands the abstract interface. If it's a structured data-type, then there's no reason why it shouldn't be understood by the caller, the type should tell you how to consume it.

And this is exactly how any well constructed OO API works as well.

You can have really good and really bad APIs in any language, it really is up to the skill of the developers. I firmly believe that, there’s no language or tool that suddenly makes you create better things.

I have in my jobs interacted with high profile robust APIs in C++ as well as functional languages. They can be a joy to use when designed well, regardless of language.

The goal of OOP is also to be modular and composable. I think the thing that is a good design choice is modular and composible. It's what the industrial revolution / assembly line was based on. It's what vim keybindings are based on. Heck, it's what programming languages themselves are based on. Here are some simple tools that do easily understandable things together, now put something together with it. Lego bricks are fun and useful and it's not the domain of any one area of CS.

>The goal of OOP is also to be modular and composable.

Well it depends on the type of composition you're talking about. Anything when mashed together hard enough can compose.

You really just turned functional programming around for me. I learned CompSci object oriented (C++) but have always loved how easy data analysis was in unix output. Cheers for making me want to give it another look!

Cool! Glad I was able to convince you. Just note that a lot of people find the functional style kind of unnecessarily mind bending. Like the functional style primarily uses recursion instead of loops for iteration which is harder to reason about (and less performant). I actually agree with them, but I feel that these people got lost in the details and missed the big picture about the benefits of functional programming from the perspective of modules, design and program organization.

I recommend you try building something complex with the point free style using Haskell. It will be a bit mind bending at first but if you get it then you'll see how FP is basically the same thing as using legos to build data pipelines... exactly the style used with unix pipes.

> Unix piping is basically functional programming.

Only in the same sense that all computing is Turing Machines or NAND gates.

This is a very common misunderstanding, but there is a reason that FP, which is transformational, had to adopt dataflow in order to sort-of handle reactive systems.

Functions run to completion and return their result. Filters tend to run concurrently and, importantly, do not return their results, they pass them on to the next filter in the pipeline.

It is possible to compose any two filters. It is not possible to compose any two functions, not even close, the default for functions is to not compose (arity, parameter types, return type,...).

>Filters tend to run concurrently and, importantly, do not return their results, they pass them on to the next filter in the pipeline.

Filters are functions. They are one and the same. For unix it's a function that takes in a string and outputs a string. The unix world reduces everything into a singular type. You will note that unix "filters" cannot compose with functions of "other" types either. If unix included types like arrays or ints in stdout you would have the same typing problems as you have with functions.

In the unix world all your types are strings so things seem simpler, but in reality your little unix programs need to deserialize the strings into proper types in order to do anything meaningful. You could achieve the same thing if you used functions that returned strings all the time then did internal deserialization but that's not really a solution.

I do get your point about arity and parameter types. You're referring to the fact that not all functions can compose UNLESS they have compatible types and an arity of 1 in the parameters and an arity of 1 in the return value (Golang for example can have arity > 1 in the return value).

However this arity issue doesn't really exist. A function with Arity of two is isomorphic to a function of arity 1 that takes in a 2-tuple as input.

  f(a, b) -> c
  f'(Tuple[a, b]) -> c

  g(d) -> Tuple[a, b]
Both f' and f are equivalent in theory. Plus you can compose g with f'.

The only compatibility problems with composition among functions in the end is just types (not arity) but this problem still exists even in the unix world... it's just hidden from you because you don't actually watch the programs perform deserialization and serialization.

> Filters are functions

As I explained before: no they are not.

They have some similarities, but they also have the important differences I outlined above.

The single type is also a restriction relative to the functional model, and crucial to (syntactic) composability.

> If unix included types like arrays or ints in stdout ...

But it doesn't...

> you would have the same typing problems as you have with functions.

...so you don't.

> Both f' and f are equivalent in theory.

But they are not actually equal (never mind the same), at least not in typed FP, and most FP is typed.

> it's just hidden from you

Exactly. It is hidden in Unix and not in FP. That's a difference.

> You could achieve the same thing if you used functions that returned strings

Two points:

(1) you write that you "could" achieve the same thing. Meaning that, once again, they actually are not the same. Otherwise the "could" wouldn't make sense (and it does).

(2) returning strings would not be the same at all, because filters do not "return" their results. They return status codes. The result is not returned, it is passed directly to the next filter, continuously.

So: it is good to recognize that 'X is similar to Y', but that doesn't imply that 'X is just Y'.

>Functions run to completion and return their result. Filters tend to run concurrently and, importantly, do not return their results, they pass them on to the next filter in the pipeline.

Let me address this point so it is more clear. Functions by definition do not need to "run" at all. The underlying implementation of a programming language may "run" a function lazily, eagerly or concurrently but that is not what the Definition of a function is describing.

A function is simply a relation between two sets, one set called a codomain, another set called a domain is deterministic. That is all.

So within unix piping the domain is stdin, the codomain is stdout, the function is the unix program. This paradigm fits the definition of a function and as long as that "function" remains deterministic and compose-able it fits within the paradigm of functional programming BY definition. there is no need to discuss "similarities" here, because it fits the definition therefore it is functional programming. Of course we can get pedantic about certain non-functional use cases but in basically you are fully aware of the generality I'm describing. No point in getting pedantic.

Also keep in mind when you "execute" these functions it does not matter HOW these functions are executing whether concurrently, lazily or eagerly. The only important thing is the relation between the domain and codomain. That is all.

>The single type is also a restriction relative to the functional model, and crucial to (syntactic) composability.

Every unix program must internally deserialize this string type and serialize it again for output. The typing must be handled regardless. It seems like you don't need to deal with it but it's dealt with regardless. You also get inevitable deserialization problems of incorrect types. You do not escape typing in the unix world.

>But they are not actually equal (never mind the same), at least not in typed FP, and most FP is typed.

A relation that takes a string from a domain and returns a string that is from a codomain is a function that is part of functional programming. This is EXACTLY what a unix program does when relating stdin to stdout and therefore it IS a function and it is FP By definition.

You can look at it from another direction. A function that returns an Array does not fit the definition of what a unix program should output to stdout therefore a function IS NOT a unix program. BUT a unix program IS a function.

Formally unix programs are just the set of all functions where the domain and codomains are strings.

>>Both f' and f are equivalent in theory. >But they are not actually equal (never mind the same), at least not in typed FP, and most FP is typed.

The mathematical term is "isomorphic" meaning you get the EXACT same properties going one way vs. the other way. It's the syntactic sugar that prevents composition. Using multi-arity functions and single arity functions with tuples only involves two additional parenthesis:

   f(a, b)
   f'((a, b))
It's just syntactic sugar that causes a difference to occur.

Arity and tuple types are well described to be isomorphic in category theory. I'm not making this up.

One way to think about an isomorphism is that if two things are isomorphic then they are just different perspectives of the exact same thing. Any differences are superficial or illusory.

>Exactly. It is hidden in Unix and not in FP. That's a difference.

Exactly what? If you're not dealing with the type conversions someone else has too. Just because it's hidden from you doesn't mean it doesn't need to be dealt with. Someone has to serialize things and deserialize it. Even from the command line level you cannot escape types. Let's say you have a unix program that can only take numerical strings from stdin. Let's say you pipe english letters to it... if you do then you will have still triggered a sort of type error despite everything being typed as strings in unix. It's just not handled explicitly as a system error.

You cannot escape typing even in the unix world because it is in the end just FP.

>So: it is good to recognize that 'X is similar to Y', but that doesn't imply that 'X is just Y'.

Except we're not talking about similarity. Like I said above X is just Y BY definition.

You keep claiming things are the same and then describing differences.

Not sure what else I can do to make you understand this.

As such, I am bowing out as I can only repeat what I’ve written.

No I did not do what you wrote.

Here's what I did: I defined what a function is, then I described how a unix program fits the definition of what a function is...

In short a unix program with stdin as domain and stdout as a codomain is a function but a function is not necessarily a unix program.

I think what's actually going on is you didn't read what I wrote very carefully. You sort of just skimmed over it. I can't blame you, it is rather long and detailed...

But if you want to have a meaningful discussion you need to read it and ask questions about things you don't understand.

As I wrote before:

> > Unix piping is basically functional programming.

> Only in the same sense that all computing is Turing Machines or NAND gates.

Yes, you can map and analogize long enough until you reach a point where you find what you perceive to be equivalences.

However, what you've done at that point is rediscover that all these mechanisms are computational and thus at some level equivalent and transformable into each other, just like yoo can implement all of this with just NAND gates.

You have not shown what you claim. ¯\_(ツ)_/¯

Want to build web apps from reusable blocks? Reason at a higher level about chatrooms, roles and permissions, credits? That was the thinking behind our open source project:


Reusability on the web. Here is where we are going:


Is there a sample project somewhere on Github? Would be helpful to see a concrete example of use to get a better idea of it.

> Unix piping is basically functional programming

Except you litterally use side effects to communicate. Not really FP, that part.

How so? While programs themselves could have side effects, the pipe -- if its elements are deterministic which is indeed not true of everything -- is just a series of composed transformations of text.

In a pipeline like

seq 100 | number | grep '[a-z]' | sort | tr -d '.' | tr a-z A-Z

every component is a pure function (byte strings -> byte strings) and the pipeline's results are totally deterministic [if you don't change the locale or collation order with environment variables]. No component of the pipeline makes any change to the filesystem¹.

¹ although both stat(2) and inotify(7) mechanisms could allow other processes to detect the interactions with the filesystem that are caused by running the pipeline, among other process-related mechanisms that system as a whole could use to detect how many times the pipeline has been run, so Unix itself is definitely not side-effect-free for almost any operation

> [if you don't change the locale or collation order with environment variables]

Or any other state change.

Because it's not stateless.

Hence not FP.

Most of those changes might be seen as akin to going in to your Haskell interpreter and changing definitions in the standard library or something. (Environment variable changes are weird because they are so local and ephemeral, but you still have to do them deliberately.)

Clearly not all command lines you can type in the shell are stateless (reading the disk, network, stdin, clock, or system RNG, among other things, isn't, and none of those things are rare or hard to do) and the shell doesn't specifically encourage you to write stateless code and it doesn't have a formal way to verify whether or not a particular command like is stateless or not. All of these properties are pretty similar to most popular FP language interpreters except for the last two. I guess it is a relevant distinction that your FP interpreter would typically know (as a matter of static inference) whether an expression is stateless, while your shell would typically not know.

Unless you use some stateful program or you're writing to a file typically a unix expression that uses purely reads and pipes is deterministic meaning you run the same expression twice you get the same result.

If you want to get pedantic fine, but the majority of use cases is stateless.

It's kind of an interesting pattern to think of having the pipeline as a whole be purely deterministic, but its initial input be something from the environment (kind of like the Haskell IO monad, right?).

Probably most complex shell pipelines follow this pattern in practice, but the first counterexample I thought of was shuf, or sort -R. Also, many pipelines that include any kind of interpolation or variable substitution also don't follow it.

A different insight about reasoning about pipelines might be that the byte string (or ASCII string) type is too weak to catch most kinds of errors, and it's not uncommon for one pipeline component to not, in fact, be a total function with respect to the actual data type that you're trying to capture (I don't know the right terminology for this). This most famously happens when you do regular expression substitutions but your regular expression doesn't actually match the full grammar that you're looking for. Then the pipeline can be incorrect as a whole for some inputs, but earlier and later stages don't notice. It can also happen anywhere that different tools have a different implicit understanding of the relevant grammar or structure.

That connects up with all sorts of other ideas which are actually about type safety and parsing more than FP. For example, PowerShell has tried to take the pipeline concept in a different direction, to the consternation of us Unix purists. Its use of typed objects in this context makes more explicit what the contract between programs in the pipeline is supposed to be. There is also a LANGSEC connection in terms of the risks of informal or underspecified parsers and grammars.

I know I've personally written lots of Unix pipelines that were correct for all the inputs that I personally threw at them, but definitely not correct for every possible input. I like to use ! in vim frequently to shell out to a command line to perform a text editing task, and vim has an associated undo and redo which means that sometimes I'm trying several variants until I find the one that successfully appears to do the edit that I intended. Sometimes I do this at an almost preconscious level, which is really strange in terms of thinking about the concept of the correctness of a program (in this case, where the program is literally only going to be used once, with the programmer looking over its shoulder).

Haskell hates side effects, but not functional programming. FP has been fine with side effects for decades - even SML, the precursor the Haskell, has them, although forwned upon.

One can say similar things about interfaces in object-oriented programming.

Not really. OOP interfaces are usually much more intricate than FP interfaces. You might say that they are less like a lego interface and more like electronic connectors. There are several different electronic connectors types, and they are designed so that you physically can't plug the wrong things together. OOP is like that - a more complicated interface that makes it impossible to connect anything-to-anything the way legos can.

On the other hand, if you're trying to build something large (a house, say - a real house, not a toy one), you don't want the lego interface. Sure, you can plug anything to anything, but not all of those connections make sense. Also, for building something that large, legos are too small a building block to be convenient. You want some larger things - beams and sheets of wood and particleboard, pipes, air ducts. You don't want to have to build all of those out of tiny blocks.

In the same way, I wonder if building larger applications out of FP is going to be similarly tedious. I have never done it, so I will admit that I don't know.

One thing I appreciate about John D. Cook's blog is that he doesn't feel the need to pad out what he wants to say.

Here, he had a thought, and he expressed it in two paragraphs. I'm sure he could have riffed on the core idea for another 10 paragraphs, developed a few tangential lines of thought, inserted pull quotes -- in short, turned it into a full-blown essay.

Given that his blog serves, at least in part, as an advertisement for his services, he even has some incentive to demonstrate how comprehensive and "smart" he can be.

His unpadded style means I'm never afraid to check out a link to one of his posts on HN. Whereas I will often forego clicking on links to Medium, or the Atlantic, or wherever, until I have looked at a few comments to see whether it will be worth my time.

"I enjoy that John D. Cook doesn't pad his posts."

Please edit all of my writing. ;-)

Check out The Elements of Style by William Strunk Jr. He talks about how to write and stresses the importance of being concise.

I suspect the point was that brevity is harder than it looks.

Be concise. Be forcible. Have a plan to kill every sentence you meet.


This is my preferred summary of Strunk and White.

He's a concise blogger.

He blogs concisely.

cncse blg

Its an interesting statement, but how much discussion can we get from it as an audience?

I haven't thought about this very much, and there is a lot I'm curious about that he hasn't elaborated on.

What are the signs of rejection? Whats an example of failure, are there examples of that wonderful modular behavior that he admires?

Its a nice way to introduce a thought or observation, but I want to know more about why he thinks that, not what he thinks.

Honestly I was on the fence about clicking the link until I saw where it was from--his content is reliably interesting and straight to the point. If it was on Medium I wouldn't have even bothered and, like you, would have gone to the comments. The compression is lossy but it's a great filter for crap content.

There was a science teacher in my high school that used to have a rule about doing things similarly. For any kind of a lab report, instead of "you must write a report of at least 3 pages", it was "your report must not be more than 2 pages long."

Not only am I sure it made it easier for him to grade, but it really forced students to write concisely about their work.

For what it's worth, as an advertisement for his services, conciseness is better. It's easier to disagree with parts of a detailed opinion than with a vague general statement.

You can then project your own opinions into the general framework, and you find you fully agree :)

As a consultant, "I 100% agree with you, you understand me" is exactly the feeling you want.

He writes the long articles that show off his smarts in fairly specialized areas, where you need to be an expert to disagree.

It's really clever, and I'm curious if it's intentional on his part, or just his style.

A few links would have been nice -- e.g. to any serious comparison of LEGO to software components.

This is the first time I've seen one his posts. It caught me really off guard but I completely agree with your sentiments. It is refreshing to see and I wish more blogs took this method to heart

why lot words when few ok

Haskell is much closer to the lego blocks analogy than most languages I've tried due to the focus on composition and polymorphism.

The teetering edge that grinds some people's gears are monads which don't compose generally but do compose in specific, concrete ways a-la monad transformers. The next breakthrough here, I think, is going to be the work coming out of effect systems based on free(r) monads and delimited continuations. Once the dust settles here I think we'll have a good language for composing side effects as well.

In the current state of things I think heart-surgery is an apt metaphor. The lego brick analogy works for small, delimited domains with denotational semantics. "Workflow" languages and such.

I like Haskell, I write Haskell at my day job (and did so at my previous day job), and I help maintain some of the community build infrastructure so I’m familiar with a large-ish graph of the Haskell ecosystem and how things fit together.[0]

I don’t really think Haskell is _meaningfully_ superior than other languages at the things that OP is talking about.

Refactoring Haskell _in the small_[1] is much nicer than many other languages, I don’t disagree on that point. Despite this, Haskell applications are _just as susceptible_ to the failures of software architecture that bind components of software together as other languages are.

In some cases I would even suggest that combining two Haskell applications can be _more_ fraught than in other languages, as the language community doesn’t have much in the way of agreed-upon design patterns that provide common idioms that can be used to enmesh them cleanly.

[0] I’m mostly belaboring these points to establish that I’m not talking out of my ass, and that I’ve at least got some practical experience to back up my points.

[1] This is to say when one refractors individual functions collections of interlocking abstraction

I think what OP was hitting on is that functional programming likes to put function composition front-and-center.

Glomming together functions that operate on very abstract data structures feels a lot more like Legos than wiring traditional imperative/OO code.

> Despite this, Haskell applications are _just as susceptible_ to the failures of software architecture that bind components of software together as other languages are.

I think it's more complicated than this. Yes, you can push poorly-architected Haskell to production & be in a rough spot. However, my experience says that even the gnarliest Haskell is easier to improve than any other language.

Because of the types, purity, etc, I find that it's much easier to zoom around a codebase without tracing every point in between. I can typically make one small change to "crack things open" [1], follow GHC's guidance, and then go from there. I've been able to take multiple large Haskell projects that other engineers deemed unfixable (to the point where there were talks of rewrites) & just fix them mechanically and have them live & improve continuously for years to come.

The big thing with Haskell IME is you don't really need to have design patterns that everyone follows. I don't freak out when I see multiple different idioms used in the same codebase because idgaf about folk programming aesthetic. If an idiom is used, I follow it. It's all mechanical. I barely use my brain when coding professionally in Haskell. I save it all for the higher-level work. Wish I could say that about professionally programming in other languages of equal experience :/

So while it's just as susceptible (because good vs bad software architecture is more a function of time & effort) it's also typically pretty braindead to fix.

[1] A favorite technique is to add a new case to a key datatype and have its body be Void. Then I just follow the pattern match errors & sprinkle in `absurd`. I now have a fork in the road that is actually a knowably a no-op at runtime.

WAI is a great example on the sort of compat/interop interfaces that are more common and easier to rollout in Haskell than in non-(typed functional) languages.


Like WSGI or RACK?

What things make WAI easier than the equivalent abstraction in $other_languages?

(Not going for a "gotcha" tone here, genuinely interested as a person who has tried and failed a few times to write "useful" software in Haskell. Would like to get there one day).

A WAI `Application` is defined as the following type alias:

  type Application =
    Request ->
    (Response -> IO ResponseReceived) ->
    IO ResponseReceived
...which is to say that WAI applications are “just” functions that accept an incoming request and a callback that turns that request into a response object (which may emit side effects), and which return a respond object (which may emit side effects).

This leads to a very nice definition for middleware as the following type alias:

  type Middleware =
    Application ->
...which says that any WAI middleware is just a function that turns one application into another.

This means that WAI applications have a nice and easy to understand top-level interface, and that complex chains of WAI middleware can be built up by chaining smaller middlewares together.

The potential benefit here (over other language frameworks) is that the “grammar” being used to describe applications and middleware is the same “grammar” that’s used in most other Haskell applications (i.e. function composition). Ideally, this should make it more easily understandable to a Haskell practitioner who might not be intimately familiar with the framework at first glance.

Not to be trite, but uh... like WSGI or Rack?

I think the specifications are functionally equivalent and differ only in their approach. I think WAI is influenced directly by WSGI/RACK.

You've got far more Haskell experience than I do, but I have done some pretty heavy refactoring on large java codebases. The process always seemed to be, tease out some interfaces and switch the implementation of those interfaces. Over and over and over. I could lean on javac and tests but some knots are hard to untangle and take a long time.

I believe you that the in the large it's still hard. It seems so much more pleasant day to day untangling that big ball of string with Haskell rather than Java.

That’s the whole point of functional programming: composition of small things to make bigger things.

Functional programming does it better - but it still suffers from the issue that when developing a programming solution we developers need to account for all edge cases (if we're doing it right) which requires a lot of decisions to be made. The important decisions for the use of the module will be carefully made - the less important decisions will be arbitrarily made. Almost no uses of the module will encompass a need for every edge case to be decided in a specific direction but when that module is reused the new consumer will probably have a slightly different requirement about which decisions go which ways - this, I think, is the central pain point of software reuse.

Completely agnostic problems do exist and modules to solve those can be very strong - but that is a small subset of all the problems we want modules for.

Careful decisions can scale with abstractions, if they're made with some rigor.

In Python, for example, whenever I'm doing operations on lists, I need to consider every function in isolation - will this break for an empty list (zip), or will it break when given a generator (anything that traverses twice)? If I pass a callback with side effects, how will it behave differently if someone calls it twice?

In a a more rigorous ecosystem, I can instead only worry about edge cases on classes of functions/data types - In Haskell, when doing the same operations, I can generally assume that traversal functions were written with consideration for functor/foldable/traversable laws, and those laws tell me how I can safely compose or swap them out without breaking expectations around those edge cases. Higher Kinded Types let code be constrained to work strictly with those abstractions, so I mostly just need to worry about "does this typecheck" and, very rarely, "is this an unlawful traversal". With things like linear and dependent types entering this mix, library authors can communicate even more subtle expectations at compile time instead of leaving users to try to anticipate them or find them at runtime - when you do need to explicitly consider edge cases, the turnaround time on discovering them is important.

I don't disagree - these sorts of questions can be worked out to an academic degree with rigorous proofs. Databases and data guarantees around them are very widely applicable and have been isolated from business logic in modern applications due to the heavy complexity of that logic - once upon a time it was common for applications to "roll their own database" i.e. build some custom data storage approach that worked well enough (hopefully) for their purposes. I worked on a MUD based off a legacy that began in the 80s that used flat-file storage as a primary data storage method and race conditions abounded.

However, databases are cost effective to build and distribute, but implementing that level of rigorous proofs for your e-commerce app is not only generally undesired but often increases the difficulty in reaching profitability and can sink a company. Working on system development isn't limited to academic concerns, if you're guiding project development you have to make really hard calls about correctness vs. cost and I say this as someone who likes to be a perfection and do things the right way whenever possible.

So I don't disagree that careful decisions can scale with abstractions - it isn't the case that it is impossible to build a well designed large system, but it will be expensive and at the end of the day we all (probably?) need to balance cost of investment vs. value of that investment - or are having someone higher up the chain making that decision for us.

I think learning to program for correctness has a cost, but it's a skill that stays with you. Once you know how to properly model things in a composable way you can definitely work just as fast or even faster as when you gave no care for correctness. And frankly, shipping an incorrect program isn't really necessarily a worthy tradeoff as you are simply deferring problems to the user or to later down the line. And you don't need rigorous proofs to enjoy the correctness protections from strongly typed languages like Haskell. The language does that for you, you just have to learn to model in them which is a one time thing.

I didn't need an academic degree to start working in Haskell and learn abstractions such as "Traversal" or any of the others. Most of them are painfully simple compared to the impression one gets from their names and the jargon that comes with the territory.

However the benefits of sticking with those names and jargon is that it only needs to be explained once. Analogies and metaphors help when you're starting out but you don't need to keep them around. It makes speaking precisely with colleagues in the ecosystem easier.

Proofs are on another level. I've also written proofs and they require much more rigor than you will find in Haskell. It can still be done without a degree but I agree -- not useful for most line-of-business applications.

However the power-to-weight ratio of learning the abstractions available in Haskell is smooth. It takes effort to acquire but it pays off in spades. That's where the "scale" comes from.

Polymorphism combined with composition means that once I know a type implements "Traversal" I can use the entire language of traversals with those data types without knowing anything else about the value I'm working with. The more things that implement traversal the more rich my program becomes.

You do get this idea of composition in the form of generics or template metaprogramming, not lost on me there. Type classes in Haskell, in my humble opinion, are better at consistency, keeping things coherent, and requiring less nominal boilerplate.

Yes but that's what composition solves. If the module as a whole cannot be used, you can just recompose the parts. As long as you've made clearly defined modules/functions all the way down with clearly defined inputs and outputs.

To be fair, that was also the major point of OO, and of structured programming before that.

Could you elaborate?

I'll go one further. Can you think of a single programming paradigm in which "composition of small things to make bigger things" isn't a fundamental design principle? A paradigm where making big things from small things is actually an anti-pattern?

Well if I were doing object oriented modeling I may have a `class Person` and some methods that operate on some Person data specific to my model. For example, I may model the person with name, age, job, and gender as a boolean... but in 2020 I may have to then reimplement the gender field entirely. If I later decide to have a `class Pet` it may share some common fields except for the "job" field. Having done this I may now do the opposite of composition in that I'm now factoring out the common properties of a Person and Pet and breaking them into smaller components rather than assembling them from smaller components. If I were doing this functionally and strictly typed I'd have started out with the leaf types of age, job, and gender that were agnostic to their owners (Person or Pet). They'd be implemented in pure way without dependencies or knowledge of a Person or Pet, can be tested and developed before any part of Person or Pet are implemented. Yes you could also compose classes, which are bundles of functions... but if you take that to the next level you'd get function composition.

It's not that you can't do composable software with OOP... it's rather that functional programming is all about composition.

You don't need haskell to make applications compose

See JVM (garbage collection), React, Datomic

Functions and scalar values is probably enough

And yet back in 1993, Visual Basic programmers were able to reuse software by literally snapping together controls like Lego blocks. There was a rich ecosystem of third-party controls. Competing tools such as Delphi had similar features. Since then the industry has gone backwards, or at best sideways, in many areas.

I wanted to make a similar point with piping UNIX commands. I can think of two reasons why the degradation happened:

1. Expansion of the software universe. Back in VB6 times, there were fewer programmers but many languages. Reusing components made with different languages was a big deal (VB used COM/ActiveX machinery to make this possible), but today there are so many more developers, that each language/ecosystem is big enough to exist on an island and happily not interact with anything unless it's a grpc/rest endpoint.

2. Transition to SaaS. We no longer use the same platform for building and running. Your VB app used to run on more or less the same computer it was built on. SaaS applications run in weird, custom-made, hard to reproduce and fragile computers called "environments". They are composed from all kinds of complex bits and this makes SaaS applications less portable. Frankly, they feel more like "environment configuration" than "code" sometimes.

The SaaS expansion could make componentisation easier - it would just be microservices, but from different companies. Somehow it doesn't. Possibly because it's not in their interest to do so.

On the other hand .. look at how much software goes into e.g. car entertainment systems. How many systems have "curl" in, for example. Have a look at the vast list of BSD license acknowledgements in many systems.

Look at the npm ecosystem, where people complain that there's too much componentisation.

Microservice has no new meaning (although a few content marketing teams may disagree). It's just a process with an input and output. Building applications consisting of multiple processes has been possible since forever. I would even argue that piping multiple UNIX commands together is a valid example of microservices :)

This is why containers are a _big deal_. They are bundling the "environment" with the "code"

We don't fully get back to component reuse, but it makes the sharing of services much more feasible, and more portable as well.

Imagine two large structures made of legos, say a man-figure and a car-figure. How easy would it be to snap those together, to make a "man in a race car" figure? A moments thought would show it would be quite hard. Neither is likely to have merely a flat surface.

So basically, increased complexity of components, any components, makes them harder to combine - except in very carefully controlled circumstances.

Wasn't web components supposed to be the holy grail to bring us back to vb6 like composable front end applications? What happened?

People liked React better.

Doesn't answer the question of why. React is the same old flawed component framework that will be replaced with the next cool thing eventually. A component system would allow objects built in whatever technology to co-exist over decades.

> There was a rich ecosystem of third-party controls.

So does HTML/CSS/JS, and they'll also often be adapted to the various popular front end frameworks of the day - Angular, React, Vue currently.

I also find it easier to customize and compose 3rd party UI components on the web than I did back in the 90's with VB, Delphi, MFC, COM, etc.

And in 2020 I can use Qt and Boost and FFMPEG and ... which are a gigantic C & C++ libraries of things that I can reuse too. That's what reuse is.

How about modding cars?

It's neither as bad as an organ transplant, nor as easy as LEGO.

It is also highly variable, dependent upon the SDKs and API choices.

I've written SDKs for decades. Some are super simple, where you just add the dylib or source file, and call a function, and other ones require establishing a context, instantiating one (or more) instances, and setting all kinds of properties.

I think it's amusing, and accurate, in many cases; but, like most things in life, it's not actually that simple.

In a word: "Engineering"

If only there was a specific job title for people whose primary job was to "engineer" software.

Developers build row houses. Engineers design complex machinery.

I try not to go by "Software Developer" anymore because too many managers have gotten the idea that it's something you just do. Which it is if you're only working in frameworks... the stick houses of code.

There seem to be many definitions of "Engineer."

I consider myself to be an "Engineer," but I forgot all my calculus, many years ago (atrophied). There are those that assume any "Engineer" must know calculus.

I consider myself to be an "Engineer," but I fail miserably at most of the binary tree employment exams. There are those that assume that any "Engineer" must know binary trees (I never came up through a CS curriculum, and, in 35 years of coding, I have never run into one single binary tree).

I consider myself to be an "Engineer," but have never sat for any kind of board exam. There are those that assume that any "Engineer" should have at least one board certificate on their wall.

I consider myself to be an "Engineer," but hold no college degree or certificate even remotely connected to software development. There are those that assume that any "Engineer" should have at least one "Engineering" sheepskin on their wall.

What I do have, however, is 35 years of consistently delivering (as opposed to just "writing") software products. Not all were ones that have made me proud, but I have always been about "ship." I still keep that up, today, even for free, open-source projects.

I have Discipline. Lots of Discipline. I write reasonably efficient, localized, accessible, error-checking, high-quality code, that can be maintained after I move on.

I practice Configuration Management.

I have a consistent coding style.

I have consistent processes, even though it may seem as if I am "shooting from the hip." I write about that here: https://medium.com/chrismarshallny (several articles cover my process).

I try to keep up with current, practical technology without getting sucked into the "buzzword maelstrom."

I practice and support Usability, Localization, and Accessibility in my work.

I take all that extra time to make sure that my code is "ship" code; not just "code." That means lots of really boring stuff that always takes a lot longer than I'd like.

I test my code six ways to Sunday (My testing code usually dwarfs the product code. I write about that here: https://medium.com/chrismarshallny/testing-harness-vs-unit-4...).

I document the living bejeezus out of my code (I write about that here: https://medium.com/chrismarshallny/leaving-a-legacy-1c2ddb0c...).

I have always stood behind my code; taking Responsibility and Accountability. Never lying, and admitting (and fixing) failings, learning from those that went before me; while treating my superiors, peers, and subordinates with the utmost respect.

I have always been keenly aware of the way that my customers used my code, and have always welcomed negative feedback (I write about that here: https://medium.com/chrismarshallny/the-road-most-traveled-by... -scroll down to "Negative Feedback Is Incredibly Valuable").

I am not afraid to admit ignorance, or ask questions. Almost every project I start, I don't know how to do. I write about that here: https://medium.com/chrismarshallny/thats-not-what-ships-are-...

Apparently, my employers also considered me to be an "Engineer," despite these awful failings. I had them fooled, eh?

Lots of I

Yup. I learned not to use "you" very much on teh internets tubes. People like...well...you...don't like being dictated to.

I've learned the value of speaking from personal experience, as opposed to vague, hand-wavy, often passive-aggressive, "keyboard warrior" stuff.

I find that personal experience, and personal ethos tends to be a lot more relevant to most folks, over personal opinion, especially when it is applied as a projection.

In fact, one of my editing disciplines is to do a search-and-replace for "you" and replace it with "I" or "me."

You'll see a lot of "we," "us," and "our" in my writing, as well.

FUN FACT: Did you know that I nuked a number of passive-aggressive slaps from this posting? I'm a really experienced troll. I have made it a point to change my stripes. One way that I am doing this, is putting my brand and personal information behind everything I post.

When I’m working with Unix utilities like grep or doing stuff with xargs... it genuinely feels like I’m playing with legos.

I feel like this is trying to argue for more “consulting surgeons” when we need more “tooling machinists” who know how to make a good LEGO block.

xargs/etc are like arcane legos, where there's more than one brick vendor in the mix with different vendor's versions of the brick supporting different features, or the same feature but implemented in different ways. And even if you're only dealing with one brick vendor, there's still little uniformity in the interfaces of different bricks. GNU vs BSD xargs, -0 vs -print0, etc.

True lego have the property that if two parts snap together, it's a valid configuration. That's definitely not the case for unix utilities. You have to read a manual for each sort of brick.

I love unix for what it was. But it made a lot of concessions for 70s hardware that we're stuck with.

It is rather unfortunate for us that everything that's come afterwards has been in some way even worse.

I think the author is discussing large existing apps, for instance connecting something like an externally built authentication layer to an existing user management suite. It's not just plug and play but a series of careful surgical moves.

Obviously shuffling data around is a different (easier) beast, especially for one off tasks.

This is an astute metaphor. In my experience software reuse simplicity strongly depends on the following factors:

* interface surface area (i.e. how much of an interface is exposed)

* data types of data coming in and out (static or dynamic). Static languages have an advantage here as many integration constraints can be expressed with types.

* whether it is a very focused functionality (e.g. extracting EXIF from file) vs cross-cutting concerns (e.g. logging)

The more limited surface area, the simpler the data types and invariants, the more localized it is - the more it is like LEGO as opposed to an organ transplant

For reusing software source I agree. The only current way around this is with the unix pipe system where you reuse software _executables_ instead of software _source code_

The reason it works is because unix softwares agree to a simple contract of reading from stdin and writing to stdout, which is very limiting in terms of concurrency but unlocks a huge world of compatibility.

I wonder if we will ever get software legos without the runtime bloat from forking.

ps: to anyone countering with examples of languages that are reusable through modules, that doesn't count because you are locked in to a given language.

> I wonder if we will ever get software legos without the runtime bloat from forking.

In a sense, shared object files / dynamically linked libraries meet this criteria -- they can be loaded into program memory and used by a single process.

There's also flowgraph-based signal processing systems, like gnuradio, which heavily use the concept of pipes (usually a stream of numbers or a stream of vectors of numbers) but, as I understand it, don't require OS forking. (Though they do implement their own schedulers for concurrency, and for gnuradio at least, blocks are typically shipped as source so I'm not sure whether that counts as reusing executables vs. reusing source code.)

Another current way is with COM in Windows.

IMHO conflating “systems” and “components” here. Module and package management has never been better and building “a system” from components, open source or otherwise, is extremely effective. Integrating software systems (with e.g. API’s, webhooks and event busses) is non-trivial, complex and difficult. They are not the same endeavor.

Really depends.

Building a system from components can be quite tricky, as the components tend to only play nicely together if they've been built for one another, or to some agreed upon spec. Otherwise you have to be very fluent in all of them, and write glue code. And there be dragons.

APIs tend to be all over the place, yes, but you can have a well defined API spec that really can allow specific implementations to be swapped out fairly trivially, such that it really is more like 'components'. Think SQL, for instance.

I think you may just be noticing that when things adhere to a spec, they tend to be able to combined pretty easily; if not, they don't.

It really depends on the complexity of the software doesn't it? Libraries and packages often snap into my projects easily and without much modification at all, especially if I'm planning on using them. I can see this analogy working for more complex projects though, where you may be copy pasting code from on project to another and trying to make it fit together.

With significant enough abstraction and a sensible public API you can make this claim, but more often than not you end up needing to dive into the internals to hook things up properly.

This is exactly what I'm working on right now. My approach is a little bit different. It's about building a community, where people share LEGO pieces(I call them Assets), but they also share where and how in the code they've made the connections. After some time there will be enough connections so users will just re-use them. Only time will show how well it will work.

I wrote about it few weeks ago: https://skyalt.com/blog/dsl.html

>I have a plan on how to do that on paper, but because connecting assets together can be complex, it's better If most of the users don't do that. The key thing is that users don't share only assets, but also how and where they are connected to each other. It's like when someone makes and publishes a library, but also example code for how to work with a library. But in SkyAlt, asset connections will be automated. This is also why it's very important to build and grow the community - more assets, more connections, which means easier to use.

Do users have incentive to document and share the connections, other than helping the community's long term goal?

Absence of such incentive appears to be the reason that open-source software is not always perfectly connectable -- few people have a significant incentive to ensure this design goal.

I'll launch it in a few weeks, so right now I don't know.

I believe that in the future the most of software will be open-source and If people don't have a problem sharing blocks/widgets/assets, they will not have a problem to share information about connections.

Side note: Every program has 2 parts - instructions and data. When people share assets and connections, both are instructions, so their data stay private.

Software re-use is limited by the primitive structures and algorithms in the language and runtime library. The more fundamental parts (queues, strings for example) that are left to the user to implement, the less likely it is that it will be possible to make compatible components.

I feel like if even on a purely technical level we eventually achieve Lego like reuse that the business and social layer of things will ensure that somebody always requires an off brand Lego block to be shoved in there. It's almost human nature.

The reason for this is that LEGO has a tightly regulated interface that each piece must adhere to. This interface regulation doesn't exist in software to that degree, nor is it necessarily desirable.

Well designed modular software is certainly like Lego. It just takes investment. It is a choice to have organ transplant software be the default. That choice is non investment.

Depends on the software. Or to torture the analogy a bit: you can build anything out of LEGO if you're willing to use a jigsaw, some Gorilla glue, a blowtorch...

The point of frameworks is to provide the standardization of the "shape" of various bits of software so it's a lot more like snapping together LEGO. But even then, LEGO isn't universally snappable; some blocks just don't click to other blocks in the product line. And then, of course, there's the "illegal" block hacks (https://crafty.diply.com/28429/14-illegal-lego-building-hack...) that work in practice but are not at all using the tool the way it's specified. When software reuse is like LEGO, we should expect (a) some things we want to do, we can't really do without jigsaws and glue and (b) sometimes, people will do things that the software technically allows but no sane person would call "desired" or "intended."

In fact, the LEGO-to-framework analogy works pretty great. And yeah, outside the context of a consistent (and, I'd argue, opinionated) framework, you're about as likely to have two pieces of software interoperate as you are likely to pick two random chunks of matter in the universe, slap them against each other, and have anything useful happen. I just tried it with a brick and this glass of water on my table. Now I have "wet brick in a pile of shattered glass," but I don't know anyone's going to give me a Series-A funding round for that.

Reuse of my own sloppy code is always easy. But reuse of complex code created by others is often far more difficult since context is missing. Creating god reusable software code block, are often the lower level apis The more context specific software is, the harder reuse gets. It’s the pain of generic vs context specific imho.

Fred Brooks had a similar take, in an opinionated way, with the concept of "lead surgeon". Motivation to re-read the chapter in mythical man-month.

So. Every editor has copy/paste functionality and search/replace. Using that plus following the SOLID principle is my way to reuse code. And it is as if I would build with LEGO bricks.

I tell you even a secret. Every piece of software has all the code you need to work with it already - just copy/paste stuff. You only need to know where it is. Every piece of software is it's own tutorial how to make that particular piece of software.

I've done this for years. I call my method of working copy/paste method.

And by reading this, I think I know why I piss so many programmers when I say what I do. And how fast I am with it.

Someone well-known in the IT field wrote about software reuse in the large vs. in the small about 10-15 years ago. The gist was that reuse in the small is a success, i.e. it's fairly simple to write a function that developers reuse within an application. When you try to generalize use beyond a certain context it becomes significantly more complicated to be successful. I think the motivation for the post was issues in object reuse in OO development. I'm trying to find the original post.

John D. Cook's post shines the light once again on the difficulty of writing reusable components.

I would love to have a link to that piece about software reuse in the large vs in the small!

I would say software integration is more like a docking mechanism between spacestation units that were designed in isolation by different countries than like snapping LEGO blocks. At least software pieces are products of human design and failure points can be identified without the many inscrutable emergent phenomena and often-unmeasurable unknowns of biology.

Lego is general graph structure. Our programs have roughly star topology - bunch of libraries and some control logic that drives it. Good libraries _are_ like snapping Lego blocks. Those have good scope around domain, well designed API and establish abstractions that don't leak. Think sqlite, zeromq, zlib. The central controlling part is too "involved" in domain and chosen tech stack to reuse easily.

The software that requires organ transplant just wasn't written to be reused. It is a matter of effort and cost. You cannot just solve your problem and wrap it in a library - you need to take care of all common use cases, define edge case behavior, test, document... It costs a lot more and that price is hard to justify. In the end it's the product that makes money and library is forgotten as long as it does its job. We need to fix the open-source income ecosystem.

I disagree.

Never has it been easier to find existing code, download it and snap it into your project. Npm, nuget, etc, etc.

Also, on the front end there are no shortage of plug and play widget libraries.

Sometimes we need to take a step back and appreciate how things have improved over the years.

Exactly - with these packaging services, developers have learned how to make their software more re-usable. The 3rd party component eco-system is a multi-billion dollar industry that continues to grow.

When reading the title I figured it would explain something different to what the author does: when doing a organ transplant (except for the liver maybe) the organ is no longer usable to the donor.

Translatable to software, your reusable piece of code might not be so reusable as you think, missing lots of abstractions and having lots of specific code for just one specific consumer. Taking a piece of functionality from one part of your software and using it for something else might harm the original use due to broken API's, introduced complexity latency and failures, etc. Even ending up with having 2 split versions of your reusable component for each use case that you now have to maintain seperately.

This is not that accurate if we talk about libraries. Those mostly have well defined scope, well defined interface, and are designed to be integrated so most definitelly feel like LEGO.

However, this is totally accurate when reusing the code from your previous project on your next one. I developed many large gov services, and each one reused bunch of stuff from previous ones like user management, complete areas of functionality etc. Here, the analogy is adequate, artefacts from previous projects creeep in constantly. This feels more like single project evolving (or devolving) over long period of time (years to decade).

My favourite form of software reuse is when I just use my old code as reference but import or duplicate almost none of it.

Feels so good to write similar software in 1/10th the time because the hard fought ideas are already won.

That is knowledge reuse. Its hardly applicable with bigger project where the knowledge is spread across teams or even lost, then we have things like NIH or big-rewrites.

https://github.com/kohler/click is a library that I've found has a lot of reusability built into it, and operates a lot like snapping Lego blocks together.

When you're talking about not being able to reuse code, I think it depends on how you've designed your system. Design for reusability seems like it needs to be key goal in development in order to achieve software re-use.

The reason Lego works is a simple, well-defined interface. Two bricks stick together by friction because the stud pattern on one matches the hole pattern on the other precisely. Fortunately, that's a model that translates well to software.

The way to make software reuse more like organ transplant than building Legos is to ignore the advice to keep the interfaces simple and precise. Or worse, don't even bother defining an interface, its data types, or error conditions.

When I hear "snapping software like Lego bricks" my mind immediately goes to the Spring Framework, which must be one of the largest, most-used, and most complete (modern) attempts at this paradigm. I haven't used it for anything large, but I would say it does not deliver on that promise.

I'm curious to hear from those who have worked on large, sophisticated, Spring Framework apps - does it really feel like snapping pieces together?

No, but Rails definitely did feel like snapping pieces together. RailsCasts was a blessing.

"People have been comparing software components to LEGO blocks for a couple decades. We should be able to assemble applications by snapping together modular components, just like LEGOs. There has been progress, but for the most part we haven’t realized the promise LEGO-style software development."

  pg_dump -U postgres db | ssh user@rsync.net "dd of=db_dump"
... looks like Legos to me ...

OK but going all like it is not worth much because it is high chance of failure is not helpful. Once I got burned on quite good interview when I went a bit to far with notion that re usability in software is fake. In the end you have to fix some things but it is not like someone is going to die. Unless you really count on it and have really constrained budget, but in such case you asked for it.

Having not used any of the following I may be wrong here, but don't dataflow languages (eg Labview?) come close to this? https://en.wikipedia.org/wiki/Dataflow_programming

And isn't GNU Radio effectively a domain-specific language for plugging signal processing elements together?

See also: Joe Armstrong, 'Why do we need modules at all?'


As long as each single project keeps reinventing their own Lego stubs, we won't be going forward with reuse.

With Haskell and Scala's type-system we seem to come closer to universal Lego. Still, like Lego, it can be hard to combine larger scale composites.

That comparison is silly. I've never seen anything built with LEGOs on production.

I read every comment waiting to see if someone would point out the obvious.

This blog is 100% accurate, and nearly everyone took the wrong point away from it.

There is a very good reason we don’t build anything but the most trivial structures from Lego.

Composability and reusability are great, but not at the expense of suitability for purpose.

You could build a house or a car out of Lego, but it would be worse on almost every possible metric than a regular house (unless your primary use case for your house is to take it apart and reconfigure it easily).

The same goes for software. You could build everything from unix pipes, but that rarely happens past the most trivial scale.

In fact, the clue is in the article. Evolution is not very tolerant of inefficient systems; animals haven’t evolved hot swappable organs, they have a complex interconnected system because 99% of the time, it works better.

not literally LEGOs, but many industries are based on repeated application of basic building blocks. The simplest and most ancient example are ... bricks: you build walls by composing objects of a few standardized sizes. This sounds obvious but it's not how things were always made; if you build a wall with stones, you have to invest more time into fitting them together neatly (and often you have to use a hammer to break larger stones apart). Construction has other examples of composition of prefabricated modules.

Another example, a lot of electronics was (and still is) implemented by combining discrete components with well known behaviours (things like https://en.wikipedia.org/wiki/7400-series_integrated_circuit...). At many levels in electronics you have abstractions upon abstractions (transistors -> logical gates -> multiplexers -> ALUs -> CPUs; communication busses, etc etc)

You can build a house with bricks. But you also need adhesive. And past a certain (very low) complexity, you probably use wooden framing, cut to size. And a different material for the roof, and glass for the windows and some kind of insulation, plastics for piping, and so on and so on and so on. And that’s not even getting started on more complex buildings.

As for electronics, yep, you start with discrete components, but as you said yourself, you quickly end up with specifically designed components for efficiency.

I don't think that software reuse the goal is to be zero-glue; perhaps that part of the LEGO analogy (the lack of glue) distracts a bit.

What often happens with software development is that despite all the abstractions that get built while building one piece of software, it often turns out those abstractions can't be just pulled out and reused in another context because often those abstractions are too leaky.

I think the "original sin" stems from how "cheap" it is to write a line of code, and another one, and another one. It's not that good abstractions are never being built during software development; but when they do they usually take a long time to mature and have to be sought purposefully (and they cost way much more to engineer).

What sets software development apart from other forms of engineering is that it's often "ok" to build a house with mud and sticks that will collapse on the first strong gust of wind because ... "because it's just a POC", "let's see if we really have customers before doing it right", "making this too well has an opportunity cost of not doing something else", etc etc

The point of Lego is not the zero glue, the point is that all components share a common interface.

>>What often happens with software development is that despite all the abstractions that get built while building one piece of software, it often turns out those abstractions can't be just pulled out and reused in another context because often those abstractions are too leaky.

My question is whether a non-leaky abstraction is even valuable.

Look at a real world example of non-trivial component reuse: a vehicle engine.

It has a single purpose: convert chemical energy to kinetic energy, and they're really expensive to develop.

But you can't just drop a v6 into a Tesla.

yet it's clearly useful to have the concept of "the engine" even if you cannot drop it in any vehicle.

E.g. it's a clear boundary between teams that collaborate on the "car" system, it's something some engineers specialize in that is quite different from "suspension system" (and such knowledge can be carried across engine design instances).

But you're right, when we're talking about LEGO-style composability, as you said, it's about having pieces that fit together and can be rearranged in many ways, and the real world is far messier and rarely you have universal composability; there are instances of "bounded" composability; e.g. the tyres for my car likely fit many other similar vehicles, but not all.

>> yet it's clearly useful to have the concept of "the engine" even if you cannot drop it in any vehicle.

Absolutely! I'm not arguing against modularity at all. I think the concept of discrete subsystems interacting is pretty much universal (organs, components, rooms, streets).

>> there are instances of "bounded" composability; e.g. the tyres for my car likely fit many other similar vehicles, but not all.

Again, completely agree.

What I'm arguing against is uniformity of interface.

You wouldn't connect a tyre to a wheel the same way you would mount an engine to a chassis.

> What I'm arguing against is uniformity of interface.

> You wouldn't connect a tyre to a wheel the same way you would mount an engine to a chassis.

Yes, you're right, LEGO is extremely uniform, and no real system is _that_ uniform.

And even a uniform interface doesn't mean any combination makes sense.

Attaching a LEGO brick with wheels on the roof of a LEGO car is not different than soldering an axle on the roof of a car: technically possible but utterly useless.

Actually, soldering an axle on the roof of a car is composing uniform interfaces: atoms in chemical bonds form a finite set of building blocks too!

I write full stack web in various functional programming languages, and I am constantly copy-pasting 'lego blocks' between projects. If the compiler is happy, I'm happy.

Accurate IMO. You can get the components or organs but they don't snap together, more careful integration required...like surgery

Is software just too easy to write that everything becomes incompatible. Everyone has the next best idea. It's XKCD 927 all the time.

Now if you instead look at Trucks and Campers, Rear Hitches, Fifth Wheels Hitches, Goose Neck Hitches. Every truck you can buy on the market today (with some strange exceptions Cybertruck cough) has very simple compatibility with all of these various trailer types. Yes in some cases you have to buy a mod kit, but when you look at the mod kit it literally bolts into pre-drilled holes in your truck frame. It's all plug and play.

I feel like the reason it's like this is because no one person can go off and design a truck in the dark. It's designed by many people and is beholden to customers.

Software, on the other hand, just gets built by whoever wants to build something. It doesn't have to fit some kind of standard. And it's not like there are a lot of standards that already exist in whatever it is that the software is doing.

In the real world however, you have Kubota tractors that need to fit a CAT bucket.

I mean just read this page:


Small quirks to it, but yeah, the rest of the world really is Legos, Software is organ transplant.

Totally gonna use that for marketing purposes :D

If only organs could be connected with pipes

That's hilariously accurate.

> Software reuse is more like an organ transplant than snapping Lego blocks

Funny, but Lego fans almost interpreted "organ transplant" exactly as "snapping Lego blocks".[0]

[0] http://xkcd.com/659/

Have you guys heard of functional programming?

Have you guys wondered why some people are so fanatic about functional programming?

Functional programming forces your program to be lego blocks by Definition.

A function with no side effects IS A Lego Block.

If I never discovered functional programming I'd be like John D. Cook, blogging about how programming is like organ transplantation.

Functional Programming is the answer to the question that has plagued me as a programmer for years: How do I organize/design my program in such a way that it becomes endlessly re-useable from a practical standpoint?

If this question plagues you as a programmer I urge you to take a closer look at functional programming. Specifically look at functions and function composition.

Strongly typed functional programing ;)

...follow the types

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact