Hacker News new | past | comments | ask | show | jobs | submit login
The plumber programmer (johndcook.com)
125 points by ColinWright on Nov 15, 2011 | hide | past | web | favorite | 42 comments

Most of the enterprise developers' job is connecting components together. I hate it. I hate modifying existing systems and hate connecting them together. It is usually hard, but the kind of hard which is not really fulfilling for me. It is about fighting with accidental complexity created by other people. It is not why I started programming when I was 12 years old. But this is what needs to be done: it is rational from business perspective. Of course at least in my side project I create something new from scratch (a component-based UI system painted on HTML5 Canvas).

It's not that difficult when you work with the right tools.

Most of these problems boil down to a couple operations.

Most of the data comes from a couple places:

  Some kind of SQL system
  Text files
  Webservices (of the JSON/XML/HTML variety)
Most of the data goes back to the same places that it came from:

  Some kind of SQL system
  Text files
  Webservices (of the JSON/XML/HTML variety)
All you need to make are four types:

  inputs: () -> Some<T>
  outputs: Seq<T> -> ()
  pipes: Seq<T> -> Seq<Q>
  tees: Seq<T> -> () -> Seq<T> -> Seq<Q> (once partially   applied the tee becomes a pipe)
Using the usual sequence operators pretty much anything is possible with those inputs, outputs, and transforms (pipes) and you avoid a lot of overhead making data pipelines.

I don't think creating the pipeline is the difficult part at all. When doing this kind of work I think more about things like:

- Designing usable APIs, which is to say identifying which aspects of System A are most important to System B, how the concepts should be transformed, and how the data should be aggregated.

- Catching errors thrown by one system when data is requested from another system and relaying, translating, or suppressing these errors. Also developing policies for dealing with unreliable services.

- Performance issues.

These are largely trees for the forrest type problems.

The first problem is the same as the second. (eg. which data (errors ARE data) where)

Make all your services unreliable and it solves 90% of your issues as then unreliable services aren't special.

Performance issues in data generally stem from two places (underspec'd hardware) and latency. Which are usually solved best by buying better hardware (more drives not more CPU) and buying a network nightmare box and putting 400ms of latency in the network. People stop designing chatty APIs pretty quickly at 400ms of latency.

I don't understand what you're saying.

The first problem is similar to the second, yes. They both still need to be solved, so what is your point?

Unreliable services are everywhere. So what's your universal solution for dealing with them? I'll give a concrete example: suppose I have a JSON service being consumed by a Javascript web UI, and my service needs to hit some kind of authentication backend. In the event that the authentication backend server is [pick one: down, giving 500 errors, being slow], what kind of response does my service give to the Javascript app, and what kind of message or visual cue does the app give to the user? If you think the answer is something other than "it depends on exactly what the application does", then I disagree.

If you think latency is the problem, why are you talking about building in latency? It seems like you actually think chatty APIs are the problem. And, yes, chatty APIs can cause slowness. But chatty APIs often exist because they are the simplest possible design. Once you realize that there is too much back-and-forth you may have to sacrifice API usability and simplicity by adding caching and eager-loading code. Again, you think this is just something that solves itself?

That particular problem is a trinary response, true,false,error, or in general a datatype of:

  type Response<d,e> =
  | Data of d
  | Error of e
which would be handled by a statement like:

  match response with
  | Data d -> doSomething(d)
  | Error e -> doSomethingElse(e)
Or perhaps

  match response with
  | Data d -> Some(d)
  | Error -> None
Any errors coalesce to error on the client and the client responds: "We're sorry this doesn't work, we've been notified and are investigating, here's your ticket #"

Each system in the chain should have a reference identifier to make tracking the requests across the system easy.

Chatty APIs often exist because a lot of programmers think that everything happens at the same speed and they think that having a getter and accessor for every field makes their code "object oriented". Putting in 400ms latency gets programmers to stop thinking that, it gets them thinking about "How can I issue a bunch of requests simultaneously, go do something else (like issuing more requests for someone else), and then respond to the client when I have all the data I need". It gets them writing async code, or using MARS. Maybe, 400ms is really excessive, but 100ms should still let your code run on systems with reasonable geographic separation.

Chatty APIs aren't simple, they're generally really annoying, because for decades the predominant idea in programming has been put as thin a veneer on top of the implementation as possible and lets call that an interface. It makes for a simple implementation at the expense of a horrible interface. APIs are about the interface.

  > Using the usual sequence operators pretty much anything is possible
i would absolutely love if this is true. however, i don't know of anyone who has constructed a convincing argument that it is true. even if it were true, it is probably impractical to require all systems you interface with to be composed in these terms. is it cheaper to work with a ball of mud (which poisons everything it touches), or is it cheaper to rewrite the ball of mud in the style you describe?

"People build BIG BALLS OF MUD because they work. In many domains, they are the only things that have been shown to work. Indeed, they work where loftier approaches have yet to demonstrate that they can compete.

"It is not our purpose to condemn BIG BALLS OF MUD. Casual architecture is natural during the early stages of a system’s evolution. The reader must surely suspect, however, that our hope is that we can aspire to do better. By recognizing the forces and pressures that lead to architectural malaise, and how and when they might be confronted, we hope to set the stage for the emergence of truly durable artifacts that can put architects in dominant positions for years to come. The key is to ensure that the system, its programmers, and, indeed the entire organization, learn about the domain, and the architectural opportunities looming within it, as the system grows and matures."[1]

Maybe, you are correct. Probably. We all hope. But, practically, in the 'bigger' problem domains, you can put a lot of smart and experienced people on a project, and it still comes out a ball of mud. Maybe, there just aren't yet enough experts to go around.

[1] http://www.laputan.org/mud/ (conclusion)

BTW, any references you know that can articulate your assertion without hand-waving, I would love to read. I must read. I'm currently devouring everything I can about FP and there's a lot of concrete stuff, even more stuff with handwaving, and then some alarming articles about people who did their startup in haskell and wouldn't do it again, even if for out-of-band factors.

Well I think its only cheaper at first and even if people know better they are forced to use a stupid java framework (that abstracts away the network) even if the know that it will end up badly.

A other problem is that people often get tought that you need <input some bad framework>. You can't do distributed computing if you done use some kind of framework. At least the places I know they would never tell you something like "just send json from one node to the other if thats all you need".

At clojure conj there where some talks about this, but the videos are not out jet. See these presentation on Concurrent Stream Processing (https://github.com/relevance/clojure-conj/blob/master/2011-s...) or this on Logs as Data (https://github.com/relevance/clojure-conj/blob/master/2011-s...)

For a other example that works in quite simular ways look at Storm (in use at twitter. Its all sequential abstractions. See this video by Nathan Marz (look all videos you can find): http://www.infoq.com/presentations/Storm

For a more philospical perspective look at the videos by rich hickey: http://www.infoq.com/presentations/Simple-Made-Easy

"durable artifacts that can put architects in dominant positions for years to come" that statement is why balls of mud work, because the primary alternative to balls of mud are space shuttles designed by architecture astronauts. Space shuttles are generally problems in search of a solution, which is why the ball of mud is seen as the only thing that works. Writing in an imperative language poking and prodding certain bits at certain addresses merely ensures a ball of mud will result.

I would never assert that every type of problem should be solved in this manner, but it's a pretty good framework for taking data from a bunch of different sources and outputting them to a bunch of other destinations. Pipeing and transforming data is not fundamentally a problem of hierarchical types (a problem somewhat solved by C++/Java/C#) but type transformation and streams.

Keep your business logic and GUIs built in Java but use something like this for moving data around the organization and importing / exporting as needed by clients.

Enterprise programming would have been a LOT more fun if I'd been allowed to use filter/map/reduce directly, instead of writing thousands of lines of C++.

In the end, I just resigned to go and do iPhone software for a while. I suspect any one person only has a certain amount of programming in them, and you don't want to waste it.

Haha, I hear ya on that. I worked on a .NET team, apparently that meant C# only, when F# came out I started writing F# until someone found out. (What's this Fsharp.exe that's crashing the build!?!?!?). Luckily, I had written so much (in terms C# code) that I was able to keep on writing in F# (technically, I wasn't allowed to create new F# code, but was allowed to maintain old code, oddly enough most new features made more sense as maintenance on the existing F# code).

Picture several thousand lines of C++ being reduced to 27 lines of Clojure and you get an idea.

That's not to say that you couldn't do something similar in C++, but I've met like 4 really really good C++ programmers in about 15 years of programming, so no, it's unlikely to happen.

What sort of tasks would make such a drastic reduction possible? Genuinely curious, having never written c++.

Higher-order functions. It's like the difference between calculus and arithmetic. If you're adding numbers together you're not going to see much difference. If you try to send a man to the moon... it's going to be a lot less code in a higher level language.

Imagine merging thousands or millions of records into disparate timelines by various attributes, merging similar records, or overwriting, changing lengths...

This is the sort of stuff that filter/map/reduce is really really good at, but also, it can be important code that people will pay you a lot of money to write over a longer period of time....

It's true that connecting components is just a matter of mapping one representation to another.

As you say, this mapping can be specified with a function, composed of other functions. But if the mapping is complex, with different levels interacting, writing this function can be difficult. That is, isomorphic mappings are straightforward; but non-isomorphisms (the non-homomorphic aspect) can be tricky.

How well does this approach handle the tricky cases? (an example would be great, if possible)

To continue the plumber metaphor: some systems are like black boxes with a lot of pipes sticking out - and they will route their raw sewage into your freshwater connector depending on barely documented and sometimes non-deterministic configuration parameters.

The API of 10 year old closed source enterprise systems that have grown over time by buying this solution provider and that and where documented methods may work or not... shudders, no, that's not plumbing at all, that is more like playing Minecraft in a toxic waste dump.

What about those all enterprise integration tools (visual mappers for xml/relational/web-services) like Altova Mapforce, MicroSoft Biztalk Mapper and Stylus Studio? Oracle and IBM have similar tools.

I haven't used these - but they look like they'd nail this problem. Is there a problem with them?

I don't know these tools you mentions but I thing generally that these kinds of tools are the problem or at least they are treatments of the symptoms.

Enterprise tools tend to make everything more complicated then it is (somebody else mentiond map/filter as a cure) and to help you fight this complexety you need even more complex tools. In the end you just have tonnes of Code and Tones of tools with tones of configuretions.

Abstracting away ofer thing that are hard to abstract is a typical error these kinds of tools make. You cant just abstract away networks or databases. Sure a simple ORM is fine for most blogs but if you end up writting 30 lines of java code to do something you could have writen in 2 lines of sql something is not right.

This video teaches the basic idea: http://www.infoq.com/presentations/Simple-Made-Easy

I saw this somewhere the other day: tedious work is a necessary part of good software engineering.

Definitely. An awful lot of programming is "Take data from place A (in format 1) and place B (in format 2) do magic thing J to it and put it in location Z (in format 3)".

Process J is usually the simple bit - the hard bit is taking it from A and B, converting (1) and (2) so that they can be used together, dealing with all the different eventualities that can cause things to go wrong when fetching the data, parsing the data into (3) and then writing to Z (while dealing with all the things that can go wrong when writing to it).

"Yeah, meanwhile Z is temporarily unavailable and by the time it's available again, the data from A and B is no longer consistent, so you have to re-query again, ouch, your pointers to last read data on A are no longer valid anymore and yeah, we forgot to tell you this little funny thing about B... <the dramatic music in the background pauses here>" - this is what you should be thinking when looking at a small arrow drawn on a whiteboard.

In my experience in Big Freaking Enterprise development, the issue is closer to "this little funny thing about B" x 100, plus "group Q doesn't want this project to succeed", "you have to be compliant with Regulation F which means, most relevantly, that all good technical solutions to this problem are legally forbidden", "your success with this will threaten the budget of the Z department -- be prepared for a knife fight with the VP", "group P will veto the project if it is delivered with property G and group R will veto the project if it is delivered without property G (bonus points awarded for values of P equal to R)" and "by the way, this doesn't advance an actual business goal."

The technical details are usually pretty boring. Data goes in, data comes out, if it weren't for all the bloody people involved it would be very civilized.

I spent over two years trying to get the contents of a single dropbox on one form changed. Coding time, about 10 minutes. I guess if you want to look on the bright side you can think about how much political weight a single HTML element can carry.

Sums up my day job exactly with just many many more places to take data from.

I don't think this makes sense. If it were true, that would imply that it would be easier to rewrite J to work on format 1, which is the flip side of claiming the conversion is the more difficult part. And if the arrows are more expensive than the components they are pointing to, this remains true for all such arrows, so adding more arrows means it's even cheaper to rewrite J n times than to write n converters; you can't make up the loss in volume.

This sounds superficially appealing but I'm not sure there's any actual wisdom here.

You're assuming that I have any ownership of J, which I don't - it was written to work in a particular pattern that the CIO favoured five years ago, before he left to work for a consulting firm.

Also, A is a 30-year old legacy system sitting on a mainframe, B is an Oracle database inherited when we bought over another company, and Z is a third-party who we're shipping data to for their just-in-time procurement system.

"You're assuming that I have any ownership of J, which I don't"

No, I'm not. I'm simply taking seriously the claim that the arrows are more expensive than the components. If that's true, then it's cheaper to rewrite J, period, by definition of the claim at hand.

If it's not cheaper to rewrite J, then the arrows are cheaper than the core components and the way in which they are more expensive is only in an artificial and useless measurement of "cost" that only holds up as long as you don't take it seriously... what's the use of it, then?

Aaah, gotcha.

The problem there is that there are already several arrows pointing to J. So if I change J then I have to update all of them too. And then have them regression tested.

(It's usually the cost of regression testing that causes the most crustiness.)

Very true.

Funny anecdote, last time I heard about "plumber programmers", the meaning was very different and actually quite pejorative. The person I was talking to was referring to the type of programmers that can only write applications by assembling third party components such as ruby gems while lacking the algorithmic skills to solve problems that haven't been solved before.

I've talked about this too. Sadly, this is what programming has started to become: slapping third-party components together and calling yourself a developer.

It's one of the reasons I tend to stay away from things like Django and RoR. It makes it easy to develop applications quickly, but I many times have to create hacks to make any changes that don't fit into the one-size fits all libraries.

I think it depends on context.

A well-architected application (they do exist) where a lot of work was put into designing the arrows and boxes will suffer from this effect less so than one where the arrows and boxes were added on haphazardly. In fact, I would argue that spending too much time worrying about the arrows is a sign that something's likely wrong with your architecture.

Of course, that's not to say that architecture can cure this altogether. In particular, enterprise software tends to focus on integration a lot, and mostly because enterprises have a lot of boxes that need to be put together.

Startups will tend to do a lot less of this, but it's still important.

For the past 8 years I've been designing .NET enterprise integration solutions and I love it. I love the problem solving that is required and working on getting square pages to fit seamlessly into round holes. Recently I switched companies and was working with jQuery and PHP doing front end things like UI validation and was losing my mind. It's not the pencil and paper type of challenge as integration design.

Wow, could you elaborate more? This sounds like a very interesting viewpoint, and one that I've not seen before.

I'm with you on this one, we're a rare breed. I think people underestimate how much good software engineering can go into these "boring" enterprise projects. Of course, there are a lot of shops where programming skill is lacking, so you're actively discouraged from doing anything the least bit interesting. But if you can find a gig that welcomes skill and creativity, you can end up having a lot of fun doing these types of projects.

For me, an interesting project is not the topic but the technical challenges behind it. Your typical web front end stuff just doesn't have any appeal. It's just programming gruntwork.

I always shudder when people say it is just a simple integration project. The arrows are the hardest part.

I've used the "plumber" term to refer to my own deeds, if only because they don't resemble the shiny and showy stuff which is out there. I don't make games, and I don't generate graphics. My stuff tends to be the glue which makes other things happen in the first place.

I'm far more likely to write the server underneath that huge multiplayer game and deal with all of the systems stuff, in other words. The actual gameplay, graphics, sound, and user interface? That's someone else's realm.

Reminded me of this section from the Rspec book - http://cl.ly/2x231s0N2v3C2P1S1z04, http://cl.ly/2R1Z1D450Y1n1n3L170b - which applies to testing object-oriented code by mocking objects and putting assertions on the interaction between objects (vs vis versa).

"Object-oriented systems are all about interfaces and interactions. An object’s internal state is an implementation detail and not part of its observable behavior. As such, it is more subject to change than the object’s interface. We can therefore keep specs more flexible and less brittle by avoiding reference to the internal state of an object."

This 'Plumber Programmer" article is meant to be more general than object-oriented code, but I think the analogy applies. Building systems is about interaction and communication - system state is just an illusion that gives us a first approximation of how things talk to one another.

I was hoping this was going to be about programmers that moonlight as plumbers on the side.

At the telecom company I work for, this is called 'System Integration'. It takes most of our time.

Tangentially related: the plumber program from UNIX R11^W^W Plan 9 makes programs work together by piping text around. http://doc.cat-v.org/plan_9/4th_edition/papers/plumb

Again, there is more value in the connections than in the individual programs themselves.

Seymour Cray said that he was an overpaid plumber....

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact