Most of these problems boil down to a couple operations.
Some kind of SQL system
Webservices (of the JSON/XML/HTML variety)
inputs: () -> Some<T>
outputs: Seq<T> -> ()
pipes: Seq<T> -> Seq<Q>
tees: Seq<T> -> () -> Seq<T> -> Seq<Q> (once partially applied the tee becomes a pipe)
- Designing usable APIs, which is to say identifying which aspects of System A are most important to System B, how the concepts should be transformed, and how the data should be aggregated.
- Catching errors thrown by one system when data is requested from another system and relaying, translating, or suppressing these errors. Also developing policies for dealing with unreliable services.
- Performance issues.
The first problem is the same as the second. (eg. which data (errors ARE data) where)
Make all your services unreliable and it solves 90% of your issues as then unreliable services aren't special.
Performance issues in data generally stem from two places (underspec'd hardware) and latency. Which are usually solved best by buying better hardware (more drives not more CPU) and buying a network nightmare box and putting 400ms of latency in the network. People stop designing chatty APIs pretty quickly at 400ms of latency.
The first problem is similar to the second, yes. They both still need to be solved, so what is your point?
If you think latency is the problem, why are you talking about building in latency? It seems like you actually think chatty APIs are the problem. And, yes, chatty APIs can cause slowness. But chatty APIs often exist because they are the simplest possible design. Once you realize that there is too much back-and-forth you may have to sacrifice API usability and simplicity by adding caching and eager-loading code. Again, you think this is just something that solves itself?
type Response<d,e> =
| Data of d
| Error of e
match response with
| Data d -> doSomething(d)
| Error e -> doSomethingElse(e)
match response with
| Data d -> Some(d)
| Error -> None
Each system in the chain should have a reference identifier to make tracking the requests across the system easy.
Chatty APIs often exist because a lot of programmers think that everything happens at the same speed and they think that having a getter and accessor for every field makes their code "object oriented". Putting in 400ms latency gets programmers to stop thinking that, it gets them thinking about "How can I issue a bunch of requests simultaneously, go do something else (like issuing more requests for someone else), and then respond to the client when I have all the data I need". It gets them writing async code, or using MARS. Maybe, 400ms is really excessive, but 100ms should still let your code run on systems with reasonable geographic separation.
Chatty APIs aren't simple, they're generally really annoying, because for decades the predominant idea in programming has been put as thin a veneer on top of the implementation as possible and lets call that an interface. It makes for a simple implementation at the expense of a horrible interface. APIs are about the interface.
> Using the usual sequence operators pretty much anything is possible
"People build BIG BALLS OF MUD because they work. In many domains, they are the only things that have been shown to work. Indeed, they work where loftier approaches have yet to demonstrate that they can compete.
"It is not our purpose to condemn BIG BALLS OF MUD. Casual architecture is natural during the early stages of a system’s evolution. The reader must surely suspect, however, that our hope is that we can aspire to do better. By recognizing the forces and pressures that lead to architectural malaise, and how and when they might be confronted, we hope to set the stage for the emergence of truly durable artifacts that can put architects in dominant positions for years to come. The key is to ensure that the system, its programmers, and, indeed the entire organization, learn about the domain, and the architectural opportunities looming within it, as the system grows and matures."
Maybe, you are correct. Probably. We all hope. But, practically, in the 'bigger' problem domains, you can put a lot of smart and experienced people on a project, and it still comes out a ball of mud. Maybe, there just aren't yet enough experts to go around.
 http://www.laputan.org/mud/ (conclusion)
BTW, any references you know that can articulate your assertion without hand-waving, I would love to read. I must read. I'm currently devouring everything I can about FP and there's a lot of concrete stuff, even more stuff with handwaving, and then some alarming articles about people who did their startup in haskell and wouldn't do it again, even if for out-of-band factors.
A other problem is that people often get tought that you need <input some bad framework>. You can't do distributed computing if you done use some kind of framework. At least the places I know they would never tell you something like "just send json from one node to the other if thats all you need".
At clojure conj there where some talks about this, but the videos are not out jet. See these presentation on Concurrent Stream Processing (https://github.com/relevance/clojure-conj/blob/master/2011-s...) or this on Logs as Data (https://github.com/relevance/clojure-conj/blob/master/2011-s...)
For a other example that works in quite simular ways look at Storm (in use at twitter. Its all sequential abstractions. See this video by Nathan Marz (look all videos you can find):
For a more philospical perspective look at the videos by rich hickey:
I would never assert that every type of problem should be solved in this manner, but it's a pretty good framework for taking data from a bunch of different sources and outputting them to a bunch of other destinations. Pipeing and transforming data is not fundamentally a problem of hierarchical types (a problem somewhat solved by C++/Java/C#) but type transformation and streams.
Keep your business logic and GUIs built in Java but use something like this for moving data around the organization and importing / exporting as needed by clients.
In the end, I just resigned to go and do iPhone software for a while. I suspect any one person only has a certain amount of programming in them, and you don't want to waste it.
That's not to say that you couldn't do something similar in C++, but I've met like 4 really really good C++ programmers in about 15 years of programming, so no, it's unlikely to happen.
This is the sort of stuff that filter/map/reduce is really really good at, but also, it can be important code that people will pay you a lot of money to write over a longer period of time....
As you say, this mapping can be specified with a function, composed of other functions.
But if the mapping is complex, with different levels interacting, writing this function can be difficult.
That is, isomorphic mappings are straightforward; but non-isomorphisms (the non-homomorphic aspect) can be tricky.
How well does this approach handle the tricky cases? (an example would be great, if possible)
The API of 10 year old closed source enterprise systems that have grown over time by buying this solution provider and that and where documented methods may work or not... shudders, no, that's not plumbing at all, that is more like playing Minecraft in a toxic waste dump.
I haven't used these - but they look like they'd nail this problem. Is there a problem with them?
Enterprise tools tend to make everything more complicated then it is (somebody else mentiond map/filter as a cure) and to help you fight this complexety you need even more complex tools. In the end you just have tonnes of Code and Tones of tools with tones of configuretions.
Abstracting away ofer thing that are hard to abstract is a typical error these kinds of tools make. You cant just abstract away networks or databases. Sure a simple ORM is fine for most blogs but if you end up writting 30 lines of java code to do something you could have writen in 2 lines of sql something is not right.
This video teaches the basic idea:
Process J is usually the simple bit - the hard bit is taking it from A and B, converting (1) and (2) so that they can be used together, dealing with all the different eventualities that can cause things to go wrong when fetching the data, parsing the data into (3) and then writing to Z (while dealing with all the things that can go wrong when writing to it).
The technical details are usually pretty boring. Data goes in, data comes out, if it weren't for all the bloody people involved it would be very civilized.
This sounds superficially appealing but I'm not sure there's any actual wisdom here.
Also, A is a 30-year old legacy system sitting on a mainframe, B is an Oracle database inherited when we bought over another company, and Z is a third-party who we're shipping data to for their just-in-time procurement system.
No, I'm not. I'm simply taking seriously the claim that the arrows are more expensive than the components. If that's true, then it's cheaper to rewrite J, period, by definition of the claim at hand.
If it's not cheaper to rewrite J, then the arrows are cheaper than the core components and the way in which they are more expensive is only in an artificial and useless measurement of "cost" that only holds up as long as you don't take it seriously... what's the use of it, then?
The problem there is that there are already several arrows pointing to J. So if I change J then I have to update all of them too. And then have them regression tested.
(It's usually the cost of regression testing that causes the most crustiness.)
Funny anecdote, last time I heard about "plumber programmers", the meaning was very different and actually quite pejorative. The person I was talking to was referring to the type of programmers that can only write applications by assembling third party components such as ruby gems while lacking the algorithmic skills to solve problems that haven't been solved before.
It's one of the reasons I tend to stay away from things like Django and RoR. It makes it easy to develop applications quickly, but I many times have to create hacks to make any changes that don't fit into the one-size fits all libraries.
A well-architected application (they do exist) where a lot of work was put into designing the arrows and boxes will suffer from this effect less so than one where the arrows and boxes were added on haphazardly. In fact, I would argue that spending too much time worrying about the arrows is a sign that something's likely wrong with your architecture.
Of course, that's not to say that architecture can cure this altogether. In particular, enterprise software tends to focus on integration a lot, and mostly because enterprises have a lot of boxes that need to be put together.
Startups will tend to do a lot less of this, but it's still important.
For me, an interesting project is not the topic but the technical challenges behind it. Your typical web front end stuff just doesn't have any appeal. It's just programming gruntwork.
I'm far more likely to write the server underneath that huge multiplayer game and deal with all of the systems stuff, in other words. The actual gameplay, graphics, sound, and user interface? That's someone else's realm.
"Object-oriented systems are all about interfaces and interactions. An object’s internal state is an implementation detail and not part of its observable behavior. As such, it is more subject to change than the object’s interface. We can therefore keep specs more flexible and less brittle by avoiding reference to the internal state of an object."
This 'Plumber Programmer" article is meant to be more general than object-oriented code, but I think the analogy applies. Building systems is about interaction and communication - system state is just an illusion that gives us a first approximation of how things talk to one another.
Again, there is more value in the connections than in the individual programs themselves.