

Programs should be Small - mairbek
http://mkhadikov.com/2012/02/02/programs-should-be-small.html

======
meaty
I love all these articles which are wholly ignorant of how complex software is
in reality and how such advice isn't necessarily good.

Sometimes decomposition results in problems at the other end of the scale such
as communication performance, data duplication, extremely nested abstractions,
messaging complexity, contract and API versioning hell etc.

Getting the sweet spot between monolithic coupled blobs and fragmented latent
deathtraps is an art which can't be puked out in a blog post. It takes
literally years of experience and some guesswork and testing and thinking.

Ultimately, lots of small programs are just as painful as a single large one
if they have to talk to each other or do IO.

~~~
sorbits
Adding to that, his stated advantage of not having to limit yourself to one
platform also seems opposite of my experience (i.e. keeping all on the same
platform _is_ an advantage).

When you have a big system with disjoint parts written in different languages,
re-use and refactoring is a pain, and redundancy is almost certain to creep in
(and with redundancy often comes inconsistency).

~~~
meaty
Yes heterogeneous systems are much easier to deal with, although from
experience certain systems are a pig to deal with from end to end (anything
Microsoft as a rule).

Different languages are just different forms of integration and the mantra of
_integration is hell_ should be in the forefront of everyone's mind, always.

~~~
mairbek
Depends on tools you use, I found Apache Thrift and Protobuf pretty to be
sophisticated tools for integration between services.

~~~
meaty
Yet they are still entirely impractical for what we do. There is no one size
fits all methodology which results in a heterogeneous communication layer.
This means that you end up with technology fragmentation and therefore
additional complexity.

------
DanWaterworth
The problem is that software has a tendency to become complex. The proposed
solution is to break up the software into smaller programs.

There are certainly advantages to having smaller components. It allows you to
rewrite components in a different language should you want to, for example.
But there are disadvantages to: smaller components means dealing with failure
at a much finer granularity.

In my opinion, the reason large programs become complicated is that there has
been no emphasis on simplicity. Breaking components into smaller pieces forces
you to adopt robust interfaces, but there are better ways of creating simpler
programs.

My personal approach is to reason about parts of a program in terms of what
they mean rather than what they do. I also have a strict rule that says,
"don't change the meaning of a component, create a new one". This methodology
works for me.

~~~
dawkins
Can you please elaborate on "reason about parts of a program in terms of what
they mean rather than what they do"?

~~~
DanWaterworth
Of course.

A good example that many people are familiar with is parsing data. Compare
these two approaches. You could write functions that manipulate the input and
build up an output or you could create a structure that represents a grammar
for the data you are parsing (perhaps by using a parser combinator library).

In the first approach, the only feasible way of reasoning about the program is
operationally; when it gets to here, this function is called, causing ... . In
the second approach, you can reason about the program by considering the
grammar that you created. You don't need to know exactly how the parsing
happens in order to understand a grammar. I argue that this is because the
grammar has been given a meaning.

Parsing makes a good toy example, but this same technique of finding ways of
giving meaning to components is applicable to software in the real world.

------
Chris_Newton
Modular design at any scale has a natural tension between looser coupling and
higher cohesion. If you split up a large code base into many small parts, each
part can be simpler and looser coupling between parts may improve
maintainability. On the other hand, now you must to co-ordinate those parts
somehow, and making up for the loss of cohesion introduces a kind of
complexity you didn’t have before.

This tension exists at any scale, from a single-developer hobby project up to
massive enterprise projects and OSS giants, so I challenge the original
premise of this blog post that having a large code base is the root cause of
the problem. Going too far in either direction can result in absurdity,
whether that’s “enterprise software” levels of boilerplate (too much tight
coupling) or DLL hell and typical Linux package management (not enough
cohesion).

------
karterk
Microservices are not necessarily bad, but one should also be aware of the
drawback of such an approach. If there is a very tight coupling between two
modules, you will often find yourself having to keep making changes between 2
different modules. The typical process goes like this:

1\. While working on module 1, you realize you need something from module 2

2\. Open module 2, add new feature and publish changes

3\. Go back to module 1, test new feature and resume work

This process is fine once both modules 1 and 2 have matured but painful to
deal with while the APIs are still taking shape. Hence it makes sense to keep
a good abstraction between potential components and spin them off as an
individual service only when they're stable enough.

------
arocks
This is probably another way of saying Service Oriented Architectures (SOA)
works best for the enterprise. They probably already know that we should have
all functionality in coarse, self-contained services.

But often the plumbing required in the form of web services becomes really
painful to leverage. For instance they require creating complex WSDLs and
workarounds to prevent timeouts.

~~~
mairbek
SOA is great, people in Java EE world use it wrong.

Instead of making small isolated services they do one single gigantic WAR
file.

Instead of using right tool to do the job everything is written in Java.

Instead of having services with implemented business logic they do services
that convert one DTO to another.

That sucks...

------
bbwharris
There is value in reading pages of legacy code. Its very common to watch new
hires solve an already solved problem. Too many people are allergic to reading
code it seems.

Solving complex problems in the physical world usually results in complexity
in the source code world.

It is always overwhelming to jump into a new gigantic code base. Talk to
someone who's been on it a while and they won't have the same drowning
outlook.

------
bhauer
Certainly perpetuating the illusion that all Java code is "enterprisey" and
"monolithic" will get tiresome at some point, right? I sure am tired of
reading such views.

~~~
MojoJolo
That's what I think before. Until I met Grails and Groovy. Plus their friend
Scala. I'm really getting high productiveness in those languages when
developing web apllications. Grails for web stuffs and Scala for non web.
Before them, I used to think that Java is only for enterprise and didn't like
using it for my web apps even though it was the language I'm most proficient
at.

------
taeric
This seems akin to the saying that "everything should be made as simple as
possible, but not simpler." Well, yeah. simplifying to that point isn't
exactly easy. And, worse, the act of simplifying your code to fit this
description is something that is usually done after you had it working. In
other words, instead of solving another customer problem, many folks spin
wheels "solving" their own "problems." Even worse, often the solution is taken
to that simpler place that the saying warns against.

------
crististm
The problem I see with lots of software is that you don't have an immediate
view of the scale. Suppose you're opening a random file. What do you see? Are
those the atoms or the gears? You want to see the gears but there is usually
no map to point you to the gears. The software is the map - yeah, right.

What are you supposed to do? Find the int main() and then make the program run
in your head?

I can make an analogy with a car - I don't know every one piece of it but I
can infer from the context. The scale is evident.

------
squirrel
SOA is somewhat different to the "microservice" design it seems this author is
proposing. I summarised what I could find on this - ideas from James Lewis at
Thoughtworks, Dan North at DRW, and Fred George at Forward - in a recent
CITCON session:
[http://www.citconf.com/wiki/index.php?title=Continuous_rewri...](http://www.citconf.com/wiki/index.php?title=Continuous_rewriting)

------
mimog
Isn't this "the Unix way"? I have seen this style of small cohesive programs
promoted a lot in Linux/Unix literature so the advise isn't really breaking
any new ground. Look at git for instance. The advise is of course sound but it
goes against the "enterprisy" way of doing things, in part because they tend
to be using huge frameworks from the get go.

~~~
IvyMike
On the other hand, think back to the famous macro- v. micro-kernel debate. The
kernel itself is huge, and the "small cohesive service" philosophy microkernel
advocates ended up never really taking over the world.

~~~
marcosdumay
That's because kernels perform differently from user software (they have more
optimizations available), and a lot of big code needs that extra performance.

That's a fact that still didn't change, but user level code is getting more
powerful (mainly for virtual machines), and computers are still getting
faster. So, it's still too early to declare the race finished.

Anyway, none of that has any relevance to how one should organize user level
code.

------
michaelochurch
Yes yes yes yes yes yes yes yes yes. This is absolutely true.

The program-to-programmer relationship deserves to be many-to-one. It's a
rewarding way to do things. You solve a problem. You add value. It's Done. You
may have to go back to a program later to add features, but you don't end up
with massive codeballs.

When the program-to-programmer relationship is inverted and becomes one-to-
many, you get the enterprise hell with no feedback cycle, terrible code, and
unnecessary complexity. It's not rewarding. Problems are never solved and
software is never Done. Requirements are "collected", bundled into an
incoherent mess, and delivered to bored, underachieving developers who never
get to see their programs actually _do_ anything.

Large problems that require more than one person need to be solved with
_systems_ and given the respect that systems deserve. Single-program
approaches are a denial of the complexity (that comes whenever people have to
work together) and a premature optimization.

I wrote about the political degeneracy that this creates:
[http://michaelochurch.wordpress.com/2012/04/13/java-shop-
pol...](http://michaelochurch.wordpress.com/2012/04/13/java-shop-politics/) .
But it's unfair to associate it with one language. It's not that Java is any
more evil than C# or C++. Any company that calls itself an X Shop is doomed.

There _are_ cases where large single programs deliver value. For example, most
people experience a relational database as a single entity. There are a lot of
requirements (performance, persistence, transactional integrity, concurrency)
that are technically very difficult to meet and all have to work together. I
will also note that it has taken some very smart, very well-compensated,
people _decades_ to get that stuff right. The quality of programmers who tend
to stick around on corporate big-program projects is not high enough to even
attempt it, though.

So why is big-program development winning? There are a couple reasons for
that. First, it gives managerial dinosaurs the illusion of control. If
programs are Giant Things that can be measured in importance by "headcount",
then executives can direct the programming efforts... which they can't do if
the programmer's job is to go off and independently solve technical problems
they deem to be important. Second, big-program design gives a home to mediocre
programmers who wouldn't be able to build something from scratch if their
lives depended on it but who, in teams of 50, might be as effective as 0.37
good developers. It's about control and a failed attempt to commoditize
programmer talent, but it doesn't actually work.

~~~
anon1385
>So why is big-program development winning? There are a couple reasons for
that. First, it gives managerial dinosaurs the illusion of control.

So why do larger programs 'win' in the open source world as well[1]? Pop
psychology about management doesn't seem sufficient to explain the phenomenon
(although I'm sure it is a good way to sell a '101 habits of highly effective
managers' book or get paid to give talks about management). Large systems are
large, breaking them up into smaller pieces doesn't change that, but it makes
navigating the code base harder (although I assume you don't care about that
since judging from your blog posts you don't think tooling is important). It
makes your interfaces less malleable (can be good can be bad), and moves a lot
of communication to places where the compiler can't warn you about mistakes
(again, if you don't care about tooling I guess this doesn't matter… but I
would argue that this is bad).

It seems to me like systems of many small separate processes is basically
dynamic OOP. Everything is late bound, dynamically typed and async. It's easy
to make changes and also easy to break things. You can argue that this is
better for certain problems, but I don't think it's universally better, and
the community seems pretty divided on the issue too: look at the popularity of
Go, statically typed and building concurrency into the language rather than
using the OS like in the older C world.

Aa an aside; surely the web developer community is eventually going to grow
tired of talking about how terrible Java is and how $idea_of_the_moment is
good because it's 'not java'? As an outsider the obsession seems extremely
unhealthy, and leads you to bizarre places like arguing against automated
refactoring or interactive debugging or static type systems just because those
things are associated with Java. I guess to maintain credibility I also need
to point out that I don't and have never used Java…

[1] Firefox/Chromium vs uzbl, gcc/llvm/clang vs pcc, gdb vs printf debugging,
sqlite/mysql vs directories and plain text files, perl vs sed/awk/grep shell
scripts, emacs/vim vs ed/notepad etc etc.

~~~
michaelochurch
On the anti-Java bias, I think the issue is that there's more than one Java
culture. There's the horrid commodity developer culture, but that's not the
language's fault.

I'm actually a pretty big fan of static typing. You don't get static typing's
main benefits in C++ or Java, though. You have to use a language like Haskell
or Ocaml, or the right subset of Scala, to see the major benefits of that.

Open-source is a bit different because people choose whether they contribute
to a project. The quality of code in the active open-source world is leagues
above what you find in typical enterprise codeballs, because of survivor bias.
No one has the authority to mandate that code be maintained by others, so the
messes are cleaned up by people who actually care, not people slogging through
it to keep a paycheck coming.

The big-program methodology of the corporate world is the evil. In FOSS, the
major projects are an unusual set-- code-quality at a high level just not seen
in the for-paycheck commodity-engineer world and large _because of_ success--
rather than the reverse. There's a survivor bias that occurs because the best
projects are the only ones people pay attention to.

The corporate world is screwy because projects become large or small based on
political reasons that have nothing to do with code quality. In the FOSS
world, code-quality problems related to growth will be self-limiting because
no one has the authority to "force" the program to grow.

~~~
anon1385
I should mention that I've never worked in the corporate world, so my reaction
is in that context. I can't talk about the corporate side of things since I've
never experienced it.

Also yes static typing in Java does look pretty cumbersome. Personally I'm
hoping that Rust takes off, I've enjoyed playing about with it over the last
couple of weeks, although it has made me less happy using the more dynamic
languages I normally use to do real work.

------
moe
A good program is 500 lines or less.

~~~
pixl97
_I type at least 60,000 characters before inserting a line break._

I joke, but I've seen some php before that was at least 600 chars before line
breaks, I have no idea how they wrote it like that.

~~~
grimman
Broken 'enter' key; it's the only way I can imagine while clinging to sanity.

------
xrt
Beyond complexity management, breaking a design into many small programs opens
it up to a rich set of well-known and proven OS services. Things like hardware
memory protection, multi-processor support, queues, mutexes, monitoring, and
cross language support. I'll take Unix over some language+standard library
_any day_.

~~~
vanwaril
and it also opens it up to a rich set of well-known problems: sharing between
processes and memory management, lock issues, system-call latencies, not to
mention dealing with the monolithic environments that are almost definitely
changing between each instance of your program you want to run (this
'environment' includes the shell, userspace services, kernel version and
features...). Decomposing a program into multiple programs isn't always a good
idea, there is a very broad trade-off here that needs to be evaluated for the
needs of every program.

~~~
mattgreenrocks
Absolutely. But it's better from a design perspective to start as pure as
possible and then make concessions when you find your back against the wall.

Most of the "it's too hard and there's too much to do!" crowd doesn't
understand the benefits of working clean.

------
niggler
"Open source that libraries if possible and you’ll get the feedback from the
peers."

Does anyone have real experience with this (open sourcing a core piece of
infrastructure and finding that others have found it, used it and provided
feedback)?

------
ams6110
[http://www.cs.nott.ac.uk/~cah/G51ISS/Documents/NoSilverBulle...](http://www.cs.nott.ac.uk/~cah/G51ISS/Documents/NoSilverBullet.html)

------
lukego
My current theory is that a good program should compile in less than 1 second
into an executable of less than 1 megabyte.

~~~
vanwaril
My current theory is that this is an arbitrary one. Its really, really, really
easy to cross a megabyte with statically linked libraries.

~~~
csense
I think the grandparent intended that the size measurement exclude statically
linked libraries or assets, debug symbols, and compression technologies like
UPX.

It can sometimes be beneficial from a distribution/deployment standpoint to
have everything in one self-contained file. But you can't conclude much about
the code quality of e.g. a computer game engine based on how many megabytes of
graphics, music and sound effects a particular game based on that engine uses.

~~~
lukego
The rule is not meant to be universal, and I don't say other people should
adopt it, but I think it's suitable for the work that I am doing right now.

Constraints like this can really shape a piece of software, for better or for
worse. My inspiration is having work with a really powerful firmware system
that had a hardware constraint to fit on a 1MB flash chip, everything
included, and was done so well that it looked easy. Give yourself unlimited
space and it's much easier to end up with UEFI...

I suspect quite a few programs out there would have turned out better if their
authors had picked a semi-arbitrary maximum value for lines of code / bytes of
RAM / bytes of disk / etc.

The actual rule I'm using for now is: \- 1 second compile excluding
dependencies. \- 1 minute compile including dependencies (excl C compiler). \-
1 MB executable including everything except libc and base OS.

------
markhellewell
he's talking about adding a lot of "moving parts"...

