Every order of magnitude increase requires a new level of discipline. At 10^3 lines, you can do whatever you want -- all your function and variable names can be one or two letters, you don't need comments (or indeed any documentation), your functions don't need well-defined contracts because they only need to work in a few cases, etc. etc. At 10^4 lines, if you're smart you can still get away with making a mess, but it starts to become helpful to name things carefully, to add a few comments, to clear away dead code, to fuss a little over style and readability. At 10^5 lines, those things are not just helpful but necessary, and new things start to matter. It helps to think about your module boundaries and contracts more carefully. You need to minimize the preconditions of your contracts as much as practical -- meaning, make your functions handle all the corner cases you can -- because you can no longer mentally track all the restrictions and special cases. By 10^6 lines, architecture has become more important than coding. Clean interfaces are essential. Minimizing coupling is a major concern. It's easier to work on 10 10^5-line programs than one 10^6-line program, so the goal is to make the system behave, as much as possible, like a weakly interacting collection of subsystems.
There's probably a book that explains all this much better than I can here, but perhaps this conveys the general idea.
That's also why I think that macros are so valuable: a single macro can abstract away an extremely complex and tricky piece of code into a single line, or two.
With the right abstractions, we can turn 10^6 projects back into 10^3 projects.
It is challenging to summarize in a brief reply, and I would not do justice in any event - characterizing the relationships between the evolutionary pressures on software, the limitations of the people and organizations who produce it to handle the attendant complexity of any such system, and the feedback loops which drive the process, etc., is not a 1 paragraph post.
See http://users.ece.utexas.edu/~perry/work/papers/feast1.pdf for instance
My crotchety old geezer, "get off my lawn" take is that minimizing LOC count is vastly unappreciated (as FEAST underscores how increasing LOC count decreases the ability to evolve/change/modify software), and that DSLs are therefore far more promising than are "index cards" and "stories" and "methodologies" to decrease LOC count and thus increase agility
[I]f we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
Valuable lesson I learned: code will be unlikely rewritten or recaptured. It costs money.
But at least you get the chance to upgrade hardware and software once in a while, because it wants out of support or breaks down.
I net this java program is still a mess and runs somewhere.
Thanks to this code I realized programming isn't something I want to do 24/7 :)
Why should I read a book when you have done a great job in one paragraph. Seriously, everyone who has built large systems should understand it immediately. This is a big part of the software engineering craft.
Is not "software" that has "diseconomies of scale", is the design process.
Then, because software has close to zero of marginal cost the design cost is all the one we end up paying.
Imagine the same milk that the author is talking about, imagine to design the system that get the milk from the cow to 2 people every morning.
Not a big deal, get a cow, a farmer and a truck mix all together and pretty much done.
Now imagine to serve a supermarket, well you start to need something more than a cow a farmer and a truck.
Now imagine to serve a whole city, what about a whole nations? What about the whole world?
Simply design how to serve milk to a small supermarket is a problem, but since the marginal cost of the milk is not approaching zero, the cost of the design process will always be less that the cost of the milk (otherwise it wouldn't make financial sense), hence the whole idea of "economies of scale."
To conclude I believe that the root causes of "diseconomies of scale" doesn't lie in the "software" part but in the "design" part.
Other than that it does matter because in a future you may see the same problem in other industries... A very advanced 3d printer may bring this whole class of problem to the manufacturing industry as an example.
If we learn how to manage the complexity of the design phase we will be able to apply the same concepts to other fields.
It's only when you want a singular capitalist to get a cut of all these transactions, that you need to complicate things.
Cheaper food means for example - people who couldn't afford to go to vacations, or to buy a computer for their kids - can.
So - designing the logistic chain is worth it, it provides value for customers.
Same as making software.
The design process/writing software on the other hand - is hard to do on big scale from scratch. I'm not aware of any big logistic network that started big.
Software == Farm
Value == Milk
Economy of scale has nothing to do with the price of milk in different-sized containers, nor does it have anything to do with software complexity.
Economy of scale is about it being cheaper for a large farmer to produce a liter of milk than for a small farmer, because overhead costs don't increase linearly.
Likewise, software has economies of scale because, on a per-user basis, it's far more expensive to support a 1st, or 5th, or 100th user than it is to support a 100,000th.
The author is fine to note that software gets more complex and costly to maintain as it gets bigger, but that has nothing to do with economies of scale -- the author is completely confusing concepts here. Economies of scale are about the marginal cost per user, not the marginal cost per bugfix.
(a) The author correctly point out some diseconomies of scale in software, i.e. things that cost more when you do more of them.
(b) The author fails to identify that economies of scale typically far outweigh the aforementioned diseconomies. The main error seems to be basing the argument on this statement --
"This happens because each time you add to software software work the marginal cost per unit increases"
-- without considering the idea of dividing the expense by the number of users, which can increase dramatically for projects reaching a critical mass. To take a trivial example, most 10-employee businesses can't afford to pay for a 10,000 line software application. A company with 100 employees may need to write a 30,000 application due to increased complexity of their environment, but they can afford it because the project now involves 300 lines of code per employee rather than a 1,000.
In short this author accounts for both the numerator and denominiator in the milk analogy that's up front, but then effectively ignores the denominator in the discussion of software costs.
Of course this is why most programmers work for large organizations, at least relative to the average employee. It's also why a handful of large software projects make most of the money in the software business. I'm not happy about this, btw, but it is the case.
To my reading the author's use of "economies of scope" and "economies of specialization" are even further off base. For example, the trend over the last 50 years or so has been towards increasing specialization (which again, benefits larger teams, although the app economy may have provided a brief counterpoint, the same forces are at work there).
Overspecialisation also causes problems, like engineers who don't understand user experience or empathise with the user or even stop to ask whether the flow they're building is needlessly complex. Or designers who propose ideas that are not technically feasible given the platform.
If you can increase scale without increasing scope, like WhatsApp supporting 900 million users with 50 or so engineers, great. If you're increasing scope in order to increase scale, you can't assume that the former leads the latter.
This example makes no sense. The user in either case still gets 1 crash per year. They're actually worse off in the many-releases case because in the annual update scenario, they can at least block off a few days to cope with the upgrade scenario (in the way that sysadmins schedule upgrades and downtime for the least costly times of years), but in the weekly release, they could be screwed over at any of 52 times a year at random, and knowing Murphy, it'll be at the worst time. '% of releases which crash' is irrelevant to anything.
"In order to keep costs low on testing you need to test more software, so you do a bigger release with more changes - economies of scale thinking."
If you wanted to keep testing costs low, you wouldn't do bigger releases, you'd create automated tests. You may spend more effort up front on building the tests, but as long as you make the test components modular and target the tests at the right level of your application the 'cost' of testing will decrease over time.
Also writing good automated tests requires a great test developer. The thing is, anyone with such credentials would be a great developer and as such, not a tester.
Even if you go fully test-driven, which makes it much cheaper, the cost of test a lot development model is surprisingly high for any application of useful size.
Just imagine trying to write even something a simple as MS Paint with good test coverage.
Oh and MS Paint would be easy to create a good test suite for. For what it's worth I'm a software tester by trade, so perhaps it's straightforward for someone who creates tests for a living to know how to approach it, I guess someone who was inexperienced wouldn't necessarily know how to approach it.
Second recommendation is to look into how to avoid test-induced damage, which is where code gets bloated and more complicated in order to make it more testable. One major source of problems in this area is the need to create mock objects. I'd recommend this video from Mark Seemann as a good starting point in this area, as it looks at how you can create unit tests without mocks:
If you can let me know which language you're mostly coding in I'll see if I can give you some more recommendations.
As you're also using JS I'm guessing you're creating web apps, so I can recommend Selenium if you want to automate front end tests. You can code Selenium tests with C# too, and you can also abstract away the details of the Selenium implementation to use SpecFlow to write the tests. If you have access to Pluralsight (if you have an MSDN licence check to see if it's bundled with your MSDN licence, I think I got a 45 course Pluralsight trial with my MSDN Enterprise licence), there's some good courses on Selenium and SpecFlow, including one that takes you through combining the two.
Plus they do not verify application state really, just that it does not crash. What if it just looks and acts funny?
As for "looks and acts funny", that's why you still have exploratory testing. Automated testing can drastically cut down on overall testing time, but there's still the need to perform exploratory testing to look for quirky issues.
(And please no one tell me that I should only hire developers who are already also expert testers. I have to operate in the real world.)
This is also a huge win for pair programming - code review is built in.
I've seen that argument several times "Why can't the developers test their work?" For the same reason professional writers have proofreaders, editors, translators, etc. Because it would be amateur not to have more than one pair of eyes and one brain looking at something. No matter how good those eyes/brain are, they'll miss things that would be obvious to someone else.
There are only a few types of fonts available - TrueType, PostScript, bitmap, maybe others? At any rate, you test a couple of examples of each type. Paint is a bitmap graphic generator, right? Apply letters in a test font to a bitmap, and compare the resulting bitmap to an expected value.
We don't need to test every possible font there is. We only need to test enough fonts to cover the likely failure cases and known bugs.
Speaking of bugs, suppose we find a bug with a specific font, or a specific size of font, or something like that. We write a test that exercises the bug, then fix the bug. Because it's a one-off test, it's not a tremendous burden to write. Moreover, that bug will never come back undetected again.
+1. Putting too much faith on the test coverage is a sign of too much faith in humans.
Some of my time goes to mobile development/android these days, and here are facts:
1. On android, there is occasionally this one vendor that crashes randomly for some piece of code. Because they modified the source code for android when they shouldn't. And it's just one model, among thousands of others. And it's different model for different code pieces.
2. Android API is not exactly one would assume. They return nulls for cases when they should return empty lists, and vice versa. They do this occasionally. So you don't really know when an api returns null vs empty list, and some apis (like get running processes) should never return empty, except that it does in some bizarre cases. So your code would need to test both empty list and null, which adds to your test scenarios. Aside from null/empty list, there are many other examples.
3. When you release, all your tests work fine, at least for what they test. There is this one client that uses things a bit differently than others. You also fail 10% of the time for that client because of how they use an API. Especially in Android, how they use an API does matter. A simple example is using Application Context vs Activity Context vs Custom Context implementations. You'd assume they should all work fine, because Context is an abstraction and
Liskov substitution principle should apply, but it doesn't because it's leaky abstraction. Thus, somethings works with some context while others work with other contexts, and you don't really know which.
I came to the conclusion that for most cases, testing is a must to verify correctness of known scenarios, and known regressions. However, for automated upgrades/updates, testing will not work. Not at all. Doing automated updates/upgrades, it should be a controlled release. it makes more sense to make the release A/B test way, if possible, where small population (say 1% of the users) get control/previous release, and another population of the same size but disjoint gets the experiment/new release. Then you compare your baseline metrics. Could be # of http connections, number of user messages sent, number of crashes, etc.
Testing can only cover what humans can think of. Unfortunately, we don't know what we don't know. That's the main problem.
When the bug that slips through the testing is discovered, the first thing to do is write a test that exercises the bug and makes it repeatable. This is sometimes difficult in the case of intermittent or highly condition-dependent bugs, but simply learning how to repeat the behavior is an important learning experience.
The bug should not be fixed until there's an automatic test that induces the bug. That way, you can test to know that your fix actually works.
Although it would be great if we went through the exercise of "how would we write an automated test for this?" at a high level, even if it doesn't get implemented. That might get people thinking about how to architect their code so that it's more testable from the get-go.
Every time I do the "I don't need a test for this, it's just an obvious little bug" thing, and I implement a test-free fix, it fails! Every. Single. Time. Maybe it's because I'm stupid, but I like to think I'm above average. So if above-average programmers have a high risk of releasing "bug fixes" that don't actually fix correctly (or cause other problems) because they didn't write good tests, that really turns the "not worth the risk" response on its head.
It's not that it's not worth the risk to write tests. It's not worth the risk to not write tests!
Or are you implying there will be a working hotfix soon after the release?
What about security updates then?
See, in a weekly model you can probably safely skip a cycle. Not so with a big release. If the release has some critical functionality broken, you might be waiting a long time for it to get fixed.
Also increasing number of people makes things even worse as communication problems increase, but I think all agree on that.
But the main problem with the article is that milk is always the same: in a glass or in a tank car. Software is completely different: more functionality require more lines of code that increases people... and cost.
- system testing?
- bulk purchasing of licenses for development?
- architecture in the sense leveraging consistent frameworks, naming etc. across a team?
- development time too long? (according to COCOMO there is an optimal time and beyond effort increases although slowly)
- management of "hit by bus" risk
On the other hand, what in traditional industries would be called production, that is producing the specific website for one request, has a rather absurd economy of scale. With a single server and a static site you can serve a sizable fraction of the world population, with negligible marginal costs of serving an additional user. Actually Metcalfe's law suggests that the marginal cost of serving an additional user is negative and hence we get the behemoths like Google and Facebook instead of competition of a few different corporations.
Bloated bureaucracies and bad processes can negatively impact any company in any industry, not just software. Some of the article’s logic doesn’t seem to differentiate software development from anything else, such as “working in the large increases risk”. So while a large monolith application is risky, building a hundred million widgets is risky too. Better to iterate and start with a prototype and expand, but the same goes for other industries too: better to prototype and market test your widget before mass production. Seems to me like the article is talking more about lean and agile and process in general than about economies of scale.
Making and transporting large milk bottles is very efficient today (because it's all automated and there are well-tested processes in place),
but it wasn't necessarily always like this.
When people where still figuring out how to make glass bottles by hand (through glassblowing), bigger bottles were probably more
challenging to make (more prone to flaws and breakage during transportation) than small bottles. So probably they just figured out what the optimal size was
and just sold that one size.
With software, it's the same thing, we don't currently have good tooling to make building scalable software easy.
It's getting there, but it's not quite there yet. Once Docker, Swarm, Mesos and Kubernetes become more established, then we are likely to
see the software industry behave more like an economy of scale.
Once that happens, I think big corporations will see increased competition from small startups. Even people with basic programming knowledge will be able to
create powerful, highly scalable enterprise-quality apps which scale to millions of users out of the box.
Automation happens a lot in the Software Development world, but instead of depriving the developer of work, they just pile on the shoulder of the developer. For example, today, with AWS/Docker/TDD/DDD/... I basically do the work that would have taken a team of 5 people only 15 years ago.
The thing is there is always going to be somebody that sits at the limit between the fuzzy world of requirements and the rigorous technical world of implementation and those people are going to be developer (of course they will not be programming in java, rather in something else, but rigorous enough that the activity is still called programming)
Unless AI takes over, but it probably means that work as we know it has changed completely.
The article is talking about large software, not small software deployed to scale.
As for tools to help build large software, we have them in spades and they will continue to improve. But some things still don't seem to scale. Tools help with managing 500 developers but not enough to really make 500 developers as effective as 50 developers on a smaller project.
In most industries scale of firm goes hand-in-hand with scale of distribution. Software breaks the paradigm, so we have to be careful to say which scale we're talking about.
Obviously, with distribution, software has insane economies of scale, since we can copy-paste our products nearly for free. That's why we can have small firms with a large distribution, unlike most industries.
With scale of firm, we face some of the same diseconomies as other industries. Communication and coordination problems grow superlinearly with firm size.
Effort and resources needed also grow superlinearly with product scale. That's also true of other engineering disciplines though. Making a tower twice as high is more than twice as hard. Part of it is the complexity inherent in the product, and part of it is that a more complex design needs a bigger team, so you run into the firm diseconomies of scale mentionned above.
> Even people with basic programming knowledge will be able to create powerful, highly scalable enterprise-quality apps which scale to millions of users out of the box.
I must disagree. The real problem with scalability is that any system that scales enough must become distributed, and distributed systems are obnoxiously difficult to reason about, and as such remain difficult to program and to verify.
Talk to me about Docker and Swarm and the like hosting technology platforms and frameworks that make it trivially straightforward to program distributed systems reliably, and really hard to program them wrong, and we might have the utopia you speak of.
The promise is almost always false. All abstractions are leaky, and if you do serious development with them inevitably bugs will bubble up from below and you'll have to dive into the messy internals.
For example, ZeroMQ makes distributed messaging relatively painless. Someone with very little knowlegde of the network stack can write simple programs with it easily. But for any serious enterprise application with high reliability requirements you'll eventually run into problems that require deep knowledge of the network stack.
>Talk to me about Docker and Swarm and the like hosting technology platforms and frameworks that make it trivially straightforward to program distributed systems reliably, and really hard to program them wrong, and we might have the utopia you speak of.
The only downside I can think of is that tinkering with those internals can become trickier in some cases (because now you have to understand how the container and orchestration layer works). But if you pick the right abstraction as the base for your project, then you may never have to think about the container and orchestration layer.
The lie is that this tool is so easy, you just have to read this 30 minute tutorial and you'll be able to write powerflu software and you don't even need to learn the internal mechanics of it.
I havn't used Kubernetes, it's possible it's so good that you don't need the learn the messy details, I'm just sceptical of that claim in general.
I was a Docker skeptic before I stumbled across Rancher http://rancher.com/. In Rancher, you have the concept of a 'Catalog' and in this catalog, you have some services like Redis which you can deploy at scale through a simple UI with only a few clicks.
I think that this concept can be taken further; that we can deploy entire stacks/boilerplates at scale using a few clicks (or running a few commands). The hard part is designing/customizing those stacks/boilerplates to run and scale automatically on a specific orchestration infrastructure. It's 100% possible, I'm in the process of making some boilerplates for my project http://socketcluster.io/ as in my case, but you do have to have deep understanding of both the specific software stack/frameworks that you're dealing with and the orchestration software that you're setting it up for (and that's quite time-consuming).
But once the boilerplate is setup and you expose the right APIs/hooks to an outside developer, it should be foolproof for them - All the complexity of scalability is in the base boilerplate.
This also ignores much of the physical advantage that existing powerhouses have. Amazon, as an example, will be difficult to compete with, not because AWS the software is so amazing, but because the data centers that house AWS are critical. Google and others have similar advantages.
Deep learning benefits from scale: both in computation power and in a larger corpus.
From the example: is the final product the cow, the milk, or the nutrition it provides? Is a software product the lines of code, the app to download, or the service it provides?
It seems like the article's milk analogy is applied to the wrong thing: another pint of the same exact product.
Maybe it's more apt to make the analogy with a pint of a different type of milk product, a type of milk with brand new features. I'm talking about real-world products like these which have appeared in the last 10-20 years:
* Almond Milk
* Cashew Milk
* Coconut-based Milk products (not traditional coconut milk)
* Soy-based Milk products (not traditional soy milk)
* Lactose Free Milk
* Grass Fed Cow Milk
* Organic Cow Milk
* Omega 3 Enriched Milk
We would not expect the marginal cost of these to be less than another pint of conventional regular cow's milk, and indeed, it isn't.
That said, I suspect that the article is basically going in the right direction. I mean, it seems like additional features on a complex software project really do cost much much more, on a percentage basis, than a new kind of milk beverage does.
I agree with OP's premise that productivity diminishes with project complexity - a decades old problem addressed by Fred Brooks' Mythical Man-Month. But, a lot of the complexity is now wrapped by reusable components. It is now possible to write a component that is shared by 1000's of projects. Somewhat akin to a milk carton that can hold many kinds of milk.
However maintenance cost in software are usually really high especially since there is a tendency to gradually "improve" the same code base instead of ripping it out and building a new one every n years (buy a new car).
This is captured well in the marginal productivity of labor: the returns on adding more labor decline as you add more labor.
Talking about actual economies of scale, software has massive economies of scale. The marginal cost of serving one additional copy of software (in the old "software as a product" way) is close to zero.
Very few production lines get cheaper as they get more complex.
But software, more than most products gets cheaper per unit, as you scale the number of units.
Software does have economies of scale. How much did a copy of Windows 7 or Office 2013 cost? Only about $100? That's because the more we produce/supply/consume the cheaper it gets, just like milk.
The notion that there's an optimal amount of human capital needed for a project is nothing new. We've all heard of "Too many cooks spoil the broth."