When I did a startup many years ago, I committed the mistake of paying too much attention to the architecture of the software  I was writing, and not enough attention to the product/customer side of it.
The last couple of years I've been de-emphasizing software architecture as an interest, and have been paying much more attention to how product teams build successful products, what the patterns are, etc. I was lucky enough to work at Facebook for a while and got to see (and learn) a very successful working model of product development.
So, while I'm not saying that software architecture is not important (it is), also pay attention to the product/customer side: what choices (software, organizational, hiring, business) allow you to move fast and iterate, to release early and often, to run A/B tests, etc.
I think good software engineers are just as much product guys (and data guys) as they are software guys.
Facebook has been extremely successful commercially, but I think it's dangerous to read too much into how a unicorn like that develops its software, partly because there is survivorship bias at work here, and partly because the most important things Facebook has achieved have relatively little to do with software.
Facebook's golden goose is the network effect. Once it reached a critical mass of users, it was all but unstoppable. Arguably its most impressive technical feat was achieving enough scalability in its infrastructure that it could keep up with that many users and that much data. That was a remarkable success story by any standard, but while it surely has a software element, no doubt it involved much more than just code.
On the other hand, to a first approximation Facebook has had infinite resources for most of its existence. It operates in an online environment where problems can be fixed even in production. And it doesn't really do anything that is going to cause catastrophic, unfixable consequences if something does break for a while. That is a list of luxuries that few software development teams enjoy, and what works if you can make those assumptions won't necessarily be a good idea if you can't.
As you say, you do have to pay attention to other factors as well, but there are not many organisations that have as much room to manoeuvre on the software side as Facebook does.
The fallacy is when it turns out that there are lots of other companies doing the same thing, but those companies aren't successful. This is the problem with all those blog posts examining the daily habits of highly successful people (like, they wake up early, or, they eat energy bars to save time, etc). Clearly, just doing these things won't make you a billionaire.
For what I wrote, I believe this doesn't apply, because of the stark difference between the culture I experienced at FB and other companies. (Admittedly, this is not a stat.sign. sample.) So in my experience it's _not_ true that other companies are doing the same thing...
Having said that, you're right that FB is a one in a million company, and probably nobody reading this will be the next M.Zuckerberg... But still, if you want to be the best in your domain (eg. best todo app), I strongly think these are good patterns to follow.
Well, there is no shortage of web startups trying to be all agile and fast-moving that still fail, and among the ones that survive long enough to become established, there seem to be plenty of problems with reliability and security issues that better software design might have avoided.
Facebook seem to have significant problems quite often too. I've seen teams redistribute their planned Facebook spending for entire ad campaigns across other channels, because the Facebook UI for setting up the ads was so broken on that day that it was impossible to run the intended FB campaign, or because FB's approval system for promotional content rejected something for obviously incorrect reasons. In one case, FB made a rookie mistake in their payment processing that stopped everything to do with Facebook ads dead for that business until the problem could be resolved, which itself was only possible because of some personal contacts who happened to work at Facebook and could escalate the issue internally.
The thing is, if you're Facebook, you can survive repeatedly causing this sort of hassle for the people paying your bills, because you're big enough that they'll probably come back and try again another time anyway. It's still x% of your potential revenue that you're throwing away, but you don't need that revenue to remain a viable business. However, if you're almost anyone other than Facebook, those kinds of quality control issues will damage your reputation and ultimately sink your business if they become serious enough.
Arguably that's the property of good architecture. Standardizing your entire codebase also works, but only as long as those initial standards stay smart (1), and even then, that relies on discipline, tools, and incentives that are typically not all in place.
(1) A standard might be "all webservers will be written in Java 6". The benefits of choosing that standard tend to be front-loaded with a gradual decline into net-negative with no affordable path to a better standard.
Well documented and tested interfaces of a component make that component easy to replace or rewrite if needed.
Nearly anytime I have worked on or been involved with a project where I or someone else tried to design specifically for a vague future requirement that wasn't entirely clear or guaranteed, it didn't work out: either the requirement never happened, or was so materially different that what we wrote was wrong. As a result, the code we wrote went unused or worse, just got in the way and ended up as unnecessary technical debt.
I'd say solve the right problem well, without spending time on requirements you don't yet have. These would include a large scalability requirement and over-modularization, like breaking out 10 "micro services."
Having been a contractor at many companies Im pretty sick and saddened of seeing how common it is for every layer of the product to be implemented as a global state Singleton.
At least on a high level . It would help a lot of people here .
It's a bit like eating well and working out. We all know we should do it, but most people don't actually eat well and work out. Then, when you see somebody who does it and looks great, you ask them, "What's your secret?". But it's not a secret, it's just that most people don't do it, because it's hard :)
One story: When I was at FB, I happen to know that a team of size S conducted X experiments in 6 months (I can't disclose the number). As it happens, I have worked in similar size teams in other companies, and there the number was ~X/20, and sometimes 0. It don't matter how good your software architecture is, if you're trying out just 1 thing instead of 20 things... I call this velocity.
Another good story: many semi-successful companies end up in a place where there is an initial product/software which gets them a lot of growth, and then ~5 years into the startup, the bigger, more mature, bigger team decided that the old legacy code is holding them back, and they're going to REWRITE IT. Maybe in some fancy new language, or a fancy new architecture like micro-services. The estimate is 6-9 months to get to first light. But it will probably end up taking 3-5 years, because the legacy had a lot of fine-tuning in it, and it turns out many of the problems are hard to fix, moving the production to a new thing is REALLY HARD, and all those fancy new technologies are actually far from perfect, plus the current team doesn't have a lot of experience with it. Compare to this what Facebook did with its PHP codebase: at some point it became a bottleneck (the crappy language and the runtime speed), but there was a never a from-the-ground-rewrite. Instead they ended up writing several iterations of better runtimes, and since they were changing the runtime anyway, they "fixed up" the language (while keeping it mostly backward compatible). The new language is called Hack and the new runtime is called HHVM. The cool thing is, in all this time, there wasn't a rewrite, so they were able to keep shipping new features on Facebook, run A/B tests, iterate on the product. Compare this to one of the companies I worked at, where they did a rewrite , and now customers have to chose between the old and a new thing, it's not transparent, because <software issues>. There's a book about Hack/HHVM, one of the Facebook guys wrote it, iirc the first chapter is about this whole story. See this blog post for links: http://bytepawn.com/hack-hhvm-second-system-effect.html
In general, the principles I've seen to work really well:
- write good code, but don't do big rewrites
- cont. delivery: always be shipping to master and production in small increments (don't have big git branches that aren't in production, at the end it will be scary to merge it and put it into production)
- cont. integration: have tests and run them on every commit (you probably need some testing nazis to enforce this...)
- if you (your team) can't write a good monolith, you (your team) also can't write a good MSA
- invest heavily in linting and other automated ways to catch and conform code when it is committed
- programming language doesn't matter that much, just pick one for each domain (eg. web, mobile, etc), and stop thinking about it, and don't let people waste their time on arguing over it too much; instead invest heavily in tooling that supports all the other aspects (like code reviews, perf, experimentation, etc)
- 1 other person should review and okay the code before it goes into production
- aggressivley remove obstacles from shipping stuff to production
- make it easy to run experiments, and run a lot of experiments
- don't hire people who just want to write code, or think their job stops there
Exactly! FB has the most productivity per developer of any company I've seen, and this efficiency is a direct result of management systematically lowering barriers to writing code and investing in developer productivity tools. Other companies have elaborate rules and procedure and committees and style guides built around stopping code. Facebook encourages writing code.
(And before you say: no, the codebase doesn't take a quality hit. Turns out that this ceremony around writing code is just unnecessary, contrary to population opinion.)
Which part of the codebase? Facebook's back-end systems do a very impressive job given the scale involved. On the other hand, its front-end systems appear to be mediocre in most important respects. A business without Facebook's advantages that wrote a UI as slow and buggy as Facebook's main or advertiser UIs often are could be in serious trouble.
But I would say software architecture matters more for infrastructure like storage systems, operating systems, and programming languages than it does for products. I think a lot of the literature on software architecture is about those domains.
Those things are harder to do iteratively. And language choice matters more in those domains.
Programming is a huge field now, and a lot of choices are domain-specific. Including how much you should care about architecture and programming languages.
I used to think that spending too much time on planning is a waste of time , but then I noticed that teams at Facebook also spend a lot of time at the beginning of each half (6 mo) doing planning. However, I wasn't able to pick up tricks or patterns that made FB good at this, other than obvious/useless things like all the people making the plans were really smart / domain experts. I did notice that the plans/planning process was very lose, there was no methodology, often the outcome was a bunch of bullet points. And they were very conscious about the value of the plan: they knew that if things go well and they move quickly, probably a lot of things will change in 3-4 months, and they will diverge from the plan. The value is thinking things through, agreeing on goals, spinning up teams, building things (eg. dashboards) to track progress, etc.
Another story about the value of planning: at another company we hired a manager, and one of the things this guy did was introduce Capacity Planning. At the core it's a simple thing: make a big matrix, where the rows are the projects people want to work on, and the columns are the available people/teams. Collect all the projects, and make people put down their estimates for each (requires X time from team Y, etc). The value is, it's a very explicit way to show what people want to work on, obviously it will be more than you can actually do. So then you can make a very explicit choice about what you're doing and what you're not doing in the next X months. Back when we did this, there was an interesting insight the first time around: since we could count man days, we learned we're spending iirc ~70% of our time keeping the lights on (infra, etc), and only ~30% working on new things.
But despite this insight, I thought it's a waste of time, it's too slow/formal/sluggish. But then I saw FB also spends significant time on planning (although it's much less formal, because they can get away with it, for a long list of reasons), so I thought okay, spending time planning is probably the way to go.
Then I went to a company which was suffering from a focus problem. Every 2 weeks they'd have a prioritization meeting, where projects would get re-prioritized, which leads to obvious problems. Here I realized, for this problem, Capacity Planning would be a good thing, because one of the things it gets you is a long(er) term commitment to certain projects.
Overall, my takeaway is to spend 10-20% of time on planning and related meta-activities. Be conscious that the value is in thinking things through etc, and the actual path may diverge, but that's okay. Depending on the culture, there are a number of things that planning gives you, eg. a commitment to work on X for Y time. Make metrics and dashboards to track things, but don't overdo it.
ya often times they are women too
And tomorrow when inclusion for neurodiversity/autism becomes the next social cause, you can talk about how you value diversity SO MUCH, all the while telling them to "get a clue" when their "diversity" manifests itself.
Even besides that, this extremist view of "if saying X offends Y, we must ban saying X" is bullshit. If you are offended by the use of gendered pronouns as a 0.1% who does not fall into male/female, tough shit. Its not used out of some hatred towards non-binary people, its simply a matter of convenience. You are talking about "inclusivity", but arguing over which goddamn pronoun to use.. there are real problems to solve re: inclusivity.
Ands its great that tiptoeing around gendered pronouns is easy for you, it might not be so easy for people who don't speak English well. I will be "inclusive" of them by assuming good intentions rather than assuming they are trans-hating assholes simply because they used a "he" instead of a "they".
Anyways, great job fighting for inclusion via nitpicking pronouns. Rosa Parks would be pumped knowing all it takes to bring about racial equality is to get the slaveowners to stop saying the N-word.
That's a name. Nobody's going to think you're only talking about men named 'Guy'. Context is king (ooor queen!) - it can certainly be either alone, or mixed.
From the abstract: "Complexity is the single major difficulty in the successful development
of large-scale software systems. Following Brooks we distinguish
accidental from essential difficulty, but disagree with his premise that
most complexity remaining in contemporary systems is essential."
I would have expected the authors of a paper that makes such revolutionary and sweeping claims to have more of a trail.
But of course, you have to judge a paper by its content.
The reason I stopped is because they quote heavily from a well respected book (often described as an updated SICP) that I am studying now, "Concepts, Techniques and Models of Computer Programming", and my interpretation of what the book says is at odds from how they apply it in the paper.
For instance, in the book the difference between a formal specification and an informal one is not in precision, but in that the formal one uses a mathematical language. However, in the paper it says that a formal specification is the same as formal requirements (synthesised by the engineer), which are different from informal ones from the user. These definitions can't both be right.
They also claim that in the ideal world control (basically order) can be entirely omitted. But what if from the user's informal requirements we must deduct that there are events that the user expects to be ordered?
The paper also claims that concurrency is not relevant in an ideal world, given that all operations are instantaneous. This would be impossible if there indeed there would be essential control, given that parallelism in the world is inescapable.
At this point I was convinced that I was wasting my time, and now I know why this paper hasn't had any impact in the mainstream, as one commentator has wondered.
It is advocating a particular architecture. But that architecture is essentially LAMP, as far as I can tell. It's what we ALREADY do!!!
From the paper:
FRP is currently a purely hypothetical approach to system architecture
that has not in any way been proven in practice. It is however based firmly
on principles from other areas (the relational model, functional and logic
programming) which have been widely proven.
In FRP all essential state takes the form of relations, and the essential logic is expressed using relational algebra extended with (pure) user-
defined  functions.
 By user-defined we mean specific to this particular FRP system (as opposed to pre-
provided by an underlying infrastructure)
- All essential state takes the form of relations This is a database. (SQL deviates from the relational model, but I don't view that as important here. An SQL database stores everything as relations.)
- Logic is expressed using pure user-defined functions This is PHP / CGI / FastCGI. PHP is imperative, but the entire program is a pure function, because the request state is cleared between every request.
What am I missing? I'm being totally serious -- this is what I got out of it when I read it 5 years ago.
You can quibble with the details of PHP or Rails not being pure functions, but I believe what's important is the architecture, not how the source code looks. The essential state is in the database, in the form of relation. Accidental state is thrown away.
TL;DR -- PHP/MySQL is functional and relational.
"Logic is expressed using pure user-defined functions This is PHP / CGI / FastCGI. PHP is imperative, but the entire program is a pure function, because the request state is cleared between every request." - the entire program is not a pure function as seen by clients; to be functional, sending an identical request would always have to display the same information, but it won't if the page was updated between requests. Functional means that a given input (the request in this case) would always return the same output (page), just like a function in the mathematical sense would.
"PHP/MySQL is functional and relational." In a behavioural view of "functional", what matters is whether an operation always produces the same output for a given input; this is certainly possible with imperative languages, with some discipline. We can't apply a definitional view in this case, because PHP is multiparadign, supporting both stateful and functional programming , so you can't write operations that are functional by definition, you have to know what your are doing.
BTW, I haven't read the paper yet, I guess it will change my views after I do
I should have said stateless Python front ends. Not even CGI/FastCGI, but just plain HTTP front ends. Many large websites use this architecture (YouTube, Instagram, etc.)
An example of something that doesn't follow the architecture is a stateful node.js or Go server.
I would say that the typical web architecture is similar to what they are talking about, with the benefit of existence :)
But you're right in the sense that they are trying to be more strict, starting on page 50:
- benefits for state -- avoid useless accidental state. This is the same philosophy behind SQL.
- benefits for control -- They are being more strict here, but I think it is missing a lot, because sometimes you need control flow.
- benefits for code volume -- I would need to see a real system to evaluate this claim. It's not fair to compare systems that exist with ones that don't :)
- benefits for data abstraction -- SQL agrees here. You don't abstract data. People sometimes make this mistake in their OOP languages, but's incidental.
 Not to be confused with functional reactive programming [EH97] which does in fact have some similarities to this approach, but has no intrinsic focus on relations or the relational model
Paper about Akamai - "Nuggets in content delivery"
Also, acolyers blog "the morning paper". He reviews and explains one paper every day. its great because you get a sense of whats current.
End-to-End Arguments in System Design,
I think Rich Hickey (of Clojure) makes lovely points about application system design. I know you are looking for papers, but Rich's talks have influenced me greatly of late.
Edit: added link to slides
Clean Architecture (https://www.amazon.com/dp/0134494164) - Uncle Bob has been talking about this for years, but it's a really good exploration of how to build systems.
Turning the Database Inside Out is a talk that really made Apache Kafka and stream processing click with me. https://www.confluent.io/blog/turning-the-database-inside-ou...
Going through and implementing the Raft algorithm was also very formative for me - it's the first time I really started to grok the challenges with building large scale distributed systems. https://raft.github.io/
And to add a paper to the list to not totally go off topic - Birrell's paper "An Introduction to Programming with Threads" I thought was a very useful read - in part for the historical context, but he also breaks down many fundamental concepts that will look very familiar to the modern reader (it was written nearly 30 years ago).
It's also very readable. https://birrell.org/andrew/papers/035-Threads.pdf
Developer life is like googlefu xtreme edition
CS papers work best when dealing with very specific, nitty gritty areas that require research after defining a hypothesis or problem to solve.
'Software architecture' is fairly broad and papers on those topics would read more like sociology papers than CS papers (e.g. how do people write code? here are some examples, here are some general patterns people like etc).
Looking back on reading that sort of stuff, it was nice but I very rarely apply it in day to day software development compared to readings that are more niche to the problem I'm solving that week.
At the bottom of page 3 it says that on UNIX V7 they added an unbuffered (-u) option to cat and then removed it on V8. Does anybody know why they removed it?
I checked out the man for the GNU version and it says
Not only was it the paper that first put into written word the CAP theorem, but it also prescribes design principles for scaling out applications with graceful degradation and "orthogonal decomposition" (i.e. service-oriented architectures).
It's also a fairly short and easy read at only 4 pages, and I'd definitely recommend everyone interested in practical modern distributed systems to take a look.
> Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.
"An incomplete list of classic papers every Software Architect should read" https://blog.valbonne-consulting.com/2014/06/09/an-incomplet...
"Introduction: A programming system called LISP (for LISt Processor) has been developed for the IBM 704 computer by the Artificial Intelligence group at M.I.T. The system was
designed to facilitate experiments with a proposed system
called the Advice Taker, whereby a machine could be
instructed to handle declarative as well as imperative
sentences and could exhibit "common sense" in carrying
out its instructions."
The premise of the talk is that while software engineering goals & processes are well-defined and well-understood, the same does not apply for software design. He sets up a framework of 'purposes' and 'concepts', which in an ideal design should be a 1:1 mapping. He points out many examples of 'design smells', where this 1:1 mapping is violated, for whatever reason (engineering reasons, product management failures, etc.)
Definitely useful not just to developers, but anyone involved in writing software/affected by the process of writing software.
To really appreciate how much thought went into the design of this course, I recommend reading "A New Approach to Teaching Programming" 
As for the Concepts and Purposes you refer to, there is more on this in    in the MIT 6.170
I found this very intriguing:
Towards a Theory of Conceptual Design for Software
Not sure the paper in itself aged so well.
 - https://static.googleusercontent.com/media/research.google.c...
 John Backus 1979. "Can programming be liberated from the von Neumann style? A functional style and its algebra of programs." (a famous case for functional programming)
 Victor R. Basili and Albert J. Turner. 1975 "Iterative enhancement: A practical technique for software development" (Agile in 1975? Actually it seems iterative development can be traced back to the 50s)
 Andrew D. Birrell and Bruce Jay Nelson 1984 "Implementing remote procedure calls" (The idea of making a distributed language operation similar to a local language operation)
 Edsger W. Dijkstra. "Go To statement considered harmful" (the mother of all those "considered harmful" titles)
 Carl Hewitt 1977. "Viewing control structures as patterns of passing messages". (seminal study of asynchronous messaging)
 Carl Hewitt, Peter Bishop, and Richard Steiger 1973. "A universal modular ACTOR formalism for artificial intelligence" (idem)
 Charles Antony Richard Hoare 1974. "Monitors: An operating system structuring concept". (an early refinement of a very influential concurrency concept)
 Charles Antony Richard Hoare 1979. Communicating sequential processes (an early presentation of invariants for state machines)
 Gilles Kahn and David B. MacQueen 1977. "Coroutines and networks of parallel processes" (non preemptive concurrency, an alternative to threads for collaborative concurrency)
 Robert A. Kowalski 1979. "Algorithm = logic + control". (A logic program can be “read” in two ways: either as a set of logical axioms (the what) or as a set of commands (the how); the idea behind relational programming, like in prolog)
 Nancy Leveson and Clark S. Turner 1993. "An investigation of the Therac-25 accidents." (the mother of all race conditions papers? Focuses the mind about the dangers of interleavings)
 Henry Lieberman. 1986 "Using prototypical objects to implement shared behavior in object- oriented systems." (influenced js? Understanding delegation)
 George A. Miller. 1956 "The magical number seven, plus or minus two: Some limits on our capacity for processing information." (this might be pushing it, but maybe the original case for managing complexity)
 Chris Okasaki. 1998 "Purely Functional Data Structures." (how to design functional algorithms)
 John C. Reynolds 1975. "User-defined types and procedural data structures as complementary approaches to data abstraction." (the fundamental abstract data types concept)
The current trends were mostly blog driven. (Notable exception being the design approach to distributed systems and databases, which has been noted by many in this thread. Brewer's CAP theorem, etc., of course, were hugely influential.)
In terms of "CS" papers, there were a few that argued against the then prevailing canon such as object orientation. In terms of theory, the theoretical foundation of much of the current approach can be found (surprisingly) in papers from the 70s and 80s, e.g. Hoare's CSP (which influenced Go) is circa '85.
"Manifesto for Agile Software Development" https://en.wikipedia.org/wiki/Agile_software_development#The...
"Catalog of Patterns of Enterprise Application Architecture"
Fowler > Publications ("Refactoring ",)
"Design Patterns: Elements of Reusable Object-Oriented Software" (GoF book)
## Distributed Systems
CORBA > Problems and Criticism (monolithic standards, oversimplification,):
Bulk Synchronous Parallel: https://en.wikipedia.org/wiki/Bulk_synchronous_parallel
Raft: https://en.wikipedia.org/wiki/Raft_(computer_science) #Safety
CAP theorem: https://en.wikipedia.org/wiki/CAP_theorem
1. actor model of computation (https://arxiv.org/abs/1008.1459)
2. map reduce (https://research.google.com/archive/mapreduce.html)
(On second thought it helps to have Armstong's thesis on Erlang as a bridge between them, but I can't find an easily downloadable link at the moment.)