Hacker News new | past | comments | ask | show | jobs | submit login
Facebook’s code quality problem (2015) (darkcoding.net)
502 points by setra on March 13, 2017 | hide | past | favorite | 226 comments



I look at programming as all design, whether deliberate or not. Martin Fowler drove the point home for me (https://www.martinfowler.com/articles/newMethodology.html).

He said that some people drew inspiration from construction, where there are designers and builders. One or two highly paid architects draw up the plans. Then you can hire a bunch of cheap labor to build it, say a bridge. This belief leads to a dichotomy in software companies, where one person is the "architect" and others are just "regular" programmers --- often outsourced to the lowest bidder.

From Jack Reeves he cites the epiphany "that in fact the source code is a design document and that the construction phase is actually the use of the compiler and linker." There is little repetitive, mindless work in programming, because as most of you know, "anything that you can treat as construction can and should be automated." Therefore, "In software all the effort is design . . ."


I feel your sentiment, and usually oppose to the construction-driven approach you describe, but it is really easy to play devil's advocate here. I mean, let's just be realistic. There's nothing sacred in programming.

The fault in your argument is that there's no design/construction dichotomy, but rather a gradient: everything can be viewed upon as a design from close enough distance, and everything can be a construction looking from the high above. For business people all the programming is just very, very low-level construction, and there's no design to it. Damn, even hiring programmers and creating companies is just a construction stage to them.

And from the practical standpoint, the truth is that perceiving a lot of programming "just as a construction" just works. For the better or worse. It is cheaper and usually reliable enough to design the system in a way that a lot of the components would be plugins not depending one on the other that can be outsourced to anyone. Even if some fail, the system continues to work while you hire new people to rewrite some components. Sure, maybe it would be more efficient and "cool" to hire a really smart guy (or do it yourself, if you believe you can) to write it as really optimized monolith system, but if this isn't required, it is usually less risky to build something not so sophisticated, and improve it if need be. It is usually ok, that some parts of the system may fail. Fuck it, let's not lie to ourselves: there always are some parts of the system that fail, whether you think it's "acceptable" or not, let's just take it into account.


> There is little repetitive, mindless work in programming, because as most of you know, "anything that you can treat as construction can and should be automated."

Sure, but there is a cost to automation. We all know the programmers who will attempt to automate God if left to their own devices. cough At some point there is a tradeoff between automating something, and sending an email to an intelligent human being who maybe isn't god level at software architecture but who can fill out some organizational context with pidgen code.

I think the idea that the only code that needs to get written is type of a decorator on a subtype Haskell kind of stuff is wrong. It's the same fallacy as people who say we just need physics and all of the natural sciences will shake out of it. We need Richard Stallmans and Terry Davis's to write that stuff for sure, and maybe they are the only True Programmers out there. But there's an awful lot of automation engineering that needs to get done too, and some of that really is just putHotDogIn(basket) (unless Debbie from accounting emails you to say no) and wget http://wordpress.com/dist.zip

Let me say it another way: there are and will always be gnarled programmers who think in koans and concoct the words we code in, but spread across that will be a thick layer of humans embedded in human systems mapping the minutia of the day in code. I daresay we'll all be doing it.

I suppose I agree that work is all design. But the code isn't the part that's being designed. The code is just the log format.


I read Richard Gabriel's "Worse Is Better" essay in my first month hired as a professional programmer, that was almost 12 years ago. Since that time the lessons of that essay have only increased in significance for me, with this FB story as additional proof.

Their back-end code might be bollocks, and I can certainly believe that judging by how sluggish their FB app feels on my phone, but the fact is that they've conquered the Internet (together with Google and a couple of other companies). It's a fact that I personally hate, but they're still winners in the end.


>I read Richard Gabriel's "Worse Is Better" essay

Fyi to help set the record straight... RG's "Worse Is Better" is not shorthand for "worse quality" is better than "better quality".

Richard Gabriel was trolling readers with the word "worse" for comical effect. His essay is: "simpler with less features" wins in the marketplace of ideas more than "complex". His observation isn't about "quality".

That said, there is another meme with the exact same 3 words called "worse is better" which does stand for "worse _quality_ is better than better _quality_" but that's not related to RG. It's just the more common interpretation of those 3 words -- especially for 99% of people that are not familiar with RG's original essay.

tldr: Richard Gabriel trolled people so hard that "worse is better" has taken on a life of its own divorced from his original thesis


>Richard Gabriel was trolling readers with the word "worse" for comical effect. His essay is: "simpler with less features" wins in the marketplace of ideas more than "complex". His observation isn't about "quality".

Only it IS about quality.

The simpler and with less features" examples he gave were examples of broken implementations, that will succesful in getting "out of the door" and having viral acceptance, have resulted on lots of issues, lost man-months, and piles upon piles of hacks.

That's regardless of what Gabriels himself thought he was saying.

And while "getting out of the door" pronto might be good for a product, it's not good for infrastructure software.


> Richard Gabriel was trolling readers with the word "worse" for comical effect. His essay is: "simpler with less features" wins in the marketplace of ideas more than "complex". His observation isn't about "quality".

I have to say that reading the thing again I don't really think that's accurate, since he also talks about sacrificing correctness for the sake of ease of implementation.


>I don't really think that's accurate, since he also talks about sacrificing correctness

RG's use of "correct" doesn't mean that it's ok for a program to return "2+2=5" which is "better" than "2+2=4".

To refer back to the original essay[1], RG's use of words like "correct" and the "Right Thing" requires careful reading because he's deliberately abusing those terms to burnish the opposing side's philosophy. He's playing with the adjectives "worse/correct/right" to present both sides of the New Jersey vs MIT philosophy.

RG says that MIT approach of "backing out of the system routine" is the "correct" way for argument's purposes. It doesn't mean it's The Universally True Correct Way. RG doesn't state it explicitly but the The New Jersey approach of adding user code to test for a failure is also "correct". There are 2 competing semantics of "correct". His observation is that the one with the simpler implementation will spread.

Maybe another analogous example would be clearer. Suppose a C++ programmer would label the "~destructors()" as the "correct" approach to clean up code. No extra user code has to be written to do housekeeping after each object goes out of scope.

However, a C programmer disagrees and "destructors/constructors" are complicated with "spooky action at a distance". They'd rather write explicit wrapper functions called "cleanup()".

Both the C and C++ camps are disagreeing about which "correctness" is better. In this case, RG would award the "correct" label to the C++ philosophy for the sake of a creating a provocative essay.

(It's not an exact analogy because the more complicated C++ programming has survived along side the simpler C for decades.)

Whether one prefers MIT over New Jersey, it doesn't change the fact that RG is not saying that "low quality is better than high quality".

[1] https://www.dreamsongs.com/RiseOfWorseIsBetter.html


> RG's use of "correct" doesn't mean that it's ok for a program to return "2+2=5" which is "better" than "2+2=4".

But it means it's ok to occasionally return Null, instead of of a guaranteed 4.


I doubt even the most "get it out the door" shop in the world would want code that flubs the arithmetic, so I'm having a hard time seeing the distinction you want to argue for.


>I doubt even the most "get it out the door" shop in the world would want code that flubs the arithmetic,

You're interpreting my example literally instead of just seeing it as one example of "correctness" that is a binary TRUE or FALSE. My point is that RG is not writing about "correctness" in that binary sense.

>, so I'm having a hard time seeing the distinction"

The NJ approach of writing user code to check for an error is also correct. It's just a different approach to "correctness."

If you only see "correctness" in binary terms instead of competing abstraction levels of simplicity-vs-complexity, then yes, you will not see RG's distinction.


Let me get straight to the point I mean to make -- what is it that Facebook has done that is literally "incorrect" in the same way flawed math is?


>what is it that Facebook has done that is literally "incorrect"

I don't understand where the confusion in conversation happened. If I'm trying to dismiss literal "correctness" as a binary interpretation, why would I think it applies to Facebook?

To make it more explicit: I don't think OP (paganel) link from RG's "worse is better" to Facebook is relevant. He interpreted RG incorrectly or he misremembered what that essay was actually about since it's been 12 years.


You're making an implicit assumption in this comment; that Facebook is in an optimal position and every decision they made contributed to that optimal position in a positive way and so any change is for the worse.

But what if they aren't in an optimal position? What if having a higher quality code base all along would have let them dominate social media more and faster, and perhaps be more successful in areas they have stumbled in the past?

A lot of people (especially non-technical folks and inexperienced coders) seem to think that code quality comes at a cost and that a bad code base full of technical debt is just a side effect of fast iteration and the cost of being quick to market. I say that's crap. A higher quality codebase with less technical debt actually improves speed tremendously because it lets you iterate faster with fewer rewrites down the road and it will be more stable and result in fewer distractions from having to put out fires with unforeseen bugs making it into production.

A sloppy codebase with technical debt is much more costly on your ability to go to market and add features and iterate.


> But what if they aren't in an optimal position? What if having a higher quality code base all along would have let them dominate social media more and faster, and perhaps be more successful in areas they have stumbled in the past?

Well, I don't know. What if spending so much time on quality would have led to it never catching on? It is curious that the world is full of compromises like Unix, QWERTY, C, VHS, etc., all of which had arguably superior competitors that failed to catch on.


> What if spending so much time on quality

Your reply includes the very fallacy that I mention in my comment that you're replying to. Namely that better quality code requires more time. Again, I say, that's not really the case. If you write better code you will spend less time debugging, adding features and iterating in the future, and I would argue it takes less time to write better code once you're in the habit of it.

But of course any number of things are possible, they could have written better code and failed. They could have written even worse code than they did and failed. Who knows. My point wasn't "Facebook should have written better code and then Zuckerberg would have another $10 billion." My point was that the comment I was replying to was based on some assumptions that, for me, aren't true.

> It is curious that the world is full of compromises like Unix, QWERTY, C, VHS, etc., all of which had arguably superior competitors that failed to catch on.

But it's the arguably superior that matters here. For each of them there are circumstantial reasons why they beat out competitors, and it's usually not really because of the mythical "worse wins" baloney. In reality there's usually a number of reasons that combine together for a given result, and that's certainly the case with, for example, VHS vs. Beta. Costs of tapes and players, Sony's tight control of Beta, camcorders, etc. all combine together.

Anyway, we're way off on a tangent now. My point was it's a fallacy that writing worse code is a short cut. It's like driving badly vs. driving well. You're still going to the same place on the same roads.


I'm with you that deliberately writing shitty code does not really save time. The technical debt from cut corners comes back to bite you surprisingly quickly.

However, given a minimum bar of quality, where I've seen the biggest quality issues crop up are due to the layering of code over time written by different developers with views of the problem domain and different priorities. Solving these kinds of issues often requires major time investment in refactoring to align the new worldview with the old. Doing so successfully requires a clear vision that is harder and harder to coalesce as the code base gets more complex. Good architecture mitigates, but is only possible to the extent that the fundamental business logic allows.

Deciding when to refactor and realign older code and architecture is where the hard decisions are, not so much in today's coding choices.


> If you write better code you will spend less time debugging, adding features and iterating in the future, and I would argue it takes less time to write better code once you're in the habit of it.

Yes, that's probably true. But you'll spend more time now. Sometimes that might be time you don't have.


You're just reaffirming the fallacy.

You don't save time by skimping on quality, it's foolish to think otherwise.


I was asked for an estimate for a project I was working on. I said ~65 days. Now the deadline will give me around 40 days of work on it. I'll need to cut corners to make that, but I know in the end it won't be a finished product that we will be releasing and it will cost more time in the long run as I'll need to go back and redo parts of it. But we have a deadline now so I'll need to cut corners. The obvious way to do that is to focus on the most used parts of the application being done well, and finish / refactor the less used parts later.


Well, no, I don't agree. There are lots of opportunities to say "this really ought to be refactored... but whatever, it more or less works" when working on a typical application.


But it's not better, it is worse as clearly put forward by this article. ;)

We shouldn't attribute Facebook's success to poor quality code, unless we really believe that 429 people working that that slow sluggish app is productive. Do you?

Maybe Facebook would be in a better situation than they are if they had a firm foundation? What happens when a competitor comes along and knocks on their door? That mess of code becomes an anchor.

They succeeded not because "worse is better" but because ...they succeeded, maybe because they had the greatest social network in the world to expand within, or because of some other reason we don't know. But they didn't succeed because of worse is better, they succeeded in spite of it.


No one is actually arguing that worse is better. It's just a catchy title. A more accurate title would be "worse is faster than better, and being fast but mediocre is better than being slow and amazing."

Worse is better makes the observation that doing right thing slows you down. Often, an ok solution is much quicker to get finished than a great one. And it concludes that the projects that prioritize getting stuff done over code quality will suceeed.

Would facebook be as successful as it is today if it had taken longer to ship features, but maintained higher code quality? Who knows.

But the intuition in worse is better, which matches with my own experiences, is that making that trade off would have cost Facebook, as other faster moving projects closed the gap.


I agree with you that it's sometimes good to focus on getting things out the door fast instead of the code quality behind it. But surly there must be a point when the pendulum swings back and you should switch focus to quality. I would argue that Facebook have definitely reached the point where they own so much of the market and with their infinite resources they should be able to ship things with good code quality. Sooner or later the costs of supporting all that crappy software will make a dent even in Facebooks chest of gold.


They have written their own PHP interpreter haven't they? Thats a pretty good indication that they left some things too late. Saying that, if they have the resources where that sort of decision makes sense, then its probably not going to be too much of a dent in their pot of gold.


As you said, I just don't think anyone can know that alternate history.

It's very possible that a better architecture would have allowed them to move quicker and make more changes without breaking as much stuff. It's possible considering features before haphazardly implementing them would be beneficial overall.

Nobody can really know that, and to apply intuition to what is an extreme outlier seems off.


> Do you?

It's written between the article's lines, so to speak, but it seems like the bad code quality was generated in part by FB moving at a very fast pace. I think they wouldn't have been as successful as they are had they chosen a slower path with more "correct" code and architecture.

> What happens when a competitor comes along and knocks on their door?

They successfully managed to "eat" and integrate both Instagram and WahtsApp, both worthy contenders at the time, and they're now on their way to stealing Snap's lunch, so they've proved to be pretty resilient on this front. Not to forget how they successfully kept Google+ in check.


how would that success have been different if they had used different engineering practices?

a big part of their success was delivering results... if they hadn't quickly crapped everything out of the door, how would facebook look today?


Agility trumps code quality. Stability can be addressed downstream (which Facebook does with heavy testing and rollouts of changes). Code quality problems or not, Facebook has been more stable than its competitors (MySpace, Twitter, etc.) and faster at pushing out changes.


Is it more stable than those two?

I can't see the point in Twitter (neither could a number of my colleagues when it was mentioned at a dinner a couple of weeks ago, none of them used it).

MySpace was a genuine competitor, but allowing teenagers to style their own HTML made for some horrendous pages.

Facebook had a nice clean interface, as well as network effects. I was signed up to a similar service at the time - HighFive I think it was called, but the network effect was what snowballed with Facebook and led to its success. At the time I didn't see either one being better or more stable than the other.


"Exhibit C: Our site works when the engineers go on holiday"

Is not all so different from "Fewer patients die when heart surgeons are on vacation". Of course your site is going to be more reliable when nobody is changing it! You should be worried if reliable isn't the steady state, and it requires constant changes to stay up!


My immediate thinking was this as well: when they stop touching it, it's fine, which means that whatever bugs are checked in seem to be caught quickly. What's more, the periods of high reliability are during periods when you might expect there to be heavy load - assuming people don't check Facebook at work (har har).

What counts as an incident?

> Figure 1 includes data from an analysis of the timing of events severe enough to be considered an SLA (service-level agreement) violation. Each violation indicates an instance where our internal reliability goals were not met and caused an alert to be generated. Because our goals are strict most of these incidents are minor and not noticeable to users of the site.

So these are minor issues. The parent article is paraphrasing at best, and jumping to conclusions at worst. From Facebook:

> We believe this is not a result of carelessness on the part of people making changes but rather evidence that our infrastructure is largely self-healing in the face of non-human causes of errors such as machine failure.

They then list a number of sane strategies to mitigate this.

http://queue.acm.org/detail.cfm?id=2839461


I also took issue with this part of the article; it seems obvious to me that rolling out new features would directly cause reliability issues (and the only alternative, of not rolling out new features, is, obviously, a cure worse than the disease).


I am not sure that most Facebook users would actually agree about this.

But then their view is slanted towards UI changes. Performance improvements and other subtleties are features too.


Engineers should not brag about this, in fact it's the opposite of something we should brag about. "Our site doesn't go down when we're not here devising ways to break it"


I don't see this as bragging but as an reality/consequence of introducing change. We do a lot of things to minimise occurrence and impact of change but there always is(/will be?) that (corner-)case that break things. In other context,that is often a point against upgrading old systems that have been working and doing the job for a long time.


Yes, I expect the graph in Exhibit C is perfectly normal for any company.


It's hard to argue with the business sense of pushing out features quickly at the cost of code quality.

But I've learned that that there are two kinds of development teams:

(A) the teams that are "moving fast and breaking things" while "creating business value"

and

(B) the teams that are "only maintaining legacy code", "not getting anything done", "costing a lot of money" and "complaining too much" about code quality that team A wrote.

As an engineer, I've learned that it's less work and more rewarding to be on the (A) teams than on the (B) teams.


I like type (C) whose motto is "move fast and fix things" https://githubengineering.com/move-fast/



I like how even the URL has a "bug" (year 2106?).



Aw, it looked so much like a minor mistake with the Y-M-D date format considering it happened on 2016-january-28th!


Sure, shit happens to the best of us. It's inevitable.


I am type (D) don't move particularly fast, but produce decent code - when management isn't pushing me to be type (A).


The best is to work in companies where people maintain their own code and are responsible for fixing what they broke. The moment the split is like you said, A team will produce progressively crappier and harder to maintain code. The B team will have high turnover and require higher salaries to stay in.

It is bad organizational structure.


The difference is that Team A can leave and a project is delayed. If Team B all left the company would fold.


Funnily I've actually seen a Team B leave.

There was much panic and thinking it would all go horribly wrong, and indeed it was madness for the first week. Then the people who took over started to stabilise some of the worst parts of the system and it all sort of fell into place after that.

Fast forward a couple of months, the project was pretty much hands off and we all realised that keeping a whole team on that project had pretty much been a giant waste of money. They did nothing but create fires for themselves to fix. The Team B's exodus was actually a really great thing for the company.

I suppose the thing to learn here is, that you need redundancy in your teams. If we hadn't had other devs to step in and take over, it would have been a bit different. Still, I'm not convinced that a handful of contractors couldn't have been brought in to the same result, just more expensive.


How big was team B, and did they leave through a random event (2 guys got hired at Google, and 1 guy found a new job), or some kind of mass "lets show them" quitting event?


Both are necessary. If Team A leaves, and no more business value is produced, and the devs only care about beautiful code, the company will fail too.

Both kinds of teams are needed. Or even better, teams should not be divided like this, everyone should do their shares of both breaking things, and creating value and cleaning the mess and scaling the things.


I've learned from working in one of the largest companies that there are two kinds of development teams: (1) the teams that mostly depend on the products of other teams; (2) the teams that create products that others depend on. And when the upper management starts rushing the deadlines sacrificing quality for the deliverables guess which team becomes (A) and which team becomes (B)? I currently work in a (B) team, and I try as hard as I can to "move fast and break things" but there is simply no way of doing that because every significant code change that I make requires changing the code we don't own. Since the code that we don't own is written in such scrappy way that you better don't touch it or else it breaks, our team has to resort to only minor insignificant code changes.

>>As an engineer, I've learned that it's less work and more rewarding to be on the (A) teams than on the (B) teams. It is certainly more rewarding to be on the (A) team, but do we always have a choice?


Yes, thank you. I've been trying to write out that very idea for so long. It comes down to, the heroes are the people making the new stuff, and the janitors are the ones stuck with all the bugs and tech debt of the new stuff. Guess who gets promoted.

When you find yourself in the janitor role, time to move on, because your employer doesn't think much of you.


Yes and that's the perversity of the incentives in the industry. Automation and coding are such high value activities that even the smallest, shittiest piece of code can produce a lot of $$$ that can make up for its crappiness and its maintenance costs.


That's because team "B" is actually an operations team and, believe me, you don't get into ops for "fun". :)


Some of FB's ~18,000 Obj-C header files from a post linked in the article: [1]

* FBFeedAwesomizerProfileListCardViewControllerListenerAnnouncer.h

* FBBoostedComponentCreateInputDataCreativeObjectStorySpecLinkDataCallToActionValue.h

* FBEventUpdateNotificationSubscriptionLevelMutationOptimisticPayloadFactoryProtocol-Protocol.h

* FBGroupUpdateRequestToJoinSubscriptionLevelMutationOptimisticPayloadFactoryProtocol-Protocol.h

* FBMemReactionAcornSportsContentSettingsSetShouldNotPushNotificationsResponsePayloadBuilder.h

* FBProfileSetEventsCalendarSubscriptionStatusInputDataContextEventActionHistory.h

* FBReactionUnitUserSettingsDisableUnitTypeMutationOptimisticPayloadFactoryProtocol-Protocol.h

oh my god

[1]: http://quellish.tumblr.com/post/126712999812/how-on-earth-th...


Those read like classes generated by tools. Or maybe I really don't want to think humans are capable of such names.


These are completely credible class names, just ask any Enterprise Java developer. I particularly enjoy FactoryProtocol-Protocol. Who would be satisfied with just a FactoryProtocol?


Yea, I was going to say, Java's almost got them beat with SimpleBeanFactoryAwareAspectInstanceFactory[1]

1: http://docs.spring.io/spring/docs/2.5.x/javadoc-api/org/spri...


Invoking the Spring framework here is just cheating -- they have dozens of classes like that! :-)

Just the other day I was dealing with Spring's declarative transactional support. Here's a fun read: http://docs.spring.io/spring-framework/docs/4.2.x/spring-fra...


Well, the humans could be tools...


They could be. The effective coding standards for iOS are to make class names stupidly long.


Makes me wonder how much dead code makes it into that abnormally large binary.


long names are good for both searchability and documentation purposes, and not a problem with decent tooling/autocomplete

If the names were more abstract, as in some java code, then it would be somewhat dubious in value. Most of these seem very precise however.


It is a problem. The class names don't fit in the class browser window, which means you have constantly be resizing it to read the end of the class name.


The problem isn't with the names. The problem is with overcomplicated architecture that leads to names like that.


It is ironic that a codebase with so many classes is likely to occur precisely because of dogmatic adherence to those "best practices" which were originally intended to improve code quality --- aggressive refactoring/modularisation being a likely culprit. What I think they really need is not more design, not more architecture, not more of anything but KISS and YAGNI.


The thing that having an overall architecture buys you is conceptual clarity. That lets someone much more easily understand what is going on. You need to have a minimal set of ideas that are well thought out and explained.

Just saying "YAGNI" and "KISS" doesn't actually make the resulting code simpler. Why? You do have some things to build and refusing to lay out a plan for what that is going to be doesn't stop that. It just means that the complexity doesn't have a clear narrative


It seems to me your point about KISS and the author's observations are compatible.

Facebook is successful for reasons that have almost nothing to do with their code, however, and as long as that's true there isn't a reason for them to change how they develop their software. In that sense, code quality doesn't actually matter (at least, until it does).


I spent a few months at Facebook and left because of that. Not necessarily because of poor code quality but because of all the issues it was causing, and a culture of turning a blind eye to (or worse still, being proud of!) those issues. It's obviously true that quality of their code does not affect their market position, but it sure affects who they hire and who they retain.


That may be true. But in the end, does it matter?

Until we see someone taking market share away, I'd argue it doesn't. Facebook still retains top tier people regardless of any perceived quality problems.

Sorry you left. I just started here, and I do agree that what I feel about code quality isn't generally shared here. But as I get older, I start realizing that in the end, as long as things work relatively well at our scale and in our problem domain (we're not medical device software, we can fail) alls well.


I did some contract work for FB around when this article was written. I don't think that nobody cares about clean code. It's more like... results will vary widely based on who reviews you. And the governing philosophy is more focused on the automated checks and balances that try to keep you from doing harm to the overall system. While human code reviews were pretty lax, the machines would slap you around for all sorts of things, none of which would force you to write beautiful code.

Personally, I like for humans to be pickier than they seemed to be at Facebook, but it really opened my eyes to the value of what I think of as a "keep your garbage in your own trash can" approach to quick-and-dirty coding. As long as you didn't spill your trash into the hallway, the machines were satisfied, and your reviewer didn't find your work to be _insane_, your work shipped.

[EDIT: And I find that this can often be a wonderfully pragmatic way to work if you have the tools to keep you from doing harm.]


As I got older I realized my threshold for tolerating bullshit is getting lower and lower. At the end of the day as an engineer, I'm as much (if not more) motivated by the quality of work I'm doing as an individual, team, and company as I am by purely monetary rewards. I could trade this justified pride for a feeling that I'm building something good and important, but FB is probably a net zero on both counts.


It matters because profits are lower. 400 devs on an iOS app is not free.


But FB is a public company. If shareholders thought that was the case, they could vote with their dollars. Looking at the stock price, they seem to have no problem with the way FB is hiring engineers. The market is okay with it.

There's no reasonable way to say FB should have less engineers without being a high level director of the company.


> There's no reasonable way to say FB should have less engineers without being a high level director of the company

I'm going to assume you don't literally mean only company directors can levy logical criticisms of a company.

Consider the fact that the FB app itself was built with a very small team to begin with and only once it became 80-90% feature-complete did the exponential growth in engineers suddenly become "needed".

I can understand why Microsoft needs tons of engineers. OSes are complicated and have lots of moving parts (drivers, file systems, kernel, etc) that are all absolutely required and must be constantly updated, even without bringing their other products into it. But Facebook? At its core, it's a single webapp that serves simple relatively static pages. Yes, at web scale, so it's not entirely trivial.

My explanation is that Google and Facebook face a similar dilemma: most of their revenue comes from a single, relatively simple core product. To really optimize that core product, they might need a couple of dozen highly skilled engineers, along with a lot of equipment to scale it.

But that's not why they hire thousands. It's because they're both constantly trying to branch out for growth. In terms of revenue it is fair to say that strategy has been a failure for Google and except for a few of the "bells and whistles" FB has added like IM, probably for them as well. This problem is caused by the way that tech companies are primarily valued by perceived growth potential rather than actual earnings. They can get away with hiring lots of useless people because capital costs in this industry are fairly low.


I would like to argue that, in most of these cases, it's not "code quality" that is at fault, but "design quality" (before coding) - which is often absent entirely.

The problem with design, in software, is not that most people forget to do it. It's that they never learn to do it. It always comes back to bite you.

I don't want to start a discussion on design, and how most people mess it up because of lack of skill or experience therein. But hacker culture seems to be allergic to design, and hacker culture seems to be what everybody strives for these days.


I think you hit the nail on the head. My motto for developing code is "make it simple as possible, but no simpler". Whoever, created a design that used 18,000 classes did not follow that rule.


I believe "our site works when the engineers go on holiday" thing is not fair. Of course that application is less stable when it's actively modified. There's no way to make it more reliable on weekdays than on weekends except for stopping the development altogether or maybe deploying on weekends.


No way at all? FB has a lot of servers and a lot of users. They have opportunities for quality practices that few other organizations get.

Here's a simple example:

Split the user population into 365* groups. Assign them to days of the year. Test new code only the group whose day has come up. Follow them until they stop having problems. Now you can deploy that to a month's worth of groups. All good? Deploy to everyone.

Yes, that means that you can't have more than 365 changes in simultaneous development. Tough.

*Yes, yes, leap years. Take a day off from deploying.


They graph the number of incidents. Even if a bad release impact only 0.3% of the user base, it's still an incident.

They have to investigate it, revert or fix the bad code and start the deployment process again.


Ack. Also, the system introduces a new problem: If you are deploying on weekend, most devs are not around to help solving the problem. Thus, the outages would be longer.


That is usually how all companies at that scale release code (at least the ones that I know of).

Only that it's not 365 groups, because at the size of FB that would be several million people.


They might already be doing A/B testing (or in your example A1/A2/.../A365). It's not clear what's the definition of an "incident". At my workplace, even if a bad code push affects say 0.1% of users, it would be classified as an incident.


That's deploying on weekends, isn't it?


What other engineering discipline would say "there's no way to improve our reliability except stopping work altogether"? There's always a way to improve reliability. Arguably, with formal verification, you could ensure large parts of your system are perfectly reliable given simple assumptions.

The problem isn't that it's impossible -- it's that it's more expensive than just hiring one more engineer to keep papering over the problems.


What other engineering discipline would say "there's no way to improve our reliability except stopping work altogether"?

All of them. Are your roads more reliable when they're constantly being changed or when they are just being maintained? Is NASA achieving its reliability by constantly changing the designs of their ships, or by reusing the same design over and over?

Arguably, with formal verification, you could ensure large parts of your system are perfectly reliable given simple assumptions.

Yes. But a fixed formally verified system will still be more reliable than a formally verified system being constantly changed.

What was said wasn't that FB couldn't be more reliable. It's that they are already so reliable that only new changes introduce problems. Sure you can still work on minimizing those problems, but that's a different point.


I think I misread the OP. You're right that there's no straightforward way to make Facebook more reliable on weekdays than on weekends.


I do think that FB, and most large orgs, have code quality problems. But the article does a pretty bad job at making its case.

> "That’s 429 people working, in some way, on the Facebook iOS app. Rather than take the obvious lesson that there are too many people working on this application, the presentation goes on to blame everything from git to Xcode for those 18,000 classes."

How does the author know that 429 is too many? How does the author know that FB's goals and functionality can be best achieved with fewer people/classes. This just reads like a classic "Why is Google so big, I can do that in a weekend" comment (https://danluu.com/sounds-easy/)

> "Our site works when the engineers go on holiday"

This is pretty much universally true for any dynamic site where engineers are constantly adding new features. Change always comes with risks. Changing code is always more likely to break something in the very-short-term, compared to not changing code. I have a hobby site (shameless plug, http://thecaucus.net) which has been running in auto-pilot for the past year, and almost never breaks, because there are virtually no moving pieces. The fact that the FB site breaks more often when engineers are making changes to it, is just repeating a universal law of software development.

I do think that the organizational structures at most large companies are bloated, inefficient, non-transparent, and produce sub-par code. I had high hopes when I read the article's headline, but the arguments presented simply aren't very persuasive.


Code quality is important, but for an organization the size of the FB good systems engineering is much more important than good software engineering. If the overall system is organized sanely, failures of subcomponents (which happen due to software bugs, hardware, humans, etc.) can be isolated, firewalled and fixed in reasonable time without bringing the system down.


As pointed out this is from 2015 but I'd love to hear some updated. Has any Facebook employee created newer presentations that discussed, say, their 18,000 iOS classes and the fact that in a single week 429 people contributed to the same iOS app?

I'd also love to hear an update to the "Our site works when the engineers go on holiday" claim.


I'd also love to hear an update to the "Our site works when the engineers go on holiday" claim.

Isn't this an inevitable outcome once your systems are actually reliable? I've been reading the Google SRE book and they make the same point: every new feature (read: change) is risky. Hence they delay deployments once their "error budget" starts running low.

Seems to me that the only systems that don't work more reliably when the engineers are out are the ones which require hands on deck just to keep them running.


I'd bet that the 18000 classes is just Thrift-generated stuff or whatever.


You don't have to bet, you can see them here: http://quellish.tumblr.com/post/126712999812/how-on-earth-th...


Even with a list my imagination is still not powerful enough to understand how an app like FB needs 18000 classes :(


It needs them .. to keep all those programmers busy.


what's up with the ZR* and ZZ* classes?


Well when there is a codebase of the scale of Facebook's, with so many activate contributors of course there will be a code quality problem. I see so much stuff at Google breaking or becoming inconsistent too (away from the core search features).

Even if they didn't move fast and break things, they would still have code quality issues. It's just the nature of a huge codebase like they have. And let's face facts - the development process needs to be suitable to the type of problem being solved. This is not a life and death situation and users would value new features, more than extreme reliability and consistency.


I believe, from time to time, you just need to stop developing new stuff for a period of time and dedicate the team to improve the overall quality, pay technical depts, hunt bugs, remove legacy code and refactor still existing ones.


I worked at a place where non-technical management didn't like hearing things had to be refactored occasionally. So the technical managers became very good at including refactoring of old code in estimates for new (related) features.

This generally worked, except when a lot of refactoring was needed. Occasionally, we would learn after the fact that one of our designs, programs, etc. was fundamentally flawed in some way, but we couldn't fix it with a series of moderate changes. Those cases really need a dedicated refactoring effort.


I agree with you. This is one of the main issues small-medium scale companies I've worked for have, they just have a dev team that has to keep on creating stuff constantly.

Although I've never worked on a company with the size of FB, and I believe the same logic wouldn't apply to them. They have to keep creating stuff, as fb still has a lot of potential on becoming way more profitable.


Sounds essentially like GC. And like in GCs, it seems preferable to have pause-free processes than stop-the-world.


And fortunately your competitors are always happy to pause and wait while you do that.


Even so, with an organization the size of Facebook, it'd still cause a lot of churn. I don't know if there's any hierarchy there, like, lead developers and architects that can align architecture and maintain quality and consistency.


Why not plan tasks like that continously and make it part of your normal iterations?


It is worth noting that when something goes viral we still don't know why. Facebook represents the ultimate in viral success. Given we do not understand why things go viral then it stands to reason that the underlying software methodology is not causation. This means that the fickle finger of fate could easily be visited upon Facebook the same as with MySpace and Facebook will be in no control. This is independent of the any software methodology they use.

Claims made without evidence can be dismissed without evidence.


> The second exhibit is from Facebook Research... wouldn’t you just write your disk files to a ramdisk? Surely they noticed this too

I'm not sure the author understand what a Research team is and what it does. Trying out a new solution to an existing problem is sort of their job. I'm also not sure how a research team publishing a paper discussing an alternate solution indicates anything about the company's "code quality".


It also seems that improvements in code quality caused by React happened not because they were driven by management, but because a few "heroic" engineers thought it was an interesting side-project. At least, that's my impression from the publicly available talks they gave.


Reality check is that most companies, including most startups, attempting to create software of such size end up tangled in their own mess way sooner and fail. Keeping it maintainable at such scale is not as easy as keeping it maintainable when there are four of you.


So?

I've become very cynical since the past year or so because there is a lot of noise/articles on the net, all knowing what's best for you or wrong with what you are doing. Yet without context or true authority to talk about it. My question these days when I read such article, "What has the author done that is equivalent?" There is rarely anything to be found.

Facebook might have a "code quality problem." But until you have worked at an organization that is big like Facebook please hold forth your tongues or fingers in this case. Startup principles don't apply nor academia. Facebook's "code problem" is what really does happen in the real world with such humongous business enterprises. Lot's of moving pieces in code, in people, in ideas and the amalgamation of these results in what most programmers will see as "code quality problem" yet the market sees as billions of dollars.


>My question these days when I read such article, "What has the author done that is equivalent?" There is rarely anything to be found.

The thing about critique is that you don't need to have made an equivalent project/effort/artwork/whatever to be able to critique it with a valid objection.

You just need to be able to spot an actual issue.

Sure, some might think they've spotted an issue when there is none. Or not understand that the issue exists for a reason (e.g. engineering tradeoff).

But those are orthogonal to the subject.

The key is that critique can come from any source, and it doesn't need someone to have done something "equivalent" or to have specific credentials to be valid.

The latter can at best serve as a rough filter to throw out some noise. But they will throw a lot of valid critique as well, which can come from everybody, including a layman.


> You just need to be able to spot an actual issue.

My problem with the article is that it's extremely hard for humans in general, and critics in particular, to accurately model the alternative. I think that's the heart of what the commenter is talking about.

Unless you've worked at Facebook's scale, it's really difficult to tell what happens. (I don't work anywhere near that scale, so maybe take my opinion with a grain of salt.) This critic is reading signals and attributing them to a failure on Facebook's part, but who knows if an alternate scenario (a) actually makes things better, or just results in different trade-offs; or (b) is even feasible for a company of Facebook's size to do. I have no idea. There are only a handful Facebook-sized enterprises that have ever existed.


My problem with the article is that it's extremely hard for humans in general, and critics in particular, to accurately model the alternative.

The lack of a visible solution does not make all criticism less valid. Criticism can be an expression of flaws without including a drubbing for not doing better. It's often made public as combination of the two, but the later can be disqualified without impeaching the validity of the former.

Even if the criticized thing is the best possible expression of the goal(s) given all the other constraints, it is still right to accept and express that limitations exist. -- Pretending they don't is just an ugly attempt to prevent the critic from refining their criticism to include the goals and self-imposed constants they view as causing the flaw.


It's nearly impossible to read this article without receiving the impression that the author is heavily implying that there is indeed a better way (vague insinuations of slowing down and focusing on quality).


I never said it wasn't. My comment making a point about criticism in general addressed both possibilities.


Sure, but the problem I (and I presume others) have with the author not presenting an alternative or an even an acknowledgement he that doesn't necessarily know of one is that the message loses a bit of credibility. I think the author knows that code quality isn't actually the most important output/outcome for Facebook, but as far as I can tell, he does not relay that.

In short, he states that Facebook has a code quality problem, one which most of us would expect of an organization of that size, but is it really a problem? Would their bottom-line be better with higher-quality code that took longer (higher cost) to develop? Make that argument. Convince us why code quality is a problem and how improving it would make sense to all stakeholders (not just engineering). I think this message resonates with an audience of software developers (myself included), but it doesn't pay attention to all of the competing interests.


I think he looked at publically available data and drew his own conclusions in line with his expertise. All valid. Feynman never built a space shuttle (I believe) but was instrumental in identifying reasons for the Challenger disaster.


Difference here is the shuttle information was easy to verify with experimentation.


To OP's point, at what point do we consider the Chesterton's fence effect? Pointing out code quality problems is of course easy when the effects of those quality problems are obvious, but as OP mentioned, Facebook is working at a scale that isn't likely familiar to most critics. Sometimes, when things get big enough, you have to do things that nobody else is doing, because nobody else has the problems that you have.

If all you've ever built was houses, it's hard to give too much credence to your critique of a skyscraper, especially if you aren't privy to the underlying geographical features like bedrock depth, wind velocity at altitude, etc.


> If all you've ever built was houses, it's hard to give too much credence to your critique of a skyscraper, especially if you aren't privy to the underlying geographical features like bedrock depth, wind velocity at altitude, etc.

This analogy isn't apt. We're talking iOS app development versus Facebook's iOS app development. This is a very easy to compare thing. Yes, with Facebook's scale they likely run into some interesting issues than many will never but that doesn't make it into something vastly different. It's still an iOS app that may have more optimizations than other apps.

It's like comparing a regular house being built and a house that has to support every type of person in existence. They're still a house.


I find the skyscraper comparison actually quite apt. You build an app that is 10 features, while the main facebook app itself is 100s of features. When you build a building that is more than 20 stories high, you have to change everything about that building. No more wood construction, you have to use steel, you have to think about the bedrock, you need to put in elevators and so on.

I bet if facebook was able to ship an app that only came with the mostly used %20 and was able to download the 100kb of binary for each of the mini features inside their app on demand, it would be a far more scalable application. But they cant, because iOS stops them.


Not a skyscraper, then; rather, a house, but one which had to be built of brick-sized prefab components, where a constant revolving door of contractors and architects worked either by adding or replacing bricks one-at-a-time.

To put that another way: sometimes the problem isn't the problem. Sometimes the ungainliness of the tool or process used to solve the problem, is the problem. If you've got a 1000-ton hammer, you might (in theory) be able to hammer a nail, but you've got some other logistical challenges to solve first.


Chesterton's fence is a good reason to lament rather than forgive the ugliness in a large existing code-base. That's because it makes it hard to distinguish what is genuinely necessary from the genuinely bad ideas.

My experience with huge real-world code bases is that yes they do encode lots of important functinal knowledge about real-world requirements that a new programmer will not see. But it will also contain even more plain old stupidity.


The issue is that its much easier to critique and call out issues than solving them. Most of the time, the individuals involved are aware of the issues already - critique is only valuable once the criticizer has made an effort to truly understand the problem and what's already been done / being done.


Criticism is easy. Anyone can spot issues. So what?


Facebook is a really successful company, and they blame everybody but themselves for these problems.

The obvious conclusion is: Because they're so successful, they're right and XCode, Dalvik, Git et al are wrong, and their hacks are entirely justified.

The less obvious conclusion is: despite them being so successful, they have a code quality issue, and the tools are entirely justified in breaking under the load that's being put on them.

Arguably, the big problem here is that the obvious conclusion is wrong, and the less obvious conclusion is right. And, because the right conclusion is the less obvious one, there's plenty of value in advising the kind of onlooker that's likely to want to emulate facebook's success (read: the sort of startup that's precisely HN's target audience!) that they shouldn't accept FB's excuses at face value, and should think about avoiding these problems in the future.


> they have a code quality issue, and the tools are entirely justified in breaking under the load that's being put on them

That's not one conclusion, it's two.

Suppose for the sake of argument we grant the first part and say Facebook has a code quality issue and they should have less code. (I'm not familiar enough with Facebook to have an opinion on the matter, but I'm willing to postulate this for the sake of argument.)

That does not in any way imply the second conclusion! One of the most important differences between a good tool and a shoddy one is precisely that a shoddy tool can only withstand the load that should be put on it, whereas a good tool can go beyond that.


There's above-tolerance load, and then there's misapplication. A good tool might work beyond tolerance, but a great tool should "fail early" if it can tell it's clearly being misapplied.

You don't want space-shuttle O-rings that fray under high tension; you want space-shuttle O-rings that fray under low tension, because they're never supposed to be under any tension and so incorrect installation that puts tension on them should result in them shredding apart before they ever get out of the factory.


Right. So if Git detects a SHA checksum failure, it should complain immediately instead of silently trying to muddle along, and so it does. But I don't think that's what was going on in this case?


Sure, kind of. I agree: good tools should be able to handle being pushed beyond what counts as reasonable usage. But all tools break, at some point. Now we're splitting hairs: Is Facebook's use of the tools merely somewhat unreasonable, such that the tools should still hold? Or is it sufficiently past reasonability that they shouldn't have to hold any longer?

To answer that, the fact that they managed to break not one but several tools is highly informative. I'm willing to accept that some tools might not work at scale. I'm less willing to accept that what basically amounts to the whole stack for mobile development is _that_ weak, not without very solid evidence.

Near as I can tell, there haven't been reports of this sort of issue with MS Office for either Android or iOS, or from other applications that should, by any reasonable metric, be more complex than facebook.


> Is Facebook's use of the tools merely somewhat unreasonable, such that the tools should still hold? Or is it sufficiently past reasonability that they shouldn't have to hold any longer?

A fair question. The most reliable way to answer it is to see whether there are similar tools that hold up better under load. For example, are there rival version control systems that scale better than Git? Ditto for the other tools.

(Let's all watch out for the human tendency to interpret an affirmative answer as taking a side in e.g. a Git vs X competition. A more useful way to interpret it would be as a source of ideas on how to improve Git.)


Totally agree; it's a case of fallacy by results-oriented thinking. I don't actually know if that's recognized in any of the official lists of logical/rhetorical fallacies, but it ought to be. Well, I guess it's a variant of the 'Post hoc ergo propter hoc' fallacy, but seems like an important special case.


I'm no fan of the "knowing-all-that's-best" trend of articles either, but here's why I don't mind this:

Facebook popularized the idea of "move fast and break things." Move fast, by all means, and don't be paralyzed by fear of all change whatsoever, but I really didn't appreciate it when my classmates thought that by being willing to break things constantly in our group project they were showing us all their potential to be the next Mark Zuckerberg or Steve Jobs. Or when some of those classmates were hired as my team mates and brought that crap into a customer-facing service. I interviewed at Facebook and was scared by the cowboy mentality I sensed in every room and with every interviewer. I'd like the glorification of that to be balanced by, "hey here's some concrete data demonstrating they have some real issues that could clearly be addressed better."


I've seen that "move fast and break things" mantra used in projects, and yeah, some people think they're being the next Zuckerberg when doing so. But what I've (almost) never seen is someone who actually fixes the shit when it's broken.

Fine, go ahead and break stuff, but be willing and able to actually FIX it afterwards. That part never seems to get picked up.


Better yet, move fast and have good test coverage that can run fast and does so automatically.


"The client won't want to pay for testing"

"We'll circle back and add tests later"


"The client won't want to pay for testing" becomes "the client will pay for not testing"

My deepity for the day...


>"What has the author done that is equivalent?"

Being able to build something equivalent is not a requirement for criticizing things.

In fact, some of the best and most valuable criticism often comes from people who have no idea about how the thing they are criticizing works.


Do you have some examples? I think the most valuable criticism is that which comes with some ideas for improvement, which it is difficult for people who don't know how the thing they're criticizing works to give. In my experience, the criticism of the non-experts that you're lauding, most often points out things that the experts already know about, but simply haven't been able to solve. In general, I think identifying problems is the easy part, and that good ideas for solutions is where the value is.


"user testing"


>"What has the author done that is equivalent?"

Remember your words the next time you complain about the food in a restaurant.


> My question these days when I read such article, "What has the author done that is equivalent?" There is rarely anything to be found.[...]Facebook might have a "code quality problem." But until you have worked at an organization that is big like Facebook please hold forth your tongues or fingers in this case.

Sorry but, in my opinion, this is dangerous thinking. Saying you shouldn't say anything unless you've done equivalent / better creates the implication that anyone who hasn't worked on a Facebook scale mobile app doesn't know what they're talking about or can't provide something to the conversation that is at least informational or useful.

Do you complain about politics? If so then I would expect you to have at least held office in the past otherwise you should hold your tongue.

Have you complained about a truck driver on the road? Well, have you driven a truck? If not you should not complain.

etc...


This is an absurd leap of logic. There's many quality programmers who have never worked at a large company and many more terrible programmers who have worked at large companies.

Anyone can share their opinion and regardless of where they have worked it may or may not be valuable. If you are judging value based on the market CAP of their LinkedIn profile then you are probably missing out on most good information out there.


>> My question these days when I read such article, "What has the author done that is equivalent?"

Well I happen to (vaguely) know the author and have worked with him in the past and he has done a lot of great work, not all of it in the public domain though.

Comparing him with an organisation that employs 17k people is a little unfair but he comes out of it favourably IMHO!


> My question these days when I read such article, "What has the author done that is equivalent?" There is rarely anything to be found.

Literally an ad hominem.


It is also intrinsically hypocritical.

Poster questions author's qualifications to criticize Facebook. Poster's own blog post about software quality never hit the front page of HN.

To quoque, dude. Tu quoque.~


Rephrase it to "What experience does the author have that indicates their experience would actually scale to Facebook levels?"

Is it bullet proof? Of course not. As an easy example, there are many that are more overweight than I am that know nutrition better than I do. That said, it is does seem somewhat logical to question health advice from clearly unhealthy people.


>As an easy example, there are many that are more overweight than I am that know nutrition better than I do. That said, it is does seem somewhat logical to question health advice from clearly unhealthy people.

Using your easy example with OP's logic, a 100-pound overweight person trying to lose 100 pounds by throwing up and taking dangerous diet pills should also discount the advice of a physician who've never lost 100 pounds themselves or successfully helped a patient lose at least 100 pounds.


Hmm... if that is what the OP intended, yes. Mayhap I was giving too much charity, but I would take it more of "Where is the evidence that you have successfully helped a patient lose 100 pounds?"

That is, again, it is not bullet proof. But, it is too often for someone to have experience that does not scale to the speed and size of facebook in this regard. Life is too easy to argue in the extremes of straw men. Wanting to know of experience that backs an argument is perfectly reasonable.


I agree to some extent, but think of it as more of this is how the sausage is made type of thing. As engineers we like to opine about perfect code, functional this, and monad that. It pains us to see that business success requires very little of even good code much less perfect. The reality is that the code rarely matters as long as it can be held together long enough for the next user or investor check to clear the bank.


This is a reality I've had a great deal of difficulty coming to terms with. I haven't internalized it fully, but I can't help but think not acknowledging this has harmed my career.


Isn't that an appeal to authority? Because someone hasn't done X they don't have the right to criticize X?

Facebook's "code problem" is what really does happen in the real world with such humongous business enterprises.

But does it have to be this way?


Until someone provided s counter-example the assumption is that yes, this is just what happens. We have entire industries of competing methodologies, some almost cult-like in their slavish devotion to "the one true process", and none have birthed any great successes. Why is that?


Let me provide a counter example. facebook, revenue in 2016 is $27 Billion, maintains approx 60mil lines of code (incl backend - source http://www.informationisbeautiful.net/visualizations/million... find facebook in the list)

Nasa, budget is $19 billion. I can't find a public official record of their code size, but the IIS is approx 6 mil lines of code, and i don't think the other missions they did (all 33 unmanned space missions in total) are too far off, so lets give each an average of 5mil per mission. That's 165 million lines of code. And arguably, probably more difficult code than facebook's domain.


And all of the code for the NASA HR system, and budget management tools, and conference system, etc. The Facebook code base is not just the web pages you see and backend to serve them, but encompasses just about everything used to run the company. It is also built and maintained by a much smaller group of coders for a much more ambiguous target domain. (And no, I do not think that the NASA domain is in any way more difficult than Facebook's.) I guess I have to question the relevancy of your counter-example.


>And no, I do not think that the NASA domain is in any way more difficult than Facebook's.

You might not think it, but you'd be wrong. Facebook was for several first years just a set of PHP crud pages and heaps of servers to run them well.

Since then they added their own compiler and such, but the core of what they do is serving web pages at scale.

All known problems, solved by tens of teams all over the world, even at larger scales (Google for one).

NASA's problem on the other hand is quite unique, any errors can cost lives, and their tasks frequently include totally novel solutions for totally novel problems related to space travel, guidance, simulations, materials science, and so on.

Not the same difficulty at all.


"Facebook was for several first years just a set of PHP crud pages and heaps of servers to run them well."

Personally, I think we've all got a piece of the elephant here. It is true that there are no known methodologies that could build Facebook at the speed it has been built without producing a globally-incoherent design. Anything that could produce a globally-coherent design would have slowed them down so much that they wouldn't have been such big winners in the first place. (So "globally incoherent" isn't much of a criticism here, really.)

On the other hand, there are options other than "a big set of PHP CRUD pages and heaps of servers" available to us now, too, and I expect those options to continue to advance in usability. Even the various projects that improve on PHP would bring more benefit if you could use them from day one instead of a retrofit.


Facebook didn't win because of the software, but because of marketing. At its inception, it felt exclusive. You were special of you got an invite.

The other social networks were open season. Facebook made some good design choices with the simple looking UI, and didn't offer the customisations MySpace had. But that's a design choice, not software engineering choice.


No one wins because of the software (code quality). They win on marketing and that software (UIx) doing a needed task people are willing to "pay" for.

HN tends to be rather myopic in that programmers exist to program. Nope. Programmers exist to use a tool to solve a real world problem. The best way to do that is with the minimal amount of design and time required to accomplish the task at the desired level of reliability. Very few projects require anything resembling the "code quality" talked about on HN - in most cases I would say trying to enforce such principles generally result in worse outcomes and folks would have been better off spending 1/10th the money on a 16 year old off elance.com and simply fixed problems as they came up.

Typically the thing that is done the quickest wins, even if the implementation would make you cry. The fact is from a business standpoint - it truly rarely matters. Very little is as critical as people think.

I've noticed the industry becoming vastly disconnected with this fact recently.


> Facebook didn't win because of the software.

Probably true, but it's also important that they did not lose because of the software. There are lot of great ideas that have gone in the drain because of sloppy software.


Many successful companies run the worst software you will ever seen. Yesterday there was a post about MUMPS which got me flashbacks of clients that showed me software that ran their entire 100m+ euro / year factory mostly written in MS Access / Excel on a shared drive (with the lovely locking Windows does!) mostly (this particular client had some CRM and an ERP from Dutch companies 'laying on the shelf' but 'never got around to it'). One of the biggest EU factories that creates the belts for conveyor belts runs on the worst PHP code as ERP I have ever seen. Many sites that are not Google or Facebook etc run on hacked together crap in whatever language and tech and do fine.

Twitter was crap at the start, most software in electronics was beyond unusable until recently (in some systems) etc.

You can of course lose because of sloppy software, but if your marketing is done well, I don't see why that would happen. There is an enormous overestimation how much people care about that kind of thing here on HN. From the first decade of the Windows versions to tons of Twitter outages after the launch to getting their computers hacked, passwords stolen, CCs stolen, privacy taken, slow as molasses systems, forced reboots, many crashes, bodged updates, forced updates, virus/malware scanners taking up 50%+ of your /still too expensive/ resources too much of the time, really bad iteration of the Facebook app at the moment (at least on Android), broken airplane booking forms, 404 support pages etc etc etc and yet no-one goes away because most people curse and move on to whatever. Unless your marketing is bad aka no-one uses it, it usually won't fail because of sloppy software.


If programmers at NASA make a serious mistake on say the space stations's code, people will die and billions of dollars of equipment will be lost. The same cannot be said for a social website which serves a bunch of content. NASA has way more on the line than Facebook.

Here's a great article on how NASA did development on the space shuttles code and the importance of process and quality: http://inst.eecs.berkeley.edu/~cs162/sp13/hand-outs/They-Wri...


Well, if someone at Facebook screws up and inadvertently reveals certain names or locations or the existence of particular groups within some countries people get arrested, tortured, and die. Last I checked NASA screw ups only kill a couple of people at a time. So you are right, they are not even close to comparable. See how easy it is to make false equivalences? I am familiar with the referenced article and if anyone in the software industry outside of few specialized areas tried to push a process like that they would need to start the endeavor by polishing their resume because they would soon be looking for a new job.


For some reason I went the other direction with your phrase "start the endeavor". Had to reread to understand you didn't mean launch a shuttle.

Not an attack or support for your position, just thought it was an amusing choice of words.


History is littered with counter-examples and great successes. Each generation of company is more efficient or solves more complicated problems than those before it. Facebook itself is an industry leader in open source innovation, which is why their product is so smooth. But not all companies specialize in hard tech, and all companies get old.


>Facebook itself is an industry leader in open source innovation

In the sense that they gave us React and couple JS tools? What are their other "industry leading open source innovations"?

That's hardly a major "open source innovation". Heck, Google itself has produces 2-3 similar JS pieces (Angular, GWT, Closure compiler, Dart, etc).


> In the sense that they gave us React and couple JS tools? What are their other "industry leading open source innovations"?

OpenCompute (http://opencompute.org), Hack (http://hacklang.org), Phabricator (now spun off, still OSS, https://secure.phabricator.com), GraphQL (http://graphql.org)...none of these are JS tools, all have had major industry impact. GitHub's entire API, for example, is now GraphQL based. And that's just a small sampling.

I can't quite decide if you're being deliberately provocative or just misinformed.


>I can't quite decide if you're being deliberately provocative or just misinformed.

Or maybe I genuinely disagree in your assessment that any of those technologies (apart from GraphQL, and that is still nowhere near major league) had any major "industry impact".


It's ad hominem; the author is being attacked to invalidate the argument, instead of attacking the argument itself. Which is never valid.

Appeal to authority usually takes the form of skipping making an actual argument for something and instead resting upon "X said so" and not explaining further. This is also not valid but is sometimes a shorthand for saying "X made argument Y," which can be valid if argument Y is valid (in which case the fact that X made the argument is purely incidental trivia).


Well, sure, they can. But don't you think someone who has been involved in a massive company would have more trenchant insights into massive companies?


You have right to criticize, but don't expect your criticism to be taken too seriously. There are two reasons for it: a.) you don't have frame of reference to compare whether facebooks result is better or worst then other companies b.) you don't know what tradeoff are necessary for making it work nor what it takes to achieve success.


So by the same logic, criticism agains the government shouldn't be taken seriously unless they come from a former prime minister?


Be careful of casting a float to a boolean.

Criticism of the government from someone who has experience running a large bureaucracy, negotiating, balancing groups with competing interests et cetera will rightly, other things equal, be taken more seriously than from someone whose experience of politics doesn't extend past ranting on Twitter.

Of course, other things may not be equal. If the latter person has sufficiently convincing arguments, can back them up with reference to sufficiently solid evidence, then these things may carry the point on their own merit.


Government criticism from people who have experience with running government or being in the politics tend to have much higher quality then criticism coming from sixteen years old high school class president.


What tends to be the case is irrelevant. The contents of what is being said is all that matters.


So when it comes to the highest level of government, only those who have reached such heights may say a peep. Only criticism from former US presidents regarding Trump, please!


When random dude possibly with asperger gives advice/criticism on how to negotiate, reasonable government official will igore him - even if the criticism came in form of a blog. Likewise, succesfull activists ignore armchair advice from people on anonymous twitter. Not sure how is that controversial. Torwalds does not spend his time worrying about whether random bloggers agree with linux core style guide.

You did not demonstrated enough of knowledge/expeciance for me to take your criticism seriously is valid answer.


The Asperger's comment is completely unnecessary, has nothing to do with the substance of your argument, and may potentially be offensive to people.


Criticism against governments by former governors should and is taken more seriously than criticism by some joe who's never had to wrangle a bureaucracy.


So they reject any outside observations, because they're precious snowflakes with unique problems that no outsider understands ?


Appeal to authority is not a logical fallacy. It's a valid argument unless said authority is false in some way.


It's fallacious if you use it merely as "Authority X said so," no matter whether the authority is true or false. A fallacy happens when the conclusions do not follow from the premises of an argument. E.g. it doesn't follow from the fact that "Al Gore says that Global Warming is happening," that Global Warming is happening.

If, on the other hand, you say "Al Gore said Global Warming is happening, Al Gore's argument is premise A and B, therefore C" that's not fallacious because your argument doesn't actually rest on the identity of the authority, but the argument they made.

But all of this is irrelevant because it's not an appeal to authority. It's an ad hominem. The author is being attacked for not-being-an-authority.


>So?

So, there's a huge lesson when an organization starts blaming every one else for scale problems; if everywhere you go there are issues, it's you.


Hold forth = To talk at great length.

Have you used it as such?


Is "hold forth" an older or less common phrase? I don't know if I've ever run into it, and I wouldn't have guessed initially that is its definition.



It's a phrase that I've usually encountered in the past tense. Google Ngrams shows that in 1800 "held forth" was used about three times as frequently as "hold forth" [1], but this ratio decreased as both uses became less common.

[1]: https://books.google.com/ngrams/graph?content=held+forth%2Ch...


I don't know a lot about the ngram corpus, but assuming there's lots of fiction everything will be skewed toward the past tense.


Here is a new submission that includes a use of "hold forth" as well: https://news.ycombinator.com/item?id=13858516


It is a bit old. I've seen it in English sources from the 19th century. I think it's common in literature for that period though less common now.


Facebook's "code problem" is what really does happen in the real world with such humongous business enterprises. Lot's of moving pieces in code, in people, in ideas and the amalgamation of these results in what most programmers will see as "code quality problem" yet the market sees as billions of dollars

There's something called waste. The issue is that we don't know how much better the situation can be but we can recognize that it's pretty shitty at the moment. If they're making billions of dollars and they're a big company and have all these code quality issues, what hope does a small company have when trying to make quality software? How many more billions could have been made with less churn and less maintenance costs?


"What has the author done that is equivalent?

The author has picked some data points and written an article. You have said 'So?'.


That's a good point you make. But that's also why we have a formal system for making logical arguments. Critical essays which break things down into a premise, arguments, and a conclusion. This allows anyone to challenge anyone else without falling prey to the bias of deferring to authority.

Thus we can and should challenge writers and pundits, but probably from the perspective of first asking if they are making a logical argument or an emotional one instead of making it about their experience levels


Yeah, fuck that noise. Not a fan of Facebook or its CEO. But clearly there is no code problem – look at the stock price over extended period. QED


So you are saying it's reasonable to have 18,000 objective C classes in a single iOS app?


It's interesting to note though that they have code quality problems but that doesn't prevent them from being massively successful.


It's a very hard to swallow pill for engineers, but many (if not most) startup or IT company success or failure is very independant of technological matters.

Once you get to "average" quality, then business and product decisions become far more important than anything else.


And it only needs to be average, or approaching average, in your niche. There's a lot of truly awful enterprise software out there, and it gets sold on promises of features and support, not quality.

I wish software engineering had the barriers and requirements of other engineering fields. I deal with industrial control systems, and we have the national electric code, UL, CE, NFPA 79, etc. With all that you can still make a machine that doesn't work well, but at least it's going to conform to certain conventions to make installation and maintenance easier, it will have a robust safety system, and it will be well-documented. Most fields of software engineering have no such minimum standards.


> I wish software engineering had the barriers and requirements of other engineering fields.

It does where I live you're required to be licensed if what you're doing passes these 3 criteria:

1. any act of planning, designing, composing, evaluating, advising, reporting, directing or supervising (or the managing of any such act)

2. that requires the application of engineering principles

3. concerns the safeguarding of life, health, property, economic interests, the public welfare or the environment, or the managing of any such act.

This includes software development. It is illegal to use, "engineer," in your professional title unless your accredited.

I think this is a good thing. Better software that works towards the public interest requires liability and professional accreditation. You can't take the, "move fast, break stuff" approach when your software is being used to monitor and control water filtration plants or food safety processes in a manufacturing facility. That works for the Facebook's of the world but they are by no means the be all and end all of professional software development.


But seriously (i.e., ref to my previous comment). All the tools. All the talent. All the leadership and management. All the money. And still even the mighty FB struggles.

Point being: This shit might not be rocket scientist hard, but it ain't easy either. And when you don't have the war chest of the likes of FB you're in for an ongoing and never ending (quality) battle.


this strikes me the same way as the story of the guys who figured out where to add armor to bombers during WW2. the places on the plane that had bullet holes after a mission clearly were operational enough to return to base. the places that never had any damage were probably critical, so that's where armor was added.

if a company can make billions with a poor-quality codebase, clearly quality isn't a bottom-line concern.

what is a concern is shipping the damn product.


one look at hhvm tells me they have poor quality development practices. :P

i get the impression that they use the struct keyword to avoid having to type public everywhere for instance...


code quality can't handle our scale


At this point it is clear that software size increase linearly with the size of your software team, whatever the problem you solve.

This affects every organization and is something that should be actively fight against. Having accidental, unplanned, unaccounted costs should not be the default path.


Can someone explain the example with the ramdisk? It's not clear to me how a ramdisk is relevant to the problems put forward in the paper, i.e. disk is too slow, they rely on lazy page allocation, and they have insufficient ram to hold two copies of everything simultaneously.


> The article moves on, without wondering whether releases regularly breaking your app are a normal part of the software engineering process.

To be fair, that's absolutely their call to make. Nothing of value is ever lost if your platform does not provide any non-ephemeral value.


Makes me wonder how their decision for an open-plan office is related with this...


It's important to understand that Facebook is incredibly large, and the fact that any software scaled with them to the point they are presently at, is a "on the shoulders of giants" situation that seems to actually be working well enough for them to become a billion dollar enterprise.

Also, the building of infrastructure that takes on scale that's not yet been required is somewhat inefficient. But it does beg the question: If Facebook is limited by its infrastructure, what responsibilities do they have to build software that continues to scale for future organizations?


I don't think all the things discussed are still relevant as it's a fairly old post. Also, "best practices" are not always best depending on the scalability of the problem itself.


I was doing iOS dev three years ago, and the Facebook lib for iOS was the worst piece of shit I ever had to deal with.

Their app also used to be unbearably sluggish despite not doing anything too fancy.


I thought the joke was that they had interns write that? I remember an API that would hold onto completion blocks and call them repeatedly instead of just once. They would also accidentally make non backwardly compatible changes to the API.

They didn't care since that API period propelled them to more dominance and growth for awhile. After they shut down the viral channels, the developer ecosystem died...but it had proven its usefulness.

The only issue now is developers don't trust them and they can't get enough developers to build atop them.


> From Jack Reeves he cites the epiphany "that in fact the source code is a design document and that the construction phase is actually the use of the compiler and linker."

Ha. That is laughable. If it were so, then we would be able to automatically graft features from one product to another. I contend this is exactly how SoC are designed and there's no reason why we shouldn't copy the methods of the digital design community (proven to scale or you wouldn't be reading this).


> The “Hack” and “Move fast and break things” culture must make it very hard for developers to focus on quality.

Well, yeah, that's what the "break things" part means. The problem is when people/companies try to have it both ways. "Move fast and have high quality" isn't possible.


I recommend reading They Write the Right Stuff[1] to anybody interested in NASA's approach to writing software.

[1] https://www.fastcompany.com/28121/they-write-right-stuff


Anecdotal evidence, but I have a friend who complained that some critical parts don't have tests, and a lot of developers are loathe to write them. He got bit by it when he made a change in an area with no test coverage and broke the spam filter.


> "when Facebook employees are not actively making changes . . . the site experiences higher levels of reliability."

This seems like a great thing to me. i.e. The system is stable and the error budget is being used to facilitate change.


bob martin a.k.a uncle bob always says that people consider patterns like mvvm,mvc,mvp etc as Architecture which actually are just for the ui layer.It happens commonly in mobile development.


Rub lavender oil on the servers, say "web-scale" 3 times, click you heals and everything will be OK. Oh wait, but the Russians.


Only Kent Beck can save them. :)


What's the point of an app? Generating high quality code or generating revenue? Until you hit intersections where revenue can be increased by changing code, the priority will always be on new or changed features.


Move fast...and just leave things broken. Users be damned.


links to slides are all dead... in parent articles and reference articles


While the author wrote this article FB made millions of dollars.


[2015]


If a company hails React as a sane way to develop software, you know where it stands.


Just trolling or could you argue that?


OK so what should have they used instead?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: