Hacker News new | past | comments | ask | show | jobs | submit login
Static Analysis at Scale: An Instagram Story (instagram-engineering.com)
137 points by YoavShapira 68 days ago | hide | past | web | favorite | 34 comments



A third static analyzer for Python! I wonder what this landscape is going to look like in a few years... (There are probably others, but this is the first I'd heard of Pyre, and I'd only known about mypy and pytype up until today - if there are others I'd love to hear about them!)

On a more substantive note, static analysis is one of those things that sounds like you shouldn't even need it but in reality is a surprisingly powerful tool, for the same reason that all code has bugs: humans honestly suck at writing code. (For those of you that have seen Bret Victor's "Inventing on Principle", I feel like static analysis is one of the first steps in getting to the kind of feedback loop he envisions.)

We have all these ideas around how things should work, but so frequently our mental model diverges slightly from what's actually happening (which dynamic typing only makes worse, and NPEs even more so) and that's on top of the mistakes we make (like this one in the JDK, which was caught using ErrorProne, a static analyzer for Java: https://bugs.openjdk.java.net/browse/JDK-8176402).

Also, a fun tidbit about using static analysis to apply automated code fixes: this is basically when you realize you need auto-formatters, because you still want your code to be indented like a sane person would after applying a change like this. (And imagine how much more complex the autoformatter/autofixer have to be when you think about how they have to do things like preserve comments, etc!)


if there are others I'd love to hear about them!

I've been collaborating with Duo Security to build a new Python static analysis tool that focuses on security deficiencies: https://github.com/duo-labs/dlint

In general, I agree, static analysis is a very powerful technique. I'd like the computer to double-check my work as thoroughly as possible when I'm working with code. Static analysis tools are often very fast and essentially "free" to run, so why not? False positives often become the limiting factor, but in my experience they at least point to locations in code that someone has deemed noteworthy, and perhaps should be investigated. Squelching false positives is also typically an easy process.

Whether it's simple stylistic recommendations for code consistency, security best practices, or even disseminating codebase information (e.g. function deprecation notices from the Instagram article), static analysis is a very useful technique.


Interesting article. I’d love to hear more about how other folks are getting on with typing in python these days. What’s the preferred tool for limiting etc? I’ve been playing with Microsoft python language server recently and mypy (via vim coc). I haven’t got a config that works well yet but it shows promise and I’m spurred on by how typescript is working well for us in that setup.


Instagram is building a tool called Pyre (https://pyre-check.org/).


At Google we've been using pytype.


mypy has saved me from a lot of bugs. I use pre-commit to run it before every commit.


> we have hundreds of engineers shipping hundreds of commits every day

Can someone with (any) experience explain to me why do seemingly perfectly functional websites need change all the time? Is the production version hacked together or what? Why can't websites be coded once and left to run with the rest of the effort being devoted to maintance/adding more servers as the load increases?

I admit that I know almost nothing about how large codebases function (it might being apparent judging by the question)


The changes are not only about making the website/app stable, it's mostly about shipping new features and product ideas.

The space where Instagram operates is very competitive, so they need to keep innovating and exploring new ideas to grow the product, increase usage, and improve retention...


They are adding new features all the time next to all behind the scenes integrations, optimisations, performance improvements, and A/B tests.


I think a big part is just technical debt acquired through the years, when you are just starting you hack some version of the site that works because you need to grow fast, but that's not scalable, so a part of becoming a big company is to "refactor" or "re-architect" your app to either more modern designs or in some cases into a completely different app (On the inside)


I quickly looked at pyre and pytype. I wonder : is there a static analyzer for Python that can check for termination guarantee in small snippets of code?

The idea is to have a "safe" and "unsafe" Python (ala Rust) according to some guarantees. Proving termination of program is hard, but it is doable if dev time is dedicated to it, and algorithms are well chosen.

That way, some core libraries could be rewritten and give stronger guarantees.

The idea is used in https://github.com/google/starlark-go which is a subset of Python. People who are used with Python can use it to write imperative config files, and the files are guaranteed to not mess with sensible stuff

edit : some starlark design choices explained : https://github.com/bazelbuild/starlark/blob/master/design.md


Would be cool if they could quantity the gains in productivity - eg before we put in the static analysis we had x level of software defects, following these improvements we saw a drop to level y. Trying to introduce similar improvements at my current place but it’s a hard sell without data


The coverity paper may be interesting to you: http://delivery.acm.org/10.1145/1650000/1646374/p66-bessey.p...


The link doesn’t work, I’m interested to read it though.


Try this link. It should be the same article as linked previously. http://www.cs.columbia.edu/~junfeng/14fa-e6121/papers/coveri...


Whoops! Should've noticed that was an ephemeral link, but yes, mdibiase@ linked the one I meant to. It's Coverity's "A Few Billion Lines of Code Later" ACM paper.


Quantifying the number of bugs sounds a bit too close to "zarro boogs found," though I grant that additional data would be nice.


I wonder if there was a point in the history of Instagram's codebase at which it would've been cost-effective to rewrite it in a statically typed language.

If you find yourself in a hole, stop digging.


how would static typing help with "method fn is deprecated use foo' linting ?


Quoting the article: "The benefits of better typed code are obvious, but leads to yet another benefit: Having a fully-typed codebase unlocks even more advanced codemods."


It's curious to me that commonly in programmer discussion, we'll say on the one hand that the programming language you use doesn't matter, while on the other hand tout the benefits of adding types and static analysis.

There are some languages where types and static analysis are part of the language.

It's paradoxical that we as an industry hold these two things to be both equal and different.


Citation needed. I recall people saying they prefer the concise syntax and speed of development with interpreted languages that do not force type annotations and others who prefer the runtime speed and compiler warnings in a typed language. I don’t recall anyone saying they are interchangeably equivalent.



> Lets say we needed to deprecate a function named ‘fn’ for a better named function called called ‘add’.

wondering why they would deprecate fn instead of renaming everywhere with 'add'. I thought that was one of the main big sells of the monolith.


At a certain scale, it becomes prohibitively difficult to rename everything at once, for a few reasons:

* Hyrum's Law - people do terrible things, and a change that seems safe almost never is. Doesn't matter what people should do, if they can they'll do it. Having good tests and scalable CI is the only guard for this.

* Moving a target is hard. If you try to change a moving target A into B, and you're changing a couple thousand references to A, chances are pretty good that while you're getting your A->B change reviewed, someone will add more code that depends on A, and then you can't submit your A->B change without fixing the new reference. The solution here is you check in B, tell people to use B instead of A (which checks the growth of references to A), make A point at B, maybe as an implementation detail (which causes all references to A to depend on B), and incrementally change indirect dependencies on B through A to be direct dependencies on B.

* Depending on a moving target is annoying. If N people write code depending on A, and a commit goes in changing A->B, then that's N people whose productivity you've hurt. In practice this one isn't really that bad, just something to think about.

* Small changes are more easy to roll forward than large ones. This assumes your change is rolled back, but consider this scenario: your change A->B, unbeknownst to you, tickles some non-deterministic behavior (maybe a race condition) when used in a specific way, and your CI doesn't fail when you submit the change at EOD. You come in tomorrow morning and find out that the change was rolled back, because it caused flakiness for N tests went from 0.1% to 25%. If your change was a small, targeted one, it'll be much easier to trace down the non-determinism and understand how it caused this change; but if it was a big one, not only might it be harder to trace down the non-determinism, there might even be other similar non-deterministic bugs that your change is causing. All of these issues will conspire to make it harder for you to make progress on your large-scale change.

When you're changing 10 references, maybe even 20-30 or so (and for some changes this number can go even higher, e.g. renaming an internal Java package), it definitely makes sense to do an A->B type change in a single commit. (And in this situation, the monolith does come in handy, because you don't have to wait for a version bump to propagate the change.) But at O(1K) LOC, this isn't a super tenable.


Later in the post, they talk about how their tool enables "codemods", such as automatically renaming 'fn' to 'add' across their codebase.


[flagged]


As an enthusiastic Go programmer: this language-war comment has nothing at all to do with the point of the article, which is a pretty excellent in-depth discussion of how a large, important product operationalized AST-based linting. Please don't chaff up threads with comments like these; this thread is perhaps the last place on HN we'd want to have a drawn-out discussion about the merits of two different languages.


Migrating millions of lines of python to go would be deeply stupid. A few extra servers are worth it.


Tens of thousands of extra servers. Perhaps over one hundred thousand. There are 95 million photos and videos uploaded every day.

No one said that you need to rewrite everything tomorrow but would it worth migrating over several years? Maintaining 100,000 servers vs 10,000 servers has a cost.

Maybe this story is simply a warning to new startups?

When you listen to the guy who says don't worry about it until you get to the point where it's too late.


> There are 95 million photos uploaded every day.

I'd be surprised if the upload service was all Python.

> would it worth migrating over several years?

No. The cost of developers (and the potential risk of adding bugs) means it would be infeasible.

It's far, far cheaper for Facebook to figure out how to speed up Python (they did something similar for PHP with HHVM).


What about the environmental impact? 10k servers use a lot more energy than 100k


The folks at Facebook know programming and teach lot of us about that. They know how scale distributed system, what parts to write in Python, Go, or just Assembly. They also built new network architecture to optimize utilization of all links at scale, open compute systems for large scale DCs. So let us just appreciate and learn from what they have to share.

No, I do not or did not work at FB. Also do not use FB. But I learn from their writing.


So ask for a carbon tax. That'll sort it.


Wondering if another language would be better in their situation is interesting… but you completely missed the point of the article tho. It's not about infrastructure costs but large codebase reliability and developers productivity.

More relevant questions:

- Would Go avoid or reduce the needs for all these static analysis tools? (probably yes)

- Is the cost of migrating millions lines of code to Go inferior to the cost of setuping all these static analysis tools? (probably no)

- Would the extra cost of migrating be worth it? (hard to say)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: