On a more substantive note, static analysis is one of those things that sounds like you shouldn't even need it but in reality is a surprisingly powerful tool, for the same reason that all code has bugs: humans honestly suck at writing code. (For those of you that have seen Bret Victor's "Inventing on Principle", I feel like static analysis is one of the first steps in getting to the kind of feedback loop he envisions.)
We have all these ideas around how things should work, but so frequently our mental model diverges slightly from what's actually happening (which dynamic typing only makes worse, and NPEs even more so) and that's on top of the mistakes we make (like this one in the JDK, which was caught using ErrorProne, a static analyzer for Java: https://bugs.openjdk.java.net/browse/JDK-8176402).
Also, a fun tidbit about using static analysis to apply automated code fixes: this is basically when you realize you need auto-formatters, because you still want your code to be indented like a sane person would after applying a change like this. (And imagine how much more complex the autoformatter/autofixer have to be when you think about how they have to do things like preserve comments, etc!)
I've been collaborating with Duo Security to build a new Python static analysis tool that focuses on security deficiencies: https://github.com/duo-labs/dlint
In general, I agree, static analysis is a very powerful technique. I'd like the computer to double-check my work as thoroughly as possible when I'm working with code. Static analysis tools are often very fast and essentially "free" to run, so why not? False positives often become the limiting factor, but in my experience they at least point to locations in code that someone has deemed noteworthy, and perhaps should be investigated. Squelching false positives is also typically an easy process.
Whether it's simple stylistic recommendations for code consistency, security best practices, or even disseminating codebase information (e.g. function deprecation notices from the Instagram article), static analysis is a very useful technique.
Can someone with (any) experience explain to me why do seemingly perfectly functional websites need change all the time? Is the production version hacked together or what? Why can't websites be coded once and left to run with the rest of the effort being devoted to maintance/adding more servers as the load increases?
I admit that I know almost nothing about how large codebases function (it might being apparent judging by the question)
The space where Instagram operates is very competitive, so they need to keep innovating and exploring new ideas to grow the product, increase usage, and improve retention...
The idea is to have a "safe" and "unsafe" Python (ala Rust) according to some guarantees. Proving termination of program is hard, but it is doable if dev time is dedicated to it, and algorithms are well chosen.
That way, some core libraries could be rewritten and give stronger guarantees.
The idea is used in https://github.com/google/starlark-go which is a subset of Python. People who are used with Python can use it to write imperative config files, and the files are guaranteed to not mess with sensible stuff
edit : some starlark design choices explained : https://github.com/bazelbuild/starlark/blob/master/design.md
If you find yourself in a hole, stop digging.
There are some languages where types and static analysis are part of the language.
It's paradoxical that we as an industry hold these two things to be both equal and different.
The idea that programming language does not matter is a pretty commonly-held view.
wondering why they would deprecate fn instead of renaming everywhere with 'add'. I thought that was one of the main big sells of the monolith.
* Hyrum's Law - people do terrible things, and a change that seems safe almost never is. Doesn't matter what people should do, if they can they'll do it. Having good tests and scalable CI is the only guard for this.
* Moving a target is hard. If you try to change a moving target A into B, and you're changing a couple thousand references to A, chances are pretty good that while you're getting your A->B change reviewed, someone will add more code that depends on A, and then you can't submit your A->B change without fixing the new reference. The solution here is you check in B, tell people to use B instead of A (which checks the growth of references to A), make A point at B, maybe as an implementation detail (which causes all references to A to depend on B), and incrementally change indirect dependencies on B through A to be direct dependencies on B.
* Depending on a moving target is annoying. If N people write code depending on A, and a commit goes in changing A->B, then that's N people whose productivity you've hurt. In practice this one isn't really that bad, just something to think about.
* Small changes are more easy to roll forward than large ones. This assumes your change is rolled back, but consider this scenario: your change A->B, unbeknownst to you, tickles some non-deterministic behavior (maybe a race condition) when used in a specific way, and your CI doesn't fail when you submit the change at EOD. You come in tomorrow morning and find out that the change was rolled back, because it caused flakiness for N tests went from 0.1% to 25%. If your change was a small, targeted one, it'll be much easier to trace down the non-determinism and understand how it caused this change; but if it was a big one, not only might it be harder to trace down the non-determinism, there might even be other similar non-deterministic bugs that your change is causing. All of these issues will conspire to make it harder for you to make progress on your large-scale change.
When you're changing 10 references, maybe even 20-30 or so (and for some changes this number can go even higher, e.g. renaming an internal Java package), it definitely makes sense to do an A->B type change in a single commit. (And in this situation, the monolith does come in handy, because you don't have to wait for a version bump to propagate the change.) But at O(1K) LOC, this isn't a super tenable.
No one said that you need to rewrite everything tomorrow but would it worth migrating over several years? Maintaining 100,000 servers vs 10,000 servers has a cost.
Maybe this story is simply a warning to new startups?
When you listen to the guy who says don't worry about it until you get to the point where it's too late.
I'd be surprised if the upload service was all Python.
> would it worth migrating over several years?
No. The cost of developers (and the potential risk of adding bugs) means it would be infeasible.
It's far, far cheaper for Facebook to figure out how to speed up Python (they did something similar for PHP with HHVM).
No, I do not or did not work at FB. Also do not use FB. But I learn from their writing.
More relevant questions:
- Would Go avoid or reduce the needs for all these static analysis tools? (probably yes)
- Is the cost of migrating millions lines of code to Go inferior to the cost of setuping all these static analysis tools? (probably no)
- Would the extra cost of migrating be worth it? (hard to say)