Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What was the best CS paper you read in 2017?
558 points by avinassh on Dec 30, 2017 | hide | past | favorite | 129 comments

I highly recommend people give this paper a read. I think it points the way to a radical redesign of fundamental parts of the system stack over the next 5-10 years. If you work in systems and you aren’t thinking about this stuff, you’re about to be lapped.

The Case for Learned Index Structures:


I'm gonna throw some cold water on this and say this is not a new paradigm by any means.


However, it's certainly possible that the time for this idea has come. Google is probably in the best position to apply it.

I will say that after having worked at Google for over a decade, some of it on data center performance, there are plenty of inefficiencies that have nothing to do with algorithms or code, but have more to do with very coarse-grained resource allocation. In other words, they are "boring management problems" rather than sexy CS problems.

I should also add that you're expanding the claims made in the paper, which appears to be a common mistake. If you look at the related work and citations, you'll see that it's mostly about B-Trees, bloom filters, and indices. But I have seen people generalizing this to "a revolution in systems research", which I don't believe is justified.

Indexed data structures are important but a revolution in them is not going to cause a revolution in all of systems. Off the top of my head, the general area of "scheduling" (in time and space) is probably more important and more potentially impactful.

Have you seen the results when they let a trained model manage Borg? The power reductions were immediate, non-trivial, and performance stayed the same. There's your scheduling result for you.

Look at it this way. As the paper points out, a Hashtable is just a heuristic that works fairly well in the worst case and reasonably well in the average case. No one would argue that you couldn't hand-roll an algorithm that is better for a specific, narrowly defined data set. This paper is demonstrating that you don't have to do it by hand, you can train a model automatically without the expected caveats like: size (doesn't have to be huge), latency (the model trains very quickly) or specialized hardware (they used bog-standard CPUs, not GPUs or TPUs).

This is obviously just my opinion, but I think it's pretty big. It's not big because of the paper itself, although that's amazing in isolation (they improved on hashtables!). As I said above, it _points the way_. They have a throw-away sentence about co-optimizing data layout along with lookup and not showing those result. My guess is that beats a hashtable in every way. More importantly, if a model can beat bloom filters, b-trees, and hashtables, you'd better hold onto your socks because most algorithms are just waiting to fall. To me this paper is Deep Blue vs Kasparov. We all know what comes next.

It sounds like a useful technique, but I'm saying that what you're claiming is a far cry from what the paper claims. They're describing a fruitful research direction and outlining how you might get around certain problems (evaluating models is slow, retraining on inserts, etc.)

Also, predicting power usage is a different problem than scheduling. It's not "managing Borg" by any stretch of the imagination. I'm not saying that's impossible, but it's not what was claimed.

Also, what did Deep Blue lead to? I'm not saying it wasn't impressive, but it didn't generalize AFAIK. You are making a huge leap and claiming that a technique that isn't even deployed generalizes.

What did Watson lead to?




Link to paper that cedes control of Borg?

Slight correction, they let an ml algorithm control the cooling system in the data center, not Borg. Borg is what schedules Google's software tasks, this was not changed during this experiment.

The article you link ends on a pretty weak claim, math didn't obsolete biologists and ML won't obsolete systems folks, but every time you write a heuristic you would probably get better results from a model.

A heuristic is a model. A successful heuristic encodes knowledge of the distributions at hand. Heuristics are designed by people who had good insight into the data rather than learned directly from data, but they are models in the same sense.

It's not anywhere near as simple as "replace heuristic with model". This paper is a good one to disabuse yourself of that notion:


This is from the team at Google with the longest experience deploying machine learning (early 2000's, pre-deep learning).

(Contrary to popular belief, machine learning was basically unused in Google's search ranking until pretty recently.)

Can you please give some examples of resource-mis-allocations ?!

AFAIK storage is not the system bottle it used to be. We always want more, but network and cores are relatively plentiful.

If we could magically (and safely) modify the software stack, which areas could give x2 or x3 improvements ?

As far as Google goes, the easiest place to get better end-user latency/performance (2x-3x) is ... fixing the JavaScript.

I'm being totally serious. Backends are generally fast, and the backend engineers are performance-minded.

Front end engineers are not as cognizant of performance (somewhat necessarily, since arguably they have a harder problem to solve). Back in the mid-2000's, Gmail/Maps/Reader days Google had a lot of great JS talent, but it seems to have ceded some of that ground to Facebook and Microsoft.

If you have heard Steve Souders speak, he always mentions that he was a backend guy. Until he actually measured latency, and realized that the bottleneck is the front end. That was at Yahoo, but it's very much true for Google too.


I would like to see a machine learning system rewrite JavaScript code to perform better and make UI more usable. I believe that's beyond the state of the art now, but it's probably not out of the question in the near future.


As far as scheduling that was just one example of an important systems problem that hasn't been solved with machine learning. Not saying it can't be, of course. Just that this is a research direction and not a deployed system.

It's also important to note that there are plenty of other feedback-based/data-driven algorithms for resource management that are not neural nets. If neural nets work, then some simpler technique probably works too.

> I would like to see a machine learning system rewrite JavaScript code to perform better

Well sure, we all want a God compiler.

Network latency right now is the biggest issue we have. If we could magically (using your term here!) get computational resources and data dramatically closer to end users, it would easily give 2 or 3x improvement. Doing this safely of course means consistently in this case, and being able to solve things like safe replication of large data sets. I dunno how to do it, but you asked and that's the biggest thing I can think of.

So backend-to-frontend latency?

Or are there also plenty server-to-server scenarios?

After re-reading your comment, are you referring to end-users inside the enterprise perimeter ?

e.g. devs using remote build system ? A local workload which access a mostly-remote db ?

Any resources you can recommend on the scheduling topic?

As mentioned, that was off the top of my head. Some of it is "inside baseball" at Google, but there are a bunch of published papers about cluster scheduling. This one is a good overview, and has numbers, evaluation, lessons learned, etc.:


My overall point is that even if learned indexes replace traditional data structures (which is a big if), plenty of important systems problems that remain. Some that I could think of:

- Fixing front end performance as mentioned in a sibling comment.

- System administration of a billion Android phones. The state of these seems pretty sorry.

- Auditing app stores for security bugs.

- Eliminating security bugs by construction. Hell, even just eliminating XSS.

- Software that runs transparently in multiple data centers. (Spanner is a big step in that direction, but not a panacea.)

I could name like 10 more... none of these have anything to do with learned indexes. The point is that the OP is generalizing from a really specific technique to some kind of hand-wavy "machine learning writes all the code" secnario.


I can't find the exact paper, but you can find a lot after reading this article popularizing the topic:


The future is here. :)

Interesting article I read from Zalando, which in the same vein as the Learned Indexes paper, takes a previously computationally expensive routine (Optimal Cart Pick) and learns it as an approximation to a complex function that, in the end, has faster run-time.



There was a time when people used to get PhD's for researching on index. This paper actually turns the table.

nice pun

Yo! This is what I'm talkin' about! Replace directories with self-consistent neural nwots [neural blackbox thingie of weights and activation vals]. It's cool, but for every convenience we take in the direction HD space we backpeddle in processing [meaning, although the dir structure could be super condensed this way with a learned index, this approach is less flexible to cold or not-cpu-related analysis]. It's very cool actually to consider the three branches of processing, hdd, and memcache ... You want to have a balance but in the ideal computer we do no calculation, and that is something worth mentioning.

Reading again of the abstract, allow me to add that it's an optimization beyond structures such as Btrees which are already not really decipherable datasets to the naked eyes. So any optimization in a space that's not designed to be human-legible can take whatever shape, and this is certainly a welcome achievement!

> but in the ideal computer we do no calculation, and that is something worth mentioning

Can you expand on this bit? I'm not sure I understand what you mean

Absolutely! This is a statement on minimization of computational effort. When the solution has already been provided we need not do any calculation. One way to think about the quoted statement and how I meant it is to consider big O notation and how some operations take O(n) or (O lg n) steps dependent on input in order to reach a solution or complete a process or spit out a result; in this hypothetical best case our operations all take O(1). This is not really a calculation but in fact knowledge coming to light.

Since writing this comment I have text-to-speech'd the whole of the article and I listened to it one afternoon over smooth classical music and I have come to appreciate the fact that many computers will have neural network processors like tensor processors or at the very least really good GPU, making it all the more likely that such algorithms [especially used in the specific case of fault intolerant data organization and file hierarchy] will actually not require so much computational effort as traditionally believed.

A very interesting read indeed!

Besides HN how can one find great papers like this? How did you find out about it?

1) The Morning Paper - blog.acolyer.org - A new super interesting paper review everyday

2) This is my personal ritual. Every weekday I check out recent entries in "Hardware architectures" and "Emerging Technologies" section of arXiv.org Many other sections are gaining traction with me, like Database.

Hope this helps.

Thank you for The Morning Paper

You'd also like Fermat's Library

Oh yes, I do read Fermat's library, but that's more into pure mathematics side, which means I need to put in 3X the efforts to understand the papers, which I do.

I'd recommend taking a look at Papers We Love


Loved the two-minute papers channel that I came across on this list. Thanks!

I found this paper after reading through Jeff Dean's talk at NIPS. I recommend that, too, obviously!


You can generally find a paper for each section.

I saw this paper on reddit.com/r/machinelearning - usually the more interesting ML papers are posted there.

Additionally, there is http://www.arxiv-sanity.com/ which sorts new machine learning papers by popularity.

That was a good read, thanks for sharing this.

I've read that paper, and I wasn't particularly impressed. Is there any concrete evidence that learned indices scale to databases with complex schemata and a large transactional volume? Could learned indices be profitably used, say, in the database backend of an ERP system?

There are immediate applications to data warehousing. I hope that's obvious to you, but if not I'm happy to unpack that.

Whether or not it applies to online databases probably depends on several factors. The ones I can think of off the top of my head are the read/write ratio, the write rate in general, and obviously size of your data set (although the approach works at GB, not TB, so most can take advantage).

Here's a way, I, a systems engineer who doesn't design databases, could imagine using it immediately:

1) Do I have any indexed queries? If not, nothing to do here. 2) Does the data change rapidly? If so, nothing to do here. 3) Okay, now I'm in the zone of profitability:

1) investigate just letting the model do all the work. This is unlikely to be successful in many cases, but it's pretty easy to try.

2) investigate "model-assist", in this mode I let the model give me an answer and, if I don't like it, I use the B-tree. The model is fast enough that this is not a huge hit to my overall performance.

3) investigate "model-replace", in this mode I build my b-tree and do everything like normal. In the meantime I'm training my model. Once my model seems well-trained (I could hand it a randomized subset of queries and, once the accuracy and performance beats the b-tree I consider it well-trained), I switch to using it. As data changes I could potentially go back to the b-tree, update the model (literally throw it away and re-build) and do it again.

4) Investigate "co-opetition" -- make the b-tree a component of the model. If you read the paper, you see how they layer different models. A b-tree is just a particular kind of model that happens to be computationally expensive but has high accuracy. You could include it in the model graph and let it compete for queries like everything else.

Anyway, I'm speculating here, but it seems obvious this has immediate impact in one large space relating to databases (data warehousing) and potential impact in many of the rest.

> There are immediate applications to data warehousing.

I am not too worried about data warehouses. Data warehouses can be fed by batch processes, which you can run, say, once every day or once every week. This leads to the design of algorithms that optimize aggregate measures of performance (e.g. amortized complexity, average complexity, throughput) rather than microscopic measures of performance (e.g. worst-case complexity, latency). Machine learning techniques have a track record of delivering good aggregate results.

However, online databases require optimal worst-case performance. The occasional massively slow operation is not okay - we'd rather every operation be a tiny bit slower instead.

> Do I have any indexed queries? If not, nothing to do here.

Of course I do. Needless to say, complex schemata come with lots of indices that are necessary to speed up queries that pull data from 10-15 tables each.

> Does the data change rapidly? If so, nothing to do here.

Of course the data changes rapidly. Crucially, an online transactional database's input comes from external parties (e.g., public-facing web servers) that cannot be considered trustworthy. Many machine learning techniques are vulnerable to so-called “adversarial examples”. Feeding adversarial examples to an ML-powered index could cause the database's performance to drop drastically in ways that are provably not possible with B-trees.

So, would you take a bet that this won't have been used in any self-evident way within the next 5 years?

Because I would take the other side of that bet. :)

I'm not in the business of predicting the future, since that is beyond my control. What other people do with learned indices is up to them.

I'm just stating my concerns.

Data warehousing is one obvious application: data is immutable; have a sort key. So NN as a builtin once index structure is perfect for such case. It is only a matter of time for commercial analytic DB to pick up this idea, I would presume.

You're right, data warehousing seems like a compelling use case for learned indices. I'm primarily worried about OLTP systems, though.

Leslie Lamport's 1978 "State the Problem Before Describing the Solution" [0]. On his web page the author adds that "The title says it all. This one-page note is as relevant today as when I wrote it. Replace "describing the solution" by "writing the program" and it becomes a practical recipe for improving software."

Herewith, the paper in full:

"After several years of writing papers in computer science, I discovered the basic expository rule embodied in the title of this note. As obvious as this rule may seem, there are fields in which it is seldom observed. (Computer networking is one example.) A typical paper in such a field is organized as follows:

(1) a brief informal statement of the problem;

(2) the solution;

(3) a statement and proof of the precise correctness properties satisfied by the solution.

In order to abide by the rule, the following organization should instead be used:

(1) a brief informal statement of the problem;

(2) the precise correctness conditions required of a solution;

(3) the solution;

(4) a proof that the solution satisfies the requisite conditions.

Although it may not be obvious at first glance, there is a profound difference between these two approaches. In the first, the precise correctness conditions can be (and usually are) stated in terms of the solution itself. Some results are proved about the solution, but it is often not clear exactly what problem is being solved. This makes the comparison of two different solutions rather difficult. With the second approach, one is forced to specify the precise problem to be solved independently of the method used in the solution. This can be a surprisingly difficult and enlightening task. It has on several occasions led me to discover that a "correct" algorithm did not really accomplish what I wanted it to. I strongly urge everyone to observe the rule.

(I am ignoring as unworthy of consideration the disturbingly large number of papers that never even attempt a precise statement of what problem they are solving.)"

[0] https://lamport.azurewebsites.net/pubs/state-the-problem.pdf

> As obvious as this rule may seem, there are fields in which it is seldom observed.

Indeed. Many -- perhaps even a majority of -- startup landing pages break this rule.

Thanks for this, this one is very relevant to industry(or at least to mine). The amount of software designs and programs that don't actually fit the technical/business problems from the get go is astounding.

Statecharts: a visual formalism for complex systems: http://www.inf.ed.ac.uk/teaching/courses/seoc/2005_2006/reso...

Even though it's from 1986 it's enlightening how useful it is to think about reactive systems using statecharts - especially as a UI developer this seems to make total sense and was a ah-ha moment when I first read it a couple months ago.

Elegant and simple. Thank you for sharing it even though it is over 30 years old.

Next time you happen to be at a store that has the rare, old fashioned-type automatic doors that swing out (not slide in/out), hurry out before your friends and stand on the pressure pad that is in the direction of the door swing out. No one will be able to exit.

This is a simple example of a state machine in the real world that I like to use to explain more complex state machines.

It's an old one, but this year I read The Emperor's Old Clothes by C.A.R. Hoare for the first time and it got me started in a Hoare rabbit hole, just a brilliant guy.

I read the annotated version on Fermat's Library - great source of interesting papers (http://fermatslibrary.com/s/the-emperors-old-clothes).

seL4: Formal Verification of an OS Kernel (from 2009).


seL4 is about 9000 LOC. So this gives a good indication of what formal verification (Isabelle/HOL) is currently capable of. seL4 is also quite fast as a result of removing unnecessary checks.


seL4 is smaller than L4Ka::Pistachio and it's also capability based which L4 isn't. They could have called it L5 or seL5.

Note that this is formal verification of an imperative, C-based program manipulating low-level structures done years ago. I wouldnt say that represents what we're currently capable of in general case. For starters, a functional program or imperative without pointers (eg in SPARK) is much easier to verify than a C program. Plus, the seL4 organization recently reduced effort for filesystems down by a few multiples with a functional language.

Truth is the seL4 team set themselves up for a harder-than-usual job. For good reason but most folks (esp managed languages) won't see as much difficulty.

I want to learn formal verification on my own. Do you know of any resources? I got Rolf Dreshler's book on circuit verification, but I would like to master both hardware and software. Would you kindly provide with some pointers?

It's a broad subject and I'm only interested in software verification. It's specialized enough that you should be reading papers and reading about systems. My favorite systems+papers are:

  Coq: The world’s best macro assembler?

  Vale: Verifying High-Performance Cryptographic Assembly Code


Two schools:

* Coq/dependent types. Check out Software Foundations by Benjamin Pierce, et. al. Everything's online. Also see Idris, which has a good book from Manning.

* SMT-solver-based verification of existing languages. See SPARK/Ada (there's a good book but I can't remember the name presently) and the GNAT website. Also, Frama-C, although the documentation is more spotty. Then there's Rustan Leino's work on Dafny and Boogie. Oh, and Why3. And some work on verifying Java code that I haven't played with yet.

Three, three schools:

* Model checking/high level formal specs. Lamport's TLA+, Alloy, possibly Z. There are some good TLA+ things online, I just couldn't get into it. (I feel bad.)

Isabelle/HOL (the one sel4 was done in) appears in none of your two schools ...

This might be because it is the best combination of interactive theorem proving with automated methods that currently exists out there.

As far as I know, Isabelle/HOL is fairly similar to Coq, perhaps with more automation. If there is something like Software Foundations for Isabelle, I'd be slobbering all over myself in excitement to hear about it.

(And I'm still a little confused about how the C code of seL4 is verified against the formal specs. Is there something like Frama-C involved?)

Both are interactive theorem provers, so yeah. But the two school comment said Coq/dependent-types, and Isabelle/HOL is neither Coq, nor does it do dependent types :)

If you are looking for developments in Isabelle/HOL, there is the AFP: https://www.isa-afp.org/

There is also a recent book that might interest you: http://www.concrete-semantics.org/

Sweet, thanks!

Chord: http://nms.lcs.mit.edu/papers/chord.pdf

I think this paper amazing because it solves a complex problem with a simple solution. How do you create a hash function that adjust to the varying number of underlying buckets ? Solution: hash to a circle.

https://en.wikipedia.org/wiki/Consistent_hashing was actually introduced in 1997 by one of the authors of that paper.

If you like Chord then you'll also like Kademlia.

The raft consensus algorithm article [1] and the follow up that modifies it slightly with some changes that are useful when implementing it [2]. Both are written very clearly and very focused on actually using the algorithm in practice. Problems that arise when implementing it are discussed and solutions are proposed as well.

[1] https://raft.github.io/raft.pdf

[2] http://openlife.cc/system/files/4-modifications-for-Raft-con...

I found Knuth’s “Dancing Links” paper [1] very well written and a somewhat easy read (I had to reread certain parts a couple times). I had to write a sudoku solver for one of my classes and I read that dancing links and algorithm x was one way to do it [2]. I then read some things around the internet to apply dancing links to sudoku solving [3] [4]. If you read the Wikipedia entry on exact cover problems there is a section on sudoku as an exact cover problem [5] this is one thing necessary to understand in order to implement a sudoku solver with algorithm x and dancing links.

Knuth even talks about dancing links in his new 2017 Christmas Tree Lecture [6] specifically here at 4:28 [7]. Basically he uses dancing links to solve for certain n in the problem he sets up in that lecture.

[1] https://arxiv.org/abs/cs/0011047

[2] https://en.wikipedia.org/wiki/Sudoku_solving_algorithms#Exac...

[3] http://garethrees.org/2007/06/10/zendoku-generation/

[4] https://www.ocf.berkeley.edu/~jchu/publicportal/sudoku/sudok...

[5] https://en.wikipedia.org/wiki/Exact_cover#Sudoku

[6] https://www.youtube.com/watch?v=BxQw4CdxLr8

[7] https://youtu.be/BxQw4CdxLr8?t=4m28s

You may like to know that Knuth's fascicle 5C of The Art of Computer Programming Volume 4 is going to be entirely about Dancing Links. He's still working on it, but the “incomplete draft” is available at the (hidden) link https://cs.stanford.edu/~knuth/fasc5c.ps.gz — check it out if you're interested; it's already 130 pages of fun.

Wow, this really is good I started it and skimmed the rest of it the first 34 pages are explanations and algorithms of dancing links and the rest of the 130 pages is all exercises and answers to those exercises. Truly awesome, thanks again for the link.

I think he mentioned that it would be in his new book, but I had no idea there would be over 100 pages of dancing links. That is sweet! I’d love to take a look at it, so I could get a better grasp of how it applies to other problems.

Knuth is, almost universally, a joy to read.

An Incremental Approach to Compiler Construction[1] by Abdulaziz Ghuloum.

The author presents a very approachable technique for building pedagogical compilers by starting with a tiny subset of the language - a language that can accept integers and print it and incrementally grows it into Scheme. Every step yields a fully working compiler for a progressively expanding subset of Scheme. Every compiler step produces real assembly code that can be assembled and executed directly by the hardware.

I've dabbled with compilers for a while now and in my experience I found this to be among the best ways to get people excited about compilers. There is some code available from Nada Amin[2] for the curious and I'm re implementing it from scratch again[3].

1. http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

2. https://github.com/namin/inc

3. https://github.com/jaseemabid/inc

Some Were Meant for C: The Endurance of an Unmanageable Language


Bringing The Web Up To Speed With WebAssembly


That link seems to broken. Here the original link https://github.com/WebAssembly/spec/raw/master/papers/pldi20...

Definitely a good read.

Adrian Colyer's review on WebAssembly was very helpful.

I really liked Vlad Zamfir's papers on "correct-by-construction" consensus protocols. What I love about this work is that it unifies traditional BFT consensus with blockchain consensus to make something better than both! Really exciting!

Abstract CBC: https://github.com/ethereum/research/blob/master/papers/cbc-...

Casper the friendly GHOST: https://github.com/ethereum/research/blob/master/papers/Casp...

I found the Abstract CBC paper a little easier to understand, as it goes into more detail about the correct-by-construction process.

The bitcoin white paper (https://bitcoin.org/bitcoin.pdf)

Yeehaw! This was the best paper I read in 2012, and then again in 2013, 2015, 2016, and 2017.

What paper was the best in 2014?


Programming languages

Slepak et al - An Array-Oriented Language with Static Rank Polymorphism [1]

Distributed systems

Kiayias et al - A Provably Secure Proof-of-Stake Blockchain Protocol [2]

Cognitive computing

Boahen - A Neuromorph's Prospectus [3]

Machine learning

Silver et al - Mastering Chess and Shogi [4]

1 http://www.ccs.neu.edu/home/shivers/papers/rank-polymorphism...

2 https://eprint.iacr.org/2016/889.pdf

3 https://pdfs.semanticscholar.org/3767/1e53c9949fa08d9dc150ad...

4 https://arxiv.org/pdf/1712.01815.pdf

* What Developers Want and Need from Program Analysis - An Empirical Study

* Weird machines, exploitability, and provable unexploitability

* What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices

* What Developers ...: http://ieeexplore.ieee.org/document/7582770 (Sorry, no open version seems to be available)

* Weird ...: http://ieeexplore.ieee.org/document/8226852

* What You ...: http://s3.eurecom.fr/docs/ndss18_muench.pdf

Openly accessible "What Developers Want ..."


obtained by pasting the title into scholar.google.com

A General-Purpose Counting Filter: Making Every Bit Count

A better alternative to bloom filters and they provide a well written implementation as well https://github.com/splatlab/cqf


Interestingly this work has nearly identical algorithm to a previous paper. Good ideas are rediscovered.

Counting with TinyTable: Every Bit Counts! http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2...

A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing


"Foraging Goes Mobile: Foraging While Debugging on Mobile Devices" (ftp://ftp.cs.orst.edu/pub/burnett/vlhcc17-foraging-mobile.pdf)

"Toward Principles for the Design of Navigation Affordances in Code Editors: An Empirical Investigation" (http://dl.acm.org/authorize?N37917)

"The Patchworks Code Editor: Toward Faster Navigation with Less Code Arranging and Fewer Navigation Mistakes" (http://dl.acm.org/authorize?N84177)

The most useful paper I read this year was Tidy Data (Wickham) http://vita.had.co.nz/papers/tidy-data.html

I read a lot of good ones but I'll answer what was most important rather than best. We're seeing a revival of formal methods ranging from lightweight (TLA+) to heavy. Two problems are in my sight: lack of framework plus consistent, empirical data for evaluating how suitable a given method is for a specific project or company; many who would use lightweight or practical methods will quit because someone convinced them to start with heavy ones. The paper below is a start on the former:


I think that was a great start on an evalution criteria because its comparisons match what I read in the various experience reports. ASM's and TLA+ coming in the lead on usability but B method on code generation. It will also help to note what projects have been completed or reusable libraries available in each. A person building verified compilers for functional languages might want to learn Isabelle/HOL due to CakeML, doing a new TLS extension learn F* due to miTLS, type-safe assembly learn Coq due to CoqASM, and so on.

For the other issue, I made a proposal on HN first and then on Lobste.rs about avoiding people quitting early from finding formal methods too overwhelming or useless. I proposed having a default intro or lure that matched the goals of different people to tools with a warning of some's difficulty. It got some interesting responses (below) I'll turn into a mini-essay with links eventually.


TLA+ is by no means considered a lightweight formal method. It is proven relatively complete, i.e. anything that can be proven about a program, can be proven in TLA+. It can be used to only partially specify and verify a system, but so can Coq (usually, the term lightweight formal methods is applied to tools that can only specify/verify partially, or that that is their main intent).

The main difference between TLA+ and other "deep" formal tools (Coq/Isabelle/Lean etc.) is that it is much easier to learn and apply. This is not because it is by any means lighter-weight in terms of software verification, but because of specific design choice which make it unsuitable to the primary goal of those other tools, namely formal proofs of "high" mathematics and the definition and exploration of novel logics. Those tools are so much harder to learn and use than TLA+ mainly because verifying programs is not their main goal.

As a general math/logic proof assistant, TLA+ could certainly be considered lightweight, though :)

Maybe I should've said TLA+ w/ model checking since that's definitely lightweight versus full, formal verification. Far as Lean, it's specifically designed for proving stuff about "complex systems."

> Maybe I should've said TLA+ w/ model checking since that's definitely lightweight versus full, formal verification.

Why do you say that? Model checking is the most common form of full formal verification. It is true that because most TLA+ specifications -- unlike most software that's verified with model checkers -- has an infinite state space and that TLC, the model checker that's packaged with the TLA+ toolbox is an explicit state exploration model-checker with not support for infinite state spaces, then most of the time it is used on a finite-state derived specification and so doesn't fully verify the spec. But that is, AFAIK, not what is normally meant by "lightweight formal methods" (although I am not sure there is a very precise, well-accepted definition). But people take all sorts of short-cuts when working in Coq, too, and very rarely fully verify every aspect of the program. In fact, the only programs that were ever "fully" verified was 1. very small (much smaller than the vast majority of programs used in industry; I think jQuery is several times bigger than the biggest program ever "fully" verified end-to-end), 2. were heavily simplified in terms of algorithms used, and 3. took a loong time and required experts. So if that's how you'd classify "lightweight", then "full" doesn't exist in any meaningful sense.

> Far as Lean, it's specifically designed for proving stuff about "complex systems."

No, it's a Coq-style, general-purpose proof assistant. In any event, Coq, Lean and Isabelle are all designed for researchers. They are not aimed at industrial use at all. I know that the people behind Lean -- all accomplished experts in formal mathematics -- have made some claims about its suitability for verifying software, but AFAIK, none of them has any practical experience in software verification in industry, and so I find their claims to be entirely hypothetical. It's a beautiful proof assistant, but pragmatic software verification is not its first nor second goal. In contrast, TLA+ was designed with the help of actual "ordinary" industry engineers (at Compaq), who were using it for a large, real-world project, as it was being developed. That's the main difference between TLA+ and Coq/Lean -- it was designed as a product for engineers, not as a research tool for logicians.

" But that is, AFAIK, not what is normally meant by "lightweight formal methods" (although I am not sure there is a very precise, well-accepted definition). "

There isn't. The general usage in CompSci for that is notations that beginners can learn easily, apply with minimal effort, and get results with. TLA+ model-checker knocking out hard-to-test errors in protocols without having long process of formal verification qualifies it. SPIN was popular before it for a lot of similar stuff. Alloy is another commonly called lightweight in the literature with lots of resources for learning. Abstract, State Machines (ASM's) was another one engineers picked up super-easy that caught problems in all sorts of application areas. I think the consensus is the training time has to be tiny, the method should be applicable without a paid specialist constantly helping, and the method should work on useful software with not much extra cost. I say useful rather than practical because some lightweight methods are just shown to work better on academic examples than full formal whereas others were used in actually solving practical problems. There's often overlap, though, where small problems former handles indicates they might shake problems out of a component in a real project which sometimes happens.

"Model checking is the most common form of full formal verification. It is true that because most TLA+ specifications -- unlike most software that's verified with model checkers -- has an infinite state space and that TLC, the model checker that's packaged with the TLA+ toolbox is an explicit state exploration model-checker with not support for infinite state spaces, then most of the time it is used on a finite-state derived specification and so doesn't fully verify the spec."

You just contradicted your own point (and mine) there. The model checkers usually only do partial verification due to state explosion. "Full, formal verification" shows the property applies in all executions typically using some logical methods. The model-checkers are therefore not full, formal verification because they don't cover all states. They're partial verification that gets one some assurances with minimal effort vs full, formal verification that aims for max assurance at high effort.

The combo of ease of use with finite-space, model checking makes that form of TLA+ a lightweight, formal method. One could certainly use it for heavyweight stuff with proving techniques. Most won't, though. That's why my generic comments on TLA+ are about model-checking. I still will be more specific in future.

"They are not aimed at industrial use at all. I know that the people behind Lean -- all accomplished experts in formal mathematics -- have made some claims about its suitability for verifying software, but AFAIK, none of them has any practical experience in software verification in industry, and so I find their claims to be entirely hypothetical."

I was saying it was designed for verifying programs/systems which you said it was not. You contradicted the authors' own claim about its purpose. I corrected that. I don't follow Lean closely enough to have been making any broader claim past what their intention was. Coq and the HOL's are used in industry for high-assurance systems. I guess whether your claim about them is true depends on what the definition of "aiming for industrial use is." Others used in industry in past or present include ACL2, Z, and B method.

I'll be the first to agree each has a main goal of being designed by experts in such tools for use by veterans of or experts in such tools. They surely aren't putting a lot of effort into making the tools easy to apply in industry like we saw in TLA+. That's not their priority. They do get used, though, so not useless for that. Just needs costly specialists. So, they are tools with industrial application that have gotten excellent results in quality/security delivered that are just hard and expensive to apply with a need for specialists. There's a lot of things in industry that work like that where average person you pull out of college can't replace the expensive specialist. We don't usually pretend in those cases whatever the specialist is doing isn't fit for industry. We just recommend or use it very carefully while minimizing a need for it.

That said, I'm definitely one of the people in favor of getting all that stuff or something similar in shape for better industry adoption. Alternatively or in parallel, attempting to stretch user-friendly methods like TLA+, Alloy, or Design-by-Contract to hand-prove whatever properties the others do. Also, doing one or both until there's an ecosystem of reusable, verified pieces that can be composed for future projects like concept of Verified Software Repository or how so much is building on CompCert's Clight. One of only ones with industrial focus was SPARK Ada. There's a world of difference between effort involved for a newcomer to learn and use that versus Coq for a similar program even they were gonna try to run the Coq stuff through an automated prover. Like TLA+, SPARK was designed and improved by engineers for other engineers being tested regularly by use in engineering projects. I agree with you that the field needs a lot more of that thinking and investment into delivering it.

> TLA+ model-checker knocking out hard-to-test errors in protocols without having long process of formal verification qualifies it.

But that's the mainstream of formal methods. 99% of formal methods in industry is using model checkers or static analyzers. What you call "a long process of formal verification", meaning, I assume, formal manual proofs is almost nonexistent outside academia. So what you call "lightweight" formal methods is normally just called "formal methods". What you call "full" formal methods is, for the moment at least, a research conjecture.

> The model checkers usually only do partial verification due to state explosion.

That's quite inaccurate. To be called a model checker, a model checker must check a model (in the formal logic sense) -- i.e. it must exhaustively search through the entire state space (some model checkers can exhaustively check infinite state spaces), as opposed to, say fuzzers or concolic tests. What happens in practice most of the time TLA+ is that the model checker exhaustively checks a finite-state specification (or a finite-state instance), which is derived (by the user) from an infinite-space one. Model checkers are not "partial verification". They can be used to do partial verification. But so can proofs.

The theoretical difference between model checkers and proofs is that the latter is based on proof theory, while the former is based on model-theory. The challenge for model checkers is trying to automatically cover huge or infinite state-spaces; proofs have the same challenge, but they work manually with the hope that the human prover would be able to assist (this can actually be done with model-checkers as well, using refinements in TLA+, but TLC specifically is not that sophisticated). This hope is far from realized. In practice, most of the time, proofs hit a limit much sooner than model checkers, which is why they don't scale as well in practice, and so are barely used (when they're used, it's usually to tie together results from model-checkers and/or other automated tools).

> The combo of ease of use with finite-space, model checking makes that form of TLA+ a lightweight, formal method.

Again, virtually all formal methods are like that, except those that are mostly used in academia. Manual formal proofs are not only not used because they are so prohibitively expensive, but because they're simply not needed.

> You contradicted the authors' own claim about its purpose.

That's right. Those authors -- as accomplished as they are in their field -- have no experience in industrial software verification. Their claim is mere conjecture. I don't know how someone is supposed to design a tool to serve a certain market if they've never researched that market. I find Lean to be very interesting (I'm playing with it as a hobby), but its authors make some unsubstantiated claims that are well outside their area of expertise. I remember their introductory paper sometimes says things like "in practice", when it is clear to anyone who's actually done that in practice that, at best, they could say "in theory".

I should, however, qualify that Lean does have some features more oriented towards software verification, like -- most importantly -- integration with SMT solvers (which pretty much amounts to a model checker). But Lean does not differ in that regard from Why3 or Boogie, and I assume that's not the part of Lean you were referring to.

> I'll be the first to agree each has a main goal of being designed by experts in such tools for use by veterans of or experts in such tools.

Those are not just veterans and experts, but mostly academics working on research projects. There has not been a single piece of large or even medium-sized software verified end-to-end (let alone "fully") by anyone, ever, experts, veterans or otherwise. What you call "full" formal verification does not exist in any shape or form outside our hopes and dreams.

I think that the confusion between methods that are now purely research stage and are nowhere near ready for practical adoption, and actual working formal methods, does a lot of harm to formal methods. In general, I see a worrying trend of pure researchers, who have done little or no applied research, let alone actual industry work, presenting their pure research in a way that can be mistaken for "nearly ready for primetime" does a lot of harm for both research and industry. The field of formal methods in particular suffered a research winter (dried up funding) precisely because of such over-promises. Again, no one has ever "fully" verified a complex, large or even medium-sized real-world software, and even small, simplified software has virtually never been "fully" verified end-to-end outside academia. It's fascinating research, but its no more than (pure!) research at this point.

In fact, I know a company that makes a commercial formal verification product (a very sophisticated sound static analyzer) that can also integrate manual proofs in Coq or Isabelle. One of their customers wanted to try writing Coq proofs, and the team -- all formal math academics -- actively tried to dissuade them (I assume they succeeded) from doing that. How? By showing them how it's done.

> That said, I'm definitely one of the people in favor of getting all that stuff or something similar in shape for better industry adoption. Alternatively or in parallel, attempting to stretch user-friendly methods like TLA+, Alloy, or Design-by-Contract to hand-prove whatever properties the others do.

All formal methods designed for practical use are more-or-less user-friendly (or, at least aim to be), or they wouldn't be used by anyone. "All that stuff" as in tools that are actually able to "fully" verify large, complex software cannot be made user friendly as it does not exist at all. I also don't understand what you mean by "stretching" TLA+. I've used TLA+ extensively and have been learning Lean for quite some time, and there is nothing Lean can do when it comes to software verification that TLA+ can't (although, there are different relative strengths). In fact things like refinement or specification/verification of interactive/concurrent/distributed software -- all things that are of great importance in practice -- cannot be easily achieved in Coq/Lean/Isabelle without some serious work of embedding a logic like TLA in them (as, in fact, has been done when specifying/verifying such software). If anything, I would say that out of the box, TLA+ is more powerful than Isabelle/Coq/Lean for a big portion of real-world software (as most of the software people write today is interactive/concurrent/distributed).

Which is why I don't understand in what sense TLA+ is any "lighter" than Coq or Lean. The difference in power between Coq/Lean and TLA+ has nothing to do with software, and everything to do with researching novel logics and formal mathematics.

I think that you are grossly misestimating what has actually been done in CompCert and other similar projects. Even the main author of CompCert readily acknowledges that model checkers scale better. A closer look at what they did would give you a better picture of the true state of affairs.

Lifeguard : SWIM-ing with Situational Awareness


I probably enjoyed 'Mastering the game of Go with deep neural networks and tree search' the most and especially appreciated the 'Methods' section where they go into a good amount of detail about their implementation.


Dullien, Thomas F. "Weird machines, exploitability, and provable unexploitability." IEEE Transactions on Emerging Topics in Computing (2017). http://ieeexplore.ieee.org/abstract/document/8226852/

“The Evolution of LISP”, Steele/Gabriel (1993).

PDF: https://www.csee.umbc.edu//courses/331/resources/papers/Evol...

80 pages, very well written and interesting.

"Quantum advantage with shallow circuits" by Bravyi, Gosset, and Koenig https://arxiv.org/abs/1704.00690

What's beautiful about this paper is that it shows a provable separation in the power of quantum circuits over classical circuits. That is in contrast to many cases where quantum advantage is only suspected (the most notorious being Shor's algorithm: Factoring could be in P).

Another cool thing is that this separation is fundamentally connected to quantum entanglement and the challenge of simulating measurements on multipartite quantum states with shared randomness and classical communication. This gives me some hope that this points to a path for future deep insights separating quantum from classical computing.

Does anyone have any suggestions on finding CS papers worth reading beyond those papers listed here?

Follow the citations of papers you liked. Note the authors and read their other publications.

Look at other papers in the same conference. Watch conference videos on YouTube.

Skim at first, and then reread papers that pop back into your mind.

In my experience, you read a lot of papers that sound interesting and then some of them turn out to be good.

What topics are you interested in? Following the top ACM and IEEE conferences in that area is a good start!

For example, I'm interested in human-computer interaction so I read CHI and UIST papers each year, and for software engineering I read ICSE and FSE.

Ah, if only CHI and UIST had good papers these days. :/

Hey, I had a CHI paper this year! I think it publishes many great papers each year.

Welcome to the echo chamber.

In my experience, the only people who think CHI publishes great papers are the people publishing at CHI.

(Disclosure: I've published at CHI a few times myself. I'm done with it.)

A company recruited me at CHI for an internship/collaboration and implemented portions of my research into their product, so it isn't a complete echo chamber.

Cool! What company and product? I'd love to see a positive example.

I remember LISA and USENIX.

See PapersWeLove

Emergence of Invariance and Disentangling in Deep Representations


"we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion"

Not originally a CS paper, but this one describes a distributed algorithm medieval merchants used to simplify their debts graph: http://eh.net/eha/wp-content/uploads/2013/11/boerner.pdf

I highly recommend it to anyone interested in CS, history & finance.


My second real job out of college was with IBM's Taligent Project Office (where I independently invented mobile IP routing); I bailed as the wheels were coming if the IBM/Apple partnership.

May I ask why people read cs papers to gauge if I should be reading them?

I read CS papers to add more tools to my (mental) tool box.

Dynamic Routing Between Capsules the long awaited paper from Geoffrey Hinton (although Sara Sabour is principal author, Hinton has been hinting at this for a while now) on an alternative to back-propagation https://arxiv.org/abs/1710.09829

who down-votes a paper suggestion, and is too afraid to explain why.

Here are some experiments on more complext data and with more layers: https://arxiv.org/abs/1712.03480

Unfortunately, it does not look very promising so far.

"Memcomputing NP-complete problems in polynomial time using polynomial resources and collective states" http://advances.sciencemag.org/content/1/6/e1500031.full

Could you tell us a bit about the reñevance of this work?

They claim they can actually build things that compute np-complete problems in polynomial time for real.

"We show an experimental demonstration of an actual memcomputing architecture that solves the NP-complete version of the subset sum problem in only one step and is composed of a number of memprocessors that scales linearly with the size of the problem. We have fabricated this architecture using standard microelectronic technology so that it can be easily realized in any laboratory setting"

This is traditionally what people look to quantum computing to solve, but that seems much farther off in practice that the technology described here, at least, as described.

TL;DR Memcomputing has been shown to have the same power as non-deterministic turing machines. They claim to have made some real ones with promising results. It looks like it's getting commercialized over at http://memcpu.com/.

I'm just summarizing the papers/data, not commenting on it for real, it could all just be snake oil

They have pretty much exaggerated their capability. I implore any curious HN reader to check out Scott Aaronson's review of their work & how he debunks it.

thanks, link for the impatient https://www.scottaaronson.com/blog/?p=2212

What Makes A Great Software Engineer?


Metaphors We Compute By Code is a story that explains how to solve a particular problem


Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact