Author here. The algorithm used here is based on Google's 2007 paper "Scaling Up All Pairs Similarity Search." Since then I am sure they have started to look at billions of sets. Generally speaking exact algorithms like the one presented here max-out around 100M on not-crazy hardware, but going over a billion probably requires some approximate algorithms such as Locality Sensitive Hashing. You maybe interested in the work by Anshumali Shrivastava.

