Playing the devil's advocate here: might I ask why? The paper seems pretty thorough w.r.t. the description of used corpora, models, and hyperparameters. They even point to the exact implementation of their evaluation scoring and include a few examples in the paper itself. Even if they put up a demo instance for the required infrastructure it would be dead as soon as it hit HN and, as research code goes, likely a security hazard to wherever it's hosted.

In my view there seems to be enough here to replicate and validate the claims yourself if you wanted to. With a basic level of trust in academic integrity I'm completely fine with this paper.

