A big part of writing the supremacy paper was optimizing the simulators. We did three different styles of simulation:
1) A state vector simulator with hand rolled SIMD assembly (called qSim; not yet publicaly released). This required too much space at 53 qubits. (Well, unless you're going to use the majority of all disk space on summit, which brings its own obstacles. IBM says they can run it that way in a few days, but we'll see.)
2) Treating the quantum circuit as a tensor network and doing optimized contraction to avoid the space blowup (called qFlex https://github.com/ngnrsaa/qflex). This required too much time at 53 qubits.
3) Custom code written to run on a supercomputer instead of distributed computers.
There's only so much effort you can put in before you have to call it. I think it's more likely for an algorithmic break to save a factor of 10 than for optimization to save a factor of 10 at this point.. although someone should probably try using GPUs or FPGAs.
I also take the view that if it takes a month to produce a new optimized implementation of a classical simulator that beats the quantum hardware, then the quantum hardware is still outperforming classical hardware for that month. The theoretical bounds are important, but in the day-to-day context of a race they aren't directly relevant.