This is incidentally a problem also faced by radar systems: if more than one radar pulse is in flight at a time, it's not possible to uniquely distinguish an echo.
One solution is to not send single-frequency pulses, but instead send chirps (https://en.wikipedia.org/wiki/Chirp). These pulses _can_ overlap, since the end of one pulse and the beginning of the next can be distinguished by their frequency components.
> But unlike network packets, we lose all context once the waves leave our pipeline and we have no way of uniquely identifying each wave.
It would take additional processing, but this (which causes the above problem) could also be remedied by encoding real information (a sequence number) into the packet using ordinary signal modulation techniques.
More information on it and audio latency in general can be found in the Ardour manual here: