I imagine that lack of stable reference points would be an issue on the ocean. Both for the algorithms under test and for validation. (I liberally use "algorithm" here as the superset of both algorithm and the hardware it runs on, I think in reality these tests are mostly for correcting parameters that are hard to determine with sufficient accuracy on paper)
If you want to write off an entire booster, adding a dummy tower somewhere in the desert would not add much to the bill and could well be worth it. Or maybe even a not-quite-so-dummy tower, but off-site, where the booster would be scrapped on location (instead of reflown) even in the best case, no matter how hypothetical?
But in a real landing, they wouldn't want to be accurate relative to some GPS goalpost, they need to be accurate to whatever physical reality turns out to be. Reconciling expectations about were the landing location was supposed to be and where sensor input says it is must be of major importance.
That link tells us that the signals are intended to be sent with an accuracy better than 200 cm 95% of the time. Using more frequencies only really helps avoiding errors from bad/misleading reception, so it's hardly relevant in "ideal situations".
Long term stationary receivers can resolve to millimeters, but dropping rockets from orbit isn't exactly a thing that would be considered "stationary" in this context.
If you want to write off an entire booster, adding a dummy tower somewhere in the desert would not add much to the bill and could well be worth it. Or maybe even a not-quite-so-dummy tower, but off-site, where the booster would be scrapped on location (instead of reflown) even in the best case, no matter how hypothetical?