This seems too easy a recipe to be worth it in the medium term - there is no moat. Better cover your data with very strict laws, like google does with its exclusive deals for medical data use.
Like you say it's the access to data itself that is valuable. At risk of oversimplifying: building models is the easy part. Plenty of smart people who can do that.
Data that is expensive to acquire is the best long-term play for an ML company. Either expensive due to regulations or expensive due to the quality of sources. Ideally both.
Thats why this is unfair. The health industry incentivizes (and often mandates) open publishing of scientific results, but patient data is reserved with gold chains for the exclusive use of google