IMO dataframes are the reason why dynamic typing fits data science so well. It's certainly possible to represent a single dataframe as a static type; but representing all the slicing, column removal, joins, etc. is actually pretty hard without dependent tricks. So bypassing types for data frames is preferable. On your ML engineering point, the other side of it is that once your dataframe's schema is finalizes it really should be statically typed so that assumptions can safely be made about what is/isn't inside of it
> representing all the slicing, column removal, joins, etc. is actually pretty hard without dependent tricks
Disagree in the strongest possible terms, tbh.
It's the lack of static typing that gets you 3/4 of the way down your experimental pipeline only for your code to fail because column "trianing_batch" can't be found. Huge productivity loss, even with rapid iteration.
We must work very differently. I couldn't fathom that happening to me, if only because I compulsively peek at samples of the data frame every step of the way, in order to make sure the data look reasonable all the way through.