Hacker News new | past | comments | ask | show | jobs | submit login

There’s really no harm in doing that, and it’s still a pretty good idea.

I generally try and get my data sources as far as possible with the database, then leave framework/language specific things to the last step, means that-if nothing else-someone else picking up your dataset in a different language/framework toolset doesn’t need to pick up yours as a dependency, and you’re not spending time re-implementing what a database can already do (and can do more portably).




The only downside to letting the database do some of the pre-processing is that I don't have a full raw data set to work with within either R or Python. If I decide I need a an existing measure aggregated up to a different level, or a new measure, I've got to go back to the database and then bring in an additional query. So I have less flexibility within the R or Python environment. But you make a good point: there's trade offs either way, and keeping the dataset as something like a materialized view on the database makes it a little more open to others' usage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: