| ||Ask HN: I am a data analyst and my code is a mess|
5 points by elliott34 on Dec 3, 2014 | hide | past | web | favorite | 11 comments |
|I have been thinking about if I'd get laughed at for asking this question for a while, but it's gotten to the point where I really need some guidance.|
I have a spaghetti code problem. I am a data scientist/analyst, and my day to day is entirely in python/sci-kitlearn/pandas, data munging and running models. Right now my code is several hundred lines of data processing steps, filtering, lots and lots of joins and sql queries, pickle dumps and loads, print array.shape. I try to create as many functions as possible to help organize the code, and put different parts of the project into different scripts. I utilize ipython notebook on the cloud for the interactive portion of my analysis, and sublimetext2 for the fixed data processing scripts.
Long story short, I have a physics background and was never taught how to properly structure my workflow for this type of coding. Should I be creating more classes and objects?
Are there any resources out there on how to code and structure large machine learning projects like this? Or is it doomed to be spaghetti code?
| Apply to YC