Hacker News new | past | comments | ask | show | jobs | submit login

This is exactly why I'm asking. So, since you are, appearantly senior, in the field, do you have any concrete recommendations?



Sure. Let me start by saying that the data science interview/competency game isn't common knowledge. That is to say, even within our own company, where we are supposed to have standard guidelines, different organizations have different perspectives. So I'm sure other data scientists may disagree with me.

I have some trouble just giving you some full/rich idea, since there is a whole world of possibility. However, I can share some heuristics with you that you may find useful.

The first is, do you have any domain knowledge that would lend itself to a data science project? This would be one step in differentiating your idea, and allowing it to build off of existing ideas you have, as opposed to an off-the-shelf classifier project from a data science project site. This could be anything from biomedical data, to sports data, to market data etc. This will let you highlight your ability to dive deep and apply data science tools to a specific problem. Even if I'm interviewing someone who worked with medical data before, their ability to do data research building off domain knowledge is a strong signal that they will be able to do it again in a new domain.

The second is can you get a semi-novel dataset? Even if it's just writing a fully-fledged python script to scrape some APIs or (maybe) web-pages, something that shows that you hunted down data, and wrangled it, as opposed to downloading data_science_project.csv.

Once you get your data, try to think of a properly engineered way to store it. A csv on your laptop isn't always bad, but familiarity with AWS/Azure APIs and storing your data on the cloud in a 'nicer' format (e.g. Parquet) (or if necessary, in a database).

In your code, can you have a lightweight API to retrieve your data? Again, I'd be looking for something that tells me you can get, store, and retrieve data in line with best practices, so if you're hired and there is messiness/challenges with data, you can manage it yourself rather than needing an engineer to do all the work for you, and your job only starting when you have a csv on your local machine.

Once you have all this, can you thoughtfully try out some different methodologies? As well as interesting exploratory data analysis? This part is harder to give concrete recommendations on, but I'd like to see something that considers the problem space, the data type, and chooses the right algorithm. Then for the algorithm you chose, I'd like you to have a medium depth understanding of how it works below the hood. The bad case is you just get some data, throw it at xgboost or a nnet, and say "well I read the API docs and sorta know how they work."

(as a side note, try not to over-complicate the problem. Always do a simple model as well as the exciting model you want to try, because exciting models usually are hard to manage in production)

Lastly, put it on your github, and really highlight it on your resume or in the interview. I often gloss over portfolio project bullet points on a resume, but I'll always check a github if it exists.

Even if the project is half-baked or not as exciting as you want, having concrete github code I can read is worth so so so much more than any coding question I could ever ask.

Finally, my recommendation is for a data scientist generalist type. I do know some data scientists who are extremely valuable, more valuable than I am, who can't do any of that stuff. Usually they just work in a jupyter notebook using data handed to them. In their case it tends to be because they are so talented/trained in, say, deep learning, that their most value to the team is having someone else do everything for them, while they tweak hyper-parameters.


Thank you very much. I really like your advice. I would like to build on my domain knowledge. I'm a PhD student in cs focusing on algorithms.


While I’m actually that guy who tweaks DL hyperparameters, this is an excellent advice. These are the things I’d be looking for in a DS candidate.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: