Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The problem is that real world, physical business data is taken from what one can get, not what one would want.

Yea, sorry, but part of your job in data sci is to collect the right data. Data doesn't magically exist and we are not stuck with what's out there. A data sci job is to figure this stuff out. Tech has a weird culture of not doing their job. Kind of like the Zip Recruiter ads. "Working as a hiring manager, hiring new people is the worst part of my job." Bitch, that IS your job. If you dont do that, what's the point in keeping you around? Bee keepers collect honey. Yea it's not exactly easy if you're not careful, but they dont bitch about it because they knew what they signed up for.

Data sci/analysis is about collecting and analyzing data, in not straightforward ways. Because if it were easy and didnt require any effort, why are they needed?



You're both right. Sometimes we have to work with the data we have. Other times we have to create or buy the data we need.

Some companies aren't experienced at building infra to collect data and don't know how to do it. Or their environment is too complex or expensive to sample data from. The data scientist's job in such cases is to do their best with what exists, show success and make a business case for investing resources into data collection infrastructure.

In other cases, when the required sensors don't exist and the information is critical to decision making, you can either buy the data or work with with an engineering group or external vendor to integrate and build out the sensors needed. Need foot traffic data? You can buy from a data marketplace like https://datarade.ai, where there exist various vendors (like SafeGraph -- which was recently used in a COVID19 study published in Nature) aggregating foot traffic data from cell phones. There are datasets that can be used as inferential proxies (so called "alternative data") for the actual data one needs.

Need to collect in-store data? I was at the NRF conference (the world's largest retail tech conference) in NYC back in January and there were a boatload of vendors hawking different types of retail analytics sensors.

In certain small scale operations, you can even engage field operations and get the in-store retail staff to help collect data and upload manually. (you'll need a good relationship with the field supervisor of course)

Sometimes the data does exist but is inaccessible, say in the ERP or in some proprietary format -- then you have negotiate with certain business groups or with OEM vendors in order to get the data out.

It all boils down to whether the data has value that exceeds (by a margin) the cost of collecting them. If the answer is yes, there's often a way to do it (albeit sometimes imperfectly).

Is it part of the data scientist's job description to create or participate in creating data collection infrastructure? I guess this depends on the company but for many companies the answer is yes.


I agree with you too. I think it's a mix of exec and management dont fully grasp the job and its implications if you shortcut too much. At the same time, too many data sci are in it for the keyword/sexiness of the job and are not of the personality type to take hardline stands. Inexperience leads to a lack of trust from higher ups. A lack of backbone from the experienced results in performing more incompetence. Which results in more lack of trust. Experienced personnel leave, more inexperienced comes in and do things the cheap, shortcut, buy bad data way, plus no backbone to combat against this when seen... and you see how this can spiral into a shitshow that I've been noticing in some consulting projects I've been in.

But yes, data collection should be part of their job. I'm having a hard time understanding why the person who analyzes the data should have a good word at least in what data is collected.


Have you collected data from and deployed products to a 2000+ store environment?


Okay, how is my argument changed if I answer yes or no? Is what you're talking about a data sci's responsibility or not? If collecting, analyzing and deploying data in reports or db is too difficult for you, data sci isn't for you. I'm not telling them HOW to do their job. I'm clarifying that you have to DO the job if you signed up for it. Dont like it? Get out. We all screwed up by taking jobs we didnt like. Nothing wrong with that. Get out of the kitchen if you dont like the heat or the smell.


I'm curious about the relative experience you're talking from.

You're trying to make a point about the fact that you need to push the business and/or obtain data yourself, but I'm saying that can be a vastly more difficult problem than you think (or just flat impossible) at scale.


Why single data scientist should be responsible for obtaining data from 2000 stores? Data at scale requires people at scale. Data engineers would do this job.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: