I'm now doing the NYC Data Science course (https://online.nycdatascience.com) but am getting concerned I may have missed the boat. What does HN think?
- The fundamental skills that you need are mathematics and software engineering. Depending on your background it might take years of additional studying.
- There is a big oversupply of people for the junior-mid level data science jobs. There are more people who want to get in the field than there are jobs. If you drastically switch careers, you'll take yourself out of a field where your skillset is incredibly rare and your competition is limited and put yourself in a place where everyone else wants the same job.
- The fact that you have a PhD is going to help you. Personally, I don't think that a PhD in a field other than mathematics/computer science is that relevant but employers tend to favor applicants with PhDs mostly because there are too many candidates for any given job and asking for a PhD is just a strong initial filter. There are also research jobs within data science for which a PhD requirement (in a relevant discipline) makes more sense but these are a small proportion of all the data science jobs.
- If you're already employed with your agriculture PhD, there must be a number of opportunities for you apply the techniques that you're currently learning wihout leaving the industry. That's probably the path that I would suggest - it would allow you to expand your skillset without taking big risks and you'll have more options in the future. Use the career capital that you already have and explore your options instead of making a sharp turn in your career direction that might leave you disappointed.
This is real gold. If you have existing knowledge about some area than you can learn and apply those things, thus you'll get the traction that you want earlier, and it would be more rewarding in the end.
If there is a chance, I won't miss that opportunity.
Without knowing all the details of your situation, it seems at least a much lower risk path to acquire some data science skills--maybe your company will even pay for it--that you can pair with your existing domain knowledge.
This means it has nearly infinite caveats and assumptions. A specifications doc or readout will never sufficiently express all of these. Especially if humans were involved in the data generated.
Consequently, the most useful data products are going to turn on whether of not you (did this small thing) to (correct for this obvious bias or flaw that anyone familiar with the industry knows).
Yea, sorry, but part of your job in data sci is to collect the right data. Data doesn't magically exist and we are not stuck with what's out there. A data sci job is to figure this stuff out. Tech has a weird culture of not doing their job. Kind of like the Zip Recruiter ads. "Working as a hiring manager, hiring new people is the worst part of my job." Bitch, that IS your job. If you dont do that, what's the point in keeping you around? Bee keepers collect honey. Yea it's not exactly easy if you're not careful, but they dont bitch about it because they knew what they signed up for.
Data sci/analysis is about collecting and analyzing data, in not straightforward ways. Because if it were easy and didnt require any effort, why are they needed?
Some companies aren't experienced at building infra to collect data and don't know how to do it. Or their environment is too complex or expensive to sample data from. The data scientist's job in such cases is to do their best with what exists, show success and make a business case for investing resources into data collection infrastructure.
In other cases, when the required sensors don't exist and the information is critical to decision making, you can either buy the data or work with with an engineering group or external vendor to integrate and build out the sensors needed. Need foot traffic data? You can buy from a data marketplace like https://datarade.ai, where there exist various vendors (like SafeGraph -- which was recently used in a COVID19 study published in Nature) aggregating foot traffic data from cell phones. There are datasets that can be used as inferential proxies (so called "alternative data") for the actual data one needs.
Need to collect in-store data? I was at the NRF conference (the world's largest retail tech conference) in NYC back in January and there were a boatload of vendors hawking different types of retail analytics sensors.
In certain small scale operations, you can even engage field operations and get the in-store retail staff to help collect data and upload manually. (you'll need a good relationship with the field supervisor of course)
Sometimes the data does exist but is inaccessible, say in the ERP or in some proprietary format -- then you have negotiate with certain business groups or with OEM vendors in order to get the data out.
It all boils down to whether the data has value that exceeds (by a margin) the cost of collecting them. If the answer is yes, there's often a way to do it (albeit sometimes imperfectly).
Is it part of the data scientist's job description to create or participate in creating data collection infrastructure? I guess this depends on the company but for many companies the answer is yes.
But yes, data collection should be part of their job. I'm having a hard time understanding why the person who analyzes the data should have a good word at least in what data is collected.
You're trying to make a point about the fact that you need to push the business and/or obtain data yourself, but I'm saying that can be a vastly more difficult problem than you think (or just flat impossible) at scale.
Part of that is that they don’t understand what we do and weren’t trained. They other part is that the business was just told they needed data people.
What businesses need today are to understand what they’re interested in first, and then hire people with proven experience and knowledge to accomplish those goals.
Having a PhD in any applicable subject in addition to sufficient practical knowledge of data science, machine learning, and AI would be a substantial plus for a niche position in the industry, a non-profit, or even in finance.
Don’t give up.
This is key. What do companies actually want when they hire a data scientist? Actionable products that make their business better.
What does it take to produce actionable products? A data strategy (collection, ingress, normalize, enrich, store, expose), a compute provisioning strategy, data engineering (pull from source system(s), land in target stores in a reliable, available, automated manner), data science, and data application (reporting, integration with target systems, app development).
Which of those components do they typically have? A data scientist. Because they just hired one.
Career-wise, the more unifaceted your skillset is, the more you're limited to employers that already have all the other pieces in place. Which effectively limits you to very large enterprise (~T100).
Start to be able to fill some of the other roles yourself, and you can compete for and succeed in smaller and more interesting opportunities.
So much this. If I have to interview another junior-level DS who has a MNIST project in their github and still somehow can't manage fizzbuzz or a fibonacci function I'm probably going to take up religious asceticism.
EDIT: I said junior, but I meant Senior. We're talking people with PhD's who claim to have done extensive software engineering in previous roles.
My hypothesis is that good stats people probably don't visualize (Aphant) or visualize very specific types of data in a unique way. Without visualization, people tend to fall back to logical thinking - or emotional thinking, depending.
For example, I have a friend who can look at 2D seismic data and see what the underground formation looks like in his mind, in 3D.
It's good though because they raise some really valid points about the importance of intuition. I joke with my colleagues that all we're doing is encoding data as a hilbert space and slapping an algorithm on it, but that elides the fact that intuition like the child is talking about is important for knowing how to build that hilbert space.
The rest is some unsupervised stuff like k-means and PCA.
What would you like to see instead? (INB4 CNNs/RNNs other deep learning topics)
One minor thing to add is that not only is the data science field overflowing with junior/mid candidates, but the entire computer science field as a whole.
Honest question is there any field where that isn’t true right now? Seems like there are no junior jobs in any sector.
- embedded systems programmers, meaning people who know real-time systems, have good C/C++ knowledge, know their way around the Linux kernel and are also able to do basic things with a scope.
- good(!) C++ programmers in general
- people who know devops and software development infrastructure
C++ might not be sexy, but there's a vast amount of legacy software out there which is not going away anytime soon.
By the way junior C++ developer is an euphemism because no company will hire a junior C++ developer. It's the most difficult programming languages that can take years of practices/studying to be able to use.
And there are a lot of foot guns around... ;-)
Why were excellent C++ skills sometimes harmful?
Also, it did in our case take you away so far from the bare metal, that the code was elegant, but slow. It did not play well with our static allocators and allocated/deallocated way too much, especially temporaries.
You might i.e. check out talks like .
Maybe it could be that he didn't know all the requirements up front and his mediocre colleague accidentally wrote code that better suited the company's unstated requirements (e.g. portability to niche systems). But that's hardly the most likely explanation.
I feel that it's rare that companies choose the second road so usually they pay a premium.
The most fun example that I currently have is that my LinkedIn profile is super wide. I market myself as a generalist software dev, because that's what I am. I'm not a specialist and I've dabbled in almost everything.
Naturally, no recruiter contacts me because of that. But you know who does? A European company looking for a reverse engineer.
I found it quite surprising that they were capable of reading that I have some IDA Pro experience, because it's in the fine print of my profile and not at all readable when you skim it.
Wouldn't searching profiles for "IDA Pro" find it?
They search for keywords, and rely on the search algorithm prioritisation and a good looking one-line summary of your profile to pick you out of pages of search results.
They'll find your "IDA Pro" if that's in the list of terms they are looking for, no matter how deeply it's buried in the description of your past jobs. If they don't read your profile they might not even know that it's buried.
I feel this is true for most tech jobs. I see tons of senior job postings, but relatively very few entry-level or mid-level postings.
Having PhD shows that you have committed to one field hard enough and long enough to be granted the title even if it is PhD in philosophy.
Also, if you switch from another career try to figure out how you can use your past experience to differentiate yourself from other people.
For example, I have studied theoretical math and I have spent 20 years as software developer working on varied projects from embedded to algorithmic trading to backend to frontend applications.
I am currently switching to electronics and will use my knowledge to work on projects that require high level of both math and software engineering skills. Think in terms of realtime control systems, building reliable and performant stuff (reliability to differentiate from competition), etc.
Never think of moving as rebooting your career. Always start with mindset of building on top of what you know -- always move forward, not backward.
A little change in mindset goes a long way.
I too am a pure maths PhD with a background in software, but I am somewhat disappointed with the data science world.
> Having PhD shows that you have committed to one field hard enough and long enough to be granted the title even if it is PhD in philosophy.
Yes, exactly. That's what I meant by "strong initial filter".
Excellent summary but this one bit is trying to gloss over reality. It will most certainly take multiple years of additional studying when you come in contact with anything related to software engineering, because that's just what it is.
Of course there is no shortage of people who learn one language, one framework, one tool and sit on their ass depending on the people they work with to pick up the slack. If that's you, none of the things being said in this thread matter - just go get your job, make your money and live your life :)
Which shouldn't be too much of a problem because they have the skills to analyse the market and find a local optimum that suits them.
I'm interested in this space; I do some work with agricultural data acquisition hardware and software (e.g., soil moisture, environmental conditions, sap flow, plant/fruit growth monitoring), irrigation, fertiliser application) and I'm interested in ways this data could be used in predictive models, but I'm not at the stage of being able to focus on that aspect yet (still getting the core data logging/display tech working well, though we’re nearly there).
Feel free to get in touch (email in profile) if you'd like to connect and discuss.
To address your question, I think the world is still mostly at a very basic stage in its use of data analysis and statistics. Most of the talented people are employed on big salaries by a relatively small number large companies with huge budgets for specific applications (e.g., ad targeting, risk assessment, algorithmic trading).
But outside of that, not much is happening, so I think there are big opportunities to apply data science in new fields and make the benefits more widely distributed.
I see what you did there.
But seriously, if you have a background in agriculture (and don't hate it) and want to get into data science, aim for the intersection of the venn diagram between agriculture and data science.
I understand agriculture is getting quite technical and data driven these days, and that can surely only become more the case in the future. Especially if vertical farms and robot tractors become a real thing.
Whoever can crack ground level data with something like Planet's bird's eye view is going to be onto something groundbreaking.
This is the most straightforward route imo. Just pick up some of the skills required to get going, which it sounds like they are based on the post, and just start tweaking with stuff in your field. They already have a great advantage of specialized knowledge about the subject they would be applying it to and Ag, from someone who grew up and worked on a row crop farm, seems very ripe for exploration through data science.
Company B - have no clue what ML or AI is and feeling the heat. They could be a multi million dollar company or a small SMB.
You will always find both these A & B atleast until ml and AI is well democratised. It is not, not even close. We are at the early stage of the curve still, but moving forward there will be rapid growth in the next 5-8years.
You have few options:
1. Start with sql. It’s not hard, join as an analyst and learn to code. Make sure the team or product you join deploys models.
2. Learn basic python and some orchestration tools (airflow, spark or aws/azure equivalent) . Join as data engineer along with basic sql skills.
On the other hand the analytics/sales teams have many DS and MLEs, a DA is basically a “junior” for them, but they routinely have to reach over to an engineering team to do pretty basic SWE skills that they are supposed to cover themselves
What boat do you think you're catching besides a reliable middle class job?
You haven’t even missed the boat in terms of being able to make money off raw buzzwords and zero skills.
At the very basic technical level, there’s infinite work to be done optimising machine learning systems. This includes not just the fashionable issues of faster more accurate (or even less accurate in terms of floating point!) deep learning, but also moving Bayesian approaches like MCMC to multiple cores and GPUs.
There’s infinite work to be done on finding the right topology for a machine learning system. This applies not just to neural network layers but also to traditional stats (i.e. multiverse analysis).
There’s infinite work to be doing in understanding, cleaning and preparing datasets. Something as clunky as tidyverse can’t be close to the final form here. We’ve only just started talking about feature stores etc.
There’s infinite work to be done improving notebooks, integrating better software engineering practise into the workflow but also in terms of productionisation of models created therein.
All this is just platform stuff as well. It doesn’t even touch on the fact that businesses everywhere are terrible at formulating questions to be answered by stats, terrible at communicating those answers and terrible at even knowing this is a valuable endeavour in the first place.
I cannot imagine a boat harder to miss.
I find the data scientist label misleading.
Roughly 70% of of the data scientists I've encountered are actually Excel analysts with little experience outside of a Windows desktop bar Facebook on a Mac. They're unable to use basic software engineering tools such as git, vscode and python. Excel users and their managers are hostile to solutions that aren't excel-like. They will fist-fight you if you restrict them from downloading and exploring data on their computer. Few understand their compliance/legal obligations.
Another 20% are familiar with a wide range of tools - such as Matlab, R, Jupyter notebooks and various ML/AI toolsets. As developers they're unaware of the tech stack, short of "I installed ananconda and it doesn't work" but are happy to work in the cloud and learn new tech. They understand PII requirements and memory/cpu limits but don't always demonstrate the latter in practice. Nonetheless they produce the bulk of your analysis, having studied classification, and reasonably cost efficient if you pair with a SWE.
The final 10% have mastered containers, venvs, wheels, cloud sdks and how to configure their software in environment independent way. They require help to achieve production quality but are great self-starters. Given enough time and support they're able to quickly replicate this effort and teach others. As relative superstars they're in high demand which makes capacity planning difficult. This pushes up their premium.
IMO the best data scientists are 1 in 10. Because we're desperate for quality almost anyone can assume the title meaning the market open to new comers - you just need to be skilled in Excel (harder than it seems - most developers can learn a lot observing an analyst/consultant use Excel).
To answer your question: No - you're not too late. Just by posting here I expect you'll be in the top 30% - an asset in demand.
The reason it's misleading is because the 70% above (who may be called data scientists) are not actually data scientists, at best they are data analysts.
In general, the core difference between data scientists and data analysts is that the former can code in at least one language (SQL doesn't count, unfortunately).
However, because the term data science became so popular, everyone re-branded their analyst roles as data scientists leading to this concern.
Additionally, the post I'm replying to is pretty biased, as the OP talks about productionising models. While this is a major facet of DS work, it's not the whole thing. TBH, I can find people to productionise models a lot quicker than I can find people who can figure out what to model, and how to measure it.
Some of those people are most comfortable with Excel, and while I'd prefer they used a different tool, I can't argue with their output.
Also, the OP here is focused on deployment of Python ML models, which again is a subset of a very, very broad field.
That being said, i agree with most of the categorisations, except that the two critical attributes of good data scientists are a strong background in statistics and data common sense.
Data common sense is a weird attribute where when you look at the numbers and see if they are reasonable. For example, if you are running a mobile gaming company and see an ARPU of $5, something has either gone horribly wrong, or you're going to be a billionaire (assuming you have equity).
This attribute is actually not that common amongst DS people, so it tends to be the limiting factor, rather than ability with containers and deployment (which I do agree is very important).
Unfortunately the phrase's usage has been corrupted by HR departments, and the BA types of job listings now outnumber the "real data scientist" listings.
Unfortunately, it was such a great name that everyone stole it, and they eventually had to call all of their analytics people data scientists (as otherwise they couldn't hire).
I remember being very angry when they changed all the product analytics people to be data scientists as many of them (the ones I knew, at least) we're strictly SQL monkeys.
I believe it makes no sense that you discard the knowledge you already have. You can apply data science methods with the knowledge that you have going to work or creating a new company that works in agriculture using new methods.
Do you want to spend your life doing surveillance and spying on people like everybody else? This is fashionable but people are starting to resist and develop antibodies for it as they understand it more. The TV or the phone that I bought spying or me is not acceptable.
Agriculture will grow enormously in the future with things like LED and other methods to give energy to plants, or plankton or whatever. Drones controlling pests or humidity or temperature. Using natural insects predators for bio farming. Growing materials like cotton directly from cell's cultures.
The methods that are used today for growing marijuana indoors will be applied for more common things when prices go down.
Nobody better than you to identify the markets that will grow in the future. It is also a very good idea if you know (or associate with someone who knows) economics and marketing and selling.
Reason from First principles is extremely useful for identifying new waves that will carry you in the future with no effort.
All those things are hard problems. I have worked on those in the south of Spain and in Holland as engineer and entreprenour.
The real life is not Academia, your title means nothing if you can not apply it and give results, but means a lot if you can. So you will need some time to adapt to a different mentality.
On top of that, you have something incredibly valuable to a budding data scientist: domain expertise. Being able to manipulate data is great, but to most effectively solve real-world problems you have to understand (or communicate very closely with someone who understands) the main problems in a domain. I can't count the number of times I've heard scientists frustrated by their lack of data skills, and data scientists frustrated by some arcane domain fact that stymies their model production.
Far from being a liability (or just a sunk cost) your background in agriculture will make you extremely valuable as a data scientist.
Bonus point it is very easy to learn.
That's a bit of an exaggeration but I will say: most of data science is not the fun stuff. Everyone goes into the field thinking it's all about doing machine learning to uncover astounding insights that will fundamentally transform the business.
In reality about 5% of most data scientists time is spent doing that. Maybe less. The bulk of the work is getting the data and cleaning it up, doing a tiny bit of the sexy stuff, then writing it up into a report or a presentation to give to people who will either not believe it or scoff because they already knew it.
Think about this - there are a bunch of other software developers who know SQL very well. If your advice was true, then every backend developer would be able to immediately land a data science job and do great at it without having to learn a bunch of math, ML-specific stuff and a whole other tech stack.
For many, the math and "ML-specific stuff" ends up being a very small part of the process. For them, data quality and data cleaning take up the overwhelming majority of hours in a given project, and SQL chops will take you much farther in that kind of an environment.
Plus SQL is not going anywhere anytime soon. So worst case scenario, OP will learn a skill that's not likely to be dated in a few more tics of the hype cycle.
I find it hard to imagine successful data scientists who don't know SQL.
OTOH, I find it hard to imagine (even though I've met some) successful data scientists who only know SQL.
I suppose it's necessary but not sufficient.
I maintain one machine learning model that is very core to our business but doing 'machine learning' is a very portion of my job.
> Think about this - there are a bunch of other software developers who know SQL very well. If your advice was true, then every backend developer would be able to immediately land a data science job and do great at it without having to learn a bunch of math, ML-specific stuff and a whole other tech stack.
In some companies Data scientists are very software development oriented but that is not the case of everywhere.
Think about this : software developers who know SQL very well usually don't like cleaning data, they don't necessarily have good interpersonal skills required to solve business problems, they are not necessarily interested in solving business problems, and they may tend to think that more software is the solution to all problems.
I fully disagree. Most backend developers don't know SQL beyond their ORM library or CRUD statements. The business intelligence world has utilized SQL to analyze data and make effective business decisions for 40+ years.
ML is 90% hype to check a box for investors, and the actual business problems could be solved by a semi-competent analyst armed with Excel or SQL, not a bunch of overpaid "scientists" who completed a few Andrew Ng courses.
SQL can become super tricky as well (depending on the context), say you want to get the list of users who are active for 'n' consecutive days from a dataset that has daily user activity for an year. It's not very difficult but needs some effort.
However, for a data science beginner, SQL is the best place to start.
I totally agree with that statement. Being a beginner myself in the DS field, I'm living through this right now in my job. And, as a plus, working with SQL everyday is also helping me a lot to have different perspectives in handling the Python/Pandas DataFrame.
I think previously they'd been used to consuming data from exports and CSVs, scraping websites and plugging into APIs directly. Having to navigate (often messy) database schemas wasn't what they imagined they'd be doing!
Learn Python, it is used universally.
Don't learn R.
Advanced data analysis / machine learning isn’t dated or old-fashioned at all, and I guess will continue to stay (or: become even more) relevant at least another decade. Not all ships haven’t left the harbor.
What I've seen in more recent years with growing supply and maturation of departments is the need for specialization. Can you do hardcore statistics? Are you an ML practitioner? Are you a data architect? Basically, the blended role DS hacker is more commonly (and correctly IMO) relegated to a various analytical and strategy roles.
Honestly you haven't missed the boat and you don't need a formal education, but I would highly recommend having a depth of skills in one or more areas of data science generally with an example or two to back it up. Basic skills are just table stakes at this point.
1980- Guys I want to learn about micro-controllers, is it too late?
1990- GUI programming
2000- Linux, Internet, you name it
In 30 years time (at the very least) there will be still Data Science. So if you are really up to it, id does not matter if you should have started 5 years ago or now. If you suck at it or really dont like it, it would make no difference either.
50-60's: Operations Research
80's: KDD (knowledge discovery in databases)
90's: analytics (statistics again)
00's: Data science
The tools and problems may have changed, but the core skills (statistics, some coding and data awareness) are identical.
For those with solid Math, Stat, Tech skills that require years to master: it is your time to shine
This is a bit too bitter and jaded (I'm not quite this pessimistic), but I think it needs to be said to counter a lot of the rosier advice.
Having a PhD in a non-CS field is a _massive_ negative in the eyes of potential employeers. Even if you're looking at moving into a role where your domain knowledge would be immensely relevant (lots of ag + remote sensing startups these days), you will be seen as underqualified compared to someone with only a BS. You'll be seen as underqualified compared to someone with no BS or a BS in an irrelevant field. You need to be strictly better at multiple roles than anyone they could hire to be considered, and even then you'll be expected to work for 1/3 to 1/2 of what they'd pay someone with only a BS and no experience. Folks usually despise domain experts because they see the role of their company as "disrupting" all of the prior knowledge. You represent what they want to replace and you're likely to disagree with them about key approaches.
You will be much more effective, but no one cares about how much you contribute to the company's bottom line. People only care about appearances.
The appearance companies want is a "self taught college dropout". That holds for data science and machine learning positions every bit as much as it does for developer positions, in my experience.
The upshot of this is that you likely know how to learn quickly. Pick up multiple additional skillsets.
You won't get hired because you're an expert in X field that the company needs + a data scientist. You'll get hired because you can throw together a crappy web app on short notice, or debug their crazy duct-tape-and-glue CI system, or save a lot on their AWS bill by switching some things around. You need those skills _on top of_ being a domain expert and a data scientist.
You have to be able to do more than anyone else they could possibly hire for the role to even be considered. Otherwise, you'll never overcome the fact that you have a PhD in their eyes.
Again, that's the bitter/jaded view. Take the above with a grain of salt.
PS: You might find the book Cloud Computing for Science and Engineering useful - https://cloud4scieng.org/
I see the future of data science benefiting from better software design, e.g. transformative frameworks like pytorch and sklearn, which are powerful tools but hardly fully automated. We'll continue to need skilled workers who are current in the latest software stacks.
It also benefits from what I'll call the "lotto effect", where data scientists will occasionally multiply the bottom line by 10x or more. This is of course rare, but companies will continue to chase that fantasy and hire data scientists because it's too tempting to ignore.
My only advice would be lean heavily into the software side of things. There's too many data scientists who are novice programmers.
It's a 'new kind of job' that's going to be here for the foreseeable future.
While competition might increase for jobs, the number of jobs is likely only grow over time.
Because I must confess that I don't see the immediate connection between an MSc and PhD in agriculture and a (assumingly generic) data scientist job. You should be perfectly capable of performing data science tasks in an agriculture context, but it seems to me you are asking for something different, a "pivot".
At the risk of sounding rude (for which I apologise in advance) are you asking simply whether there's still space on the current bandwagon? If so, I must advise you against it, because employment bandwagons are awful things to get on. Crodwed, badly paid, poorly understood, not that useful, scarcely productive- in short, short-term and not very fullfilling.
Is it just a matter of making lots and lots of money with the skills you clearly have and that you must have worked hard to acquire? Well then, there should be much, much better placements for you, outside the lab, in the sector you studied about.
Edit: in the interest of full discolosure, I'm a CS graduate with an MSc in data science and studying for a PhD in AI, but I'm not looking for data scientist jobs and am not interested in them, because I find them boring, unproductive and unfullfilling. I have actually worked as a (freelance) data scientist for a while.
I've personally spoken to a few large enterprises in the agricultural sector who are just beginning to build out their data science department. It seems like the industry as a whole is just getting started in data science.
Also, there are many promising startups that are emerging in space. E.g. Vertical farming
Ag, farmers, lawn owners are all errr...
"ripe for disruption" as their genetic systems lag and only innovate at a the paltry rate of once per season.
There are many ways to play the field (no pun intended)
from predicting and capitalizing on misfortune to minimizing the same.
I hope some will be most interested in using our talents to mitigate crop failures, maximize sustainable nutrition
or find the optimal low risk carbon sink.
"Data Smart: Using Data Science to Transform Information into Insight" by by John W. Foreman
This is likely the quickest way to start.
Joking aside, this is a very good overview of data science and engineering for the year 2020:
Try to remember the very beginning of Data Science is often cleaning up data in Excel, and then learning to do it with excel functions.. and then learning to do the same in Python.
The market seems to be flooded with students with CS skills. Actually, people from all fields — mechanical engineering, computational biology, EE, math etc — are entering data science. It is worrisome.
The actual skills of light programming (R, Python), data literacy/manipulation and some basic modeling (statistical/machine learning, Bayesian methods, time series, etc) are useful for many job roles, and I think will be considered basic skills for college graduates in the near future. This isn't new - operations research, particularly Six Sigma and quality control, have used statistics and some light programming to solve business problems for decades.
By itself, I don't see Data Science evolving the way promised by schools and boot camps. Most of the positions named "Data Scientist" (at non-tech companies) are really just senior business analysts; I work with a group of them at my company and 90% of their day to day is just extracting and analyzing various reports for other managers and directors. When an interesting (and potentially lucrative) business problem does come along, they usually outsource to a specialized analytics firm and the data scientist helps coordinate that project. (If you have a good data dictionary, a clear outcome in mind, and some basic knowledge of the field it's relatively easy to outsource the advanced work.)
st1x7 had the best advice below -- learn the basic methods and then apply them to your field. If you google "agriculture iot research papers" you'll find tons of examples of people using sensors for data collection and then analyzing the data to improve some process.
TL;DR I see Data Science melting into other roles, but the basic skills/data literacy are useful for almost everyone.
Will data science get you out of the "lab"?
Maybe you can become a farmer, and iterate on patentable new agri-tech you will inevitably develop along the way; then sell that.
Also known as Linear Regression.
1/ No there's no "incredible harm" to the society, not more than any technological revolution in the past and it's bringing more good than bad, like any technological revolution.
2/ You can perfectly do a data scientist formation (or preferably a ML formation), there are thousands of free courses on the net but if you get into a reknowned formation it's better, and go find work. Also try to find a niche such as embedded systems, or robotics, or whatever is scarcer than doing "image recognition" lol
Incorrect at best, apathetic and myopic at worst:
Any atrocity and injustice has been done at all scales by humankind. Ai will not amplify that and actually i think that well used, it could help fight them
You are just choosing to see the wrong side of it because you think it makes you look woke.
Did I once say that they were?
what I doubt are your abilities to actually comprehend.