> the second phase of machine learning involves pulling in as many features as possible and combining them in intuitive ways. During this phase, all of the metrics should still be rising
As Google points out, after you build an initial model, the next step to increase accuracy is to perform feature engineering. They explain that this can be done manually or automatically using something like deep learning. Another option that people here might consider is using a library like Featuretools (https://github.com/featuretools/featuretools) for "automated feature engineering". Note: I am one of the developers.
Our goal is to help you increase the performance of your models without sacrificing the interoperability of your features. We have a post up about how our algorithm works here: https://www.featurelabs.com/blog/deep-feature-synthesis/. There are also plenty of real world demos on our website: https://www.featuretools.com/demos
Secondly, for some reason data science just doesn't excite me as much as typical software development goes. Like, why am I not excited enough to go down the path of specializing in data science in field of machine learning? Even if there is more money in it, I'm still not extremely motivated to learn it.
What i do particularly enjoy is good ol' back end web development. I don't have a degree in computer science but working on a information system degree with focus on "programming", I dream/working my ass to become cult of "software engineer" type II, a sophisticated software developer/programmer. I love building layers, optimizing code, learning new tools, algorithms data structure (without knowing math), creating unit tests, following programming paradigm. It excites me so much. And my core skills to dive into is block chain.. I love studying that topic too and all the algorithms it comes with it.
But when I see data science, no excitement. All I imagine is image manipulation and fancy charts. I know I sound a bit ignorant but, that's how it is.
Mindshare or more generally PR. Also to "collect" the talent on their platforms (Tensorflow, Google Cloud, ...). Also these guides were repurposed from existing (internal) guides and are a few years old by now, so the cost is low.
You further describe the role of a data engineer or ML engineer. If you'd approach data science with a focus on engineering and tool use, you could be one of the few dangerous data scientists that is able to go end-to-end (should be safe for at least 5 years when such pipelines are evolved without much human intervention).
> But when I see data science, no excitement. All I imagine is image manipulation and fancy charts.
This is because, while there is legit substance to the hype, the hype is real and it is focused on deep learning ImageNet (and later GAN's, Atari games, Go). Being able to show deepdreamed images and cat neurons is like catnip to journalists. Computer vision is but a very small part of ML and lots of data-driven companies have no need for such skills. Charts are made by analysts.
Everything (including block chain) will move closer to ML paradigm of learning software. Data infra engineers will see their infra increasingly used for ML. It remains all software (very advanced, but accessible to anyone) and hardware (still a asymmetry here between industry lab and practitioner). Don't get left out: Do machine learning like the great engineer you are, not like the great machine learning
expert you aren’t.
>Secondly, for some reason data science just doesn't excite me as much as typical software development goes
Fair enough. Part of the reason is "data science" has been so jammed pack of nonsense and people who don't do the actual work of building things, as you describe below.
> What i do particularly enjoy is good ol' back end web development. I don't have a degree in computer science but working on a information system degree with focus on "programming", I dream/working my ass to become cult of "software engineer" type II, a sophisticated software developer/programmer. I love building layers, optimizing code, learning new tools, algorithms data structure (without knowing math), creating unit tests, following programming paradigm. It excites me so much. And my core skills to dive into is block chain..
Ok this makes sense. But I'd be worried about 5 years from now. When all the little gears and things that go on in backend becomes a commodity (or abstracted away in the "cloud"), what are you going to do?
> I love studying that topic too and all the algorithms it comes with it.
That spark of interest in the algorithms, (which is just about logic, which is what math is basically about in the end), is basically the essence of what makes "Data science" so attractive.
Well, over the last 8 years or so I started out in a similar kind of place, and have gotten quite good at building CRUD and business logic and glue, and fixing crap on the front end, and configuring servers.
Maybe I can stand in for the OP a few years down the line?
Over the last quarter, I've been splitting my time between things like linux admin automation and a set of pre-calculus core classes.
To answer your question on my personal scale, my whole ability to do this kind of work with my mediocre CS education (my BA is in Philosophy, and my PhD work is in Lit) is premised on leveraging the points in the systems where "all the little gears and things that go on in backend have [become] a commodity"... hence I just integrate ERP systems with WordPress or try and clean up some business's AWS drupal hosting setup some crap like that. That's been a fun and rewarding conjunction of my love for systems and the commodification of parts of IT/ programming work.
My hope is that by the time all the little bits of these data science topics become "abstracted away" over the next couple of years, I will understand the general underlying things well enough to use them. But who knows if that is a good bet or not... certainly not me.
However, it feels perfectly fine to learn things like math... I'm way, way better at it than I was as an undergrad 20 years ago and so it's quite a lot more fun for me. It's not like knowing some math has no application outside of this narrow field.
I dunno if my personal answer (keep learning, and enjoy fixing crap) matches the OP or helps extend your points/ question, but I've been getting a lot of fun (and some money) out of following my answer.
Consider a larger organization (1000+ people perhaps), if groups within that org can train their people with these materials or even send them to Google to be trained in this subject matter they can come back with a nice shiny credential. Whether that ultimately becomes useful to that individual or the group is up to them but really it helps google foster that relationship with the main organization to eventually snag higher contract values.
That probably made no sense, but I thought I'd give my two cents (however crummy they might look).
s/Google/someone at Google/
20% time leaves discretionary time for people who're motivated to get something like this started. Official approval may come along the way.
They want to sell TPUs, this is part of generating the demand.
By all means, keep at it! Better to be an exceptional backend dev than average ML engineer. No one can predict the future anyway. It's certainly possible that the ML job surge is gonna stop abruptly when most of the advances have been captured by APIs.
The advantage over other techniques is that one can easily trace the exact math of a conclusion, and tune it as needed. The disadvantage is that one probably has to manually tune it all rather than let the machine "learn". However, a hybrid approach could be used whereby "pure" AI suggests words and phrases to encode.
rule.addList("nigerian, prince", rank=7);
rule.addPhrase("great opportunity", rank=5);
rule.addPhrase("lisa smith", rank = -4); // probably good
Total: 9 Threshold Exceeded!
Category: Tech Support
Total: 1 Insufficient total
From the past, we learn that these systems are brittle and break continuously.
For example, what happens when spammers start using different words, or send legitimate looking emails that are actually spam? Do you think you can build rules to catch 70%, 80%, 90% or 99.99% of spam?
If your goal is simply showing the rules being applied, you can still learn the rules with ML but display them in this way (for example GP suggested looking at Naive Bayes which was the most common method used to fight spam; I'd also point you to decision trees which are easy to visualize).
But what I described had additional purposes such as sub-routing to various departments. It was a multi-purpose email categorizer in the early days of spam. Each approach has trade-offs. I'm not sure how you'd apply a "decision tree" using weights in a way that makes sense to a power user. A non-weighted decision tree seems too blunt an instrument. One generally needs multiple "clues" (factors) voting in tandem.
A good example is the text preprocessing flowchart (also shared by fchollet on Twitter): https://developers.google.com/machine-learning/guides/text-c...
I think there's a tendency on Hacker News and other tech websites to diminish the importance of having a PhD in ML fields. The problem solving and communication skills you learn during the course of a PhD program are precisely the skills companies value when they're trying to solve hard problems. It's important to know not just how to apply ML algorithms, but when they're appropriate.
What always confounded me was the choice of the number and width of hidden layers. This is even now more confusing with the advent of deep and recursive networks. We need empirical work on this, that can be taught in much the same way that gravity is taught as an apple falling from a tree.
We need a determination of the entropy of a network, how to route that entropy and expolit it. Specific scenarios are not adequate.
Is this more advocating for a theory of neural networks rather than empirical evidence?
Honest question, why?
We used to give degrees (albeit hundreds of years ago) for material that now is covered, at a high level, in a single course (e.g. physical sciences). The amount of material to cover, and to master, increases dramatically over time. It makes sense to compress the knowledge to be delivered to a compendium so as to simply keep up with progress.
ML is a last-resort for problems you don’t understand. There are lots of these, but understanding the problem is better.
I don't follow very much blockchain but seems like it was 6 years ago
Indeed Machine Learning/Deep Learning has become much more accessible thanks to the number of free guides such as this. But that means data science job placement will become more difficult as competition increases, with more gatekeeping/requirements (e.g. Masters/Ph.Ds)
Most companies are going to utilise ML to some extent. Once technology and tooling improves they'll need boots on the ground engineers and not labs with R&D teams
I completely understand why there is such a stigma around bootcamps. Nobody can deny that they don't afford the same depth that you'd get at a "real" program. But they can be amazing for career switchers like me, who had no real direction in college. Don't look down your nose at them.
Neither Facebook nor Netflix offer outsiders access to their ML platform, and you completely forgot Azure, which IMHO has the most mature offering of the big 3 in this space.
Of course understanding the theory will be helpful in knowing which architectures are most likely to be productive and what-not, but this whole field is very empirical anyway. So if your experimenting is a little less guided my intuition rooted in theory, that's not exactly the end of the world.
The only thing that makes you think it is easy is because you are just copying what others have been doing and you don't change anything. Try to go beyond that and you will change your mind quite quickly.
The point is, you can do a lot of very useful things with ML, without needing the entirety of the theoretical underpinnings. Of course you can't do everything but not everybody needs to be able to do everything.