Hacker News new | comments | show | ask | jobs | submit login
Ask HN: What do you use Machine Learning for?
170 points by endorphone 245 days ago | hide | past | web | favorite | 160 comments
What real-world applications are you finding it useful for right now?

I use it as something to worry about not knowing how to use, and how that might make me unemployable in a few years, while also having no obvious need for it at all, and therefore no easy avenue towards learning it, especially since it requires math skills which have completely rusted over because I also never need those, so I'd have to start like 2 levels down the ladder to work my way back up to it.

I've found it very effective in the role of "source of general, constant, low-level anxiety".

This is true of the software engineering profession in general. I think there's a quote somewhere stating "the half-life of a software engineer is 2-5 years." There's constantly new programming languages, new frameworks, new platforms, new tools, new paradigms etc. It becomes harder and harder to keep up with all that as you age and have more responsibilities and require more stability having a mortgage and offspring (although there's certainly individuals that can keep up with it).

Those who are still in school or fresh out of it will be up-to-date with the latest theory and trends. And they will be eager to pour hours of overtime into their new career. Sure, they will have years of practical experience to gain, but that's not usually what they will be interviewed on (from my experience). They will be interviewed on the latest tech and the algorithms and theory they learned in school. Their whiteboard interviews will be another day from class.

Given the influx of fresh blood, I view the software engineering profession as a entry-level type of job, regardless of the varying levels of seniority titles available. The only way to escape the constant churn is move to other roles, such as architect, management, executive or to start a business such as in niche consulting or a startup, etc. Or you can eschew the typical life of mortgage + kids and dedicate it to keeping up with every latest trend, hopping jobs to stay current and hoping you never slow down.

"Those who are still in school or fresh out of it will be up-to-date with the latest theory and trends. "

The theory doesn't change that fast... also trends? Maybe in terms of a new JS library.. but I think it's more of new grads will work for cheap and will take more bullshit. They also are more likely to move to/for the job.

Yeah, sadly, that's a common problem in software development. People often use technologies on projects because their resume needs them, not because the project benefits from them.

I wonder what our technology choices would look like if this weren't the case. In other words, suppose someone said to you "you'll be responsible for these software applications for the next 10 years" vs "you'll need to find a new job every 2-3 years for the next 10 years." Would we adopt and apply new technologies this aggressively? Would we be changing javascript frameworks constantly? Would we necessarily be using them at all, vs a wait and see approach? (btw, yes, depending on the project sometimes we would use them, but my guess is that we would far less often).

It's happened a bunch of times in my career by now. Currently, I think our huge over-application of technology is in the realm of constantly evolving front end javascript frameworks. I still believe, firmly, that most smallish apps would be better handled through an integrated framework like rails or Django, with a bit of judiciously applied javascript when necessary. Or, perhaps, even no framework at all.

But unfortunately, that doesn't get you jobs. Almost all the rails jobs out there now require expertise in Ember, React, etc. So the truth is, it's rational to add these things to your current rails project even if you don't think the trade-off is quite worth it.

Just to be clear, yeah, sometimes it is worth it. I look at it this way - for a while there, most land animals had exoskeletons, even some fairly large ones. Then, endoskeletons came along, and it turned out that to scale up to be a giant land animal, you needed an endoskeleton. Some of the larger animals with exoskeletons started to go extinct, as the more scalable endoskeletons drove them out of their ecological niches.

Here's the part that they don't cover on the nature channel - recruiters started to ask giant centipedes if they had any experience with endoskeleton. Senior architects and technology directs started insisting that fleas, ticks, and flying gnats have endoskeletons, because they attended a conference that told them that if you want to to wooly mammoth size, you need endoskeleton.

Meanwhile, the creators of endoskeleton frameworks are (rightly) all like, "Don't put this on me, I never said a gnat should have an endoskeleton".

Oh as for machine learning... yeah, I suppose there may be some use of spark, map reduce, Hadoop, and so forth that exists to gain experience with these technologies rather than because they're needed. That definitely happened with some of the noSql stuff. But honestly, the unnecessary churn on the back end/data science side pales in comparison to what happens out there in the web world.

I'm a person (software engineer) who knows some ML and I tell this to my coworkers when they express similar sentiments: If you're worried about ML taking your job then you're not thinking straight. If the world has automated to the degree where programmers are non longer needed in 3 or 10 years then you will definitely have other things to worry about. Truckers, farmers, taxis, cops, and construction crews will likely all be automated before programmers are, and when that happens it will cause a huge global crisis and leave millions unemployed. Programmers will be sometime later when we figure out general AI.

Alternative scenario: loss of jobs to automation and inadequate social safety net cause social unrest and communist revolution. We've seen this kind of thing before.

In short: if you're looking at Hacker News then doing ML is likely irrelevant.

My understanding is that the anxiety isn't about a fear of programming being automated in the near future, but rather about ML growing into a required, essential skill for a software developer in the next few years.

> My understanding is that the anxiety isn't about a fear of programming being automated in the near future, but rather about ML growing into a required, essential skill for a software developer in the next few years.

Let's say you were an automobile assembly line worker in 1965, and you were hearing about robots taking over your job at some point. What would you do? That is, what would be the best thing you could do?

If you answer "Get an education in something that robots aren't doing, and try to get out of this career path" - go to the front of the line.

So - if you are a software developer - and worried that sometime in the future (I'd argue that "sometime" will soon mean "in 5 years, maybe less" - just as you note), ML is "rowing into a required, essential skill for a software developer" - what would you do?

Sit back, throw your hands up, and hope?

Or maybe get that education now on how ML works, how to use it, learning the hard math, etc?

Guess which route I have taken? Perhaps others should as well, if they want to continue in their careers...

If robots can design websites in 5 years, it won't be much longer til they can devise statistical ML schemes. Also, if you don't have a background in math, you'll just be finishing your studies of math/CS after 5 years, for reference.

Projects like TPOT and DataRobot are already showing a huge amount of promise for automating the tedious/time consuming parts of ML. But this doesn't mean they'll get good at the hard part any time soon.

I'd say your time/energy is better spent lobbying governments for socialist reform than learning ML at this point so we can avoid the worst case scenario(s).

It's more like concern that not having those skills (or something comparably advanced, but also with no immediate use in my work or life) will force me to slide down, rather than up, if/when competition from overseas finally hollows out the middle of the US developer market—though I'm not sure why this hasn't already happened, so maybe I'm wrong to worry about that at all. Not concern that ML programs will start writing other programs well enough to be a direct threat.

Overseas competition? I'm not sure what you're on about now - it's currently the hardest[1] time in history to get an H1-B with Cheeto man in the White House. Also I think that USA-born and educated developers are still significantly more sought after because of language barriers. If you've tried to get a job in tech recently you'll notice that... it's pretty damn easy.

Honestly it sounds like you're paranoid about your job security and externalizing it. "It's ML, no it's foreigners, no it's the aliens!" I'm a programmer, not a psychologist, though, so don't take my word for it. You can have a look at the sklearn docs though, good place to start on your ML quest.


[1] Not intended to be a factual claim, just a point about the current political climate

H1B? No I don't care about that. I just think it's a bit concerning that the only things keeping this cornucopia full are the circumstances of a language barrier that's only relevant because foreign software companies largely aren't yet competitive with those in the US, and that no-one's fully cracked making remote work a thing on a large scale. That's a tissue-paper thin barrier protecting the wages of 95+% of the people employed as software developers in the US who aren't doing anything remarkable.

My job security's fine as long as all these dumb "Facebook for Dogs", mid-sized software company doing nothing all that amazing (so: 99% of mid sized software companies), and "Non-software-company software division" jobs that comprise a giant percentage (almost certainly the vast majority) of US development jobs remain in the US (or other expensive OECD states). I'm just not certain why they would remain here, in the medium-term. It's not paranoia so much as "gee, why the hell am I paid so much?" The reasons seem pretty weak, which... isn't comforting. It's not like I'm curled up shaking in the corner over it, though.

That last paragraph is funny as hell, and I get what you're saying. I guess I'm just more pessimistic about anyone solving the problems of the language barrier and remote work soon - as with ML, once we truly have global, language-agnostic communication and physicality is irrelevant the world will be a very different place.

I guess my overarching point here is that global economics is one of the more dizzyingly complex things you can choose to worry about so I tend not to.

IMHO, the anxiety is well-placed. I did my first CSci homeworks on punchcards. FORTRAN was a teaching language in those days. The reason I am still relevant as a developer is that over the years I have learned to recognize progress when I see it. (Why am I the one dragging 30-year-old-luddites into the world of Python 3.x???? 2.7 is history. But I digress....)

Back on point... this is one of those times in my career when I have to say "This changes everything." ML changes everything. How we formulate problems, how we architect solutions, how we program solutions.

For practical advice: I started the Udacity self-driving car Nano-degree. (I won't be finishing it, but that is another story.) The first term is pretty much all ML. I found the Udacity course exactly what I needed to get a practical, hands-on intro and ramp-up to the current world of ML. I'm no expert, but I now at least recognize the vocabulary, have rubbed my brain cells against a few simple ML problems, and can move forward and learn more. Udacity also has a ML nano-degree. YMMV.

> I started the Udacity self-driving car Nano-degree. (I won't be finishing it, but that is another story.)

I'd be curious to know why myself?

I'm about 2/3rds of the way thru the second term (I'm in the November cohort), and I plan to finish the nanodegree.

What I have found, though, with this second term, is that it seems to be a rehash of the CS373 course (I took that back in 2012, when Udacity first started) - though it also has taught a few new concepts for me (I like the fact that it is mostly done in C++ - which had gotten really rusty for me - but now I am liking C++ more).

I'd also recommend the Coursera ML course - from what I know, it is basically identical (if not the same material) as the original ML Class from 2011 (I took that one, which was my intro to MOOCs). If you can wrap your mind around the programming language used (Octave - basically an open-source clone of MatLab - where the main primitive is a vector/list), you'll find it does a great job at explaining many concepts that you might have had trouble with in the past (when we got to neural networks and backprop, it finally clicked for me - same with how think in terms to figure out how to parallelize processes).

OK, since a couple people are curious I'll spend the bandwidth. It really is an issue of personal circumstances, not any failing of the Udacity course.

I like the video lectures a lot, and got a lot out of them. I like the quizes and short problems. The large projects are a little too color-by-numbers for my taste, I really would like to see an incremental exercise step in between in order to flesh out understanding more solidly, but if I had enough time I could probably fill than in myself.

But as I said above, my decision not to follow through is mostly personal circumstances. When I signed up for the course I was casting around for something new to do and thought the course would add to my robotics knowledge and cred. By the time my class cohort started, I was working at a robotics startup. Aside from the startup taking a lot of time and energy, doing robots by day, robots at night, and robots on the weekend just got to be a bit much. Something in my life had to give, and Udacity didn't make the cut this time.

Hey - that's a perfectly reasonable answer, and I glad for you on the startup bit (sounds fun).

You're right on the "color by numbers" part - but I think there's a couple of reasons for it. I think that the purpose of the course isn't geared only to ML and self-driving vehicle purposes, but also as a means toward general software development. So they don't expect everyone to be at the self-starting coder level.

Secondly, if they were to, say, have you code each project from scratch with only a bare outline of what is needed, I think they'd have to likely expand the course (more lessons to teach things more in depth), as well as extend the time for the lessons; all of this would mean the course would be much longer, and that students would have to dedicate even more time to the effort, beyond what they already are doing.

Some could do that, but I think that many wouldn't be able to; they have to strike a balance somewhere, and this has been their answer. Plus, I have a feeling that many of their students are currently students going to university in some manner, and not all of them in the United States, but abroad internationally. Many of these students are likely struggling just to make the payments for this nanodegree as-is (and may have contributed to the drop-out rate from term 1 to term 2). Making the course longer and more expensive might put it completely out of reach for those students.

Why didn't you finish? I did the 3xxx self driving course on Udacity and was considering doing the nanodegree as well. Would really love to hear your perspective.

If you can afford it, do the nanodegree.

I did the CS373 course back in 2012 - and I have found that the nanodegree (which I am taking, and I am about 2/3s of the way thru the second term) touches on more than a bit of new stuff that I didn't find covered in the CS373 course (at least back in 2012).

First term alone is worth the price, I think. Learning how to use OpenCV, TensorFlow and Keras, then using that knowledge to "drive" a virtual car (I built a version of the NVidia End-to-End CNN and trained it using CUDA and my GPU at home, using data I derived in the simulator by steering the virtual car using a steering wheel controller I bought just for that purpose).

Second term so far has felt mostly like a re-hash of what was learned in the CS373 - although, learning about extended and unscented Kalman filters for integrating LIDAR and RADAR data was not something covered in CS373 (only the basic Kalman filter - useful in itself, though).

My goal is to finish this puppy.

From what I have heard (anecdotally), when the initial class started, for the first term there were approximately 30,000 students enrolled. When term 2 started, only about 300 of those students enrolled for the new term.

So far, I'm doing pretty well with this. Considering I don't have an advanced degree, that I am a self-taught software developer, and I am going to have my 44th birthday in about a month - well, you know the story about dogs and tricks; that don't apply to me.

Sounds good, thanks for the insight.

Machine learning is very good at solving certain types of problems -- classification being chief among them. I think the current hype is a little overblown. Yeah, great scientific advancements have been made, but that doesn't mean"AI" is about to replace all programmers. It's just another family of algorithms we have in our toolbox for when we need them.

ML can certainly do continuous problems as well, but for many problems, classification works well to solve them. Plus, most problems can be devolved to classification, and classification problems can usually be solved by simpler "classic statistical" methods (regression methods), so are easy and fast to code for (no need for any fancy neural network or deep learning or such, or massive datasets).

I don't think ML or AI (or whatever you want to call it) is going to replace programmers any time soon, either. I do think, though, that having the skills in such domains will be something that employers are going to look at in differentiating potential candidates for job placements. So if you have the knowledge or can get it now, it may prove useful in the future.

I agree that ML knowledge useful and differentiating, but I don't think it justifies the level of dread of the parent comment.

Ha! I've been in the same position, reading what I can and looking out for a good application to try it on.

I saw a quote around here like "Data has gravity; the computation will move to it." (I think this was Nadella at Build.) This is true for ML in a way that was not true for previous computer applications (I mean, mainstream applications of computers). You could write an image filter or an audio effect (or a spreadsheet, or a mail client, etc) with no data at all (or a small amount of ad-hoc test data). But this suggests a subversion in the problem-finding process. Instead of thinking about what problems algorithms could solve, you have to look for large corpuses of data. It does feel like a paradigm shift that should engage our self-interest, if nothing else.

> looking out for a good application to try it on.

Self driving vehicles - or other robotics localization and similar problems. Those are fun, at least, I find them fun.

I've also considered stuff in the graphics arts - like the various "deep dreaming" things, or the various ML filtering of images (turning a painting into a "photo" for instance); there's some interesting possibilities there.

But for me - robots are where it's at. The idea of making a machine that can "do its own thing" intelligently - well, that has fascinated me ever since I was a small child and watched the original 1970s "Buck Rogers" series with my dad.

I find this from Bezos to be a good tldr for programmers about ML/AI:

"Over the past decades computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder."

Three years back, I used to sweat about not knowing Angular and Rails. They were the rage back then. I still don't know them, and I can see how React has become a thing now. It's hard to put in words how misplaced fear of "ML-taking-our-jobs" is when you take few moments to look beneath the iceberg and see how awful is the software that world depends upon.

As a guy who's going into ML, I'm worried about AI and products like DataRobot clobbering the market for data scientists. DataRobot gets glowing reviews for automating the setup of dozens of ML models, replacing weeks of effort by teams of ML specialists. It's very expensive though.

I used to feel the same way in regards to a lot of technology. I think this is the problem with trying to be a generalist is it seems like you need to be an expert at everything. Machine learning might be a great area to specialize in these days but if it is not for you I wouldn't stress out about it.

There are also a lot of resources for ML that don't require diving off the deep end such as tutorials/intro classes on coursera. In my opinion also picking what you learn helps as well. Trying to skip straight to neural networks might not be the best place to start if you don't know anything about linear regressions or classifiers.


I find it frustratingly hard to learn ML on my own, when I don't have a real problem to solve, just exercises.

You could participate in one of the Kaggle competitions. I personally have found it extremely useful to improve my ML skills.

Hack your car! Make it drive itself!

Or - go smaller (probably safer, too!) - make a self-driving RC car!

Like encryption, the math is necessary when you build it, not when you use it. You can go quite far by just treating it as a black box.

Well, yes, until you are stumbled upon something that is not really working as you may expect and then you need to find out why... ML, without understanding the math can be a very strange place to roam around. I will suggest wrapping your head around basic math at least. The first chapter of the deep learning book is very nice in fact if you want to do this. http://www.deeplearningbook.org/

Any good resources for how to use it as a black box for online services?

I don't know of a teaching resource, but I have found TensorFlow and Keras to be like Lego when it comes to ML; I am taking the Udacity Self-Driving Car Engineer nanodegree, and in the first term we used both (after having implemented our own simple ANN library in Python - to understand the basics) - I found it to be so much simpler to implement things with (especially neural networks for deep learning).

I would not have expected to find my same definition so common as to be the top comment.

I feel with you :D

I'm not using it at this moment, but in a couple of weeks I'll have a newborn, and I was thinking of taking pictures of him every time he is upset, and then tagging the photo with what ends up being the resolution (feed, change, or nap) and seeing if I could build a classifier that could figure out what he needs just from a picture of his angry face.

Semi-related, but if anyone is using the AWS tools for their AI, please ping me. I'm looking for a speaker for a community event in SF in June. (contact info in profile)

Suggestion from a new father: you'll have a better chance analyzing your newborn's voice. They have very little control over their facial expression, but there might be some useful information in their crying. Good luck with your new projects!

Yeah I agree, you figure out very quickly by the voice/cry what they want.

I was actually planning to do both and see which, if either, produced any useful response. :)

:) I really like the idea and get ready for the excitement of having a family.

I think the problem you'll end up having is that in some/most cases you are never really certain what the problem was.

Feeding might make the crying stop but you have no way of knowing if the reason for the crying was hunger. Similar problems for sleep, diaper changing, etc.

Every time the crying stops you can never be certain if you just made them temporarily forget about the root problem or if you did properly identify the root cause.

TL/DR my hypothesis after raising my own kids, is that the data will be almost all noise with very little perceptible signal.

I already have a kid, so I know how noisy (pun intended) the data can be. I honestly don't expect to work at all, but it will be fun to try!

I seem to remember nappy (diaper) changes were easy to predict, they were usually proceeded by a bright red face.

Such a cool use case. I'd love to see the results if you could build a classifier for it, though I do wonder how quickly it could identify the symptom.

Hey I'm having a newborn as well and wouldn't mind contributing to the sample set if you make a web form or API for me to post to.

> I was thinking of taking pictures of him every time he is upset, and then tagging the photo

You won't be able to have anywhere near enough data for that.

Unless somewhere out there is a labeled dataset of newborn baby faces, with around 20-30,000 unique samples (the more data, the better).

Your problem isn't really that difficult; it could probably be done with a very simple neural network a few hours of training. If you wanted to be robust, you could hack the ImageNet CNN into something to do this, too.

I would suggest, though, while what you are doing sounds fun - enjoy being a father and deal with this the old fashioned way. It'll probably be more satisfying.

/heck, what do I know - I don't have kids...

imagine a mobile app that parents can use to identify what their angry new born needs, brilliant :)

We analyze images of skin lesions and give a probability whether they are malignant melanoma. The analysis is performed on-board on mobile phones.

Thats super cool. We built a simmilar simple app on a Hackathon. We are getting 90% Accuracy nowadays. We opensourced it at https://github.com/kuboris/smartoscope We should add an App for download as soon as our Android developer fixes some minor bugs. The plan is to make it open to use any model so people can train and provide their own. Any advice ? :)

do you worry about the weights of your net being stolen off the app through reverse engineering?

It is a bit of concern, but we are not out in the wild yet so we have time to look at ways to protect the IP.

Hey, sounds interesting. Would you share the name of your company/ startup?

At this point, it is university research. The work has been published, and we are in clinical trials in prep for commercialization.

Is this just for melanoma, or has it expanded into pap smear and bronch territory? My mother's a cytotech and every couple of years somebody claims they're going to replace her with a shell script. Hasn't happened yet, though.

We are expanding it to a number of skin and ocular diseases.

Do you mind sharing a link to the research?

Detecting what pornstars, actions, production and categories appear in 5mil+ adult videos.

Solving the important problems ;)

Is hiring ML talent in that industry any more difficult than generally?

It appears to be not too difficult. Many people are interested in the quantity of data to work with.

of course they are

Is this somewhere online? Github, site or what?

OP works for Pornhub, so I suspect it is on their site.

Classifying future elite race horses, before they've raced.[1][2]

I didn't develop the original approach but am involved in working to improve the predictive outcomes.

[1] https://www.breezeupiq.com

[2] http://www.performancegenetics.com

I work for a large semiconductor manufacturer. In our R&D operations we require a large volume of newly designed parts to be purchased from suppliers. These parts are not off the shelf but are all custom created by us. Once a design is approved we try to get them into the lab the fastest way possible so the earlier we can get the designs quoted with a supplier, the better.

I use machine learning to predict if a newly designed part will have to be purchased soon to go in the lab. That prediction saves me about 3 weeks of lead time to get it to a supplier. Before machine learning, the first time the purchasing department knew that it will need to be purchased is when the designer filled out a form. Now they know as soon as he submits his design and can get a head start.

That sounds really interesting. What sort of data do you use to do this?

Burning CPU cycles on an opaque, needlessly complicated model created by a canny professor who realized that putting ML in grant proposals gets you 5x the cash compared to regression.

Unfortunately I bet this will be the case for the next few years until funding agencies figure out not everything has to be done with machine learning.

I run a website called Hey Am I Fat. We've trained a classifier based off a simple softmax regression in TensorFlow that tries to detect whether people are fat or not. If you submit a picture of yourself to HeyAmIFat.com we can tell you if you're fat or not within minutes.

A nice email / phone capture portal you have there.

Does anybody know what the value of such lists are? Having no other information with them I can't imagine it is much.

I don't store this information after we send the response, but that would be one way to make money if you don't care about being an asshole to your users. This is mostly a joke and I don't make any money from it.

Likely fair enough - I'm just suspicious of anything that takes contact details when it could just hold the late open and let me know when it's done

... Well, you know if they are fat or not. So you can market to people that now want to get fit, with their new found realization that they are fat, or people who want swim suits now knowing that they are fit.

i think you would get more traction if you had called it doILookFat.com or doesItMakeMyButtLookBig.com . Then automatically find and link to products ( jeans/hairstyle) ect that can improve the situation. make money from product referrals.

Hahaha, the website was just made as a joke but you're probably right that it could be optimized to get more traction...

I don't think it would be that much of an asshole to sell the data so that a user starts to see ads for their local gym in their Facebook feed all of the sudden.

You'd be making the world a better place by encouraging the act of exercising. And the lower cost of healthcare. It's pretty much up to you now!

Gyms would be interested in this demographic in January. The highest bidder during the other 11 months would be junk food delivery.

We use it for automated seizure detection in EEG data. Most seizures have no clinical manifestation (e.g., shaking or trembling), and if a seizure goes on too long (~30 minutes) the patient can suffer permanent brain damage or even die. This is particularly problematic for patients in the ICU since they tend to seize silently more often and it isn't easy to tell that they are seizing without having a neurologist examine the EEG data.

We also use machine learning for detecting other features of EEG data and removing artifacts. (Eye blinks, for example, cause big artifacts.)

Replacing project managers and introducing reward/score based AI bots that keep track of progress of individuals as function of stories they are working on, and these bots send out reminders (slack chat bots perhaps). This will can help improve productivity at large firms with multiple (and sometimes unnecessary) levels of manager hierarchies such as IBM, Dell, HP, oil companies etc. Let's be honest a certain programmer has a pattern with which he/she works. Time spent on a problem, number of initial bugs, number of commits, time spent vs difficulty of a certain problem and so forth. We have all this data to quantify and train a bot. I mean why not :)

This seems potentially terrifying if it's not able to understand the subtleties of what 'value' means on a project.

That sounds a lot like the workflow in Dave Eggers' "The Circle". I think if you read the book you'd be less likely to implement anything like this. :P

I have been meaning to read that book recommended by a friend working in tech. Apparently there's a movie out now too :)

That would replace less than 1% of what my PM does. But I work for a smaller company so perhaps I don't understand the use case.

I really wish more quantitative analysis was done in this portion of the field, thank you.

(In progress) Automate e-commerce business completely, including customer/supplier e-mail responses, voice communication, identifying data structure of supplier feeds and performing automated conversion between formats for integration, sales estimation, hidden platform variables identification (how to win a buy box?), price competition etc. Tech based on Python, DL (Keras), RNN, GAN, SVM, decision trees etc.

Checking for issues with loan photos as they are posted. It's been interesting to find that many ML platforms and libraries don't do a great job of recognizing humans with dark skin. Hopefully, by adding Kiva's 1 mil+ images to the mix we can give them a better basis point to learn from. Hoping to use it for a lot of other things as well, but we're still getting to know how to leverage it.

That's a great insight to use Kiva as training data, because I agree that not recognizing dark skinned people is a huge problem with current facial recognition datasets.

This will get worse as more retail establishments use image recognition in their every day operation.

Building a smart home security system, so you can be notified when your friends have arrived, or if someone is trying to break into your home. Machine learning is very useful here, as you can train the system to recognise what your friends look like, or what a break-in might look like.

How do you get the data for training how a break-in looks like?

Couldn't you just train it on genuine visitors and then use the probability of it being a genuine visitor to determine whether it is a break in?

I imagine only training on genuine visitors would be tricky with any traditional classification approach. Even having a 90/10% split of positive/negative training data is difficult since a lot of classifiers will just degrade to a majority vote.

Maybe a Restricted Boltzmann Machine or something similar?

I'm guessing this would also use data for time of day, and if anyone is at home

possibly some kind of anomaly detection, but I'm not sure how you would model the data.

How do you collect training data for break-ins?

Did you put an add out on craigslists requesting for people to attempt it or do you have some criminal connections?

I work for a large multi-hospital corporation.

  We use machine learning to predict the risk of a patient, after an inpatient admission, being readmitted in 30 days post discharge.  If the risk is high, we proactively put process in place to reduce the readmission risk.

Can you elaborate a little bit more on the process?

Can you reveal which corporation?

Forecast loan payment defaults for 90 days or longer, 3 months in advance using a tree based model.

What's your training set?

past loan performance data

Lending Club?

hedge fund

How large of a dataset are you looking at? I'm curious as to whether Lending Club's publicly available data will be lengthy enough to get meaningful accurate results for general 5-10 year loan predictions. Though it is nice that a economic crash happened to occur in the middle.

of course more data is always better, but there are plenty of flexible models that can learn form small data, so just try the lending club data and if your model isn't learning anything choose a more flexible one...

Very interesting stuff. Is it just you for personal interest?


I am just getting into it, but I am primarily looking at it because I have about 8 months worth of home automation sensor data stored in a MySQL database thanks to my home automation software (thanks home-assistant!) and I want to see if ML can look at patterns from that sensor data and infer what to do in my apartment.

For instance when I get home (gps sensor) and when I open my front door (door binary sensor), can a system understand that I typically turn my TV on when that happens?

Automatically answering HR queries worded in different ways

Different ways of wording automated HR answers.

I use Stanford NER to extract book titles and authors from HN comments. Previous thread: https://news.ycombinator.com/item?id=14202557

How do you disambiguate similar/identical titles?

Did you evaluate other NER against CoreNLP? e.g. how does spacy compare?

EDIT: answering my own question -- research here [1] ranks CoreNLP as the winner among CoreNLP, NLTK, spaCy, Lingpipe but AFAICT spaCy is competitive.

[1] https://aclweb.org/anthology/W/W16/W16-2703.pdf

I found spacy's documentation lacking in details on how to train NER from scratch (at least I couldn't make sense of it). That's why I decided to stick with Stanford's NER.

Regarding disambiguation: People usually mention books with their respective authors, so it takes one query against the local database to check whether entity is indeed a particular book by a particular author. When no author is mentioned, I check parent comment (if any) to find if the given book is mentioned there. Well, and if comment turns out to be standalone then I query Goodreads / Google Books API. Given books with identical titles, these APIs return the most popular option.

I'm using it to isolate acapellas from music for making mashups/remixes. Not quite "real world" yet, since even my most recent models aren't perfectly reliable, but it's getting there.

Could you share what sort of model you're using? Something WaveNet inspired?

This is very cool

Predicting mistakes made by humans during audio/video transcription, giving them auto-complete suggestions, classifying severity of changes, predicting the difficulty level based on audio characteristics.

I've been working with clustering algorithms to create a personal news aggregator from sitemaps and RSS feeds, using an NLP library (Stanford CoreNLP) to do feature extraction. [1] I'd like to extend it to classify articles into different categories so I could filter things like transcripts, opinion columns, web scrapes gone wrong, but I'm not sure how to set that up yet.

[1] Very rough prototype: https://confabulator.io/newsclustering/

Hi writeslowly,

a friend and I are working on a tech investment news aggregator (manually curated right now -- http://circulaat.com), and I think your clustering algorithm might be really useful for it. I can't figure out how to DM you via the HackerNews system but would love to get in contact

I've added my contact information to my profile. Feel free to email me if you'd like to talk

Detecting moments of user frustration in web apps (ie. rage clicks, shaking mouse, refreshing the page) https://logrocket.com

Fraud detection (credit cards, online banking)

Detecting illegal mining activity via satellite

network intrusion (packet analysis)

Predicting machine failure

Stock market trend prediction (up/down)

Simbox fraud

Finding similar parts for large vehicles to help engineers

I can keep going, but this is a sample I've seen over the years. We mainly do work in time series data for enterprise basically mostly "things google doesn't do". A bulk of what we do doesn't appear in research papers because it's not "GAN art" eg: "the current hype".

I use some basic tf-idf machine learning for clustering similar articles for http://tracket.com

At Transloadit we're going to use it for predicting how many machines we'll need to encode incoming video and audio files in near realtime, and then scaling that capacity up in parallel, as files are still being uploaded or imported. Already half a year in the making, but in our early tests it seems we can outperform the custom algorithms we had in place for this, by a lot.

Detecting malware. https://www.barkly.com/product

Not ML per se but ANN have proven to be quite useful in tumor growth modeling. I am using them to model cancerous cells' genotype.

Classification of Contenets Segmentation of MEdical Images Neural Networks can also be used in DeNoising, DeHAzing Super-Resolution

* financial forecasting

* resource demand forecasting

* balancing distributed energy resource loads

* automating metadata generation

* categorizing 'events'

* finding anomalies in satellite data

Most important:

* running bots on my xbox

Use it in my software engineer screening service to determine if a user should answer more questions on a particular topic and dive deeper into a particular subject. Using Azure ML, been working out well so far. Went from messing around for days with other ML libraries to up and running within 20 minutes on Azure.

Hotdog/Not Hotdog

We use machine learning for parsing jobs and resumes and then to match them with each other.

To see the results - post your resume here: https://www.postjobfree.com/post-resume

We use ml to identify users who might try to commit fraud when purchasing things from our site, as well as prevent identify fraud when users verify their ID with us. We are also looking into taking advantage of adaptive design to improve ux and conversion.

Interned at a KYC/KYB company last summer and its all manual there. They've got people checking the submitted forms/scans manually. How good are you getting because I truly believe they can all be replaced very quickly! Talk to me :)

There's research about the high frequency of certain digits when bogus accounts are created. I've also done work with KBAs, geo's, and name origins with interesting results. Very exciting area!

-Customer analysis / clustering / behavior visualization

-Financials analysis / predictions

-Image classification

bump on this one. - Sales Promotion/Offer Prediction - Customer Analysis / Clustering / Site Content Display

I've got a client that's wrangling their data into a place where I can do some targeting for upselling, end-to-end - I'm really looking forward to that. Lots of value on the table, IMO, and really interesting and unusual datasets.

Automatic text summarization.

Mind sharing which algorithm you are using?

I'm a complete novice in ML and I've been looking for something like this for awhile - would you mind sharing your process on how you did this?

Predicting day-to-day harvest yields of perishable fruit, like Strawberries.

I find it useful for learning from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

At work we're using it to battle fraud, we're also using it to recommend apps and themes to our users and we're using it internally for various forecasting tasks.

Evolving mobile games to discover variants that earn 5 star reviews. Using https://improve.ai

At my work we use it to do large-scale analysis of remote sensing data, in particular segmentation of satellite imagery to provide land monitoring services.

image processing and effects using deep learning: http://somatic.io

improving relevancy of search results - http://www.coveo.com/en/platform/machine-learning

Multilingual named entity recognition and disambiguation

predicting the evolution of running cases, and the arrival of new cases, for capacity planning in a call center

Do you believe ML is adding value there (instead of a standard prediction/forecasting model)?

Recommender system using visual features

Trying to predict horse racing

Me too :-)

What do you use ? I found out that svc trained with proper data leads to satisfying perf for predicting the four first horses (in disorder though)

Is that working?

I am still lacking data to test it more thoroughly, but as of now it is promising enough for me to keep working on it. And it's a fun way to learn. What do you work on ?

Not me, but a friend of mine heads a company that uses drones to identify defects / damage on containers. He uses ML to process the images captured by the drones, which is fed through an ML processor to detect such damages & flag them for insurance claims purposes.

By containers, do you mean shipping containers? Sounds interesting!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact