Hacker News new | comments | ask | show | jobs | submit login
Facebook Field Guide to Machine Learning – video series (fb.com)
552 points by tosh 6 months ago | hide | past | web | favorite | 63 comments

Facebook's guide comes a couple weeks after (EDIT: FB's guide was originally published on May 7th, so it's actually a few months old) Google published their ML guides (https://news.ycombinator.com/item?id=17595611).

That's not a bad thing; the more guides from reputable sources, the better. Just don't read them and say you're an ML expert afterwards.

>> Just don't read them and say you're an ML expert afterwards.

Too late. I get like 2 random LinkedIn invites a day from randos claiming to be simultaneously "AI" and Blockchain experts. You look at their profile and they have some codecamp course in React. I dont get it, do these people get jobs this way?

> I dont get it, do these people get jobs this way?


I don't think anyone's said it better than Joel Spolsky in his 2006 article on why bad developers are over-represented on the job market:


I wonder this too. As a relatively young professional, I regularly have folks add me who are my age or younger (mid 20's), and have a portfolio of experience that reads as if they are a Fortune 500 CEO. Combine that with having 5000+ "connections", a slew of unverified and uncorroborated skills, and LinkedIn continues to feel frivolous.

Yet most jobs listing these days require a link to your profile, so it's clear there's some value in having a "complete" profile.

There was an article on HN a few weeks ago about a guy making a game out of LinkedIn. In that he made a fake account, filled it with buzzword skills, degrees, and colleges. Started adding every notable person he could. Then it became a domino affect. Once he had some influential people as connections, other people started adding him, thinking he was then a big shot. Then once he got a ton of influential connections, recruiters from major companies started reaching out to him with job offers and interview requests.

Ultimately LinkedIn does have a good business use, but it can also be gamed pretty hard.

EDIT: Found the link. It didn't originate on HN, but I saw it from a post about it on HN.


Thank you for sharing, Fun read

No, they don't really, which is why they're always out and about making noise. The good people just keep getting promotions and salary increases and don't spend a lot of their time "networking." When's the last time you got a LinkedIn spam from Jeff Dean looking for a job?

I had to block one because they kept asking me to endorse and recommend them for shit that I don't even know about.

I'm like dude, when did you ever worked on "designing collimation towers" for NASA? You are like 19. Unless you were like a child prodigy, which I doubt, since you are asking me for an endorsement.

Well they do get some jobs, they just don't happen to keep them for very long...

They don't, otherwise they wouldn't have to spam LinkedIn.

Can't the inverse also apply?

If it didn't work they wouldn't spam LinkedIn.

I feel like the timing is more related to this: https://news.ycombinator.com/item?id=17706997

It may be just a coincidence but I have started noticing that very often when a company has a post that criticizes it for some behavior trending another post immediately follows showing the "generosity" of the same company showcasing some OSS or a blog post about some popular technological subject.

It got me thinking that maybe those companies have bots ready to upvote nice things about them when some criticism surfaces.

I've seen this happen with both Google and Microsoft so far.

I think you will find that large companies attract criticism continually, and release open source software continually.

Well then. Your 2 cents on how to become one after going through those videos? I am genuinely asking. Looking for a career in it. At the beginning now. Read a fair bit, basics videos and all that online. Nothing formal. Can do a bit of c, c++.

I strongly recommend doing personal projects with ML, particularly projects/datasets that haven't been done before. (i.e. not that Titanic dataset or sentiment analysis of Trump's tweets)

Work through a ton of Coursera courses and implement your own projects. If you don't know what projects you could do, you can use Kaggle to get ideas and datasets.

The MOOC courses are "just" there to teach you the basics and background. Projects are absolutely necessary for you, because without a degree you will only be able to convince prospective employers based on having a lot of practical experience.

Like anything... do it. If you have the academic background get an entry level job. If not, you'll have to build things using ML.

Of note, there aren't really any entry-level ML jobs (Data Analyst isn't the same, although would be a valid stepping stone).

Sure there are.

I get freelancers to do data preprocessing for me frequently, and sometimes put it through some off the shelf model.

Generally it's hard to find the right people for this, but that isn't exactly unique.

> Just don't read them and say you're an ML expert afterwards.

"ML expert" is the new "web developer"

As someone going through a Master's in CS to get into ML, this makes me a little sad..

(though I do understand what you're getting at)

Actually, the Google guides were initially published in February (and publicly available before that from Martin Zinkevich since 2016!). But I agree, it's great to see more resources around these best practices.


I read the summaries and skimmed through the videos but couldn't find anything on ethics, fairness, transparency or data protection. Is there anything?

If not, I really think that these topics should be addressed (even if only briefly) in any "field guide" to machine learning. Especially FB should give those some more attention after their numerous scandals and "mishaps".

Now, before you downvote this or say "but this is just about the methods and tools" please take a moment to think about how much power ML models can have over people when deployed at FB scale.

I would like to see at least a (brief) discussion of the following topics in such guides:

- Data Provenance & Data Ethics (Can I use this? Should I?)

- Data Protection & Security (How can I protect personal information when doing ML?)

- Model Fairness (How can I ensure my model is fair and does not discriminate?)

- Model Transparency (How can I explain the results of the model to users and colleagues?)

This has become a new CSR-type trend (esp. GOOG vs FB). Steps: * Open Source ML Tools and Press covers it

* Create a video Series about ML and a blog post about it

Thus make people believe:

* You are a leader in ML

* You care about democratising ML or AI whatever

* You care about sharing knowledge

* You care about Open Source

Look, I don't much care for Facebook as a company or as a product. FAIR is a bit separate and does a lot of pure research. FAIR has put out a lot of good work, tools (it supports PyTorch), and has a lot of good people working there with good intentions.

Not everyone that works for a company that a given person dislikes is "evil" or acting malevolently. No company is homogeneous.

Yep, FAIR is good for the ML community. They have a number of open source projects that are useful.

I don't want to Godwin this, but it totally could be Godwinned.

In this case, a tutorial from Facebook carries more informative weight than a Medium thought piece (and Facebook legitimately is a leader in ML).

> and Facebook legitimately is a leader in ML

I am curious how you came to that conclusion. I would think that Google is a leader in ML - unless I am missing something.

- Facebook employs some of the most accomplished and widely respected researchers in artificial intelligence, such as Yann LeCunn [1]

- Every year, researchers from the Facebook Artificial Intelligence Research Lab (FAIR) present their research at top conferences such as NIPS [2]

- FAIR also produces some of the most capable and widely used open source software in machine learning, such as Torch and Caffe2 [3]

If you'd like to see how Facebook uses machine learning, take a look through its blog posts related to recruiting [4].


1. https://research.fb.com/people/

2. https://research.fb.com/facebook-research-at-nips-2017/

3. https://research.fb.com/downloads/

4. https://www.facebook.com/careers/life/machine-learning-at-fa...

Thanks for the info, I didn't know about FAIR or that pytorch came from facebook research.

Also note that FBs open source ML projects are generally easier to use than Google's.

Things like FastText are completely standalone which makes them really easy to integrate into other things.

I've been working with FastText almost daily and I completely agree. The code rocks and it is really easy to quickly do your own modifications i.e. the source is clean and well-structured.

> "a leader in ML"

not /the/ (only) leader in ML.

Google, FB and several of the chinese tech conglomerates are leaders within ML.

I am not trying to be a fan boy here. I am genuinely curious. What makes someone a leader in ML applications.

Google has more than studies and blog post about machine learning, apart from research and opensource ML applications, they are heavily dependent on machine learning in almost all their major product that I know of. For the most part they extremely good at it.

What and how does FB use machine learning in their product?

FB uses ML to do image recognition, face recognition, spam filtering and news ranking. They probably have many many more applications.

PyTorch by itself would be enough to make FAIR a leader. PyTorch is being used in many research papers and was/is a leader in "eager execution" mode for neural nets. But there are many other things, such as fastText (see more projects here https://github.com/facebookresearch).

Machine learning is basically the product that FaceBook actually sells. At the end of the day, they are an advertisement delivery service. They get paid based on user interaction with ads and sponsored content. The way they determine what ads to show to whom and when is all ML.

On the user side of things, they use it to suggest friends, detect/recognize faces in pictures, etc.

Picking which items to show you in your newsfeed; making recommendations for places based on your friends; automatically generating blind-friendly captions for uploaded photos (read the hover text for some of your friends’ photos); suggesting events you might be interested in; friend suggestions; ads.

One example would be the automatic face detection of people that were not originally tagged in uploaded photos.

The newsfeed is one of the most powerful AIs in the world. It is constantly learning, from engagement and ad spend, when to serve various types of content to various types of people. A glorious paperclip optimizer that 2 billion people feed data to daily.

> Thus make people believe ... you care

Corporations aren't people. They don't have minds (you can say that it's an AI made out of people instead of transistors), and using concepts like "care", initially created for describing human's mind, is a leaky abstraction.

What I care about is what corporations do. And when they actually release valuable resources for free, we can just acknowledge it instead of trying to guess what's going on in the mind of an entity that doesn't have a mind.

Does anyone else find it old that the best engineers are basically selling ads

Most engineers at large Ad-supported companies are not working on ads in any meaningful way.

Ads pay the bills, but only to the extent that you have something people want, to put the ads next to, and most of the engineers are there trying to make something people want.

My brother, who joined facebook during the whole Cambridge Analytica fiasco, said that he had "sold his soul for more money".

Was it worth it?

I've promoted a lot of good products that people wouldn't have known about without Facebook ads.

If the technology used to sell more ads can be also used to cure cancer, it's fine by me.


Recommendation engines are used in bio research.

I've always felt that same way too. Lots of really smart people doing what they can to get more people to click on ads ... the source of 99% of their revenue. Seems a little sad. Same but less so with Google as at least they make $ from their cloud product ... whose development is funded by ad clicks.

Not particularly. That's where the money is. It's the same as how a lot of brilliant math people go into quantitative finance.

Not at all. Connecting the producers and consumers is a field that's been changing in the most radical way for the last 15 years, and whole humanity has reaped enormous benifits from it. A lot of people I personally know started making jewellery, cakes, and other small artisanal items and selling them on Instagram. Quite a lot of others - importing small hiche items from China. Most of my friends started buying some items like that.

All these people got their lives improved thanks to engineers that enabled worked on selling ads. There's a popular stereotype that "ads" is some evil capitalist thing designed to manipulate you to buy some shit you don't need. Looking at the world around me, I see something different.

I'm half way through Google's crash course on ML which I think is helpful. As a machine learning field guide, I also find Andrew Ng's short paper series Machine Learning Yearning helpful. I watched the first FB video and didn't feel like they added anything particularly interesting. It's almost like they feel obliged to put something out there under the FB name.

I am an electrical engineer interested in Machine Learning applications.

So far, based on the myriad of literature I have read about Machine Learning stuff, a lot of results and their quality depends on the type of network, training, error correction methods, etc.

Say I learn how to make computers build and train models, but is that enough to get good results? Are there any resources that will guide me into choosing a good topology or Network parameters (say like number of hidden layers, etc)

How do developers who use Machine Learning in production environments confidently come to a particular network topology/parameter set for a given class of problem?

Do Caltech's free online course "Learning from data" to learn ML in a principled way. You will get the answers to all those questions and then some. You basically need to learn some basic theory and concepts (linear models, non linear transformations, vc dimension, learning curves, regularization, validation, etc) much more than tools first to deal with those questions. The professor in the video lectures is excellent

ML is slowly approaching the blockchain bubble I think. You have ALOT of people just chucking random bs data into insert popular model and hoping for the best.

More practically Facebooks ads run off ML at this point meaning you need to feed them a ton of data to get decent results. It's good to learn their take on it for advertisers.

As an advertiser, this! Lookalikes and optimizing for conversions are the key to effective FB advertising at scale.

Is it just me or does it feel like these are very generic ideas floated in the videos, and that concrete examples would have helped a lot more?

Interesting how this kind of news always drops whenever there is bad press about Facebook.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact