Hacker News new | past | comments | ask | show | jobs | submit login
Google Brain Residency (tinyclouds.org)
337 points by aleksi on June 7, 2017 | hide | past | web | favorite | 55 comments



I enjoyed the section on negative examples, and the things that don't work out. Makes me feel a lot better about the ton of things i've tried which didn't quite pan out the way I hoped.


Interesting to see that the Google Brain Residency isn't that much more advanced than other graduate work or fellowship programs out there.

I disagree with him about needing a "Rails for deep learning," I think it's quite fine now in practice, especially since DL code is typically significantly less than a web app. A file for data pipeline, a file for your model, and a file for training and/or inference. Not sure it really needs to be much more complicated than that.


I actually strongly agree with him, In fact the progress in Deep Learning has stalled due to inefficiencies in sharing data and code between researchers. Today to replicate a study you have to spend several days downloading data, making sure all dependencies are installed etc. This is worsened by lack of standard formats for even simple things such as representing 2D bounding box on an image.

People forget that the reason Deep Learning came to prominence was engineering that enabled use of GPU's (Thanks to Alex K.) and large dataset from ImageNet competition (Thanks to Fei-Fei Li).

Importance of software engineering in moving the field forward is often under-appreciated. This blogpost beautifully illustrates several instances where a great implementation made it easier for researcher to speed up experimentation and lead to breakthroughs in Computer Vision. [1]

I am building Deep Video Analytics which aims to become the Rail/Django/MySQL/(favorite analogy) of Visual data Analytics [2,3].

[1] http://www.computervisionblog.com/2015/01/from-feature-descr...

[2] https://github.com/AKSHAYUBHAT/DeepVideoAnalytics/

[3] http://www.deepvideoanalytics.com/


I also strongly agree with him, and you. The work of annotating, curating, prepping, etc, is huge. Every industry could split off 10% of its workforce just to work on prepping their data for ingestion, and then designing the interfaces to actually use the inferences the algorithms emit.


Well, my professional experience has been completely different, maybe hardcore research in images is different. At any rate isn't TF already the Rails/ magical framework of DL, in some sense?


I don't know...

There's plenty of horrible code out there, particularly tensorflow code, which matches all the classical metrics for 'spaghetti code'.

The complexity of the implementation might be hidden away by the framework, but there's no excuse for writing massive hundred+ line functions that do multiple things with copy-pasted code blocks.

That's just bad code, in ML or not.


I'm not an expert, but Keras seems much closer than TF.


They're complementary. Keras typically uses TF as a back-end, letting users model standard architectures quickly, while TF lets you express new and complex graphs of computation.


I'm aware, I'm using Theano as my backend. I just mean that Keras seems more analogous to Rails than TF - TF would be Ruby in that analogy.


Interesting to see that the Google Brain Residency isn't that much more advanced than other graduate work or fellowship programs out there.

I think they accepted residents from different backgrounds and different levels of experience.


> Interesting to see that the Google Brain Residency isn't that much more advanced than other graduate work or fellowship programs out there.

Why would you assume that a temp job at an advertising company would be "more advanced" than work conducted at an actual research institution?


Even though that's how it makes a lion's share of its revenue, I don't think anyone actually considers Google just an advertising company.


Maybe they don't "consider" Google an advertising company, but that's what it is.

Comcast isn't just a cable company these days, but it is still a cable company.


That advertising company did create software that beat the best Go player in the world a decade earlier than estimates.

They also created the most power NN processing silicon that uses far less power than anyone else. A pod is doing 11.5 quadrillion FLOPs is pretty impressive. Just sayin.


Sure, stipulated. Google has demonstrated conclusively that if you throw enormous amounts of money and compute infrastructure at problems, you can make progress. Those are still engineering developments, not the kind of groundbreaking research that happens at universities or national labs.

It's not even the kind of fundamental research that places like Xerox PARC, IBM, or Bell Labs used to invest in.

Frankly, it's kind of insulting to imply that a one-year Google residency would be "more advanced" than years-long graduate programs at places like MIT, Caltech or CMU.


I don't really see the point in prime classification, I feel like you're never gonna get better than doing a deterministic sieve algorithm. Even if it did do quite well in terms of accuracy, probabilistic models aren't 100% correct so you still have to check for primality deterministically.

However the twin prime factorizer idea was interesting and could potentially lead to some factorization speedups (based on heuristics) if done correctly.

Either way though, prime number distribution is quite tricky - we should probably be looking in other number bases as well rather than just base 10.


> Even if it did do quite well in terms of accuracy, probabilistic models aren't 100% correct so you still have to check for primality deterministically.

Deterministic primality tests are rarely used in practice -- they're too slow, and the probabilistic tests have such a high accuracy that it doesn't matter.


Either way though, prime number distribution is quite tricky - we should probably be looking in other number bases as well rather than just base 10.

This interested me so I went looking for how primes would work in other bases, this answer implies that primes are the same regardless of base:

A prime is a prime no matter which base you use to represent it. On the surface one might think that in Hex you would have 35 = 15 as "usual," but it really turns out that 35 = F.

The example 21 doesn't work too well because it is not prime. The base ten number 37 is better, because it is prime, but its Hex representation is 25, which sort of looks non-prime. Hex 25 is not, however, repeat not, 5 squared.

Okay, enough for examples. The fact of being prime or composite is just a property of the number itself, regardless of the way you write it. 15 and F and Roman numeral XV all mean the number, which is 3 times 5, so it is composite. That is the way it is for all numbers, in the sense that if a base ten number N has factors, you can represent those factors in Hex and their product will be the number N in Hex.

Relating to your question about base 13, the base ten number 13 will be represented as "10" in that system, but "10" will still be a prime, because you cannot find two numbers other than 1 and "10" that will multiply together to make "10".

I hope this helps you think about primes in other bases.

[1] http://mathforum.org/library/drmath/view/55880.html


"this answer implies that primes are the same regardless of base"

Right. If you have some quantity of apples, then can you lay them out in a regular rectangular-shaped grid (more than 1 apple wide)? If not, the quantity is prime. You can count the apples in any base you like.


This is the far more intuitive answer I use when explaining to people.


> I believe my motivating demo will be achieved some day soon—you will watch Charlie Chaplin in 4K resolution and it will be indistinguishable from a modern movie.

Wow, where does it stop, then? Can we upsample modern 4k films? Or deeper dynamic ranges? Stereo projections from mono images?

I suppose that the limiting factor is the resolution/color depth/perspective channels of the training images?


It will stop when you hand a computer a novel and a cast as well as the style of your favorite director and then it will produce the film. Think of it as the ultimate in video compression.


> It will stop when you hand a computer a __(representation)__ and an __(embodiment)__ as well as the style of your favorite __(artist)__ and then it will produce the __(artwork)__. Think of it as the ultimate in __(*)__.


Considering how hard it is to pick something to watch on Netflix, I think it will stop when the computer pre-empts what you want to see - 100 channels of interesting generated content!


> 100 channels of interesting generated content

If I have to choose a channel, or even choose whether to turn the TV on or off, there's still room for improvement.


And there still won't be anything on TV!


> I believe my motivating demo will be achieved some day soon—you will watch Charlie Chaplin in 4K resolution and it will be indistinguishable from a modern movie.

How do these sorts of techniques perform with respect to temporal coherence?


Since this would probably be more like it is inferring the signal by correlating movies that have similar elements you would need an dataset of 8k signals / dynamic ranges / stereo projections and then you can attempt to perform the up sampling. You were right in your last statement.


I don't see why you'd need a dataset of 8k images in order to upsample 4k to 8k, as long as there are 4k images of that object from at least twice as close as you're shooting from.

(Actually I think twice as close might be 16k, but you get the point...)


4k refers to the number of lines, so twice as close is 8k.


All the "what happened to ryan dahl" questions have been answered.


For context - the author of this article, Ryan Dahl, is also the creator of Node.js.


Thank you, that explains the "I uprooted myself from Brooklyn and moved, yet again, to the Bay Area in pursuit of technology." with a link to node.js in there.


This is interesting:

The program invites two dozen people, with varying backgrounds in ML

... "with varying backgrounds in ML" -- does that mean that relative beginners have a chance of being accepted?

(I have a personal stake in this! I'm an academic researcher who is getting into ML and working on legal applications, and I'd totally apply for this if I thought you didn't have to basically be Andrew Ng to get in...)


I work as a researcher on the Brain team.

An experienced machine learning researcher like Andrew Ng would probably not join the team as a Brain resident. We hire experienced machine learning researchers and engineers all the time (see https://careers.google.com/jobs#t=sq&q=j&li=20&l=false&jlo=e... ) and the residency program is probably not appropriate for people who are already experts. It is a program designed to help people become experts in machine learning.

For residents we look for some programming ability, mathematical ability, and machine learning knowledge. If an applicant knows absolutely nothing about machine learning, it would be strange (why apply?). We accept people who are not machine learning experts, but we want to be sure that people know enough about machine learning to be making an informed choice about trying to become machine learning researchers. Applicants need to have enough exposure to the field to have some idea of what they are getting into and have the necessary self-knowledge to be passionate about machine learning research.

You can see profiles of a few of the first cohort of residents here: https://research.google.com/teams/brain/residency/

See the old job posting which should hopefully explain the qualifications: https://careers.google.com/jobs#!t=jo&jid=/google/google-bra...


Thank you so much! I may well apply in a year or two, after doing more ML work and talking my way into sabbatical time. :-)


I was accepted to the 2017 class, so I can at least provide one data point. My background is in astronomy where I did lots of computational work, but no ML. After I got my PhD I went into industry working on applying NNs to EEG and EKG interpretation. I had a little over a year of industry experience in ML at the time I applied. I'm maybe not quite a beginner anymore, but I'm much closer to that end of the spectrum than being at the world-renowned expert end!

I haven't met the other residents yet (the program doesn't start until mid-July), but based on a few online interactions and meeting some of them at the interviews it seems that a fair number are undergraduates or grad students with some research experience in ML or computer vision. Others seem to have backgrounds more similar to mine, i.e., they were researchers in physics or neuroscience or something and are transitioning to ML.


Coolness---thanks!


The colouring algorithm is very interesting, what if we give a dinosaur image as an input after learning the network with images from all the 10,000 reptile species?


It will probably be colored somewhat grayish-green, since most reptiles are small animals that benefit from being able to hide in foliage. I doubt it will give better insight about dinosaur coloration than just having the illustrator who made the image also do the coloring (and they at least have the option of including fossilized pigments into their educated guess.)


I admit I'm a little disappointed that "Google Brain Residency" isn't a prototype for uploading your consciousness to "the Net".

Loved the article though. Dashed cyberpunks dreams notwithstanding.


I was half-expecting it to be about their invention assignment agreement.


yup, the very best machine learning models are still made from distilled postdoc tears...


This is just incredibly impressive on the surface. I wonder if it could be used for compression of videos and pictures?


If you google something like "neural net compression" you'll see a paper on it. It doesn't beat standard compression right now as far as I know


There's some cool new work -- to be presented at ICML in August -- that claims practical improvements over existing compression methods by using neural nets for "adaptive compression". The authors are strong academic researchers and appear to have formed a startup around this tech. ICML is a rigorous venue, subject to selective peer review. This is also my area of expertise, though I'm not affiliated with the authors of this paper.

Link to project: http://www.wave.one/icml2017/


I'm waiting for someone to train a NN to make over compressed Spotify streaming to sound not-crappy.


It's almost time for youmightnotneedmachinelearning.com

AFAIK Spotify uses Vorbis for their streaming. Perhaps if they switched to Opus things would sound better.


Perhaps the solution could be approached from the receiving end, during decompression, given some kind of information about the appropriate form of the decompressed stream.


Like feed in a compressed file (OGG or whatever) feed in a lossless file (FLAC say) and have the [ML] system create an algorithm to run at the receiver that will more closely approximate the lossless output.

This has been done before surely, like compression with varying decompression algos the characteristics of which are sent with the compressed stream to enable more optimal reproduction of the originally encoded data??


Great article! I especially like how its understandable and relatable to folks not wholly within the ML space.


https://how-old.net/ has improved a lot



That XKCD feels spot on... https://xkcd.com/1838/


That was literally at the end of the article.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: