Hacker News new | past | comments | ask | show | jobs | submit login
Google is acquiring Kaggle (techcrunch.com)
810 points by Perados on Mar 8, 2017 | hide | past | web | favorite | 156 comments

This is obviously a talent acquisition in more ways than one (the Kaggle team, but also their ability to source machine learning talent). I wonder to what degree it's also a Tensorflow promotion move? It seems like Google is very interested in growing a community around it.

For example: some friends who run a seed-stage biotech deep learning startup were offered a considerable discount by the Google Cloud folks. Their ask? That the company switch to Google Cloud, rewrite some proprietary software in Tensorflow, and heavily publicize both moves.

I wonder if we'll see Kaggle gain a specific bent towards that ecosystem.

Not clear to me why this is a talent acquisition. The Kaggle team (Ben in particular) have some talents in ML, but I'd be surprised if they have anyone there working day to day on ML tasks.

It seems to me more like an old school product-and-media acquisition: Google like the product, and love the audience. This is a good way to get both.

I think parent's focus was on the "sourcing ML talent" part rather than the Kaggle team itself.

It was; should have emphasized it more. The Kaggle team is talent for sourcing ML talent.

Plus Kaggle is a good tool for weeding out talent. Sending an ML candidate a kaggle competition is much better than a traditional code interview.

I don't think Google actually has that problem.

The whole thing is strange.

Kaggle can bring out unknown or underprivileged gems into the spotlight. I remember reading an article about a top performer on Kaggle who was a school teacher somewhere in SE Asia (Singapore?).

1) Why do you believe that the hypothetical Singaporean isn't going to apply to Google? Google has no shortage of applicants. And if the applicant believes that Kaggle could help them, whey not simply put the score on the application / resume?

2) If Google is trying to recruit people from Kaggle accounts, why not simply index the accounts?

Neither approach requires purchasing Kaggle at all.

Singapore is a bad example. How about a (hypothetical) guy/gal learning ML from Coursera and living in a remote village in Indonesia? No way to go to college because it's simply too far and he/she has to support their family. The person stumbled upon Kaggle, and started to compete with the best in the world.

Only Kaggle has the full data to be able to make an accurate decision. I don't really think indexing account pages is even remotely enough to find the really talented people among the noise.

I think Google acquired Kaggle for one of the following two reasons: 1) they wanted to expand their talent acquisition reach[1], or 2) they wanted to build a platform like Kaggle aimed at Google Cloud, but figured out that it was just easier to acquire Kaggle itself.

[1]: Google will NEVER be satisfied with its talent pool given their size and rate of expansion. The company is prepared to do a ton -- perhaps even acquiring Kaggle -- to get the best of the best, wherever they are.

If you own all the user data then you know you have access to, and control of, all of it.

But if you index it. You have it.

Not the team themselves, the competitors...

>This is obviously a talent acquisition in more ways than one (the Kaggle team

Last I heard was Kaggle runs atop Azure and is heavily a C# shop. It'll be interesting to see the transition to Google Cloud if that's the case.

I can confirm that Kaggle runs on Azure because I block all Microsoft IPs (to avoid the ninja Windows 10 upgrade) and must disable the blocker in order to go on the site.

As skrebbel said, don't they charge for the upgrade now? That said, Never10[1] was (still is?) a great tool to prevent the Windows 10 auto-upgrade. Also, according to the Never10 page, Microsoft now has an optional update to get rid of the GWX stuff.[2]

[1] https://www.grc.com/never10.htm

[2] https://support.microsoft.com/en-us/kb/3184143

> to avoid the ninja Windows 10 upgrade

What ninja upgrade? You always had to opt-in. Yes, they were really pushing the offer annoyingly hard, but I had no problems whatsoever to keep one of my machines on Windows 7.

Anyway, you can stop doing so now, the time for a free upgrade is over.

This is incorrect. There was an opt-out phase where the Windows 10 install started automatically in the middle of work. I've experienced this myself, there's a moment where Windows 7 just shuts down and starts installing Windows 10 and I had to wait 30 minutes until I could press "I disagree" to the EULA and then it would start rolling back the Windows 10 it just installed.

Entirely off topic, but I thought they now charge for the Windows 10 upgrade and don't force it anymore?

They do charge now but you can get it free if you say you will use an accessibility feature.

Why not upgrade to Windows 10? It's my favorite Windows OS yet, and has me even rethinking whether I want our house to be all OS X...

This thread from a while back covers some of people's objections to Windows 10, outside the usual privacy concerns:


So you don't get windows updates?

At this point presumably a system not running Windows 10 is not getting updates anymore. Unless it's an enterprise install, in which case the ninja update is irrelevant.

I get updates on Win 7

This is a great idea. Adding it to my DNS blackhole as we speak.

It's really not a great idea. Either you don't run Windows, and it's not an issue, or you just blocked Windows Update and other important services Microsoft provide that work in tandem to keep your systems safe.

Blocking Windows Update sounds like a feature to me, not an issue.

> Either you don't run Windows, and it's not an issue,

Not a solution for those of us who run Windows boxes for various reasons...

And to clarify, I plan on occasionally letting updates through (I'm already on Windows 10) but this is a great way to prevent data collection / backdoor activation, which I hadn't considered. Seems like the simplest way to add a lot of privacy to Windows.

There are a shed load you can block without interefering with updates.

Yet that's not what the parent and its parent were talking about/implying. It clearly said "blocking all Microsoft IPs".

And considering the Windows 10 upgrade was being pushed through Windows Update I'm not sure how you'd want to prevent that specific update by blocking an IP and not interfere with Windows Update as a whole.

Makes sense. Azure LBs do not support ICMP and all ping packets are dropped. You can't ping any Azure-hosted services. Kaggle.com fits the description.

I'm pretty sure it supports ICMP, as TCP/IP cannot work properly without it. I guess you mean ICMP echo. Also there are like four kind of Azure load balancers and this is only true for some of them.

I can ping bing.com, but does that mean bing is not hosted on Azure? [Though it redirects to pinging something like a-0001.a-msedge.net]

It may just not use the Azure LB service (e.g. running HAProxy on virtual machines instead).

They are also known to have used F#, and even provided a testimonial to this effect: http://fsharp.org/testimonials/. Can't say if it's still used, though. That's two recent high-profile acquisitions (with Jet.com) for F# shops.

> At Kaggle we initially chose F# for our core data analysis algorithms because of its expressiveness. We’ve been so happy with the choice that we’ve found ourselves moving more and more of our application out of C# and into F#. The F# code is consistently shorter, easier to read, easier to refactor, and, because of the strong typing, contains far fewer bugs.

> As our data analysis tools have developed, we’ve seen domain-specific constructs emerge very naturally; as our codebase gets larger, we become more productive.

> The fact that F# targets the CLR was also critical - even though we have a large existing code base in C#, getting started with F# was an easy decision because we knew we could use new modules right away.

Google Cloud supports Windows, right? What would be the problem? (Honest question)

None whatsoever, unless they're heavily bought into Azure-specific services.

The idea that if you do C# you must be on Azure (or the other way around) has been outdated since Azure started. The first startup I ran tech at hosted C# on Mono in Docker containers on DigitalOcean and had devs on all 3 major OSes.

I'd be surprised if there isn't a decent amount of C# somewhere in the Google ecosystem.

I'd be interested if anyone knows anything about this. Especially given the recent updates to for running .NET core on Linux/Mac, a company like Google could make great use of C# without needing to shell out for Windows licenses.

Relevant 10 year old blog post [1].

[1]: http://steve-yegge.blogspot.com/2007/06/rhino-on-rails.html?...

Don't know how true this still holds, but there was a time at least where it sounds like anything outside of C++, JVM languages and Python was off limits.

The IP of kaggle.com reverse DNS is cloudapp.net which is a Microsoft Azure domain so I think that this makes sense.

That's really interesting to hear. I wouldn't read too much into it, I was mostly just speculating. It's quite likely that they mostly scooped them up for the rolodex that is their user database.

In any case, congrats to the Kaggle team!

I think this may have something to do with Jeremy Howard's time as president there - I remember watching a few of his tutorials a couple of years ago when he was still at Kaggle and he was really into C#.

I wonder if Nest has support contracts for any Java 6/7 they are still using.

Why does Google want to promote TensorFlow? To make people use more of their cloud offerings?

Likely to avoid their mistake with MapReduce, where by around 2011 candidates were coming in to interviews and saying "MapReduce? That's sorta like Hadoop, right?"

There's value in controlling mindshare; keep everything proprietary too long, and people just use open-source clones that may be inferior but can actually be used by the majority of the talent pool.

More specifically, Amazon Elastic MapReduce (EMR) beat Google to market. By years, if I recall correctly.

Does the downvote indicate my memory is faulty?

I believe I was already using EMR when Google's MapReduce service was announced. I'm not referring to their internal tool, but the external service.

EMR beat Google Cloud MapReduce to market, but you're forgetting that before there was such a thing as cloud services, we relied on open-source frameworks and setup our own clusters. EMR is based on an open-source framework called Hadoop, which itself was built on a closed-source Google framework called MapReduce that Google released a paper about. MapReduce came out in 2003, Hadoop in 2006, Amazon EMR in 2009, and Cloud MapReduce in 2015.

...which is sorta my point. People remember the version of the technology that makes it accessible to them, not the first one that comes out. When Google keeps thing proprietary forever and only releases academic papers, people quickly forget just how far ahead they were.

That's all true, but what may matter more to Google was the missed business opportunity of being first to market with a relatively easy distributed computing paradigm.

That's exactly backwards - the MapReduce paper was intentionally released as vaporware to make the rest of the industry spin its gears trying to replicate an imaginary result. And that's why we have Hadoop.

You realize you're arguing with an ex-Googler who has worked on production MapReduces that were first written around 2005 and has read the initial MapReduce commit?

I thought the MR paper described an actual working implementation. It had performance test results, descriptions of issues they encountered and solved, and some sample source code of how MR is used. It seems like a lot of effort was put in for it to be a hoax.

I imagine part of it is that businesses built on Tensorflow play nice with Google Cloud at their TPUs, but mostly I suspect it's just a mindshare thing. If Google becomes the place that all the top data scientists want to work – such that they don't even have to be poached – that's a Very Good Thing for them. It probably doesn't hurt if those data scientists come in already familiar with a tool Google uses internally.

Kind of reminds me of the genius move by Tesla to crowdsource collection of self-driving car information. Experts want to get where they have the data to train their models, and if Tesla propels itself ahead of the pack for number of miles of real-world training data, then that makes them very attractive to talent.

If all machine learning experts use TensorFlow, all the machine learning chips coming out will be highly optimized for TensorFlow. Higher competition among TensorFlow chips = better acquisition prices for Google. They also don't have to go around convincing chip makers to support TensorFlow (like they did, for instance, with the VP8/VP9 codec).

I am curious to see what will happen to Tensor Flow. I hope the code will get clean up... I also hope they will eventually pay somebody to do it, as the open source option clearly generates heterogeneous nightmare.

The rewrite in TensorFlow is somewhat worrying though, since TensorFlow is open source, meaning that there's no real benefit to google if it's written in TensorFlow (except for recruitment purposes).

It's worrying since it suggests that google might be planning to make it, or at least parts of it proprietary in the future....

For the record, I don't think that google will, but I'm still worried about the possibility....

Doubt it.

They made angular and they didn't some how proprietary it.

The more worrisome stuff is when they close shop on services or completely change a framework.

TensorFlow isn't a service so we don't need to worry. And I doubt they would change TensorFlow so much like angular 1 to 2 to 3 kinda deal. If it does happen Keras library abstract it iirc.

I think their goals is to get people to use their cloud services imo. They do the same with their nexus without the SD card to push people to the cloud.

Also I think it's almost like the idea of controlling a framework instead of being on the whim of some other company. I'm looking at Oracle and Java here.

Facebook have their NN. Google have their owns. So they don't have politics to deal with.

Recruitment is an important purpose, though. Having a steady supply of pre-Tensorflow-trained engineers available is presumably why they opened up Tensorflow to begin with. They're not going to benefit more than that anytime soon by closing it off again.

I think their play has been building specialized hardware that executes TensorFlow better than anyone else. "You could use a GPU to do this, but check out our custom ASIC that does it 400x faster for 1/5th the cost..."

NO ONE does it 400X faster...

And the hardware advantage is easily negated. For example, our startup is building something like this.

The 400x was clearly hyperbolic – and sure. But you probably don't have one-click integration with an existing battle-tested IaaS platform.

Vertical integration is powerful, and by open-sourcing Tensorflow Google is achieving useful synergies in sales and recruiting. At their vast scale, even small ROIs (as a percentage) can be massive.

No, but our customers, ie AWS will...

> It's worrying since it suggests that google might be planning to make it, or at least parts of it proprietary in the future....

That would be the Google-only Tensorflow acceleration hardware they have.

I have a soft spot in my heart for Kaggle. I was motivated to get into the software industry 5 years ago when they ran their first Facebook hiring challenge. How else to break into an industry I had no degree in?

I didn't do so well in the competition but it got me coding every day and it gave me enough to talk about that I figured I could sell all my things and ride a motorcycle to California and start knocking on doors. It worked, after a fashion.

I also have a soft spot in my heart for Kaggle because I interviewed there during my first month in San Francisco and it was absolutely the worst interview of my life.

I can relate to the "bittersweet Kaggle memories" phenomenon.

I participated in their first-ever competition, which I thought I would have a good shot at because Kaggle was brand-new (thus not much competition), and because it was in my wheelhouse, a biological application of ML. And at that time, c. 2010, ML was not all that well-known.

I did OK (placed somewhere in the top-middle IIRC) but it was quite humbling. Now it's not really worth doing except for fun or to be recruited by someone because the competition is so fierce and there are people with a lot of time to devote to it. The difference between 1st and 25th place is often measured in the 3rd decimal place of performance, making success kind of random. But the postmortems by winners are always good to see some real-world best-practices and different workflows.

As for the business model, I'm pretty ambivalent about it. My wife is a graphic designer, and in that field, "compete to see who has the best design" is a somewhat common thing. But it's scummy and designers hate it because it's a way to basically get free work out of lots of people and it erodes salaries in the industry.

Work should probably not be gamified, especially when the gamification takes the form of "you only get paid if you win". And "hey, you might get recruited if you do well without winning" is not a lot of consolation. It's pretty exploitative for anyone not A) doing it purely for fun/learning or B) willing and able to assume the risk of making their money from competitions. (Just to be clear, I've never done Kaggle except for fun, but I know others do it for serious career purposes or money, as those are obviously express intentions of the site)

> Work should probably not be gamified, especially when the gamification takes the form of "you only get paid if you win"

Wanna join a startup? Huge equity! :)

Good point. I work in the "relatively safe" area of academic research, but the point still holds.

Even more broadly, your thought has made me wax a little philosophical about capitalism: we believe that 1) everyone should work, 2) I only want winners to work with/for me, and 3) not everyone can be a winner. I guess you can't have all three, but we sure try.

If you put it in that light, maybe Kaggle isn't so bad. But OTOH, we do make the distinction between employees and entrepreneurs for a reason.

Don't leave us hanging like that, why was it so bad?

Please tell us the story about your worst interview. I'm curious.

This was about 5 years ago.

It was a mess from the get-go since I was applying for a data scientist position. At the time I didn't have the slightest idea what working as a data scientist entailed, nor that despite 7 years unsuccessfully pursuing a PhD in bioinformatics I wasn't sufficiently statistically oriented to excel as one. I just knew that I was good at math and had taught myself enough Perl and Ruby to be dangerous. I should have been applying for software/data engineer roles but I didn't know what that work was like at the time. The only engineer I knew had lived across the country from me and seemed more like a god than a regular human.

So after doing not-terribly in Kaggle's machine learning competition (which I did through some unholy Perl scripting and this bizarre network propagation model which had nothing to do with any normal machine learning techniques) I viewed Kaggle as a dream job. I think I did some phone interviews, I let them know that I was already local (crashing on couches and staying in transient hotels), and they scheduled me for some in-person interviews.

Waking up with flea bites from the dogs staying in my transient hotel next door to The Armory, I walked outside to find my motorcycle placed upright on its stand but somewhat demolished and a large pile of its broken parts piled politely on the seat. No note. I had about 20 minutes to make it to my interview. This would have been sufficient lane-splitting with a functional motorcycle, yet I hadn't left myself enough time to take public transportation instead of my busted ride. I brushed the jetsam off and tried to ride to the interview.

I had no left or right footpegs; my clutch lever was bent; my front brake lever had snapped off and was barely two inches long; my handlebars had been wrenched into a 135* bend forward and upward on the righthand side. As I rode this shambling contraption to my interview with my left foot anchored on top of the shift lever and my right foot reached back on the passenger peg I must have resembled a Street Fighter character diving forward in some sort of headlong right hook. I couldn't shift out of 1st and I couldn't really brake.

I arrived at the interview miraculously on-time but absolutely drenched in sweat from the harrowing journey through hostile traffic. So we'll start with that good first impression. My interviewers seemed to all have PhDs and significant post-doc experience and more than half of my interviews consisted of questions about why I dropped out of my PhD program. I mean, repeated hammering about it. I didn't really want to talk about it in great detail because I was emotional about it, mainly because my research supervisor died unexpectedly at the age of 42 and the whole situation still makes me sad to this day. So instead of actually asking questions about the company, or the role, or my skills, about half the time was spent being grilled about this sad story of my graduate career. So I was in a wonderful mood at that point and was happy to finally move on to the technical assessment because I was worried about tearing up in front of the interviewer.

THE TECHNICAL ASSESSMENT: a bunch of softball questions that I don't remember followed by a standard algorithms question. Come to think of it I don't think I got any statistics/ML questions because we got stuck on the algorithms question. The problem was that I never took CS courses, nor had I studied the CLRS book at the time, so I didn't know the "expected" way of doing some problems. I tend to get too creative.

The problem: given a list (n) of words (as strings of size k), return all sets of anagrams. The somewhat clever O(n k ln k) solution is to sort all the strings and then look at all the sorted strings that have multiple words mapping to them. The cleverer solution is to build a dictionary for each word, using the letters as keys and the number of times each appears as the value. This is linear to construct on the word length though it takes a memory penalty that is usually nugatory.

But oh no, I didn't think of either of these bog-standard solutions off the bat. Not having ever practiced this problem I immediately got creative, having too much fun with it. My initial thought was to create a dictionary where the keys were integers and the values were arrays of words. The integer keys were produced by reading the characters of the word in sequence, mapping the alphabet to the first 26 prime numbers, and taking the product of all the primes represented by the characters in the word. By the fundamental theorem of arithmetic (a.k.a. unique factorization of primes), any anagram would create the same integer key, and all words mapping to that key would get pushed into the value array. Voila, and a very compact data structure to boot!

This was really, really bad. Basing the correctness of your interview solution on Euclid's Elements is never going to get you any traction with software engineers. As we all know, engineers are allowed to interview candidates to make themselves feel smarter and more fortunate for already having a job. Getting into an argument about prime number factorization with a candidate isn't going to support either of those goals.

So I was gloating a bit, feeling good for finding such a nice clean solution after that bit of emotional wrenching, figuring I had explained it well with a little math flair and eager to move on to the next problem.

"Um, are you sure that works? Will the numbers always work out like that?" 'Yes, that's why I chose the first 26 prime numbers, it wouldn't work with composite numbers.' ". . . Could you do it a different way? Could you do it another way that isn't this solution?" 'Could I do it slower than this? Sure, but why?'

It seemed like they had a list of possible answers on their interview script and I had gone way off of it. Now, the funny thing was I couldn't think of the sorting solution or the comparing dictionary solution, and I had a correct and well-explained answer that was superior in runtime and memory requirements (and before you object with arbitrary length words and integer overflow I'm just going to stop you with replacing the product of the primes with the sum of their logs and . . . I can explain but this margin is too small to contain). I could only think of really dumb things, like calculating every possible permutation of every single word and comparing them exactly. In fact I was kind of taking a perverse delight in trying to figure out the most inefficient way I could answer their question, seeing how I was very certain there was no better answer than what I had already come up with first. They kept asking me, "is there another way you could do this? Faster than comparing all permutations?" and I kept returning to my prime number solution and they kept saying "but is there ANOTHER WAY you could do this?" Round and round this went and I never got the solution they were looking for, whatever it was.

I don't even remember the rest of the interview process as I had the distinct feeling that it ended prematurely. I didn't catch that at the time since this was to be the first of my in-person interviews in the tech world, but looking back I'm sure they hadn't originally planned on hustling me out the door after a half-hearted tour of the loft workspace.

There are of course two sides to every story, and I'm sure there's someone from Kaggle who tells a story about a leather-clad sweaty-toothed madman who kept prattling on about his dead professor trying to score sympathy points and who gesticulated wildly for an hour while screaming FUNDAMENTAL THEOREM OF ARITHMETIC in a poorly disguised Southern accent. But, regardless of their perspective I can definitely assert that was the worst interview I'd ever had. I went 'home' to buy some calamine lotion, call my insurance company, and tell my mom that the interview at the 'dream job' didn't quite pan out.

Thank you, this is the best HN comment i have read in a year (or possibly more)!

Your solution to the anagram problem is very ingenious, indeed. I have seen this question a bunch of times and know all the "standard" approaches - but your solution, which i've never heard of before, is correct, clever and far superior. Kudos to you for coming up with it under such stressful conditions!

Your description of the interview process is very sad. The interviewers' attitudes and response induced an icky feeling in me.

I think that the attitude you encountered at Kaggle is an artifact of rather poor quality, insecure and insufficiently experienced / accomplished engineers. Whenever i interact with really accomplished and senior technical people, the conversations tend to be of stellar quality - they like to discuss actual past work in depth, are able to comprehend new and (perhaps unusual to them) concepts, and are generally a lot more respectful and pleasant. On the other hand, everything you described reeks of junior, inexperienced and rather insecure technical talent.

This reminds me of my own very-first tech interview. I had spent 5 years in a research lab, working on autonomous vehicles, writing software for things like signal processing algorithms, error correction codes, ML / reinforcement learning algorithms optimised to run on power-constrained devices and the like.

I went to interview with a VC-funded startup. They were looking for "experts in signal processing and machine learning experts". I figured my years of signal processing / ML experience might be a good fit. After the initial pleasantries, i get asked "How will you design a server that can detect anagrams of any word in the english language?"

For the life of me, i couldn't come up with a reasonable solution. I went home and hung my head in shame.

Thanks for the compliment, I didn't realize it would turn out so long. I think I ought to start writing more often.

I joke about the interview process existing to inflate the ego of the interviewer, but it isn't that far off. Everyone wants to be Google so they have to "act like Google". Meanwhile Google stopped doing that five years ago, but they're not going to tell you that.

I don't know what happened at that Kaggle interview and I'm not going to single out the people or the culture there because it really could have happened anywhere. Maybe I could ascribe some of it to having so many people coming from academia, since we (the PhD and post-doc dropouts) tend to have such enormous chips on our shoulders. But it's just our interview culture in general, so much unconscious bias.

I really would love for interviewers to look at your resume, say "well you've been gainfully employed doing this for 5 years, let's delve into your soft skills and see if you are actually a communication / working fit for this particular team" instead of just trying to see if you meet some arbitrary algorithmic bar and hope that correlates with job success.

Kaggle is a great idea, but it's steadily getting more annoying to use.

1) Cruft on all landing pages and having to click through to get to the comps page which is the site.

2) Annoying focus on exploratory notebooks. Inevitably they aren't powerful enough and people link through to external sites.

3) Forcing the use of 3rd party compute platforms to enter comps. Half the fun for me is messing around with my own ideas and this just gets in the way. These should be optional rather than required.

4) Poor incentives. Many of the comps have tiny prizes for the value of work that gets done. They're also concentrated way too much at the top. Unless there's something I want to try out, the expected value of participating is way too low to do it just for the giggles.

I do analytics for a huge corporation and have been quite happy however some of my peers who are unhappy with the pay here participate in Kaggle for the opportunity to do well and get a better (higher paying) job.

Some of the inherent value of the work for the small prize pool is more the opportunity of doing well and being recognized for that work.

Data Science, or trendy statistics, is inherently fun which is also what makes kaggle fun. Discovery in data will always be popular among people who love to solve problems.

To your other points, I don't disagree with you-- all the steps just to participate are becoming more work than its worth, at least for me. I do a lot of the same problems asked in kaggle naturally at work.

>>> participate in Kaggle for the opportunity to do well and get a better (higher paying) job

Obviously it's anecdotal data at best, but still curious, what are the results? Because it sounds very similar to the frequently given advice for software engineers 'push code to github to land a great job'.

I can say anecdotally that my ranking on Kaggle helped me recently land a good data scientist job offer, transitioning from academia. I have spent a lot of time on Kaggle though, probably it would have been more efficient (but less fun) to spend that time spamming job boards and studying machine learning, stats, and computer science.

I've hired many people, and I don't know anyone that's ever looked at either kaggle, or stack overflow, or github commits for anything. I've seen them on resumes before, but only from very junior people, and typically from people outside of the US.

Quite frankly it's a rather bullshit signal, since it's presence only tells you that the person spends all their free time on the computer. Maybe the know something, but a traditional interview will tell you that and more.

I disagree. From junior people, it shows that they can actually do something in practice, and it's not all theory that they don't know how to apply.

A person just outside of university does not have heaps of past jobs to show. So they should just leave it blank and describe their hobbies?!

No one cares about hobbies, and Kaggle is a hobby.

An NCG should write more about class projects. Everyone has class projects.

If an NCG wants to put it down, fine. But don't color me impressed. Why should I select someone that spends their evenings alone tweaking out an extra 0.001% on a AUC curve, when I could conceivably get a more rounded individual with better team skills?

> 3) Forcing the use of 3rd party compute platforms to enter comps. Half the fun for me is messing around with my own ideas and this just gets in the way. These should be optional rather than required.

Specially this. I loved playing around with data there, but the moment most competitions have datasets with sizes in the order of tens of GBs, I'm out. I can always take the opportunity to learn AWS / Google Cloud processing methodologies, but that kills a bit of the fun of the first days.

Points 2) and 3) seem limited in scope. For 3) that was only 1 competition that I can think of, but I agree that was a terrible competition format and executed poorly. For point 2), you can just ignore the exploratory notebooks if you so choose. I agree with point 1), the website is steadily getting slower and worse, and was already a pretty slow website. I am not a web developer so I don't really know why it is so slow, but it is really noticeable. I agree wholeheartedly with point 4), if you want to make money directly from the competitions Kaggle is not a wise investment. However if you want to transition to data science Kaggle is a great resource for learning.

https://www.crowdAI.org is an open source alternative. Disclaimer, my research group at EPFL started the platform, because we think there should be a community-based open source version that is open to anyone. Always looking for contributors!

Edit (1): Github https://github.com/crowdAI/crowdai Edit (2): We're currently re-designing the whole site to look & feel better.

I just looked at the site and it sounds real exciting (but way too difficult for me). Can I ask how you guys are funded? I saw that there is a ~$2000 payout for the winner of the most recent challenge.

The platform itself is funded by institutional research funding we get at EPFL. For some of the monetary prizes, these typically come from the corresponding projects.

That's disappointing. Google will probably keep the service alive for recruiting and the consumer base, while most of it's technologies will probably be shut off. Being owned by Google might also mean that some companies might not want to post challenges on Kaggle anymore, like Facebook or Microsoft.

I really don't understand this assumption that all acquisitions are going to lead to disaster. I work in the Firebase team at Google and couldn't be happier that they've joined (it's what got me to return to Google). Google doubled down on the product and it's grown in ways that Firebase could never have achieved on its own. All while integrating into the broader ecosystem of Cloud.

Firebase then acquired DivShot and people cried doom. Yes DivShot was shut down--after completely rearchitecting Firebase's CLI and Hosting to have DivShot's open source web hosting framework with the features of both product lines. The CEO of DivShot now runs Firebase Hosting's product line and has massive resources at his disposal to push his (great) agenda of simple and speedy static web services.

What is kaggle's technology really? The notebooks? It's a rather experimental feature that doesn't work all that well most of the time, and shouldn't be going nowhere as a lot of people still appreciate them.

I agree for big companies though, even if that doesn't make a whole lot of sense (as there's no private info involved here, the competitions being in the open including the data provided by these companies).

Some people spend years competing and trying to get top rankings. It was a great signal to show to recruiters/potential employers. If Google shuts it down, all this work will be gone.

This is worst news I read today. Kaggle independently serve more purpose to community than a baby of some large giants. I love kaggle and I am very disappointed that google acquire everything we love.

I'm a little sad about this, what will Google do with this? Are they going to drain its soul? I think at a minimum the people behind Kaggle won't feel the same urge to keep building , maintaining and growing it the same way as before, specially as the $$$ flows in their pockets. It will probably change direction by people at Google in control and I'm not sure if that's a good thing since they didn't just built something like this on their own or a better version of it if they were really good at doing stuff like this themselves.

They just officially announced it at NEXT. It was presented by Fei Fei Li, who is known for the ImageNet project, which one one of the first big open datasets that really helped advance this field.

The way she presented the news is that they will aim to advance that vision, but we'll have to wait an see how their vision pans out.

Why does the article say that Ben Hamner was involved in the founding in 2010? He joined years later. Some basic fact checking would be nice, even in tech articles...

(Ben has been a great contributor, mind you.)

Probably because Crunchbase has Ben's title as "Co-founder & CTO", which is also what his LinkedIn says.

It's not unheard of for people who came on ex-post facto to be offered cofounder titles to sweeten the deal or for other reasons (titles are cheaper than equity, I suppose).

Edit: just rechecked and his LinkedIn notes Nov 2011 as the start date for Kaggle. Guess it was easier to condense it into one sentence than to explain the difference to TechCrunch's audience?

Yeh as I recall Jeremy Howard was chief boffin at the start, and left some time later to start his own biomedical data analysis company (also in SF).

Funny thing: the comment you're replying to was posted by Jeremy Howard. Pardon if I'm missing some tongue-in-cheekiness.

heh. I was unaware :)

Congrats to the Kaggle team! One great thing about Kaggle was that the team listened and sought out feedback from users (even if they didn't always follow the feedback). I hope that doesn't change with the acquisition.

It's amazing how focused Google is on AI compared to the other giants. I think it's a great investment on Google's part and congrats to Kaggle.

I hope the mission of the site doesn't change. I think Facebook did a great job with whatsapp and instagram. I expect the same with Kaggle.

DrivenData.org is a solid competitor without much publicity. Maybe they'll take over some of the traffic if Kaggle changes for the worse.

Well, supposing this is correct...Congratulations to Anthony and the rest of the Kaggle team! Those guys do a great job. Hopefully they get rewarded for it.

Congrats to Jeff and the rest of the team. I'd be interested to hear how much .NET survives the transition!

This could well end up being a fantastic move for Google to also acquire customers in its platform. If Kaggle moved large pieces of its competition to be automatically hosted on GCE it might be a good win for Google. So like Kaggle's "kernels", GCE machine learning tools would become an extension that's usable with it in a really simple way. Not entirely sure what that might look like, but it feels like this kind of integration would be the best for both parties.

Since we're sharing alternatives:

https://empiricalci.com is a dashboard to keep track of your experiments & compare them on public benchmarks.

> Kaggle, which has about half a million data scientists on its platform, ...

Are there really that many data scientists? I thought it was a niche specialty. Is there enough work for that many people?

Think they maybe put a 0 in the wrong spot. Kaggle's leaderboard only shows ~50k: https://www.kaggle.com/rankings

I think they mean unique users.

A lot of them are academics who participate out of interest, and I'm sure a significant amount are regular software engineers trying to get their hands dirty with ML.

Best of luck to the Kaggle team. We attended a data scientist conference they presented at in 2012 which led to our YC application, and formation for SimpleLegal. Hats off...

Google probably want to use Kaggle as Google cloud entry point for the data scientist community. Kaggle has a lot of student and entry level data scientist. Getting those users to start to use Google cloud could potentially drive the growth of lots of potential customers.

I think you hit the nail straight on the head. Sure, Tensorflow will also probably get pushed in the form of tutorials, etc. but I certainly think it's rather related to bring a way to popularize GCS.

I'm not sure if this is good or bad news. I wonder what google motives are and how they will influence kaggle if this becomes reality.

Hate to think that some of the beauty of Kaggle being independent will be lost but that's likely

As with many of Google's hires chances are they see it less about acquiring a "product" and more about getting access to what that product produces - an extremely large number of leads in a high-demand space that they're currently trying to ramp up themselves.

Interesting, though, as they never needed to own that platform to mine it for hiring leads.

Only Google can spend this much on what's ultimately a recruiting project.

They have 500,000 developers, do the math at 30% commission for each assuming a $180k salary that's $50k even if they hire 0.1% of them that adds up to $50k * 500 = 25m they probably paid a few times more than that but not a few hundred times, which therefore makes this a pretty sweet deal for Google assuming the community keeps growing.

That 500,000 developers is a vanity metric. The real number is around 50,000 who have completed a challenge, and around 5,000 who are active on the site. Also of those 5,000 users the large majority are employed somewhere else.

do you have buy kaggle to hire kaggle members?

Well I'm not sure if they expose their members' contact info or if it'd be easy for Google to advertise on their site as effectively.

giving that acquihires in AI seem to go like $10M/head, Google access to that Rolodex would pay out pretty quickly

You don't think this would drive data scientists to use Google's cloud platform? I.e., if the most well-known data science competition uses the platform, then they will use it after the fact since it's what they know best. Right?

Good luck to Kaggle's employees. They have done a phenomenal job.

I think having a dataset on who is really interested in machine learning and applying it in practice can only help Google. Plus, if they kind of lurk on the side, you don't get enough of the Google brand overwhelming Kaggle so that it disrupts the community, but in the back of the minds of people going into competitions and who are in the know, it might help incentivize people who think "Hey, Google is really interested in this".

How could a company called "Google" not acquire a company called "Kaggle"? This makes me giggle.

Guess that means Kaggle users can expect it to be shut down in the next five years.

Sadly I think 5 years to sunset is an optimistic estimate.

Don't know why. Just don't think this is going to happen.

Deep Mind keeps acquiring appendages

Really excited about this acquisition! Might open new avenues for the data science community.

This must imply Kaggle has some internal software that Google want?

Nah, they just want a good recruiting station.

I am wondering if it's a marketing and recruiting play

According to the article, Google mainly wants the Kaggle community.

I wonder if that's the only one they want, or if they are also going to try to get other relics such the Knife of Exact Zero, the Fleece-Crested Scepter of Que-Teep, or the Orb of Ti-Teleest?

Hopefully there will be no uncertainties in the acquisition. If not they can form a team to fix them. But I'm joking around: this is a Google-Kaggle niggle gaggle giggle.

Anyone knows what was the price?

Whoever named this company has literally never spoken to a woman.

Care to explain what kaggle means to a woman?

Do you mean kegel, the exercise?

I think the name is a little odd too. Does anyone know how they came up with it?

> I didn’t have any money when I started the company to purchase a domain name so I built an algorithm that iterated phonetic domain names and printed out a list of what was available. My wife and I went through the list and “Kaggle” was the one we picked. It’s algorithmically generated.

> It’s a terrible name because most Americans pronounce it “kagel” [rhymes with “bagel”] which sounds like the pelvic floor exercises. Australians pronounce it “kaggel” [rhymes with “haggle”].

-- Anthony Goldbloom, http://www.intelfreepress.com/news/a-marketplace-for-data-sc...

Ah, interesting. I was thinking there must have been a German immigrant to the US whose name, Kegel, was anglicized as Kaggle (the officials at Ellis Island have not always been known for their deep knowledge of English orthography). But I could find no evidence of such a person.

-- Also, I see by following your link that the company was founded in Australia, so the pronunciation that rhymes with "haggle" is actually the original one! Cool! I'll use it.

> It’s a terrible name because most Americans pronounce it “kagel” [rhymes with “bagel”] which sounds like the pelvic floor exercises. Australians pronounce it “kaggel” [rhymes with “haggle”].

Speaking as an American, this is an awful example. Why are "bagel" and "haggle" not supposed to rhyme with each other? What difference is there in Anthony Goldbloom's mind?

Bagel = Kay-gel

Haggle = Kah-gl

I wonder about your dialect, if for you, "bagel" rhymes with "gaggle".

I'll accept bagel basically anywhere along the continuum you're trying to describe. (Well, assuming you think the vowel in the first syllable of "haggle" is like "bag" and not like "bog".)

California English.

As an Australian living in the US, it never occurred to me that it would be called anything but the latter.

As a self-described redneck in Arizona - the latter seems most appropriate.

Why anyone would pronounce it to rhyme with "bagel" makes no sense to me (same as pronouncing "gif" with a soft "G"); IIRC (and I am no linguist), in english there's a rule about how something is pronounced based on surrounding letters - and I think that double consonant vs singular consonant preceded by a vowel is one of those rules.

I'm sure there are exceptions, after all (it's english...) - but I have a feeling that if you looked at such words you would find the general pattern to fit.

Again - I am willing to admit that I really don't know what I am talking about; I'm not a linguist, I'm not an expert in english. I'm just some guy who last studied english in high school years ago...

I feel the same way about "GIF". Alas, the inventor of the GIF format insists on the soft "G" [0].

[0] https://en.wikipedia.org/wiki/GIF#Pronunciation_of_GIF

haggle gaggle straggle

There's really no excuse for any English speaker to interpret kaggle as kagle or kagel.

As an American living in the US, I feel the same


good news, I did not like the whole Kaggle concept anyway: thousands of people over-engineering solutions for one problem, paid peanuts, while there are more rewarding problems than talent available. It was a huge waste of scarce brainpower. I am launching my Kaggle alternative, landing page here: http://startcrowd.club/ Thanks Google for eliminating my competitor.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact