For example: some friends who run a seed-stage biotech deep learning startup were offered a considerable discount by the Google Cloud folks. Their ask? That the company switch to Google Cloud, rewrite some proprietary software in Tensorflow, and heavily publicize both moves.
I wonder if we'll see Kaggle gain a specific bent towards that ecosystem.
It seems to me more like an old school product-and-media acquisition: Google like the product, and love the audience. This is a good way to get both.
The whole thing is strange.
2) If Google is trying to recruit people from Kaggle accounts, why not simply index the accounts?
Neither approach requires purchasing Kaggle at all.
Only Kaggle has the full data to be able to make an accurate decision. I don't really think indexing account pages is even remotely enough to find the really talented people among the noise.
I think Google acquired Kaggle for one of the following two reasons: 1) they wanted to expand their talent acquisition reach, or 2) they wanted to build a platform like Kaggle aimed at Google Cloud, but figured out that it was just easier to acquire Kaggle itself.
: Google will NEVER be satisfied with its talent pool given their size and rate of expansion. The company is prepared to do a ton -- perhaps even acquiring Kaggle -- to get the best of the best, wherever they are.
What ninja upgrade? You always had to opt-in. Yes, they were really pushing the offer annoyingly hard, but I had no problems whatsoever to keep one of my machines on Windows 7.
Anyway, you can stop doing so now, the time for a free upgrade is over.
Not a solution for those of us who run Windows boxes for various reasons...
And to clarify, I plan on occasionally letting updates through (I'm already on Windows 10) but this is a great way to prevent data collection / backdoor activation, which I hadn't considered. Seems like the simplest way to add a lot of privacy to Windows.
And considering the Windows 10 upgrade was being pushed through Windows Update I'm not sure how you'd want to prevent that specific update by blocking an IP and not interfere with Windows Update as a whole.
> At Kaggle we initially chose F# for our core data analysis algorithms because of its expressiveness. We’ve been so happy with the choice that we’ve found ourselves moving more and more of our application out of C# and into F#. The F# code is consistently shorter, easier to read, easier to refactor, and, because of the strong typing, contains far fewer bugs.
> As our data analysis tools have developed, we’ve seen domain-specific constructs emerge very naturally; as our codebase gets larger, we become more productive.
> The fact that F# targets the CLR was also critical - even though we have a large existing code base in C#, getting started with F# was an easy decision because we knew we could use new modules right away.
The idea that if you do C# you must be on Azure (or the other way around) has been outdated since Azure started. The first startup I ran tech at hosted C# on Mono in Docker containers on DigitalOcean and had devs on all 3 major OSes.
Don't know how true this still holds, but there was a time at least where it sounds like anything outside of C++, JVM languages and Python was off limits.
In any case, congrats to the Kaggle team!
There's value in controlling mindshare; keep everything proprietary too long, and people just use open-source clones that may be inferior but can actually be used by the majority of the talent pool.
I believe I was already using EMR when Google's MapReduce service was announced. I'm not referring to their internal tool, but the external service.
...which is sorta my point. People remember the version of the technology that makes it accessible to them, not the first one that comes out. When Google keeps thing proprietary forever and only releases academic papers, people quickly forget just how far ahead they were.
Kind of reminds me of the genius move by Tesla to crowdsource collection of self-driving car information. Experts want to get where they have the data to train their models, and if Tesla propels itself ahead of the pack for number of miles of real-world training data, then that makes them very attractive to talent.
It's worrying since it suggests that google might be planning to make it, or at least parts of it proprietary in the future....
For the record, I don't think that google will, but I'm still worried about the possibility....
They made angular and they didn't some how proprietary it.
The more worrisome stuff is when they close shop on services or completely change a framework.
TensorFlow isn't a service so we don't need to worry. And I doubt they would change TensorFlow so much like angular 1 to 2 to 3 kinda deal. If it does happen Keras library abstract it iirc.
I think their goals is to get people to use their cloud services imo. They do the same with their nexus without the SD card to push people to the cloud.
Also I think it's almost like the idea of controlling a framework instead of being on the whim of some other company. I'm looking at Oracle and Java here.
Facebook have their NN. Google have their owns. So they don't have politics to deal with.
And the hardware advantage is easily negated. For example, our startup is building something like this.
Vertical integration is powerful, and by open-sourcing Tensorflow Google is achieving useful synergies in sales and recruiting. At their vast scale, even small ROIs (as a percentage) can be massive.
That would be the Google-only Tensorflow acceleration hardware they have.
I didn't do so well in the competition but it got me coding every day and it gave me enough to talk about that I figured I could sell all my things and ride a motorcycle to California and start knocking on doors. It worked, after a fashion.
I also have a soft spot in my heart for Kaggle because I interviewed there during my first month in San Francisco and it was absolutely the worst interview of my life.
I participated in their first-ever competition, which I thought I would have a good shot at because Kaggle was brand-new (thus not much competition), and because it was in my wheelhouse, a biological application of ML. And at that time, c. 2010, ML was not all that well-known.
I did OK (placed somewhere in the top-middle IIRC) but it was quite humbling. Now it's not really worth doing except for fun or to be recruited by someone because the competition is so fierce and there are people with a lot of time to devote to it. The difference between 1st and 25th place is often measured in the 3rd decimal place of performance, making success kind of random. But the postmortems by winners are always good to see some real-world best-practices and different workflows.
As for the business model, I'm pretty ambivalent about it. My wife is a graphic designer, and in that field, "compete to see who has the best design" is a somewhat common thing. But it's scummy and designers hate it because it's a way to basically get free work out of lots of people and it erodes salaries in the industry.
Work should probably not be gamified, especially when the gamification takes the form of "you only get paid if you win". And "hey, you might get recruited if you do well without winning" is not a lot of consolation. It's pretty exploitative for anyone not A) doing it purely for fun/learning or B) willing and able to assume the risk of making their money from competitions. (Just to be clear, I've never done Kaggle except for fun, but I know others do it for serious career purposes or money, as those are obviously express intentions of the site)
Wanna join a startup? Huge equity! :)
Even more broadly, your thought has made me wax a little philosophical about capitalism: we believe that 1) everyone should work, 2) I only want winners to work with/for me, and 3) not everyone can be a winner. I guess you can't have all three, but we sure try.
If you put it in that light, maybe Kaggle isn't so bad. But OTOH, we do make the distinction between employees and entrepreneurs for a reason.
It was a mess from the get-go since I was applying for a data scientist position. At the time I didn't have the slightest idea what working as a data scientist entailed, nor that despite 7 years unsuccessfully pursuing a PhD in bioinformatics I wasn't sufficiently statistically oriented to excel as one. I just knew that I was good at math and had taught myself enough Perl and Ruby to be dangerous. I should have been applying for software/data engineer roles but I didn't know what that work was like at the time. The only engineer I knew had lived across the country from me and seemed more like a god than a regular human.
So after doing not-terribly in Kaggle's machine learning competition (which I did through some unholy Perl scripting and this bizarre network propagation model which had nothing to do with any normal machine learning techniques) I viewed Kaggle as a dream job. I think I did some phone interviews, I let them know that I was already local (crashing on couches and staying in transient hotels), and they scheduled me for some in-person interviews.
Waking up with flea bites from the dogs staying in my transient hotel next door to The Armory, I walked outside to find my motorcycle placed upright on its stand but somewhat demolished and a large pile of its broken parts piled politely on the seat. No note. I had about 20 minutes to make it to my interview. This would have been sufficient lane-splitting with a functional motorcycle, yet I hadn't left myself enough time to take public transportation instead of my busted ride. I brushed the jetsam off and tried to ride to the interview.
I had no left or right footpegs; my clutch lever was bent; my front brake lever had snapped off and was barely two inches long; my handlebars had been wrenched into a 135* bend forward and upward on the righthand side. As I rode this shambling contraption to my interview with my left foot anchored on top of the shift lever and my right foot reached back on the passenger peg I must have resembled a Street Fighter character diving forward in some sort of headlong right hook. I couldn't shift out of 1st and I couldn't really brake.
I arrived at the interview miraculously on-time but absolutely drenched in sweat from the harrowing journey through hostile traffic. So we'll start with that good first impression. My interviewers seemed to all have PhDs and significant post-doc experience and more than half of my interviews consisted of questions about why I dropped out of my PhD program. I mean, repeated hammering about it. I didn't really want to talk about it in great detail because I was emotional about it, mainly because my research supervisor died unexpectedly at the age of 42 and the whole situation still makes me sad to this day. So instead of actually asking questions about the company, or the role, or my skills, about half the time was spent being grilled about this sad story of my graduate career. So I was in a wonderful mood at that point and was happy to finally move on to the technical assessment because I was worried about tearing up in front of the interviewer.
THE TECHNICAL ASSESSMENT: a bunch of softball questions that I don't remember followed by a standard algorithms question. Come to think of it I don't think I got any statistics/ML questions because we got stuck on the algorithms question. The problem was that I never took CS courses, nor had I studied the CLRS book at the time, so I didn't know the "expected" way of doing some problems. I tend to get too creative.
The problem: given a list (n) of words (as strings of size k), return all sets of anagrams. The somewhat clever O(n k ln k) solution is to sort all the strings and then look at all the sorted strings that have multiple words mapping to them. The cleverer solution is to build a dictionary for each word, using the letters as keys and the number of times each appears as the value. This is linear to construct on the word length though it takes a memory penalty that is usually nugatory.
But oh no, I didn't think of either of these bog-standard solutions off the bat. Not having ever practiced this problem I immediately got creative, having too much fun with it. My initial thought was to create a dictionary where the keys were integers and the values were arrays of words. The integer keys were produced by reading the characters of the word in sequence, mapping the alphabet to the first 26 prime numbers, and taking the product of all the primes represented by the characters in the word. By the fundamental theorem of arithmetic (a.k.a. unique factorization of primes), any anagram would create the same integer key, and all words mapping to that key would get pushed into the value array. Voila, and a very compact data structure to boot!
This was really, really bad. Basing the correctness of your interview solution on Euclid's Elements is never going to get you any traction with software engineers. As we all know, engineers are allowed to interview candidates to make themselves feel smarter and more fortunate for already having a job. Getting into an argument about prime number factorization with a candidate isn't going to support either of those goals.
So I was gloating a bit, feeling good for finding such a nice clean solution after that bit of emotional wrenching, figuring I had explained it well with a little math flair and eager to move on to the next problem.
"Um, are you sure that works? Will the numbers always work out like that?"
'Yes, that's why I chose the first 26 prime numbers, it wouldn't work with composite numbers.'
". . . Could you do it a different way? Could you do it another way that isn't this solution?"
'Could I do it slower than this? Sure, but why?'
It seemed like they had a list of possible answers on their interview script and I had gone way off of it. Now, the funny thing was I couldn't think of the sorting solution or the comparing dictionary solution, and I had a correct and well-explained answer that was superior in runtime and memory requirements (and before you object with arbitrary length words and integer overflow I'm just going to stop you with replacing the product of the primes with the sum of their logs and . . . I can explain but this margin is too small to contain). I could only think of really dumb things, like calculating every possible permutation of every single word and comparing them exactly. In fact I was kind of taking a perverse delight in trying to figure out the most inefficient way I could answer their question, seeing how I was very certain there was no better answer than what I had already come up with first. They kept asking me, "is there another way you could do this? Faster than comparing all permutations?" and I kept returning to my prime number solution and they kept saying "but is there ANOTHER WAY you could do this?" Round and round this went and I never got the solution they were looking for, whatever it was.
I don't even remember the rest of the interview process as I had the distinct feeling that it ended prematurely. I didn't catch that at the time since this was to be the first of my in-person interviews in the tech world, but looking back I'm sure they hadn't originally planned on hustling me out the door after a half-hearted tour of the loft workspace.
There are of course two sides to every story, and I'm sure there's someone from Kaggle who tells a story about a leather-clad sweaty-toothed madman who kept prattling on about his dead professor trying to score sympathy points and who gesticulated wildly for an hour while screaming FUNDAMENTAL THEOREM OF ARITHMETIC in a poorly disguised Southern accent. But, regardless of their perspective I can definitely assert that was the worst interview I'd ever had. I went 'home' to buy some calamine lotion, call my insurance company, and tell my mom that the interview at the 'dream job' didn't quite pan out.
Your solution to the anagram problem is very ingenious, indeed. I have seen this question a bunch of times and know all the "standard" approaches - but your solution, which i've never heard of before, is correct, clever and far superior. Kudos to you for coming up with it under such stressful conditions!
Your description of the interview process is very sad. The interviewers' attitudes and response induced an icky feeling in me.
I think that the attitude you encountered at Kaggle is an artifact of rather poor quality, insecure and insufficiently experienced / accomplished engineers. Whenever i interact with really accomplished and senior technical people, the conversations tend to be of stellar quality - they like to discuss actual past work in depth, are able to comprehend new and (perhaps unusual to them) concepts, and are generally a lot more respectful and pleasant. On the other hand, everything you described reeks of junior, inexperienced and rather insecure technical talent.
This reminds me of my own very-first tech interview. I had spent 5 years in a research lab, working on autonomous vehicles, writing software for things like signal processing algorithms, error correction codes, ML / reinforcement learning algorithms optimised to run on power-constrained devices and the like.
I went to interview with a VC-funded startup. They were looking for "experts in signal processing and machine learning experts". I figured my years of signal processing / ML experience might be a good fit. After the initial pleasantries, i get asked "How will you design a server that can detect anagrams of any word in the english language?"
For the life of me, i couldn't come up with a reasonable solution. I went home and hung my head in shame.
I joke about the interview process existing to inflate the ego of the interviewer, but it isn't that far off. Everyone wants to be Google so they have to "act like Google". Meanwhile Google stopped doing that five years ago, but they're not going to tell you that.
I don't know what happened at that Kaggle interview and I'm not going to single out the people or the culture there because it really could have happened anywhere. Maybe I could ascribe some of it to having so many people coming from academia, since we (the PhD and post-doc dropouts) tend to have such enormous chips on our shoulders. But it's just our interview culture in general, so much unconscious bias.
I really would love for interviewers to look at your resume, say "well you've been gainfully employed doing this for 5 years, let's delve into your soft skills and see if you are actually a communication / working fit for this particular team" instead of just trying to see if you meet some arbitrary algorithmic bar and hope that correlates with job success.
1) Cruft on all landing pages and having to click through to get to the comps page which is the site.
2) Annoying focus on exploratory notebooks. Inevitably they aren't powerful enough and people link through to external sites.
3) Forcing the use of 3rd party compute platforms to enter comps. Half the fun for me is messing around with my own ideas and this just gets in the way. These should be optional rather than required.
4) Poor incentives. Many of the comps have tiny prizes for the value of work that gets done. They're also concentrated way too much at the top. Unless there's something I want to try out, the expected value of participating is way too low to do it just for the giggles.
Some of the inherent value of the work for the small prize pool is more the opportunity of doing well and being recognized for that work.
Data Science, or trendy statistics, is inherently fun which is also what makes kaggle fun. Discovery in data will always be popular among people who love to solve problems.
To your other points, I don't disagree with you-- all the steps just to participate are becoming more work than its worth, at least for me. I do a lot of the same problems asked in kaggle naturally at work.
Obviously it's anecdotal data at best, but still curious, what are the results? Because it sounds very similar to the frequently given advice for software engineers 'push code to github to land a great job'.
Quite frankly it's a rather bullshit signal, since it's presence only tells you that the person spends all their free time on the computer. Maybe the know something, but a traditional interview will tell you that and more.
A person just outside of university does not have heaps of past jobs to show. So they should just leave it blank and describe their hobbies?!
An NCG should write more about class projects. Everyone has class projects.
If an NCG wants to put it down, fine. But don't color me impressed. Why should I select someone that spends their evenings alone tweaking out an extra 0.001% on a AUC curve, when I could conceivably get a more rounded individual with better team skills?
Specially this. I loved playing around with data there, but the moment most competitions have datasets with sizes in the order of tens of GBs, I'm out. I can always take the opportunity to learn AWS / Google Cloud processing methodologies, but that kills a bit of the fun of the first days.
Edit (1): Github https://github.com/crowdAI/crowdai
Edit (2): We're currently re-designing the whole site to look & feel better.
Firebase then acquired DivShot and people cried doom. Yes DivShot was shut down--after completely rearchitecting Firebase's CLI and Hosting to have DivShot's open source web hosting framework with the features of both product lines. The CEO of DivShot now runs Firebase Hosting's product line and has massive resources at his disposal to push his (great) agenda of simple and speedy static web services.
I agree for big companies though, even if that doesn't make a whole lot of sense (as there's no private info involved here, the competitions being in the open including the data provided by these companies).
The way she presented the news is that they will aim to advance that vision, but we'll have to wait an see how their vision pans out.
(Ben has been a great contributor, mind you.)
It's not unheard of for people who came on ex-post facto to be offered cofounder titles to sweeten the deal or for other reasons (titles are cheaper than equity, I suppose).
Edit: just rechecked and his LinkedIn notes Nov 2011 as the start date for Kaggle. Guess it was easier to condense it into one sentence than to explain the difference to TechCrunch's audience?
I hope the mission of the site doesn't change. I think Facebook did a great job with whatsapp and instagram. I expect the same with Kaggle.
https://empiricalci.com is a dashboard to keep track of your experiments & compare them on public benchmarks.
Are there really that many data scientists? I thought it was a niche specialty. Is there enough work for that many people?
Google: https://cloudplatform.googleblog.com/2017/03/welcome-Kaggle-... (HN: https://news.ycombinator.com/item?id=13822635)
Kaggle: http://blog.kaggle.com/2017/03/08/kaggle-joins-google-cloud/ (HN: https://news.ycombinator.com/item?id=13822727)
Do you mean kegel, the exercise?
> It’s a terrible name because most Americans pronounce it “kagel” [rhymes with “bagel”] which sounds like the pelvic floor exercises. Australians pronounce it “kaggel” [rhymes with “haggle”].
-- Anthony Goldbloom, http://www.intelfreepress.com/news/a-marketplace-for-data-sc...
-- Also, I see by following your link that the company was founded in Australia, so the pronunciation that rhymes with "haggle" is actually the original one! Cool! I'll use it.
Speaking as an American, this is an awful example. Why are "bagel" and "haggle" not supposed to rhyme with each other? What difference is there in Anthony Goldbloom's mind?
Haggle = Kah-gl
I wonder about your dialect, if for you, "bagel" rhymes with "gaggle".
Why anyone would pronounce it to rhyme with "bagel" makes no sense to me (same as pronouncing "gif" with a soft "G"); IIRC (and I am no linguist), in english there's a rule about how something is pronounced based on surrounding letters - and I think that double consonant vs singular consonant preceded by a vowel is one of those rules.
I'm sure there are exceptions, after all (it's english...) - but I have a feeling that if you looked at such words you would find the general pattern to fit.
Again - I am willing to admit that I really don't know what I am talking about; I'm not a linguist, I'm not an expert in english. I'm just some guy who last studied english in high school years ago...
There's really no excuse for any English speaker to interpret kaggle as kagle or kagel.