Clearly there's really not a linear relationship between the first coordinate and the second coordinate. In other words, if you plot these four points and try to approximate them with a single line in the plane, you just can't do a very good job.
The person's manager suggested sorting the second coordinates in this list while keeping the first coordinates fixed, resulting in the following:
These points are still not collinear, but they certainly can be better approximated by a line than before. The problem is that this is simply a completely different set of points, so a linear approximation here implies nothing about the original dataset.
It's no different to having a list of key/value pairs and sorting the keys and the values independently, and hoping to get a meaningful result which is absurd.
It's an interesting watch - I'd recommend it if you're interested in learning about it.
(Heck, I'd recommend the channel. Tom does some great videos on a number of different topics.)
To see this most clearly, read up on estimation of distribution algorithms (EDAs), which generalise the ways in which GAs work.
I'm guessing that, in the sciences, that is a big no-no. Imagine if doctors, seeing that a new drug being trialled is failing to cure a disease, simply started chucking out the sick subjects until all the ones that were left where healthy, declaring the sick ones to be "noise" and the trial a success. Somehow, I don't think that would fly...
Data manipulation is different from data transformation. Manipulation changes the nature of data, transformation does not change the nature of data. PCA is data transformation, not data manipulation.
The hype is only real if you systematically work it into measurable process, not virtuoso jamming.
The same approach works with pig entrails: a bunch of people make predictions, the ones that fail go away, the ones that happen to succeed a few times "must work".
Or in stock market terms, "past performance is no guarantee of future results."
> Or in stock market terms, "past performance is no guarantee of future results."
Past performance isn't a guarantee, but under mild conditions it is very strong evidence that there will be future results.
I'm trying to remember where I read about this but, allegedly, there used to be a gentleman in New Orleans (if memory serves) who went around handing, at random, sealed envelopes with "boy" or "girl" written on a piece of paper inside, to pregnant women.
The idea was that he could expect to hit the right sex of the unborn child a few times and that those lucky hits would make people think he had some sort of gift for seeing the future. As to the misses, the women would be too preoccupied with having just given birth to raise a stink. Note the envelopes were handed out for free. It was his advertisement, see.
Which is exactly how we got from single cell organisms to human level intelligence. Of course it took 4 billionish years for that to happen. Life, hence intelligence is survivorship, that's the selection mechanism.
The rule of thumb is that if Google or Facebook releases some toolkit for something, the next 3-5 years are going to be a hellscape of idiots clogging up the channels for the relatively small number of people who may have an actual, legitimate use for these tools. But since the idiots are bored at work and can tell their bosses, "Well, Google does it this way, so it must be the best! We're important just like Google, right?", we have to languish through their tiresome drivel, and watch as they drag their companies through a quagmire, only to propose the next fad as the savior a couple of years later.
I'm no ML expert by any means, but I've seen several bachelor/master thesis and even ML competitions where ensembles performed best. Sure, this isn't necessarily aimless stirring and could combine models that really capture different aspects of the data. But often enough it's just several algorithms that do the same general thing, combined to achieve a slightly higher score.
Imho this is most relevant when competitions provide data that is not readable by humans (e.g. simplified: "classify these documents where all words are given as word IDs and never as actual strings").
To me this has a touch of pouring in data, stirring (build many classifiers and plug them together in an ensemble), and getting answers on the right side.
Optimizing hyper parameters goes in a similar direction, imho. I can really see an analogy to stirring
At least, most of the time you don't.
I mean more than they already are.
Reservoir computing. Some are critical of this method.
We actually might learn a lot from Tibetan tantric practitioners. It seems that Wall street guys and economists did.
Nice to see Ukrainians reinventing German Stab-in-the-back mythology from interwar period.
(In modern days also, all my friends who like football, their teams never lose a match. It's always the referee who is on the side of the other team).
No one wants to blame the Everyman and his masculine valor for failing the country. So the parts of the national leadership who started the war need to deflect blame, and they do it by attaching themselves to martial myths and posing as defenders of the Everyman. Not like those other dastardly effete leaders who oppose war, who are in a rhetorically weaker position because they're correctly acknowledging that individual valor doesn't play a major role in the outcome of the war. They're easily portrayed as trivializing valor and not glamorizing it sufficiently.
If you talk to old white American soldiers, you can hear the same thing about Vietnam (damn liberals!) and more recently Iraq (damn liberals!). I'm sure you could hear similar things from old Brits nostalgic for Empire.
Something tells me Rundall Munroe had a run-in with some over-eager data scientists recently.
Check whether a photo is of a bird.
For instance could an algorithm conclusively identify the birds in all of these pictures without having too many false positives?
This is not a rhetorical question by the way, I genuinely don't know the state of the art in this field. If it's indeed possible to do that today I'll be extremely impressed.
Convert string:"this small bird has a pink breast and crown, and black primaries and secondaries." into a photo.
If you wanted a general algorithm working on non-curated data (like tagging facebook photos for instance) I'm sure it would be significantly harder.
It's only ~50% accuracy, but the photos are terrible. Much worse than Facebook pics.
OTOH, this is classification into hundreds of classes, not millions like in the case of FB face recognition. (Although of course that can use the connectivity graph as a filter on that too).
The research group I am part of, Salesforce Research (formerly MetaMind), have a model that does this "accidentally" - and there's even an example image of a bird! The model is only meant to provide a caption for an image, not to segment the image into the various objects, but learns to "focus" on the bird as part of describing the image. For those particularly interested, check out the paper "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning".
Systems made specifically to segment an image into objects would obviously do far better. For an example of that, check out "CRF as RNN -
Semantic Image Segmentation Live Demo". There are many more systems of this style floating about.
Human experts can get enough clues from the bird shape and the context to do that in the sample photos. I doubt your captioning system can.
This is a good example of a standard problem in ML - underestimating the complexity of the problem domain.
You could argue that your system only needs to do the simpler task to be useful, and that's likely true. But if the goal is to approach human expert levels of classification, it needs to improve by at least a few levels.
I suspect getting it there would run into some interesting performance constraints, and possibly some theoretical issues too.
These are way better than anything a non-expert human can do. For example, it can distinguish between the Rhinoceros Auklet and the Parakeet Auklet.
I'm not sure what expert performance is, but around 94% is where humans top out on most tasks.
Also, the parent poster knows what they are talking about: https://www.semanticscholar.org/author/Stephen-Merity/337544...
For example if we could take a photo of Noah's Ark loading up every animal?
Do you just loop through each NN you have on each species?
There's also image segmentation as another poster has pointed to.
In the case of FB face tagging, they'd have learn an embedding space for faces, and when a new image comes in they'd place it in the embedding space along with all the person's connections and find the nearest neighbors.
See https://arxiv.org/abs/1503.03832 or the implementation https://cmusatyalab.github.io/openface/
More importantly, the progress that has been made in recent years actually builds very heavily on work since the early 1990s, so not only is it not complete, what has been achieved took a great deal longer than 5 years.
Now that we are in the range of having the correct hardware the whole "it's taking decades issue" will go away.
Is there a way to know the date (1425) was released ?
No, it is because estimating software tasks is difficult, the penalty for underestimating is that people think you are dishonest/flakey, and there isn't anywhere to get an education in how to do it well. The default advice given to junior engineers is therefore: "take your intuition and triple it." I hate that this is the state of the industry. My interactions around estimation over the past 5 years since uni have literally made me feel nauseated and near fainting on multiple occasions. I would love for Joel or Klamezius or Uncle Bob or someone else to fix it and produce a good course on how to create estimates.
Probably the best your going get is the book "Software Estimation: Demystifying the Black Art "
Even applying those techniques you get it wrong.
Most experienced software companies have adopted agile, and accept reductions in scope to meet deadlines as something that happens.
Of course, all this leads to bad blood between techies and business side: how long will it take? -> probably about 3 weeks, but this requires using a library we haven't used before, so in the worst case even 2 months -> what? so long? get it done in 4 days, this is required the next week -> no, that's not really possible -> make it happen -> it happens and it either sucks when it's delivered at all, so the deadline gets extended anyway to iron out all the bugs or it causes lots of problems in the future.
"OK you have implemented it as requested, but finally the customer does not like it, it needs to be slightly different. Can you do it quickly?"
Sometimes it is easy to adapt, sometimes next to impossible.
When the outcome is bad/not what humanity wants, we get/give a negative response and hope the outcome next time will be better.
"It made a few episodes of webcomics obsolete: xkcd: Tasks (totally, by Park or Bird?), xkcd: Game AI) (partially, by AlphaGo), PHD Comics: If TV Science was more like REAL Science (not exactly, but still it’s cool, by LapSRN)"
If not, what is your interpretation of the XKCD cartoon?
ML is powering most or even all self driving car efforts underway, powers online translation services, numerous vision projects and speech recognition besides winning competitions meeting or exceeding human performance on the same data.
I'm just as allergic to those that hype some technology as I am to those that will snarkily discard something with arguments that have already been laid to rest, in some cases multiple years ago.
Xkcd is fun, but isn't necessarily prescient nor does it have to be accurate, when this cartoon was published the writing was already on the wall and that's 3 years ago. It's fine to be skeptical about new technology but before you start criticizing it make sure that you have at least a rough idea of where things stand lest you end up looking foolish.
Sure, ML is abused and if we're not careful we will see another AI winter because of silly hype and ascribing near magical properties to ML. But at the same time snark, condescension and a-priori dismissal of what is most likely the biggest landslide in computing since the smartphone is - especially on a site that deals with both hacking and novelty - something that I would not expect.
Compared to the HN love for the next JS framework or language fad this attitude is surprising to say the least.
Like the neural nets which generated non-sense, but realistic looking C++ code.