As in, they knew who they were looking for from the start, and just worked with the data to find the known conclusion.
Also, there's no actual machine learning in this really, except calling out to a hosted language processing service...
Personally, I think this is "good" marketing.
Edit: I’m not sure how that was worthy of a downvote. Not everything is a conspiracy, but thanks for valuing contradictory options.
The world is awfully exploitative and cunning. The world of business doubly so.
Perhaps so, but that doesn't make it not true.
I could tell it wasn't going to be a super deep dive or deeply technical just from a look at layout/style of the post -- but it wasn't supposed to be super deeply technical right? Kind of just like a fun post.
They were extremely helpful in making the "conclusion" bits on the end of every section very obvious though (I aim to write like that as well, to save readers time), so I basically only read those, and skipped to the bottom where they unveiled and clicked the button...
I was most surprised at them glossing over the activity patterns with guesses based on their assumption of the target's sleeping patterns -- their guess of a time zone would have been stymied by someone who liked to sleep early/late or had an unusual work schedule, but there was no mention of that or their reasoning.
That all made sense given patrickaljord's comment about it all being one big Microsoft ad, though.
Looking for anomalies at the same time, or in sequence easily turns "What is going on?" into "Alright, why is there so much more stuff coming into the system, and why is that increased ingress causing increased memory usage per event?"
As you said - this only makes sense because it's a Microsoft ad and they could have arguably known the answer to start with anyway.
Given that this is focused on someone who's at least interested in software development, they made some pretty specific assumptions.
There's no way anyone (unless they somehow literally think there is nothing outside America) would just discount the concept of it being someone in another country, and/or with non-'regular' work hours.
But this is Microsoft afterall. The imagination of a brick. I suspect if you asked about remote work, they'd think you want to work on remote desktop.
Too bad for a (fun and clever) Microsoft advertising.
When saw the part with Azure Cognitive Services Text Analytics - I burst out laughing. Earlier, they quoted: Half of the time when companies say they need "AI" what they really need is a SELECT clause with GROUP BY.
Their motivation of using AI is even below that threshold.
Now awaiting some horse_js comment about this absurdity.
Looking forward to your response!
I'm very surprised they didn't find him based on the least quoted people's followers actually.
With a nifty and I think necessary touch of themselves still being in the dark; I very much doubt that the data they gathered can really reveal the author's identity, and the result they arrived on (Tom Dale) seems to largely originate from the "quotes one person far more than others" metric.
You could almost consider it an anti-metric: which intently pseudonymous author would dare to retweet their own nym? However, to counter this analysis you'd have to blend in with the average Twitter user in your niche, so it then comes down to a psychological game of "what would horse_js do?".
What good is an "ethics paragraph" anyway? Isn't it like prefacing an offensive statement with "no offense, but"? It's one thing if you have a disclaimer to protect the user, since the user is making the decision. It's another thing to make a disclaimer that just lectures the user about privacy, but does nothing to protect the doxxed. That just seems like a lame attempt to make the site less liable, like when people post copyright content with the disclaimer "I do not own this."
Are you volunteering all the time to set that up for me? ;-)
As someone commenting at 4 AM, this might not be a great assumption to make ;)
Perhaps. The data here is not 100% conclusive. There are some critical assumptions holding up our conclusion and [...] has never confirmed (or denied) our findings.
Perhaps the horse lives to tweet another day...
Ironically this highlights one of the main problems with how machine learning is used.
On a very high level, I think you can sum up machine learning algorithms as finding pattern in enormous heaps of noisy data ("training") then trying to apply the discovered pattern to novel data and using the result to guess the answer to a question you posed ("predicting").
The keyword being guess here. Unlike algorithms not based on learning, there is no guarantee that the answer is correct, because you usually don't know if the training data you supplied was sufficient or if the learned patterns were the ones you need. If you knew, you could just hard code the patterns directly and get rid of the whole learning overhead altogether.
Researchers know and communicate this. However, in the press, "AI" seems to be seen as almost the exact opposite: Not only can those fantasy AI systems answer questions about fuzzy human concepts with the precision of a computer, their answers are even better than the human ones - which is why the things we need to worry about are ethics discussions and humanity becoming obsolete...
This could be funny if it were just restricted to science fiction and public discussion, but it becomes problematic when "AI" systems are used to make life-changing descisions like setting insurance premiums or declaring persons suspicious to law enforcement.
I'd also argue it's how our brains work. Many times as we come to a decision we are going off of confidence, not true correctness. I'm the case of declaring suspicious person's, well by definition they are suspects based on confidence, not by truth. Even in court we determine verdicts based on human probabilistic confidence that comes from the evidence.
Hasn't the latter already happened?
I'm without link/source, but I seem to recall reading about there being tests of using a homanoid-looking AI-driven "attendant" at a border somewhere that would judge people based on looks/temperament and try to guess if they were lying about what's in their luggage.
Luckily some critical parts like issurance calculation is regulated (in some parts of the world) to have the requirement of explainable algorithms to prevent this kind of discrimination, so it's not as bleak as it's often made out to be. Of course it's also important that it stays that way.
How does this work? You pay for your function's run time in serverless so you wouldn't want to just have the function sleep for x minutes or however long it gets rate limited surely. I can see a way to do it using a service bus queue (push the message with a delay of x minutes, have the function set up to run on messages on that queue) but they specifically said timers. Does Azure let you programatically set the timer for a function from inside that function (eg. "Run me again in 3 minutes")?
If the article explained why they wanted to identify this account then fair enough. However you are going to end up in an ethical slippery slope were it will be used to doxx people who are controversial, troll, political dissidents and whistle blowers.
What more do you want? It's not like we can take away these dangers by not talking about them openly.
> We ran all of @horse_js tweets from the last 2 years through Azure Cognitive Services Text Analytics service. This service identifies keywords in phrases.
How was that necessary in comparison to a simple "split by whitespace, count occurences"? :P
Also, great website design. Simple and clean!
But it does have the tone of a type of “data journalism” that we should see more often.
I would appreciate a site that treats all news the way 538 treats politics.