Hacker News new | past | comments | ask | show | jobs | submit login
What happened to Google Alerts? (gwern.net)
151 points by gwern on Sept 25, 2013 | hide | past | favorite | 39 comments

Google Alerts pretty much only alerts me of news stories. Unless it would show up in Google News, new links never make it to my e-mail.

For example, a customer posted a nice video review of Improvely on YouTube today which I can find through Google limiting the date range to today with the "Search Tools" button. No e-mail from Google, despite the alert set up for the brand name.

On the other hand, I have one set up for "Surface Pro" and get daily e-mails when the big tech blogs mention it. Smaller blogs and forums, which are no doubt talking about Surface often too, never show up in those alerts. The e-mails even say "News" up top [1].

A few years ago, every mention would trigger an alert. Something did change. 3rd-party apps like Mention [2] alert me more often.

1: http://i.imgur.com/XeXDUG4.png

2: https://en.mention.net/

He mentions that they dropped the RSS functionality, but this is not the case at all. If you edit one of your existing alerts, you can change it to a feed.

Ref: http://www.google.com/alerts/manage


But the blog post OP cites provides some convincing evidence that they were gone for at least a moment last July. http://googlesystem.blogspot.com/2013/07/google-alerts-drops...

I guess they changed their mind and brought them back? Weird.

It may have had something to do with the Google Reader shutdown. The reason I say this, is that I just recently launched an RSSOwl instance, where I had populated RSSOwl with a list of feeds some time ago... including quite a few Google Alert RSS feeds. Strangely none of the Google Alerts were working, and when I started investigating, I found that all of them had a path that included something like


which suggests that, when those feeds were originally created, the delivery mechanism somehow involved GReader.

I went back to Google Alerts and re-copied each of the feed URLs and see that in the new URLs, the path looks like:


Not definitive, but it hints at some kind of connection.

As jrochkind1 says, they certainly did drop the RSS feeds, but restored them recently. I'll update that detail, but I can't help but suspect that I'll be needing to go back and update it again in the near future...

Great article, case studies of skilled hackers solving real problems like this one are so rare, wish we had more of this. Only comparable thing I can think of at the moment are Peter Norvig essays:



If anyone can recommend similar things, I won't mind :)

Like others have said, Google alerts only notifies you of news articles.

My company http://www.Alertification.com takes a more general approach and alerts you when something on any public website changes. For example, you'll get an email or text message when an Amazon price drop occurs, when a college class opens up, or even when concert tickets go on sale.

Pro tip: advertise an alert for when a popular product goes back in stock. (eg popular electronics)

I made a one off app for the nexus 4 release to account for low inventory, and got quite a bit of traffic for low/no effort.

Worth mentioning that gwern writes some of my favorite things on the web, and the wealth of information on his site is worth spending time pouring through. Always nice to see new stuff from him.

One thing that I hope people don't miss is that the problem "Google Alerts" solves is an information retrieval problem that is still unsolved (at least in the open literature ;-)

Conventional search ranking algorithms give you some score from 1 to 0 and the only meaning of the score is that a document with a higher number is more likely to be relevant than a lower number. The results usually are good at the top and they gradually get worse as you go down. You stop either when you're satisfied or when it feels like a waste of time.

Suppose, however, you wanted to search scientific papers or news articles about a topic and see the results ordered in time. All of a sudden the junky documents that were hidden are visible; the results are embarrassing even for world-class search engines.

You might say, "let's filter out documents that have a score less than, say, 0.8".

It doesn't work, at least not very well. You run into two problems. Search engines that crush TREC search evaluations have worse than 70% precision when the score approaches 1. Also, you'll see plenty of cases that are obviously a direct hit and the score is 0.5.

The difficulty of the problem is one thing, but the academic approaches people have taken in IR are another part of the problem. The methods used for most TREC evaluations are designed NOT to give search engines credit for "knowing what they know", because to score well on "knowing what you know" you need to do a super job on easy queries and recognizing they are easy queries, and if you don't do that, how well you do on hard queries won't shine through.

Another one is the whole idea that you need to normalize scores from 0 to 1. You don't. A while back I developed a topic similarity scoring system that just counted the number of common traits things have in common, rather than using a dot product or K-L divergence or anything like that. It turned out when the score was 40 you knew the results had to be good because 40 pieces of evidence is a lot of evidence. If you had 4 pieces of evidence, it was clear things that were iffy. I might have gotten "better" results in some sense with a more complex algorithm, but the scores from the simple count were meaningful -- from my point of view, the better algorithms are stupider because they are erasing their knowledge about their own confidence.

It's also a big problem in machine learning: often you use the SVM or Bayes or a neural network and you get some score and if you say the score is greater than some threshold and it is in the class otherwise it isn't. Because these algorithms almost always get the wrong idea about the prior distribution, you often make a "failing" machine learning algo very useful if you do logistic regression on the output and use that to convert the output into a probability score.

Anyhow, if you want to learn about this and stop making 'stupid' intelligent systems, stop what you're doing and read the issue of the IBM Systems journal about IBM Watson because that's what Watson is all about -- it converts all of the signals it gets into comparable probability estimates, and then uses decision theories to take actions that maximize it's utility function. (i.e. "business value")

Thanks for an interesting post.

The IBM journal publication - is it this one? http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=617771...

Although I'm not sure if that's the mentioned publication, you might want to take a look at https://uima.apache.org/.

More info on how Watson is using UIMA: https://blogs.apache.org/foundation/entry/apache_innovation_...

Yep, thanks for the link!

Isn't Google Alerts simply based on keyword/phrase matches? So if I want to get an alert for the keyword "recipe", it'll give me web pages that are about recipes, as well as articles that simply have the word 'recipe' (ie "Customer development is the recipe to startup success"). I don't think it ever marketed itself as a topic search alert system.

Yeah, but here's a Fermi problem the crowd.

How many web pages get created every day that contain the word "recipe"?

I'm certain you'd be buried in notifications if Google sent you an alert for every recipe, so it has to have some selection mechanism to send you particular recipes...

I thought Google Alerts were to tell you when specific phrases were encountered, like "MyMostlyUniqueFullName" or "MyCompanyName". "Recipe" doesn't seem that useful -- or at least doesn't match my M.O. for Google Alerts.

I'm guessing that most Google Alerts are "vanity alerts," as your examples suggest. The putative decline of the service could thus fit with the idea that such functions are meant to be subsumed by Google+.

I once tried to search for Avatar the movie, but SERP returns many Avatar the video game. I added -game, but realized that if a blog titles "Why I like Avatar (the movie, not the game)", it might be filtered out with the -game keyword.

But not the TV show? That's disappointing.

Well, if there's a ton of results, I'm sure they use some sort of way to filter out the unpopular ones, such as pagerank.

Nice points. This is pretty much why the old Google blog search failed, too.

The problem with your similarity scoring idea is that it fails badly in adversarial conditions (as I'm sure you are aware). It's easy to work around that failure, but then you end up using something like dot-product. I'm not at all convinced that normalizing the scoring is throwing anything away at all.

On another point:

Would you mind explaining this a bit more: Because these algorithms almost always get the wrong idea about the prior distribution, you often make a "failing" machine learning algo very useful if you do logistic regression on the output and use that to convert the output into a probability score.

In the case I'm talking about we were using RDF data that was curated, so we had no enemy.

Adversarial IR is a problem that came with Google and will go away with Google.

Bing has the problem because too they are trying to be Google.

If you accept Sturgeon's law,


and realize that it's more like "99.9% of the web is crap", you can look at it as a whitelisting problem rather than a blacklisting problem. If you search for "WOW Gold" you're going to get some article from Wired about how people are working 18 hours playing video games under horrific conditions in a Satanic mill somewhere in Shenzhen. And that's it.

Google can't whitelist because of business and political reasons. Smaller companies, particularly vertical focused, can.

As for the prior, I was working w/ Thorsten Joachims and an undergrad years with classifying papers from the physics arXiv. If you want to separate out astrophysics (which was the biggest category) from anything else, the number of negatives in your training set will greatly outnumber the positives, and under this situation the SVM gets the idea that it's safer to bet against astrophysics than it really is. If on the other hand you have a balance number of pos and neg examples, it's also getting a wrong idea.

We tried using the SVM out of the box and had lousy examples and then Joachims told us to try


and we found we could tune the cutoff to get performance that was much more satisfying.

Most machine learning books go on for hundreds of pages about Kernel theory and whatever and spend two or three pages on ROC analysis (and it's friends, like logistic regression -> probability score.)

A big problem with things like TREC and Kaggle is that need to pick one definition of "accuracy" so that a whole crowd of intelligent but unwise people can fight for the last 0.2% percent, but it doesn't lead to applications because in the real world the cost of some mistakes is worse than other mistakes, and you could use simple methods and ROC/logistic analysis to make something that maximizes business value with 1/10 the effort.

I've setup a bunch of Google Alerts within the past few weeks, and most of them have not been triggering when relevant (and very public) content is published.

I am using Google Alerts for a few months now. I set up a few alerts for physiotherapy jobs with some keywords like 'job' 'physiotherapy', 'vacancy' et cetera (in Dutch). But I only get once a month an alert and even then it only found some sort of news article about the profession, not anything about jobs. I cannot imagine that so little jobs are put online for this profession. Twitter search or plain Google search gives me more relevant information.

Not sure about the Google Alert if it is useful at all this way. I can manually search every day of course, but these are things I thought could be perfectly automated and done by Google Alert.

I created service that performs predefined search and notify via email if new results appears. You can search job aggregation websites. the service url is http://vertascan.com email me at sasha vertalab com and I will help to set up scanner for you.

I still receive the same alerts I did a few months ago, but now I receive more. Far more, to the point where some have become useless.

I had alerts like: "This" -"Not that" -site:notnews.com

The filters stopped working for me. I removed and re-added the alerts. Now I'm pounded with results. They were amazingly effective before.

I can't say I'm surprised, I don't see much of a business model in it, but I was surprised that it randomly happened to all of my alerts and I haven't seem a word about a change.

Google Trends appears to be next. They changed the interface under "Explore". Few featured and works poorly on tablets.

If you were curious, my shutdown analysis ( http://www.gwern.net/Google%20shutdowns ) estimated Trends had a 52% chance of surviving 5 years (dating from March).

I saw your study when it was released. Really excellent work. I suspect Trends will survive in some form but it has really changed over the last 24 months. It's becoming far less useful as a research tool, which seems to be Google's overall trend from transparent->opaque.

For a while, spammers were using data from Google Trends to find new phrases/topics to target for autogenerated spam. See http://www.stopthehacker.com/2010/04/05/google-trends-for-se... for example.

The new Google Trends UI is still useful for hot topics (Hey, Oracle just won the America's Cup!), but less useful for spammers.

Not just auto spam; a lot of sites haunted the trends board and wrote stories. The most hilarious example was when 4chan bulked searched for "justin bieber syphillis" and created a burst of stories across the web.

That's unfortunate. It was a really useful tool. The Hot Topics are not particularly useful, well, unless you want to know what People Magazine is going to print in their next issue.

Yeah, I'm a little worried that I'm going to have a demarcation problem when I do the 5-year followup - where Trends will have changed so much that I won't know whether I should consider it alive or dead. Even while I was researching it for its entry, I was coming across all sorts of complaints from different time periods about how the functionality was being constantly chopped down.

Ya it's all but dead these days. Private companies (like Moz with their Fresh Web Explorer) are trying to fill the gap.

Google certainly look to be actively walling their data off.

See also Google Keyword Tool which has took quite step backwards recently, and removing organic keyword sources "(not provided)" from Google Analytics.

Bad times for marketers.

Today I learned about Change Point Analysis. Thanks!

today I got a Google alert for a Twitter account I created 8 months ago.

The data seems a bit dated. RSS functionality for example had since been restored: http://searchengineland.com/google-quietly-brings-back-rss-f...

looks for a tl;dr version...

Isn't that what the first section is?

"Has Google Alerts been sending fewer results the past few years? Yes. Responding to rumors of its demise, I investigate the number of results in my personal Google Alerts notifications 2007-2013, and find no overall trend of decline until I look at a transition in mid-2011 where the results fall dramatically. I speculate about the cause and implications for Alerts’s future."

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact