

We're Entering The Worst Period In Modern Search History - rpm4321
http://www.readability.com/articles/pjqnipoq

======
rohansingh
> The minute the 2013 presidential inauguration started, it was as if the 2009
> inauguration had never even happened — at least as far as Google was
> concerned. Searching for 2009-vintage photos of the first family, of the
> performers, and even of the President himself became much, much harder on
> January 21st, as did finding contemporaneous news accounts of the first
> election. The information was still out there, but it had been pushed well
> below the surface. Finding it was more like excavating than searching.

"obama family", from January 2008 through December 2012, including images:
[http://www.google.com/search?q=obama+family&tbs=cdr%3A1%...](http://www.google.com/search?q=obama+family&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2008%2Ccd_max%3A12%2F31%2F2012)

One would have hoped the author would have at least mentioned this feature.

~~~
tveita
Really this is how it should work. It's a fairly safe default assumption that
people are searching for recent information if the query matches a current
news story.

As an example of the opposite: In the days right after the earthquake and
tsunami that struck Japan in 2011, I happened to search for "japan tsunami",
without quotes, in both Google and Bing.

The first page in Google was full of relevant news stories, with an extra link
at the top to Google's own crisis assistance page. The first page in Bing was
full of stories about 10 or 20 year old incidents. As I recall it, there was
nothing whatsoever about the 2011 event. It was almost disconcerting.

As an aside: why would the submitter use a readability.com link? That seems a
bit rude to the original page, even though the article was tripe.

------
klez
Thanks, but next time let me choose if I want to read the article in
readability or not. Having done this you are hiding the article's source and
making sites like hnsearch (that can track discussions based on links) less
useful.

Original link:

[http://www.buzzfeed.com/jwherrman/were-entering-the-worst-
pe...](http://www.buzzfeed.com/jwherrman/were-entering-the-worst-period-in-
modern-search-h)

~~~
drucken
Agreed. Also, at least on the original site I can read what is just text
without a Javascript requirement.

------
thaumaturgy
Eh, it's not so bad really: we've also entered an age where any programmer, if
they choose to do so, can roll their own search engine and build something
pretty OK in a reasonable amount of time. That's _amazing_ if you think about
it; back in 2000, I could not have imagined building my own Excite.

The kind of parsing that Facebook is doing (or planning to do -- I don't use
Facebook, so I don't know if it's implemented yet or not) isn't all that hard.
I do it with my own toy search engine. Comments like
<http://news.ycombinator.com/item?id=5099787> are compiled from results for
searches like, "top submissions with 'hackny'", or "recent submissions from
qz.com". I'm currently compiling a huge data set that should let me do neat
things like, "front page threads with comments from [user] and [user] from
last week" -- or just about any other logical query you can think of.

Technology is really damn cool sometimes.

~~~
rpm4321
Just curious, when you say that it's easy to roll your own web search engine,
are you referring to things like Yahoo BOSS and Common Crawl, or are their
other OSS projects / data sources out there I'm not aware of yet?

It's relevant to an important side project that I'm working on, but I'm new to
search, so any links or other info you might have would be greatly
appreciated.

Thanks in advance.

~~~
thaumaturgy
In my case, I initially just built an interface for ThriftDB's HN data; I've
recently started building my own data set in MySQL (partly because I wanted to
learn how to tune MySQL, partly because I'm starting to expand beyond what's
available in ThriftDB). I have a little custom crawler that regularly
retrieves the stuff I'm currently after.

I've been looking at using Sphinx
(<http://sphinxsearch.com/docs/current.html#about>) for full text search
capability on that data set, but it's not set up yet. It seems to be the most
recommended way to handle text search with a MySQL data store.

I wanted to have something that can do a good job of answering questions like,
"popular articles about that space mining company from last week" (because
that's how my brain works). I probably can't offer any useful information for
what you're looking for.

edit: Sorry, to more directly answer your question: "Just curious, when you
say that it's easy to roll your own web search engine, are you referring to
things like Yahoo BOSS and Common Crawl, or are their other OSS projects /
data sources out there I'm not aware of yet?"

Sometime around 2000, full text search of just the contents of your hard drive
was still a really icky problem. Programmers spent a ton of time coming up
with reasonable ways to do that. It blows my mind sometimes that now we can
fairly easily set up our very own SQL database with just a few commands and a
little bit of forethought, and then use any one of a number of programming
languages to interface with a standard communication protocol to connect to
other servers around the world and then download the data, use fairly decent
libraries to parse it all, shove it into our own database, and then quickly
search the database.

I wasn't really thinking about a particular API or service.

------
anonymouz
This whole article only makes sense when one restricts one's scope of the web
to "social web". The notion that Facebook and Twitter contain the web's most
valuable information seems ridiculous otherwise.

Except for information on trending current events, I personally almost find
the opposite to be true: Facebook and Twitter add lots of noise and little
information compared to what can be found outside of them. Posts on Google
Plus are more likely to be interesting to me than on Facebook, and blog posts
tend to be still more interesting than Google Plus posts. My impression is
that the further away I move from the walled garden "core" of the social web
(Facebook/Twitter), where everyone feels obligated to add his chatter to the
general noise, the more valuable the content becomes.

As long as Google is able to decently index the complement of these walled
gardens I see little reason for despair.

------
super_mario
If you exclude Facebook and Twitter from the Internet nothing of value would
be lost. I'm perfectly fine with not having that information indexed and
searchable, it would just get in the way. Also, if these disappeared there
would be a lot more real time information posted in hopefully more open and
accessible data stores (in the true spirit of the web) rather than in private
corporate silos.

~~~
rpd
I agree, you picked up on a point in the article which said, in summary,
Google solved a problem back in 2000. I believe that Google is still solving
that problem, and again in 2008, it also attempted to solve the "trending"
problem.

I think the trending problem is "solved" by Facebook and Twitter because of
their structure, so the next step is to search the structure for data. Google
doesn't do this.

So, I agree that nothing is lost, however, we clearly have two modes for
searching: an Archival mode, and a Trending mode. They tend to interfere with
one another when both results are displayed in the same list (Search Engine
says: "You clearly want the most recent stuff, right?") . The next step is to
now separate the two result sets so that more relevant information can be
found more easily within the context of the search query.

------
recursive
Is there any non-modern period in search history?

