
There was a time when search engines were a thing, and it seems they still are - CrocodileStreet
http://boston.conman.org/2018/06/30.1
======
jacquesm
To me the main way of determining search engine quality is whether or not it
can find pages that I know exist because I've read them recently (they're
still in my history) based on the terms that to me make sense for that page.

By that metric there are no good search engines at the moment and the older
the pages the worse this effect gets. It's really nice to see Google do lots
of 'moonshots' and interesting tech demos but I'd be far happier if they fixed
search and kept their focus on that.

If a page doesn't show up in either Google or Bing for sensible queries then
that page effectively ceases to exist. The perverse incentive that these
companies have to _avoid_ you going to a page with relevant results as long as
you spend more time on pages with their advertising on it ensures that more
and more content will end up missing in action.

I'd be happy to pay for a search engine that:

\- actually _really_ works

\- also allows you to search past page 10

\- has a working API with reasonable limits

~~~
InclinedPlane
Yup. Google has been pretty awful for the past, oh, 5-10 years or so. They
used to be a high quality search engine, a way to index the depth of material
on the internet. Now they are just a semantic front-end for the most popular
content on the internet. Do you want to find something from wikipedia,
youtube, medium, the new york times, amazon, etc? Google does great. Do you
want to search for something that thousands of other people also search for
routinely? Google does great. This role is also the easiest to monetize for
google (through promoted links). But if you want to search for something
highly technical or very specific, google is now terrible, in fact it's worse
than it used to be 10 years ago.

~~~
wruza
I’m attributing the reason for this to the mass of mobile/social users who
changed the search market. Most people from these groups search for naturally
popular things like Saylor Twift [legs] or whatever it is in the trends right
now. I wish we had unpopular search engines that suck at pop. I also miss
directories – while not fully complete, they gave a good overview on
technology sections and many more that you cannot just google since you don’t
know it exists. Before the internet my family had a big encyclopedia
collection that I as a kid occasionally opened/skimmed and read about
something new. It is not possible with wikipedia and the internet since it is
now overwhelming and has no good place to start anymore. Our average attention
volume is so narrow (relative to amount of information) that it became a
product. Also, I miss the days when you had to investigate a topic, make
yourself fluent in it and enter ‘the club’ of highly interested people. Now
anyone can google a shallow pop-info on anything and pretend to be educated in
it in minutes. That degraded many good groups as a result.

~~~
bryogenic
Technical topic directories are alive and well!

[https://github.com/sindresorhus/awesome](https://github.com/sindresorhus/awesome)

------
reality_czech
Yahoo hasn't had its own search engine for years. In 2010, they became
essentially a frontend for Bing. In a later 2015 deal they switched the
backend to using Google.

Duckduckgo is a metasearch engine, technically, but mostly it delegates to
Bing.

As far as I can tell, there are only two and a half real search engines that
still exist: Bing, Google, and Wolfram Alpha. (I count Alpha as a half because
it's not really what most people are looking for.) I'm curious if anyone else
knows of other real search engines still in existence.

~~~
PhasmaFelis
Did Yahoo ever have its own search engine, technically? In the early web it
was a directory maintained by humans, which made sense at a time when the
total number of pages in existence on any given subject was no more than a few
hundred; I thought that when that era passed they went straight into licensing
other search engines' results.

~~~
greglindahl
The sequence was: Yahoo Directory -> Yahoo showing Google results -> Yahoo
builds their own search engine -> Yahoo uses bing

~~~
anothergoogler
Yahoo search was provided by Altavista for some time as well.

~~~
greglindahl
Yes, before they used google. It's a pretty interesting story, actually, how
Yahoo felt that they should use the best underlying search engine with a
"white label" approach, and how Google succeeded in eventually building a very
strong brand despite being invisible.

------
swsieber
I'd love a search engine that doesn't search content, but instead (edit:
primarily) searches for characteristics. For example

\- origination date of the website

\- number of theme changes

\- whether ads are present

\- estimated data usage / load time

\- general website size

\- type of website - personal, company, etc.

\- last update time

\- presence of javascript

\- browser compatibility (well, feature usage detection & browser
compatibility inference from that).

Mostly this is just me missing the websites of the early 2000's, and trying to
figure out a way to rediscover them.

And I'd probably want content on top of this. (Edit: e.g. search by topics)

Lastly, it'd be nice to restrict things to sub-genres, but I'm not sure. E.g.
when I'm doing a search I'd love to reference things related to micro-
controllers, and so maybe I'd put in Arduino to get into the realm. Sort of
what like google does for you without telling you. (tailoring your searches by
some magic context).

A man can dream...

Edit 2: Search engines these days seem to be answer engines, I want a research
engine.

~~~
flatline
> Mostly this is just me missing the websites of the early 2000's, and trying
> to figure out a way to rediscover them.

Apparently somethingawful and ebaumsworld are still around in some form, and
Slashdot of course. The thing is, these sites have largely been replaced by
better versions of themselves. That’s resulted in a lot of centralization into
a few sites like reddit, which is a combination aggregator and blogging
platform for people who are too embarrassed to attach their real name to what
they write, which is apparently a good share of the population. Then there are
sites like YouTube, LiveLeak, Facebook, that just offer something that no one
could or did in the 2000s. And with mobile and apps, there’s a level of
engagement that doesn’t leave much room for a thousand little sites with
quirky, regular, custom content.

~~~
swsieber
I'm not talking about the large sites though, I'm talking about the small
sites. Perhas I should have said feel, and not sites, of the early 2000's.

Right now Google thinks it knows what you want and when you search for things,
it returns the same few sites (mostly). You used to come across people's
personal sites into which they poured their soul. And while those exist less
frequently now, I bet they still exist.

------
throwaway2016a
I don't know if this is still the case but when I worked for Lycos 10 or 11
years back they owned Hotbot so including both in the list is a bit redundant.

They also -- despite being one of the first ever search engines -- didn't do
their own search in 2008. They outsourced to Yahoo. Though there was an effort
at the time to become a search engine again. I don't know if anything came of
it.

Edit:

It's hilarious they labeled Lycos as...

> Lycos—is still around!

Because even at the time I worked their the number one response I got from
people when I told them that was "They still exist?"

~~~
jccalhoun
What was it like to work there? I know it was a decade ago but I am still
curious what it would be like to work at these mostly forgotten companies that
still manage to exist.

~~~
throwaway2016a
Maybe an AMA for another time :)

But the short summary: I loved it. It was a really fun company with a lot of
great people. And we got to launch some really great products. Most of whom
were let go during the great recession (myself included) but it was fun while
it lasted.

On the flip side, every time we launched a product the news media treated it
as a novelty instead of a serious thing. Which was insanely frustrating. Some
of our tech was way ahead of its time.

------
tgb
So why aren't there more search engines these days? Google is great but we
constantly talk here about how it's losing its edge for certain kinds of more
specific searches like technical ones. So seems like there's room for engines
that are more tuned for special use cases and the ability to index web pages
has only gotten cheaper since Google started. Has the size of the web made
this impractical? Or do I just not know about these options?

~~~
SerLava
The problem is that Google and Bing really are duopolistic.

You'd think that a monopoly could just break instantly if all it required was
typing in a different URL.

But modern search engines are reliant on machine learning on mind-bogglingly
enormous troves of real human interaction data.

If you truly outsmart Google by inventing a better mousetrap, it's worth fuck-
all. Your solution will probably require more usage data than you will ever be
able to collect, because nobody will use your search engine while it still
produces poor results.

~~~
greglindahl
This is the conventional wisdom. On the plus side, it means that those of us
who want to build innovative search engines don't have much competition!

~~~
SerLava
Well, ML on 100 billion searches is probably going to more closely approximate
user intent than a living brain in a tank, because that brain doesn't know
what the hell "lkw attachment" means.

Looks like german-english bilingual logistics professionals are looking for
truck parts vendors, while teenage Americans are looking for hidden locations
of laser focusing and enhancement devices within the video game Wolfenstein
The New Order.

Do you envision a second route to that answer?

~~~
greglindahl
Well, the interface offered by blekko's Izik tablet search engine was that it
would show 2 categories in the answer, one related to automotive and one
related to games.

~~~
wutbrodo
Google has done this for a while, but at the level of recognized entities, not
at the level of individual queries. Is that how the engine you're referring to
worked too? Eg, would the "lkw attachment" example above have partitioned
results?

~~~
greglindahl
In Izik's tablet interface, each category was a separate row of results. So in
this example, there would be 2 rows, one for automotive parts and one for
games, and if you scroll horizontally in a row you get more results in that
category.

I think that's what you meant by partitioned results.

Google computes this internally but I've never seen them use it for anything
other than having diversity in their top 10 results.

~~~
wutbrodo
Yea it's explicitly separated, just not shown at the same time. For example,
if you search for "kings", there will be a couple of bubbles at the top of the
page with different entities: "Kings" (2017 film), "Sacramento Kings"
(baseball team), etc. Clicking on one of those will show you a list of results
that only pertains to that entity. This feature has been around for years, and
is part of the series of "things, not strings" features they've been working
on.

As I said, Google is pretty conservative about this and other entity-based
features, so they definitely wouldn't do it got something like "lkw
attachment". My question was whether Izik triggered this feature in such cases
or not.

~~~
greglindahl
That Google feature is like "related queries", that kind of feature has been
around for more than a decade. If you click on the "Kings (2017 Film)" link it
runs a search for [Kings 2017]... which just adds 2017 as a keyword to a
conventional search. No semantic search is involved.

Izik would show you film-related website results for the film category.

------
scruffyherder
If only things like desktop search became more popular, I prefer the
distributed and fragmented model.

As always I can shill for my utzoo early Usenet search that combines AltaVista
desktop + a few hacks to make a specialized search

[http://altavista.superglobalmegacorp.com/altavista](http://altavista.superglobalmegacorp.com/altavista)

I know it's niche, but it's great for anything historical from 1981 to early
1991 on the internet

------
ctrlp
"What is it with these nearly twenty year old sites still up?" Not sure but I
believe some of the answer lies with adtech distribution needs. The search ads
demand traffic, however astro-turf it be.

------
bakztfuture
Search is alive and well. I'd recommend reading some of the latest textbooks
and research papers on information retrieval. The industry was given new life
about 5 years back with knowledge graphs and has been reborn again with recent
innovations in machine learning, cloud computing, and data mining
technologies.

I'm working on a project now which has indexed billions of pages and answers
queries similar to a web search engine like Google:
[https://www.AtSign.co/](https://www.AtSign.co/)

The only difference is that it's a keyword + location based business contact
information engine but operates on the same principles as a real web search
engine client.

We're a small team and it would have been unthinkable even a years back to
launch something of this scale effectively ... But here we are! Amazing space
to be in right now

~~~
greglindahl
People search has been around for a long time, aren't there a bunch of current
competitors in the "business contacts" space?

~~~
bakztfuture
Many assume you know the name or website domain of the company beforehand.

Also, ours is keyword based. We index the site similar to how Google does. So
you can get very specific company matches and then export to CSV.

------
ChuckMcM
All of those are 'meta' search engines. There are three English language
indexes of any size available, Google, Bing, and Yandex. _All_ of the other
search engines go to one of those three for most if not all of their queries.
Some of the bespoke engines have local indexes of things like stack overflow
or wikipedia (both fairly easy to index) to save on the cost but all the
others use the big three (and mostly big two because Yandex pulled their
servers out of Nevada which made their ping time add 300 - 800mS of latency to
their searches).

Most of these used BOSS (aka Yahoo!'s old build your own search service API)
which was served off Bing as its index, although Google has started paying
more and more people to send their search traffic to Google.

Bing charges $7/thousand [1] for their "quality" searches and $3/thousand for
their so-so searches (not as current, the index doesn't go as deep, this used
to be what BOSS called until they turned it off in 2016).

That $7/thousand lets you give them up to 250 queries per second. For
reference that is about 1-5M uniques per day. It looks like 21M searches a day
but for English most of the searches come during the day from Europe and The
US so you're really only going to do 10 - 15M searches per day at that rate.
If you are clever you can cache results so for the same search you can just
re-use the cache rather than paying for another result. This is nominally
frowned upon but hard to defend against. If you manage to make a deal with a
phone supplier to be the 'standard' search engine a lot of queries will just
be 'facebook' or 'reddit' so you don't really need to actually query those.
You will want to find some ad networks to provide you ads. Bing will do that
too, but you will quickly figure out that if you could make money reselling
Bing results with Bing ads, that they could do that too so you've find the
margins pretty thin and negative at times. You'll have to pay for a machine
that is taking those queries, calling out to what ever ad networks you want,
and then filling out a results page (SERP) and sending it back to the
consumer. If you are just fronting Bing or Yandex that is pretty straight
forward to do with an nginx server on an AWS "large" instance.

If you negotiate well and market well you can be a dogpile or a startpage with
some schtick that makes you different than just going to Google or Bing. The
more privacy you afford the clients the more margin you give up (because you
can't sell that information as well).

Bottom line is that its a hard way to make a living.

[1] [https://azure.microsoft.com/en-
us/pricing/details/cognitive-...](https://azure.microsoft.com/en-
us/pricing/details/cognitive-services/search-api/web/)

------
wildpeaks
I wish Google would let us use _both_ Verbatim (use all keywords I entered as-
is instead of assuming you know better than I what I meant to be searching
instead) and filter by date (to get most recent results first) because right
now, you have to choose between relevant but outdated results, or irrelevant
but recent results, both being frustrating.

------
known
The world’s most valuable resource is no longer oil, but data
[https://www.economist.com/leaders/2017/05/06/the-worlds-
most...](https://www.economist.com/leaders/2017/05/06/the-worlds-most-
valuable-resource-is-no-longer-oil-but-data)

------
chrisseaton
What on earth is this title supposed to mean?

~~~
Arnavion
I think the title is supposed to be "metasearch engines", since that's what
the article is about.

~~~
dang
Ok, we'll use that. Thanks.

~~~
tgb
Seems like that's the wrong title though: the article is showing that search
engines (other than Google, Bing, Yahoo) are still a thing. Maybe "alternative
search engines are still a thing"?

~~~
spc476
Author here. Back in the mid 90s, search engines were a thing, and you had
many companies trying to provide search results in the emerging web. In
1996-97, it was the _in_ thing to run a search engine.

And then Google happened ...

~~~
dang
OK, we'll change the title back.

------
NeedMoreTea
Talk of metasearch engines and not mention Dogpile? They have been going since
'95 and they're still there. Doubt they get much traffic these days.

~~~
jmts
Dogpile is listed (though not by name) under the Webcrawler entry.

