
Shirt Without Stripes - elsamuko
https://github.com/elsamuko/Shirt-without-Stripes
======
DenisM
This problem is known as "attribution" \- you have a "no" or "without" in the
sentence, but you don't know where it belongs. One could (and one does) argue
that the problem cannot be solved with statistical methods (ML), especially
not in any domain where accuracy is required, such as medical recored
analysis: "no evidence of cancer" and "evidence of no cancer" are very
different things.

Zooming out, the language field breaks into several subfields:

\- A large group of Chomsky followers in academia are all about logical rules
but very little in the way of algorithmic applicability, or even interest in
such.

\- A large and well-funded group of ML practitioners, with a lot of
algorithmic applicability, but arguably very shallow model of the language
fails in cases like attribution. Neural networks might yet show improvement,
but apparently didn't in this case.

\- A small and poorly funded group of "comp ling", attempting to create
formalisms (e.g. HPSG) that are still machine-verifiable, and even generative.
My girlfriends is doing PhD in this area, in particular dealing with modeling
WH questions, so I get some glimpse into it; it's a pity the field is not
seeing more interest (and funding).

~~~
smoyer
While this is indeed an example of the attribution problem, I'd argue that
this particular query will never be solved. I don't search for a "shirt
without stripes", I search for a "solid <insert color here> shirt, or a
"<insert color here> hawai'ian shirt".

I'd be curious to see how many sentences with attribution problems actually
have other structural issues. If I want to write clearly and without
ambiguity, I rewrite sentences that have these problems. Why wouldn't I do the
same for search queries?

~~~
derivativethrow
Consider the query, "non-glass skyscrapers", which suffers from the same
problem.

What do you call a skyscraper like that if you want to refer to it? They
exist, but you can't find them using that search term on Google.

~~~
js2
I'd call it windowless:

[https://www.google.com/search?q=windowless+skyscraper&tbm=is...](https://www.google.com/search?q=windowless+skyscraper&tbm=isch)

~~~
DenisM
You missed the Seattle Tower Building. It has windows, but very little in the
way of visible glass.

[https://www.emporis.com/buildings/119453/seattle-tower-
seatt...](https://www.emporis.com/buildings/119453/seattle-tower-seattle-wa-
usa)

Windowless is a superset of glassless.

~~~
bscphil
To be clear (for the benefit of anyone else who reads this), the _windows_ in
the Seattle Tower are made of glass, but the exterior of the building is not
that modern all-glass-and-steel look[1]. This is a third interpretation of
non-glass I hadn't thought of and it took me a minute to figure it out.

"non-glass skyscraper":

1\. No glass used in the exterior construction at all -> implying no windows

2\. No glass used in the exterior construction at all -> implying the windows
are made out of something other than glass

3\. A skyscraper in which glass is not a prominent architectural feature, but
the building does contain features like windows and doors that contain glass.
(This comment)

[1]
[https://en.wikipedia.org/wiki/Wilshire_Grand_Center](https://en.wikipedia.org/wiki/Wilshire_Grand_Center)

------
rgovostes
The point that the author is making, in a very understated way, is that all
three companies have PR websites that breathlessly describe their advanced AI
capabilities, yet they cannot understand a very simple query that young
children can.

~~~
Alex3917
At least this is relatively innocuous. Until recently if you did a Google
Image Search for "person" or "people", it only showed white men.

~~~
bartread
I couldn't quite believe your comment when I read it so I did a Google image
search for "person" and the results weren't a lot better than you'd suggested.
Mostly white men, a few white women, a _very_ few black women, a handful of
Asians, and multiple instances of Terry Crews.

The net result of that Google search, combined with the "Shirt Without
Stripes" repo, leaves me even more unimpressed with the capabilities of our AI
overlords.

~~~
jedberg
If you really want to be disappointed, search for [doctor] and [nurse].

Unless things have really changed, [doctor] will be mostly white men and
[nurse] will be mostly white and Filipino women.

But don't blame the AI. The AI has no morality. It simply reflects and
amplifies the morality of the data it was given.

And in this case the data is the entirety of human knowledge that Google knows
about.

So really you can't blame anyone but society for having such deeply engrained
biases.

The question to ask is does the programmer of the AI have a moral obligation
to change the answer, and if so, guided by whose morality?

~~~
encom
What bias? Who is biased? Quick duckduckgoing indicates there are far more
male than female doctors in the US. So statistically, it would be correct to
return mostly male doctors in an image search. If you want a photo of a
specifically gendered doctor, it's not hard to specify. Not really seeing a
problem here.

~~~
jedberg
> What bias? Who is biased?

I would contend that society is biased. There is no evidence that says men are
better doctors than women, and in fact what little this has been studied says
that women make better doctors than men (and is reflected in the more recent
med school graduation classes which are majority women).

So it's a question of what you are asking for when you search for [doctor].
Are you asking for a statistical sampling or are you asking for a set of
exemplars?

> So statistically, it would be correct to return mostly male doctors in an
> image search.

And that's exactly it. The AI has no morality. It's doing exactly what it
should, and is amplifying our existing biases.

------
seiferteric
I have noticed in the past few years google results have become noticeable
worse for similar reasons. Google used to _surprise_ me with how good it was
able to understand what I was really looking for even when I put in vague
terms. I remember being shocked on several occasions when putting in half
remembered sentences, lyrics, expressions from something I had heard years ago
and it being the first! result. I almost never have this experience anymore.
Instead it seems to almost always return the "dumb" result, i.e. the things I
was not looking for, even trying to avoid using clever search terms. It's
almost like it is only doing basic word matching or something now. Also,
usually the first page is all blogspam SEO garbage now.

~~~
nikanj
Your search for "skiing Norway" mostly returns results for skiing in the
French Alps, because those pages have much higher visit rates.

Google is a dumbass nowadays, and regularly ignores half your search terms to
present you with absolutely irrelevant results, that have gotten lots of
visits in the past.

~~~
holler
I've noticed this too, and frequently wonder why there aren't new and better
search startups...

~~~
jedberg
There are, like DuckDuckGo. But the first complaint is usually "their results
aren't as good as Google" and that's because Google in reality still gives
better results because of their (lack of) privacy.

People want better results but don't want to be tracked, and those things are
in opposition to each other.

~~~
pbhjpbhj
It took me something like 6 years, but I've gone over to DDG. Their results
were poorer than Google's for me, so when I tried to switch I used to end up
repeating and adding !g to every search. I don't think DDG got better, but
Google results are bad enough that I think they're equal in quality now (for
me). I don't login to Google, have tracking disabled, use uBlock and pihole;
FF/Brave.

~~~
jedberg
> I don't login to Google, have tracking disabled, use uBlock and pihole;
> FF/Brave.

I mean, that's probably why they are equivalent for you. You've chosen privacy
over better results (which is a totally legit choice to make!).

~~~
pbhjpbhj
Well it's hard to tell objectively but it seems to have got worse without me
changing the privacy settings. I guess that's their quid pro quo though.

~~~
holler
I concur that both the UI/UX has gone down and the results themselves are
feeling less reliable.

Have you tried viewing pages past the first page? Often times it's just filled
with what looks like foreign hacker scam websites.

------
transreal
Searching "men without pants" versus "men with pants" gives much better
results.

This is a case where, while it makes sense to say the sentence, it's not a
common use of language, and at the end of the day, the search engine will find
what's written down, it's not a natural language processor yet (despite any
marketing).

Shirt stores don't advertise "Shirts without stripes - 20% off", they describe
them as "Solid shirts" or "Plain shirts". Men's fashion blogs talk about
picking "solid shirts" or "plain shirts" for a particular look. If I walked
into a clothing store and asked for "shirts without stripes", the sales person
would most likely laugh and say "er, you mean you want plain shirts?".

Plain shirts/solid shorts are the most common way to refer to these, and
people seem to be searching this way:

[https://trends.google.com/trends/explore?date=all&q=solid%20...](https://trends.google.com/trends/explore?date=all&q=solid%20shirts,plain%20shirts,shirts%20without%20stripes)

Regarding moving towards natural language processing - the "without" part is
not as important as knowing the context.

My kids will ask me to get from the bakery things like "the round bread with a
hole and seeds", which I know means "sesame bagel", or "the sticky bread",
which means "cinnamon twists" \- which I understand because I know the
context. Sometimes they say "I want the red thingy", and I need to ask a bunch
of questions to eventually get at what they want (sometimes it's a red
sweater, sometimes it's cranberry juice).

Unless Google starts asking questions back, I don't think there is any way it
can give you what you want right away.

~~~
pbhjpbhj
Thankfully, "men without pants" shows me exclusively men wearing pants,
underpants that is, as I'm in the UK.

Searching "pants" only shows me "trousers", that's a big fail for Google IMO,
I'm accessing google.co.uk.

------
wkyle
Vaguely similar to a joke from Ninotchka that Zizek often uses about the
difference between 'coffee without cream' and 'coffee without milk'. He
usually uses it to reference the concept of negation in the Hegelian
dialectic, but he's also mentioned the difficulty of computers understanding
negation in the context of the coffee/cream example.

The joke from Zizek:
[https://www.youtube.com/watch?v=wmJVsaxoQSw](https://www.youtube.com/watch?v=wmJVsaxoQSw)

------
albertzeyer
Why should it not be possible to solve this with statistical methods? The
model just needs to be able to understand the important meaning of "no" in
here, in the context of the whole sentence. I would guess that most modern NNs
from the NLP area (Transformer or LSTM) would be able to correctly
differentiate the meaning. The problem is, I think there is no fancy NN (yet)
behind Google search, and the other web searches.

To extend on that, you can think of the human brain as just another (powerful)
statistical model.

~~~
animalCrax0rz
"there is no evidence of cancer" and "there is evidence of no cancer" are two
different statements with different meaning, so it's more complex a task than
just understanding the importance of "no" in a sentence. It's involves
semantic analysis of the sentence. The paper I linked to below describes a
technique they call "deep parsing." Check it out for more context.

~~~
londt8
I tried to search for "cheese without holes" on Google and it yielded good
results. I think the problem here is that the query is something people would
rarely search.

~~~
jjnoakes
I just searched google images for "cheese" and "cheese without holes" and I
got roughly the same results (about 1/3 of the images had holes in both
cases).

~~~
Miraste
"pictures" and "pictures without color" show that it does get some of these,
although not the way I expected.

------
caust1c
My favorite was "What do vegetarians eat" which was broken for years:
[https://twitter.com/Caust1c/status/855193855422943234](https://twitter.com/Caust1c/status/855193855422943234)

~~~
gumby
I'm significantly more concerned about what _human_ itarians eat, especially
in today's economy.

------
GuB-42
Fun experiment on Google:

\- Shirt Without Stripes: shirts where the description contains both "without"
and "stripes". Example: a shirt without collar, with stripes.

\- "Shirt Without Stripes": a mess, with and without stripes, suggesting an
unusual search query. In fact, the linked article site is the first result in
web search.

\- Stripeless shirt: sexy women in _strap_ less shirts

\- "stripeless shirt": pictures of Invader Zim...

\- "stripeless" shirt: mostly shirts without stripes, but there are some
shirts with stripes that are described as stripeless...

The last one may give us a hint at the problem. If you have to mention a shirt
is without stipes, you are probably comparing is to a shirt _with_ stripes.
For example imagine a forum, some guy is posting a picture of a shirt with
stripes, I can expect some people to ask questions like "do they sell this
shirt without stripes"? Or maybe the seller himself may have a something like
"shirt without stripes available here (link)" in the description. So the
search engines tie "shirt without stripes" to pictures of shirts with stripes.

I remember an incident where searching for "jew" on Google led to antisemitic
websites. The reason was simply that that exact word was rarely used in other
contexts. Mainstream and Jewish source tend to use the words "jews" and
"jewish" but not "jew". And because Google doesn't look at the dictionary
meanings of words but rather what people use them for, you get issues like
that.

~~~
knodi123
> The reason was simply that that exact word was rarely used in other contexts

I had a similar problem when I was trying to convince a friend that homeopathy
was a complete and utter fraud with absolutely no basis in reality. She was
convinced that the internet's overwhelming consensus was that homeopathy was
valuable and regular doctors were control-freaks who make things up when they
don't know the answers.

To prove her point, she did an internet search for allopathic medicine and
showed me how the majority of the results were negative.

[https://en.wikipedia.org/wiki/Allopathic_medicine](https://en.wikipedia.org/wiki/Allopathic_medicine)

Just a humorous anecdote, not trying to start any conversations about the
relative value of different medical paradigms.

------
bentona
To me, the most interesting implication here is that this must not adversely
affect Google's ad revenue. If it did, they would surely fix it. This, in
turn, means that apparently we have been trained to interface with search
engines such that this is not a problem.

Sometimes I wonder how much my brain has changed to use search engines / how
much of it is dedicated to effective googling. Makes me feel like a cyborg.

~~~
Saaster
That sounds like an ad business version of the efficient market theory. E.g,
that can't possibly be a hundred dollar bill on the ground, because if it was
someone else would surely already have picked it up by now.

I think you're overestimating Google's sophistication.

~~~
jjeaff
Exactly. And how would Google know whether this would improve add recenue?
They have never created and tested it.

------
captainmuon
This is something that has annoyed me since the Altavista times. I want to
search for "madonna but not the singer", and find pictures of the holy icon. I
can do "madonna -singer", but that fails if the page mentions the word
"singer" a single time. Even if it is "This is a page about madonna statues,
but not about the famous singer."

It would be great if I could add negative keywords to a website, or mark text
as "don't index" or "index with a negative weight". But probably, people would
game this in ways I can't imagine.

There is probably a clever ML solution for this, like having meaning-vectors
for distinct ideas, and pushing pages that are close to one meaning away from
the other meaning. Classification is easy if you have a keywords like
"painting" and "catholic", but if it is "virgin" or "prayer" then it could be
either meaning, so there is never a bullet-proof solution.

------
Tade0
My Operating Systems professor (Tomasz Jordan Kruk, PhD) in college had an
appropriate anecdote for this:

"Humans usually don't intuitively understand the word 'no'. Please imagine a
non-pink elephant."

~~~
davesque
Maybe you mean, "Please don't imagine a pink elephant." Imagining a non-pink
elephant seems pretty easy.

~~~
natefox
Along these lines, I once heard somewhere that people do not process the word
'dont'. As a coach, I've had to shift my vocabulary to focus on the 'do's
rather than the 'dont's

Eg: If you're doing a sport where leaning forward is bad, avoid telling
yourself 'dont lean forward' as your mind only hears 'lean forward', therefore
reinforcing the thing you're trying to avoid. Alternatively, tell yourself
'lean back' or 'stay straight' or whatever you're focusing on for that
maneuver or drill.

~~~
rurp
Interestingly I've found this same approach from good coaches across
completely different niche sports. I imagine this phenomenon has been
discovered a number of times by various smart people. It certainly wasn't
intuitive to me, but since learning to use affirmative advice in real-time
sport situations, my advice got noticeably more effective.

------
cscurmudgeon
A few years back (in around 2012) I attended an NLP talk.

The theme of this talk was how they did a study that showed prepositions and
articles _do_ have meaning. A big deal was made out of the results.

I think things like this happens when people consider engineering
approximations such as bag of words to be the truth over time.

~~~
blahedo
I have a PhD in NLP (which is what we often call it on the CS side, but is
almost synonymous with CL="computational linguistics" on the
cognitive/linguistics side of the field). I remember a talk at our annual
conference, well-attended, perhaps around 2003 or so. The speaker was from one
of the labs that was really leaning into "big data", which was only just
becoming possible at that point, and argued persuasively that we should all
just throw out our parsers and formalisms—ditch the computational linguistics
side, basically—because we were on the edge of functionally infinite
(unsupervised) data, and supervised and partially supervised systems would
never ever be able to keep up. He presented performance numbers and how the
unsupervised systems needed a lot more data to compete with the supervised
systems, but that data was available, and he threw more and more and more data
at the system and it got better and better. (I no longer remember the specific
task he was using to illustrate his point.)

There were gasps in the room and a kind of depressed acquiescence: geez, he
might be right. And the pendulum indeed swung in that direction, hard, and the
field has been overwhelmingly dominated by the statistical machine learning
folks on the CS side of the field, while the linguists kind of quietly keep
the flame alive in their corner.

But I thought then, and I still think now, that it really just was another
swing of the pendulum (which has gone back and forth a few times since the
birth of the field in the 1960s). Perhaps it's now time again for someone to
ring up the linguists and let them apply their expertise again?

------
meritt
The point of this isn't asking how to apply boolean search operators, it's
showing that the largest AI-focused companies in the world absolutely suck at
NLP.

~~~
packetlost
Why would you really apply NLP to a search engine though? Generally speaking a
weighted keyword search is good enough 95% of the time and requires
significantly less resources to perform.

~~~
binarymax
I work specifically in this field with clients, and deliver training on
applying NLP to search.

You’d be surprised how effective NLP is for use when identifying query intent,
and pulling out modifiers that should apply as metadata filters.

Weighted keyword search works a lot, but it fails hard for many long tail
queries (especially in e-commerce and other attribute heavy domains).

IMO there really isn’t a good excuse for these firms to fail at queries like
this. The query itself isn’t particularly difficult when using a decent NLP
stack and following well known practices.

~~~
skewart
If it's technically possible then presumably it's a deliberate product choice
to not have better search results for "shirt without stripes". And that seems
entirely plausible.

Google is already by far the most widely used search engine, so they don't
really need to innovate or improve the search product very much in order to
attract and retain users. Presumably capturing more advertising spending from
the companies paying for ads is a bigger priority.

Microsoft under Satya Nadella has been all about enterprise and cloud, and I
doubt Bing is a strategic priority any more, so it's not surprising that they
wouldn't put a lot of resources into making it better.

Amazon is a little surprising. You'd think they'd have a lot to gain from
making it easier for people to find what they're looking for. But maybe less
than perfect search results are deliberate? Maybe it's like how supermarkets
put basic items in the back of the store and high-margin impulse buys in the
front - so you have to walk past chocolates and chips if you want to buy a
carton of milk.

If Amazon is deliberately nerfing search results then maybe Google would stand
to benefit from having better shopping-related results - people would get
frustrated trying to find a shirt without stripes on Amazon and just use
Google instead, letting Google profit from advertising in the process. But
maybe people selling shirts aren't willing to pay much for ads, so there isn't
much money for Google to make by getting better at finding specific types of
shirts.

I dunno if any of these conjectures are anywhere near accurate, but it's
interesting to think about.

------
ChuckMcM
I love this. It is such an easy to grasp example of what is "wrong" with
search. Historically, searching was keyword based so documents with "shirt"
and "stripes" would rank highly, even though none of those pages had the
keyword "without".

As humans we know immediately that the search is for documents about shirts
where stripes are _not_ present. But the term 'without' doesn't make it
through to the term compositor step which is feeding terms in a binary
relationship. We might make such a relationship as

Q = "shirt" AND NOT "stripes"

You could onebox it (the Google term for a search short circuit path that
recognizes the query pattern and some some specific action, for example
calculations are a onebox) and then you get a box of shirts with no stripes
and an bunch of query results with.

You can n-gram it, by ranking the without-stripes n-gram higher than the
individual terms, but that doesn't help all that much because the English
language documents don't call them "shirts without stripes", generally they
are referred to as "plain shirts" or "solid shirts" (plain-shirt(s) and solid-
shirt(s) respectively). But you might do okay punning without-stripes => plain
or to solid.

From a query perspective you get better accuracy with the query "shirts
-stripes". This algorithmic query uses unary minus to indicate a term that
should not be on the document but it isn't very friendly to non-engineer
searchers.

Finally you can build a punning database, which is often done with
misspellings like "britney spears" (ok so I'm dating my tenure with that :-))
which takes construction terms like "without", "with", "except", "exactly" and
creates an algorithmic query that is most like the original by simple
substitution. This would map "<term> without <term>" => "<term> -<term>". The
risk there is that "doctors without borders" might not return the organization
on the first page (compare results from "doctors without borders" and "doctors
-borders", ouch!)

When people get sucked into search it is this kind of problem that they spend
a lot of time and debate on :-)

------
c3534l
If you select "I don't like this recommendation" for a video on youtube, you
will get to provide feedback on why you did so: either "I don't like this
video" or "I've already watched this video." I've pressed the latter on
literally thousands of videos at this point, and after well over a year of
this, YouTube still hasn't figured out that I don't want to be recommended
videos that I've already watched.

Likewise, Google says I should log into their website for personalized search
results, but after years of always clicking on Python 3 results over Python
2.7 results, it never learned to show me the correct result.

Eventually I realized that personalized recommendations are more or less just
a thin cover for collecting vast amounts of data with no benefit to the
consumer. I believe we have the technology to do better, but we don't use it.
In fact, we seem to be using it less and less.

~~~
disqard
My experience has been that most ads that "follow me around" peddle the exact
same thing I purchased most recently. No, I will not buy a second electric
kettle again, let alone the exact make and model that I now own. I'd rather
have generic ads, so I might discover some new product that I could (at least
in theory) actually buy.

------
ggggtez
Perhaps, but would you really say "Hi, I'm wearing a shirt without stripes"?

It's a completely artificial construct. Simply the fact that this hacker-news
entry is the #1 search result shows that _real human people_ do not perform
this search in significant quantity. But we can quantify that with data to
backup the assumption [1][2]. When people want to buy a shirt without stripes,
they do not describe the shirt by what it _doesn 't_ have.

In fact, it's trivial to cherry pick a random selection of words that on the
face of it sounds like something a human might search for, but it turns out
never occurs in practice. Add to that the fact that the term is being searched
without quotes [3], which results in the negation not actually being attached
to anything.

Do you go to a store to buy it along with your Pants Without Suspenders, Socks
Without Animal Print, and other items defined purely by what they don't have?

[1]
[https://trends.google.com/trends/explore?geo=US&q=%22white%2...](https://trends.google.com/trends/explore?geo=US&q=%22white%20tshirt%22,%22shirt%20without%20stripes%22)
[2]
[https://trends.google.com/trends/explore?geo=US&q=%22plain%2...](https://trends.google.com/trends/explore?geo=US&q=%22plain%20shirt%22,shirt%20without%20stripes)
[3]
[https://trends.google.com/trends/explore?geo=US&q=plain%20sh...](https://trends.google.com/trends/explore?geo=US&q=plain%20shirt,shirt%20without%20stripes)

------
VohuMana
Is it just me or does it feel like in the last couple years all of these
companies have had the quality of their search go down? I've noticed large
portions of my search will go ignored and it will just grab the most popular
terms in my search rather than searching all terms.

------
rbetts
This is also confusing what you search for vs. what the vendor thinks you will
buy. Product catalog searches often _intentionally_ return items outside your
search parameters.

------
shanecleveland
I would never search for something this way. If I wanted to find a 4WD car, I
wouldn't search for "cars without 2WD."

Likewise, here, I would search for solid-colored shirts.

And these services are limited to the content/terminology utilized by the
cataloged sites/products.

If I am selling a "black shirt" or a "solid black shirt," it is not google's
job to catalog it as a "shirt without stripes," unless I advertise it as a
"black shirt without stripes."

I would use natural language to test a services' NLP ability.

~~~
jorvi
But it what if you just hate shirts with stripes but do like polka dots or
other patterns? You can do a fancy advanced search query with OR and EXCLUDE
tricks but that is not what this post is trying to emphasize.

~~~
shanecleveland
Let's assume the point of the OP is that Google sucks at understanding this
search that most people would logically understand.

But let's also logically assume that most content on the web is telling the
visitor what their page or product is, not what it is not.

I would expect a shirt to be advertised and described as what it is: “black
shirt” not “shirt without stripes.”

So if a content creator does not include the exact term “without stripes” in
the description of shirt, then you are relying on google to infer meaning on
your behalf and the content creator.

Now, this is relatively inconsequential for a shirt and perhaps not well
representative, as a fashion-related searches a different than many searches.
If I search for “news without coronavirus,” should I expect only articles that
do not refer to coronavirus? I wouldn’t.

If I was allergic to peanuts and I searched for “food without peanuts,” I
would expect results from content creators and sellers of products who took
care to include the term “without peanuts,” because they are advertising their
product as safe for those with peanut allergies. I would not rely on google or
amazon to make that determination for me.

Both for google and individual sites, there are better options to further
narrow results. If you don’t want to narrowly define your result to a specific
pattern or color, the first search more broadly and then used advanced
settings or filters to omit terms and/or include others.

------
wooders
We're a company coming out of the YC W20 batch working on the product
attribution problem [http://glisten.ai/](http://glisten.ai/).

There's too many products nowadays to be manually attributed (e.g.
pattern=stripes), making it hard return good results even with entity
resolution for queries. We train classifiers to categorize products, including
what something is _not_ , using their images and descriptions.

------
schmichael
Google Photo's search is a similar source of amusement for me. While it's
quite good, it also fails fairly regularly and sometimes amusingly. For me
"turtle" includes understandable mistakes like fish, a snail, and a rock that
does look a bit like a turtle. However "turtle" also includes this, a picture
of sequined slippers reflecting light?!
[https://i.imgur.com/4aSlA4B.jpg](https://i.imgur.com/4aSlA4B.jpg)

I'm guessing one of those reflections looks like a turtle? Or maybe a pattern
on the floor, wall, or rug?

Although there are examples where I'm unsure if the AI is dumber than my 4yo
or smarter than me. This is a result for "truck":
[https://i.imgur.com/JcgXZAG.jpg](https://i.imgur.com/JcgXZAG.jpg)

Even (especially?) my 4yo knows those are Brio trains, not trucks. However,
trains have components called trucks!
[https://en.wikipedia.org/wiki/Steam_locomotive_components](https://en.wikipedia.org/wiki/Steam_locomotive_components)
I'm unsure whether or not any of the wheel assemblies on these toy trains are
considered trucks, so either the AI is extremely smart or slightly dumber than
a 4yo.

~~~
dicytea
I can almost see the left slipper looking kinda like a head of a turtle

------
antman
A good display of the current state of search engines.

~~~
nmstoker
Although search within apps can be even worse. I was looking through Google
Movies the other day for the film 2001 and instead it swamps the results with
those from the year 2001 - one could argue that there are lots of people who
are massively keen to buy films based on the year of production, but I suspect
it's better to satisfy those looking for years in the title first and then
after that brief interruption list the year based results, rather than the
other way around. Similarly looking for "The Book of Why" on Audible is
dismal: even when it's in quotes it isn't until the 42 result that the exact
match shows up, with a load of useless not-obviously connected results turning
up first. Both these failures interest me as they have a tangible financial
implication (I clearly had money to spend) and yet they remain unfixed.

~~~
noworriesnate
I realize the obvious fix for the Audible problem is to interpret double
quotes as "only search for this phrase," but I wonder if an alternative
solution would be do TFIDF for combinations of wordsm, not just individual
words. For example, the search crawler could watch for the phrase "the book",
"the book of", "the book of why", "book of", "book of why", and "of why", and
weight the search results accordingly.

~~~
nmstoker
Yes, that would be a better user experience as: 1\. not everyone will think to
use quotes 2\. it would probably be more resilient to minor user errors too
(eg if it had been misused misheard as The Book of Way you'd still have a fair
chance of it returning the desired result fairly high in the list)

------
joshmn
Proof that SELECT with GROUP BY doesn't work if your tags aren't correct.

Joking aside, it doesn't surprise me that this isn't being picked up — aren't
most of these AI teams more R&D than actual public-facing? Maybe I'm just
cynical though.

------
dEnigma
This contrasts with my query of "guys in jean jumpers singing too ra loo ra
loo" a few years back, which Google correctly identified as "Come on Eileen"
by Dexys Midnight Runners. To this day my favourite search experience.

------
js2
If it were butter, you'd want an unstriped shirt. If it were provolone, you'd
want a non-striped shirt. But because it's neither of those, I think you just
want a "shirt" or maybe, a "plain shirt". Indeed, I get much better results
with either of the latter two search terms. There's no need to mention stripes
at all, since no pattern is the default state, isn't it?

------
lifeisstillgood
Weirdly, searching for

'shirt no stripes'

on Google returned _this_ web page at top of the organic results.

So at some point, searching for a shirt online will involve this conversation.
Even more confusing.

(Although I expect my filter bubble will play a part in that)

~~~
shanecleveland
because google does understand that no and without are interchangeable. But,
understandably, it does not correlate "shirt without stripes" as being the
same thing as "solid-colored shirts." Why, because no one advertises or
describes a solid-colored shirt as a shirt without stripes and no one searches
that way. It's an irrelevant point, in my opinion.

~~~
lifeisstillgood
my point was that _this very web page on HN_ got to top of Google search in 42
minutes or less on this search term

''' Shirt Without Stripes | Hacker Newsnews.ycombinator.com › item 42 mins ago
- The point that the author is making, in a very understated way, is that all
three companies have PR websites that breathlessly describe their ...

'''

~~~
shanecleveland
Sorry. I meant the assumed point of the OP.

------
throw345hn
Only slightly related but a couple of years back I got an alexa as a gift.
When you open the alexa app, they had the option to add list of todos as a
reminder. The first thing I did is to say something like - Alexa, add a
reminder to get milk and eggs and paper. The app literally added a single item
like this - milkANDeggsANDpaper.

After that I facepalmed myself and turned it off.

~~~
lokedhs
Every once in a while I try the voice recognition by trying to speak normally
to it. Normally, as is saying things like: "please set a reminder for five...
umm... no I mean 6 o'clock".

Normal humans do this all the time, and if I can't do it speaking to it
becomes incredibly frustrating to the point that I never want to do it again.
I don't want to plan ahead what to say before I say it.

Granted, it's been a couple of years since I last tried so maybe they're
better now.

------
kator
Joe: "Hey is Lisa back from vacation?"

Larry: "I saw a red Lamborghini in the parking lot!"

Most people will assume Lisa is driving a red Lamborghini and back from
Vacation, meanwhile, all the bots are searching for Lamborghini vacations and
trying to figure out what's going on in the conversation.

------
partomniscient
"shirts without stripes" results:
[https://www.amazon.com/s?k=shirts+without+stripes&ref=nb_sb_...](https://www.amazon.com/s?k=shirts+without+stripes&ref=nb_sb_noss)

"shirts -stripes" results:
[https://www.amazon.com/s?k=shirts+-stripes&ref=nb_sb_noss_2](https://www.amazon.com/s?k=shirts+-stripes&ref=nb_sb_noss_2)

So basically the AI doesn't convert "without x" to "-x" even though the basic
capability needed is there. This is why AI is a hard problem, especially when
it meets the real world.

It's 2020 and we're still quibbling about the terminology used in SQL, what
did we expect?

------
mabbo
It's not enough to say "Oh, we should add a rule that 'without' means negate
the next word" because that only applies to this one situation, in this one
language. Let's generalize the problem: We aren't correctly translating from
English (or other spoken languages) to Computer/Logic.

The state of the art in machine translation (from what I've read at least) is
translating from language-A to a language-less "concept space" and then from
there to language-B. Could that be done where the output language is something
a search engine can use to find what you want correctly?

Given that pattern, I suspect we could see much better results in cases like
this.

------
slaymaker1907
I think that this is actually really encouraging in showing that we still have
a ways to go in improving search engines. A lot of people treat search engines
as a solved problem, at least for non-question answering aspects.

~~~
hrktb
I am amazed at the amount of things people are willing to treat as "solved
problem".

At a point there was a TED talk explaining social networks were a solved
problem now that facebook was dominant. Recycling was seen as solved problem
until it wasn't etc.

I wonder how many actual "sovled problems" we have.

~~~
scabarott
Reminds me of this: [https://www.amazon.com/End-History-Last-
Man/dp/0743284550](https://www.amazon.com/End-History-Last-Man/dp/0743284550)

~~~
disqard
Thank you for that reference. It's astonishing how "experts" can construct
complex frameworks for justifying their personal failure of imagination.

------
alanbernstein
Or you can do this:
[https://www.google.com/search?q=shirt+-stripes](https://www.google.com/search?q=shirt+-stripes)

~~~
sulam
The '-' operator stopped working for me with Google a while back. Not sure
why, it's fairly frustrating.

~~~
speedgoose
They want to always return results even when they don't have any. So they will
remove some of your query. This last week Google even returned months old
results when I asked for the last 24 hours. It's so frustrating. I wish the
Google team that decided this nonsense have to migrate Python 2 to Python 3
projects until they retire.

------
varelaz
Problem here is not about negation, but there is no product that's described
as "shirt without stripes". Stripes and shirt will come together in a
different sense, since Google cannot find whole phrase it has to find parts.
For example check for "shirt without shoulders"

------
imgabe
Humans can kind of make some assumptions based on context, but it's really
just a poorly defined, vague query.

What if you walked into a store and asked an associate for a shirt without
stripes? What would you get?

Probably some further questions for clarification. What about checked shirts?
Floral prints? Plaid? Do you want no pattern at all? T-shirt? Polo shirt?
Dress shirt?

Granted, the AI results are particularly bad because they give you the one
thing that you specifically didn't ask for, but that's also the only
information you provided. Defining a query in terms of what you don't instead
of what you do isn't going to go well.

What if you went to google and said "Show me all the webpages that aren't
about elephants"? Sure, you'd get something, but would it be anything useful?

------
raindropm
Apparently, this kinda works in Thai language too(and I think other language
also) The search keyword is "เสื้อไม่มีแถบ" which is literally translated as
'Shirt without stripes'. It's common words to speak, unlike 'without' in
English.

The result, of course, show shirt with some kind of stripe, albeit not
prominent like the English one.

------
pugworthy
Interestingly, Google can handle these searches just fine...

"birds without flight"

"cars without wheels"

"cats without tails"

"dogs without hair"

"intersections without lights"

"poems without rhyme"

"shirts without collars" (also "sleeves", "shoulders", "buttons", "logos",
"pockets", and more)

~~~
lolc
That's because all these things are readily labelled by humans as such. So
they don't have to understand the sentence. Just match it.

~~~
pugworthy
I somewhat agree, but for me personally, "car without wheels" and "shirt
without stripes" are about equal in terms of usage regularity.

I'd also say nobody uses the phrase "birds without flight", but instead
"flightless birds".

I'd imagine people say "hairless dogs" a lot more often than "dogs without
hair".

And the shirt examples show that there are many uses of the lead phrase
"shirts without" that work just fine, but for some reason "stripes" really
stands out far beyond the others. "Shirts without shoulders" is kind of a
bizarre term to match, when almost all search results show "off the shoulder"
or "cold shoulder" as the thing being matched.

~~~
lolc
I have to agree with you on "birds without flight". Google does manage to
infer "flightless" from "without flight".

------
hombre_fatal
This is a good example of the bar HNers must have these days when they
bafflingly assert that Google is somehow getting worse from what they
remember.

Google has gotten better, it's just HNer expectations that have changed as
they expect more and more magic.

For example, the subtitle on the repo is "Stupid AI" when this query has
_never_ worked in these search engines, and it won't anytime soon.

You'd think the technical HN crowd would be more advanced than to make the
same mistakes that (they complain that) stakeholders/users/gamers make when
they mistakenly think everything is much easier than it actually is. Things
aren't "stupid" just because they can't yet read your mind.

~~~
Nextgrid
> has never worked in these search engines

I would expect there to be an e-commerce site or blog post somewhere
containing a page with the exact title "shirts without stripes" and I'd expect
it to be the first match.

------
rjurney
That darn conceptual search sure is hard :) The technical approach to
achieving this involves a sentence embedding that then uses vector search to
match documents based on a distance metric like cosine similarity. If you
encode a description of a shirt in an embedding trained on all shopping item
descriptions, it should match up with the search query. The trick is in
getting a sentence embedding from a short query to match a longer description
in a document description - long summaries of text in embeddings tends to
average too much and cloud meaning. The other problem is including the vector
search feature without screwing up other searches.

------
pvtmert
Since there is no context is provided, I do not expect it to understand
prepositions itself.

Given exact query to human, they create environment thus context themselves.

It may also depend on whom you are asking to. For example, myself, entering
this site to find out news about software & tech. Also since 'Stripe' is a
company name, I assumed link will get the list of shirt shops who do not
accept Stripe as a payment method/provider. (Thus some kind of protest related
thing)

I literally thought about that yesterday and did not see the page thinking
"That's too much for tonight".

Now seeing topic is somewhat very different.

------
ltbarcly3
Here's another fun fact about how commerce search engines work (I spent a
couple of years on this):

Negations sidestep almost all of the algorithms that try to provide an
improved result set, and fall through to pure text relevancy. So try searching
on amazon for shirt, then search for: shirt -xkxkxkxk. Since xkxkxkxk doesn't
match any documents, the negation should have no effect, but it does, the
effect it has is to sidestep all the fancy relevancy work and hardcoded query
rewrite rules, domcat rules, demand and sales/impression statistics etcetc,
and give you basically awful search results. You don't even get shirts.

------
kinkrtyavimoodh
On a meta note, I am a bit tired of HN submissions being used more as "Writing
Prompts" rather than as links to substantive material.

This thread is an excellent example. The author of the linked page didn't have
the decency to actually make a substantive point, instead sharing three
screenshots and posting the link here, chumming the HN waters with the kind of
stuff that brings in the sharks from far and wide.

Bashing on big cos: Check

Vague pronouncements about AI: Check

Generic side-swipes about 'ad revenue': Check

This is why a coherent thesis is required to even initiate a proper
discussion, because in the absence of that it invariably devolves to lowest-
common-denominator shit-flinging.

------
crimsonalucard
In terms of AI, the following is literally the best I have ever seen and it's
not even done by a professional (meaning you can make it too):

[https://aidungeon.io/](https://aidungeon.io/)

------
twodave
I'm actually not sure I expect this much from a search engine. Typically there
is going to be a useful word to describe what you want without having to hope
it can understand "no" or "without" (for example, without stripes -> "solid"
or even "NOT striped" in many cases).

Anyone with a programming background knows there is an art to forming useful
search queries--it is an acquired skill. I'd personally much rather the engine
bring back predictable results given mundane rules and keywords than attempt
to understand sentences using an opaque method of understanding.

~~~
elicash
I think Google should design _primarily_ for people who DON'T know a ton about
crafting queries, even if it's at the expense of a much smaller number of
folks who are experts.

That said, this seems like an obvious place for improvement where both groups
can be made happy.

~~~
twodave
I agree that's a valid argument, probably a more sensible approach, even if it
does make me a little sad :)

------
civil_engineer
Wikipedia gets it wrong too: Try “men without hats”
[https://en.wikipedia.org/wiki/Men_Without_Hats](https://en.wikipedia.org/wiki/Men_Without_Hats)

------
thedeviantdev
Try searching Google for 'white couples' and 'black couples'.

The former returns lots of mixed race couples, mostly not white couples.
However the latter returns black couples.

What is going on here? Similar phenomenon perhaps?

------
heavenlyblue
To be fair, the only thing that Google needs to do _internally_ is to match
this query to “shirt -stripe” and then you’ll get the necessary answer. The
bigger question is why they are not doing that.

------
quickthrower2
"Plain shirt" works a charm though. What is a 'shirt without stripes' anyway?
That could be a shirt with diamonds? Or a plain one? Or a Hawaiian shirt?

What is the expected result, can we agree?

~~~
notRobot
> What is a 'shirt without stripes' anyway

It is a shirt with anything _except_ stripes.

~~~
quickthrower2
Cool but what is the search intent? It's a bit like saying "Google gimme
something that ain't a ukulele"

------
V-2
I believe the future of AI, as showcased by this simple usecase, is not one
central AI such as Google search engine recognizing the context, but rather
each of us having a "smart assistant" with a personalized, trained
understanding of the contexts that we mean.

And it's only that smart assistant that automates coping with the deficiencies
of a one-size-fits-all central solution, finding me shirts with no stripes by
using a rather dumb search engine. (Or "a pizza I would like", etc.)

------
dk8996
I'm kinda late to this conversation but there are companies and Engineers
trying to solve this problem basically adding more "semantics" to visual
content. Good place to start is with this blog from Pinterest.

[https://medium.com/pinterest-engineering/pinsage-a-new-
graph...](https://medium.com/pinterest-engineering/pinsage-a-new-graph-
convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48)

------
lgessler
A good demonstration of the linguistic fact that far from being meaningless,
prepositions (adpositions, more generally) are actually highly consequential
for meaning and are highly ambiguous between different meanings. Here's a
paper that'll give you a good appreciation of this from an NLP perspective if
you're curious:
[https://www.aclweb.org/anthology/W16-1712.pdf](https://www.aclweb.org/anthology/W16-1712.pdf)

------
dailypeeker
Why is everything a git repo when it could have been a blog post?

~~~
jjjjjjjjjjjjjjj
Why do you think a blog post would've been better?

~~~
Minor49er
Because a repo is meant for source control. People are using the issue tracker
to leave general comments. There's no reason to branch or fork a blog post. It
pollutes the contributions metrics. That said, I don't really care how people
use Github, but there are better blog solutions out there already that would
make more sense to use. However, this aches of the old adage "when all you
have is a hammer, everything looks like a nail"

------
obarthelemy
Reminds me of the the challenge:

"Don't think of a cow !"

What did you just think of ? A cow, of cowrse.

If you want a shirt w/o stripes, just google "plain shirt" or "dress shirt
-stripes.

------
dtunkelang
As others have pointed out, most search engines don't support natural language
search in general, let alone natural language negation in particular.

There are several reasons for this, including the following:

1) Natural language understanding for search has gotten a lot better, but it
is still not as robust as keyword matching. The upside of delighting some
users with natural language understanding doesn't yet justify the downside of
making the experience worse for everyone else.

2) Most users today don't use natural language search queries. That is surely
a chicken-and-egg problem: perhaps users would love to use natural language
search if it worked as well or better than keyword search. But that's where we
are today. So, until there's a breakthrough, most search engine developers see
more incremental gain from optimizing some form of keyword search than from
trying to support natural language search.

3) Even if the search engine understands the search query perfectly, it still
has to match that interpretation against the documentation representation. In
general, it's a lot easier to understand a query like "shirt with stripes"
than to reliably know which of the shirts in the catalog do or don't have
stripes. No one has perfectly clean, complete, or consistent data. We need not
just query understanding, but item understanding too.

4) Negation is especially hard. A search index tends to focus on including
accurate content rather than exhaustive content. That makes it impossible to
distinguish negation from not knowing. It's the classic problem of absence of
evidence is not being evidence of absence. This is also a problem for keyword
and boolean search -- negating a word generally won't negate synonyms or other
variations of that word.

5) The people maintaining search indexes and searchers co-evolve to address --
or at least work around -- many of these issues. For example, most shoppers
don't search for a "dress without sleeves"; they search for a "sleeveless
dress". Everyone is motivated to drive towards a shared vocabulary, and that
at least addresses the common cases.

None of this is to say that we shouldn't be striving to improve the way people
and search engines communicate. But I'm not convinced that an example like
this one sheds much light on the problem.

If you're curious to learn more about query understanding, I suggest you check
out
[https://queryunderstanding.com/introduction-c98740502103](https://queryunderstanding.com/introduction-c98740502103)

------
bryanrasmussen
I think, looking at shirt without stripes and shirt with out stripes in Google
images, that without is decompounded, which then ends up giving you shirt with
stripes, however the slight difference between the two searches "shirt without
stripes" and "shirt with out stripes" is that the there are some exact hits
mixed in also, so there are some results for "shirt without stripes" mixed in
with the decompounded query.

Just my theory.

------
jpswade
Google search doesn’t work that way, it’s still based on how we link to
things.

Nobody would describe a plain shirt as a shirt without stripes unless it’s
within that context.

------
holdenc137
Wake me up when I can google "anything but crocodiles"

------
rubatuga
Also, what really angers me is when websites don't support the minus operator
for search queries. It's a simple feature introduced decades ago!

------
neycoda
How does a search engine know whether you wanted shirts that didn't have
stripes or results that contained the words shirts, without, and stripes?

------
bryanrasmussen
[https://www.google.com/search?q=plain+shirt](https://www.google.com/search?q=plain+shirt)

[https://www.amazon.com/s?k=plain+shirt](https://www.amazon.com/s?k=plain+shirt)

on edit:
[https://www.google.com/search?q=shirt+-stripes](https://www.google.com/search?q=shirt+-stripes)

~~~
leephillips
So the - is working, contrary to the claims of several people here. The bottom
line is that it is super easy to get results with shirts that don't have
stripes, if you don't expect your search engine to understand English (which
it would be wildly unrealistic to expect, and, besides, is not what you want).

~~~
ofTheMountain
While it is easy to get results based on certain keyword matching, Google has
been publicly touting their ability to use NLP in search to better understand
english. So this just shows a small limitation in their BERT models
([https://www.blog.google/products/search/search-language-
unde...](https://www.blog.google/products/search/search-language-
understanding-bert/)). Doesn't make it any less impressive.

------
rjurney
The latest embeddings/networks like BERT can handle encoding this logic. They
take the surrounding words in context when they're encoded.

Google can do this now, for example in a prototype. The tough thing is to get
it to consumer-grade quality without messing up other searches. The QA process
is utterly brutal because one weird search can be a scandal.

------
ddebernardy
The correct query would have been "shirt -stripes". That works fine, or at
least does on Google. But yeah, sentence parsing fail.

~~~
PeterCorless
Literally the first image that comes up for me is a striped shirt.

~~~
ddebernardy
No idea where you're from or what you did...

[https://imgur.com/a/XBzfOsF](https://imgur.com/a/XBzfOsF)

Might your search history be so that you're so contrarian that Google suggests
contrarian results? :D

------
wordabby
On a positive note Google used to have trouble with a query like "words with q
without u", now the top 5 pages at least all show the correct results, eg:
[https://word.tips/words-with/q/without/u/](https://word.tips/words-
with/q/without/u/)

------
vekker
I worked on an ingredient parser a few years ago. This exact kind of thing
made things a lot more difficult than they seemed at first.

------
CPLX
Related:
[https://www.google.com/search?q=mountains+without+women+in+a...](https://www.google.com/search?q=mountains+without+women+in+a+bikini&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiygeHatPfoAhWGknIEHe8QDSkQ_AUoAXoECAwQAw&biw=1747&bih=947)

------
aj7
Since I was a teenager, if someone energetically asserts a statement is “true”
or “false,” I drop the true or false and evaluate the statement. In essence,
their only communication to me is, ‘I think this is important!’ Often, why
they think it’s important is more pressing than whether the statement is true.

------
adamredwoods
I wonder if this is a need for humans need to learn search queries. "-stripes"
instead of "without stripes".

Or does input need to have basic filters applied before handing to ML?
"without X" or "no X" = "-X"? Can be foiled with "shirt without having
stripes".

------
mirimir
Searching "plain shirts" does in fact yield results for shirts without
patterns. And "paisley shirts" works too.

So it's not such a big deal that negation doesn't work.

Also, "shirts -stripes" does seem to work in both Amazon and Google. Or at
least, I see no striped shirts.

------
leonardopucci
I think that query analysis in terms of volume of actual people using this
query will show that very little people if any actually type "shirt without
stripes". Once enough people do it, feedback is accumulated that results are
bad (by CTR analysis), and results will auto-correct.

------
aaron695
Not sure if people actually search for "Shirt Without Stripes" or this was
picked for academia over what is actually needed

But make a script that scrapes the top X results for these sites. Get your own
AI / humans to rate it.

Make it competitive for these large sites <==> give them an incentive.

------
moultano
Ya'll might be interested in this paper.
[https://arxiv.org/abs/1907.13528](https://arxiv.org/abs/1907.13528)

> _in particular, it shows clear insensitivity to the contextual impacts of
> negation._

------
carapace

        shirt -stripes
    

> "Am I going crazy or is it the world around me!?"

Fishbone - Drunk Skitzo
[https://youtu.be/SaPGH4Yd_zc?t=231](https://youtu.be/SaPGH4Yd_zc?t=231)

(Apologies for the snarky low-content flip reply.)

------
softwaredoug
In search we know it’s easy to cherry pick queries and criticize any search
engine. A search engine is optimizing for billions of queries. Most of which
are on the long tail.

The real question is “shirts without stripes” really a query people enter? Or
representative of a real pattern in the data?

~~~
altfredd
> A search engine is optimizing for billions of queries. Most of which are on
> the long tail.

Citation needed.

As far as my personal observations go, Google is NOT optimized for long tail
at all. It is always trying to return most popular results from cache of most
popular results. Once the cache is exhausted, Google starts to return
completely irrelevant trash (anything after first two pages of search is pure
spam and meaningless keyword soup).

If you try to look up some obscure keyword and find nothing, try again after
couple of months. There is a very high likehood, that you will see dozens of
"new" results — most of them being from several years old pages. Perhaps, the
actual long-tail searches still happen somewhere in background, but you are
not going to see their output right away — instead you need to wait until they
get committed to the nearby cache.

Another alarming change, that happened relatively recently (4-5 years ago), is
tendency to increase number of results at expense of match precision. A long
time ago Google actually returned exact results when you quoted search phrase.
Then they started to ignore quotes. Then they started to ignore some of search
terms, if doing so results in greater number of results. Finally, Google
gained horrifying ability to ignore MOST of search terms. OP's example
probably has the same cause — Google's NLP knows the meaning of word
"without". But Alphabet Inc. can't afford to hose all those websites, that use
AdWords to sell you STRIPED SHIRTS. This would mean a loss of money! THE LOSS
OF MONEY!!!

~~~
b3kart
I think what the original poster meant is that the tail is really long, and
it's hard to cover it, hence these bad examples.

------
cfv
While I'm sure this is A Hard Problem to solve by NLP I for whatever reason
was under the impression that this is trivial to special-case.

As in, "X without Y" sounds like a common enough use case to have it's own
little parser branch in places as big as Google or Amazon

~~~
hbosch
I mean, if I google the phrase "shirts -stripes" and click the Images tab I
see mainly shirts without stripes.

So it's essentially the same input, and essentially the same expected output,
but there must be quite a knot between understanding the word "without" and
literally just using the - operator.

~~~
cfv
Right, but the - operator requires prior knowledge that the search engine
understands boolean operators and that a -b is an alias of "listings with a on
their text that don't have b in their text too" whereas "a without b" is
inmediately recognizable by whoever is writing the search as "I want something
of kind A without property B"

------
realo
Tried this on amazon.ca (instead of .com) and got quite a different, but also
amazing, result...

[https://www.amazon.ca/s?k=shirt+without+stripes&ref=nb_sb_no...](https://www.amazon.ca/s?k=shirt+without+stripes&ref=nb_sb_noss)

------
arnaudsm
I wonder why this problem hasn't been resolved yet, considering we had NLP
systems capable of this for a decade now. Maybe it's too hard to scale to
production. Or Pagerank is still better most of the time. Or plain old
monopoly and risk aversion.

~~~
atupis
My bet is that sites are generally optimized to a keyword search and this kind
NLP search engine would return subpar answers most cases.

------
harimau777
It seems to me that the problem isn't so much that this search performs
incorrectly. Rather it is that many search engines have removed the tools that
allowed a user to specify exactly what they are looking for (e.g. shirt
-stripes).

------
GnarfGnarf
Why not simply say "shirt -stripes" (negation in front of "stripes").

------
need_more_bort
So I’d have to ask, is the problem the AI doesn’t intuit “without stripes”,
find shirts that satisfy that condition (what kind of shirts? Dress shirts? T
shirts?) and then do an image search identifing shirts and their quality of
stripeyness

------
cachestash
Key question here, do all three even profess to using AI/ML in the search
feature?

------
dpcan
Yes, exclusive search is a huge problem.

You have to know to search for "solid colored shirt", but when you can't think
of this variation of search, or maybe there isn't one, exclusion is your only
option, and it's broken.

------
rammy1234
I see Bing is poor of the lot. It did not understand the "without" keyword

------
DangerousPie
Counterexample: [https://www.google.com/search?client=firefox-
b-d&q=Doctors+W...](https://www.google.com/search?client=firefox-
b-d&q=Doctors+Without+Borders)

~~~
crote
Is it, though? Try "Docters with borders".

------
kitplummer
Why is this a Github repo? I can't get past the abuse of a git repository.

~~~
sceptically
+1

------
mv4
Not surprising. Also, "shirt -stripe" might product better results.

~~~
jayd16
This works quite well but don't add the quotation marks.

~~~
mv4
right

------
otikik
Poignant but accurate.

On Amazon's side of things I would also include the obnoxious "Hey you just
bought a pair of sneakers so now I will change all your recommendations to
sneakers".

------
MaysonL
Reminds me of the difficult time I had finding socks without elastic.

------
tiborsaas
I get the author's point, but if you think about it, a search engine is a
database that serves you results you want to see. Why should a search engine
be fine tuned towards things you don't want see?

If it's meaningful for some reason, then it works:

[https://www.google.com/search?q=woman+without+makeup&tbm=isc...](https://www.google.com/search?q=woman+without+makeup&tbm=isch)

If it's an user error (like a dumb query) it fails and it shouldn't be a
surprise:

[https://www.google.com/search?q=sea+without+ships&tbm=isch](https://www.google.com/search?q=sea+without+ships&tbm=isch)

------
l0b0
"shirt -stripes" seems to work on google.com at least, even though they and
others like DDG have been getting really bad at ignoring "-foo" terms
recently.

------
ddlutz
Noticed something interesting, if you search for 'shirt without sleeves' in
google images, you DO get sleeveless shirts. So why doesn't this work with
stripes?

~~~
mr_toad
I’m guessing that BERT understands that ‘without sleeves’ and sleeveless means
the same thing, and there are images labelled as sleeveless shirts.

But there probably aren’t many images labelled as stripeless.

I’m not sure why BERT doesn’t try shirt -stripes.

------
fpoling
Searching in Russian on Yandex gives the same ridiculous results.

------
Despoisj
What about using the "-" sign to filter results instead of relying on complex
language understanding?

=> "shirt -stripes" works pretty well on google at least

------
peter_retief
If you search for "plain shirt" its good If you add "plain shirt no stripes"
it adds stripes Strangely "striped shirt" has some plain results.

------
lavp
For the google search, I get better results by typing "shirt -stripes". Still
not perfect, but it's better than the seemingly redundant 'without'.

------
prvc
On a related note, Google seems to favor forum replies which instruct users to
perform a search in order to find the answer to the question that they had
asked.

------
Skunkleton
Good thing search engines generally support a more machine-centric process for
communicating intent. Try searching for "shirt -stripes".

We are in a funny place with UIs.

------
dvh
[https://www.google.com/search?q=leap+years+in+1900s](https://www.google.com/search?q=leap+years+in+1900s)

------
fortran77
Here's a better example:

"shirt without sleeves"

That something that someone may actually search for. (At least the guys at my
gym would!) And Amazon gets it mostly wrong.

------
melvinram
`shirt -stripes` gives the type of results one would expect. I guess we
haven't reached that level of natural language processing yet.

------
willdearden
One time I ordered from stitch fix and asked for “a shirt which is not red,
white, and blue” and got a red, white, and blue shirt.

------
wwarner
I thought the minus prefix instead of "without" would exclude "stripes", but
it doesn't (any more).

------
__ryan__
Similarly, ask Siri to “play all of my music, except classical music”. Siri
responds “OK, classical music coming up.”

------
activatedgeek
I'm sorry but what's the point being made here?

Search results could be better? Sure.

Can we find adversarial examples? Almost always.

------
hajimuz
It’s like the plot in the movie Inception. How to plan an idea “don’t think
elepant” to an human idea?

------
flamtap
In the Amazon app, a search for “shirt without stripes” now get corrected to
“shirt without strips”.

------
impostervt
This should be the next captcha model.

------
cgb223
When the Robot Wars come we’ll all be wearing striped shirts as camouflage to
confuse their AIs

------
sceptically
Just a short question: Why does someone choose to publish something like that
on github?

~~~
detaro
because it is easy for them?

~~~
larrybud
with one more click, they could have turned this into a 'real' website using
github pages.

------
robbiemitchell
In my experience, semantic embedding is simply not very good at taking
negation into account.

------
skizm
Could this be an SEO opportunity to capture some simple negative phrases like
this?

------
billiam
It's a shopping problem, not a language problem, according to these companies.

------
voldacar
Doesn't Peter Norvig work at google? maybe they should pick up his book

------
java-man
Bag of words does not work.

------
tempodox
Stripes seem to hold an irresistible attraction for impostor “AI”.

------
diegorbaquero
This is expected, not wanted though, I would expect some semantic analysis
translated into "shirt -stripes", but what you really mean is "solid color
shirt". This is a tough one but surely something that can be tackled with
research

~~~
saagarjha
shirt -stripes does not mean "solid color shirt", as the - operator looks for
that text on the page, instead of performing a semantic "I don't want
stripes".

~~~
SAI_Peregrinus
And "shirt without stripes" doesn't mean solid color shirt either. Could have
spots. Could be Hawaiian print. Is plaid striped?

------
holdenc137
wake me up when I can google "Anything but crocodiles"

~~~
montroser
You're joking, but perhaps you're right. It's unnatural to search in the
negative -- "solid shirt" or something like that would be more likely from a
human with that actual intent.

This just suggests to me that real humans haven't issued that type of search
query enough for the AI to know what to do with it. Which wouldn't be so big
of a problem.

------
AA-BA-94-2A-56
Did nobody tell OP you can search "shirt -stripes"?

------
paulftw
Are they also building AI cars that drive without accidents?

------
DonHopkins
Stay positive: "solid shirts" works just fine.

------
_pmf_
But muh "autonomous driving is almost there".

------
longtermd
A pretty accurate descriptions of the State of AI ;D jk

------
dvduval
Most important is what the advertisements at the top show. The organic results
are so yesterday. The Google Ads AI should already be teaching you that All
you base belong to us.

------
downshun
It's called plain, as in plain shirt. PEBKAC

------
PeterCorless
Someone needs to learn how to use the ¬ operator.

------
ChrisArchitect
hm, 'without' is a tough one. You're not looking for a zebra without stripes.
You're looking for a horse.

~~~
moron4hire
Not necessarily

[https://www.nationalgeographic.com/animals/2019/09/zebra-
pse...](https://www.nationalgeographic.com/animals/2019/09/zebra-pseudo-
melanism-kenya-masai/)

------
kleer001
"shirt -stripes" works, wtf?

------
jlmcguire
without must be a tough one? It does seem that bing is the worst at figuring
this out from the pictures.

~~~
lucb1e
None of them make an attempt at filtering this. While I found it amusing
because they advertise with AI magic in other areas, those search engines all
advertise to be keyword searches and one of them being "the worst" means it's
either random (since none of them are trying) or they are the best (since it
matches the stripes keyword).

To be honest I wish they actually _were_ keyword searches and that the machine
doesn't try to be smarter than you. Many times when I carefully specify which
keywords must appear on the page, it'll ignore parts of the query or add
unrelated synonyms. Usually one can work around it with operators but it's
tedious and doesn't work reliably.

------
softwarejosh
its obvious but the solution is shirt -stripes until we make ai interpret
attributes

------
danielovichdk
Result without correct match!

------
sparrish
Learn you some Google-foo. Don't say what it isn't. Say what it is. "shirt
solid color"

~~~
spelunker
Google-Fu == working around machine learning limitations which is the point of
this post.

------
noughtme
Probably not surprising that most people don’t know, but negative keyword
search works on all these platforms:

shirt -stripes

~~~
derekp7
It would be really nice if the search engines would automatically apply the
"-" when it sees the work "without".

~~~
oliveshell
I’m not so sure.

What about a search query for “Doctors without Borders“ or “Men Without Hats“?

Surely interpreting “without“ as the negative operator would ruin those
searches.

~~~
derekp7
What happens when someone asks you about one of those items? You consult your
memory reserves, and you find that those are proper names of specific
entities. So your brain returns those entities. Now there very well be a high
school band called "Shirts without Stripes", you most likely would call up
plain shirts or shirts with non-striped patterns. No reason that a search
engine wouldn't follow the same rules (i.e., Google must have millions of
entries for Doctors without Borders and Men without Hats).

------
eggie5
simple LTR w/ clickstream data would fix this easy

------
arkanciscan
BUT THEY HAD MILK!

------
kebman
shirt -stripes Thank me later. :D

------
aaronsnoswell
Brilliant :)

------
renewiltord
This is amusing but not a problem.

------
inopinatus
Tortoise: But we must be careful in combining sentences. For instance you’d
grant that “Politicians lie” is true, wouldn’t you?

Achilles: Who could deny it?

Tortoise: Good. Likewise, “Cast-iron sinks” is a valid utterance, isn’t it?

Achilles: Indubitably.

Tortoise: Then, putting them together, we get “Politicians lie in cast iron
sinks”. Now that’s not the case, is it?

\---- Douglas Hofstadter, _Gödel, Escher, Bach: An Eternal Golden Braid._
Basic Books, 1979

------
WilliamEdward
querying "shirt no stripes" yields slightly better results.

------
JabavuAdams
bag of words?

------
aerovistae
This comment would go from "unreadable" to "interesting" if you had phrased it
as:

"Vaguely similar to a joke from _the movie_ Ninotchka that _the Slovenian
philosopher_ Zizek often uses...."

Give people context. Don't assume people know what you know.

~~~
stuartaxelowen
This feels like a perfect example of how contextual natural language is!

~~~
ForHackernews
Somebody someone will developer a hipster AI capable of deciphering lefty
twitter.

------
yters
Human level NLP is the halting problem, so unsurprising that AI cannot do
simple expressions.

------
lostmsu
You got it backwards. This is the opposite of the problem I and many other
people tend to have with search engines lately. I do not want the damn thing
to combine words, exclude unpopular ones, and search for synonyms without me
telling it to explicitly. As little as you can semantics please.

If Google wants to group words by semantics, they should have a semantical
grouping operator. For example "shirts (without stripes)". What if I am
looking for a song text with these exact words in random positions?

If what author wants was implemented, it would make my experience with Google
even worse, unless it could think for me also. But then why would it need me
in the first place?

