
A Google bot scrapes pricing info by adding items to carts - psim1
https://www.wsj.com/articles/who-is-the-mystery-shopper-leaving-behind-all-those-online-shopping-carts-11593617464
======
whoisjuan
This bot is simply trying to get the final price (with tax and shipping) which
is ridiculous because e-commerce storefronts should do that in the first place
without going through the whole checkout process.

I always have found that kind of shady but it's probably known to increase
conversions.

What I found interesting is that this an open attack vector for e-commerces.
Multiple bots can hit a website and start adding items and start the checkout
process. This basically creates an unprecedented cart behavior data influx
that ruins any possible usage for data coming from legit customers. Maybe
cleaning the data wouldn't be that hard but if someone knows what they are
doing they can really make it hard (separate IPs, emails and cart behavior)

I doubt Shopify or Magento have anything to prevent this.

~~~
mmcconnell1618
Not all shipping charges can be calculated ahead of time. For example, you may
offer free shipping on orders over $50. You may charge $9.99 for the first
item, $5.99 for each additional item. You may charge by weight of the whole
order. You may have oversized items or packages that can be combined to reduce
shipping charges. Some items may ship together as OTR Freight, while others
can go via the local postal service. Buying multiple items changes this
calculation.

So, yes, you can estimate shipping for a single item but you can't always
present the per-item shipping charge as it depends on the context of the whole
order.

~~~
chrisan
How does that change by having the bot add items to the cart? You haven't
solved anything

You are still left with the same scenario as if the store listed the
individual shipping price on the front page

Google isn't going to know what other items you _might_ add to show you a
"real" shipping cost

~~~
hrktb
I’d assume parent’s point is regarding the “which is ridiculous because
e-commerce storefronts should do that in the first place without going through
the whole checkout process.” part.

There’s a lot of legitimate case were showing shipping price upfront is just
not doable or valuable to the customer.

BTW there are a surprising amount of shops for specialized goods that won’t
even list the final price at the end. The customer places an order, and they
update it with a finalized price after a human looks at the content, and from
there the customer is free to pay the transaction or give up the order.

~~~
zoomablemind
Even the Y2K-style ecommerce stores usually had a separate S&H section for
some guidance. These days the H part (handling) seems less in vogue (perhaps
still common on ebay), while S part is pretty predictable if not free.

It's the T (taxes) part that may be still a tipping point these days, but it's
just between vendor and your state,

~~~
hrktb
We are in agreement that there needs to be explanation on what's going on, and
not just "we'll set some price yon won't know why".

In my experience, the most fluctuations were on international shipping by
small vendors. Lego bricks for instance, where it makes a big difference if
you request 5 small pieces that weight 20g total and can wait 3 months, or if
it's 500+g in a middle sized box and you want it in 2 days.

Even with average indication on what to expect, depending on the combination
you are requesting the vendor might use a different carrier, different
shipping method and so on. They could make it more simple with a range of
arbitrary standard fees, but then it costs a lot more to the customer, putting
the vendor at a disadvantage price wise. In particular people have visceral
reactions to overly high shipping prices.

------
soganess
For people saying this to calculate the final price with shipping and tax,
it's not (or at least not entirely). It is for this new sales conversion dark
pattern where prices aren't listed until you add to cart.

Ebay sellers are particularly bad offenders: [https://www.ebay.com/itm/Open-
Box-Certified-Samsung-Galaxy-1...](https://www.ebay.com/itm/Open-Box-
Certified-Samsung-Galaxy-13-3-4K-Ultra-HD-Touch-Screen-
Chromeboo/203028862820?epid=21037915306&hash=item2f45769764:g:IM4AAOSwq4Nesuii)

~~~
abiogenesis
Google disagrees with you:

> When The Wall Street Journal contacted Google in June, a spokesman at the
> internet giant, after a few days of digging, provided an update: The mystery
> shopper is a bot of its own creation. The purpose: making sure the all-in
> price for the product, including tax and shipping, matches the listing on
> its Google Shopping platform or in advertisements.

~~~
leeoniya
this is what we've seen as well. it validates that whatever price, promo,
shipping and taxes you've put into your feed is what ends up in the final
checkout and there's no bait-and-switch going on between the feed and reality.

it's rather annoying because it creates dozens of "abandoned" carts per day
which we have to continually clear out (based on Google's known ip address
ranges) so our reps can go through actual abandoned carts.

------
vmception
That sparked a funny idea in my head, what if we tricked product managers
industry wide to follow KPIs and A/B tests that resulted in a better user
experience for consumers, instead of experiences that coincidentally slightly
upticked "engagement".

Because it seems like this mystery shopper is already doing that.

~~~
maltelandwehr
„Messing up your competitors A/B test“ is not unheard of as a tactic in highly
competitive ecommerce settings.

~~~
withinboredom
Do software engineers actually implement that? That seems pretty immoral. I'd
rather let them run the a/b test and steal whatever solution they end up with.

~~~
st1ck
I can't find reasons why would this be immoral. I'd say it's rather aggressive
and won't earn you good reputation for sure. But it's sort of fair game.
Compared to many business practices (lobbying, forced arbitration, patent
trolling, DMCA, price dumping etc.) this is extremely mild one.

~~~
habosa
Generally active sabotage is frowned upon as opposed to winning in fair
competition.

------
advisedwang
[http://archive.is/YRkQe](http://archive.is/YRkQe)

~~~
maltelandwehr
Thanks! I was not aware you could use Web Archive for that. All the more
reason to Love that site!

~~~
kqr
I'm not sure archive.is and archive.org are the same site.

~~~
mobilio
They're not same!

------
yongjik
robots.txt, man, if you don't want search engines to visit certain part of
your page, use robots.txt!

Once heard a tale of an angry site owner calling Google (back when Google
itself was novel) - Google deleted his whole website! Turned out he had
"DELETE" button in each page, which generated plain GET request. So Googlebot
visited the site, followed links to every page, and then of course followed
every link that generated GET requests - because they are supposed to be safe.

Don't be like that site owner.

~~~
YetAnotherNick
How do I use robots.txt to tell google to not add item to the shopping cart?

~~~
yongjik
Erm... hide the shopping cart page behind robots.txt?

~~~
kabacha
As someone who has seen way too many robots.txt files that's exactly how you
do it.

------
justinwp
Protip: You will often get a discount coupon if you go through most of the
checkout process(need to provide email), but wait a couple days. Many stores
automate abandoned checkout promotions.

~~~
bradlys
Yes! This is also something that is common with smaller online retailers.
Don't expect this with B&H, Adorama, or Newegg. Frequently these small
companies give one time codes you won't see or be able to gain elsewhere.

------
danimal88
It's just price data collection. In particular, MAP policies can be skirted by
not publishing a final price but having a price below MAP in the cart which is
a common tactic that online sellers utilize. By pretending to walk through the
cart, all sorts of data about pricing, taxes, etc. can be learned. It's not
entirely uncommon to see different prices at different times, for different
user agents, for different locations, etc. Used to work for a company that
build huge price collection systems and built many of them...

~~~
Drdrdrq
MAP == Minimum Advertised Price

------
Alupis
The real problem with this is from the merchant side of things.

This bot generates thousands of "Abandoned Carts" on one of our sites...
thousands...

We send cart reminders to Abandoned Carts after a few days, sometimes with a
coupon offer to complete checkout.

This bot is responsible for thousands of bounced emails each week, which
impacts our metrics with Mandrill among other things.

Maybe we shouldn't care, but it's sloppy and ruins all sorts of stats we keep
track of regarding cart abandonment rates, recapture rates and more.

~~~
SquareWheel
>We send cart reminders to Abandoned Carts after a few days, sometimes with a
coupon offer to complete checkout.

I consider this spammy behaviour, and mark the emails as such. I can only hope
this discourages such practices in the future.

~~~
Alupis
It doesn't. If you mark it as Spam through most email programs, it's reported
to the sender (Mandrill in our case) and Mandrill automatically black-lists
your email address so we don't continue to send to someone that doesn't want
the emails.

That's a win-win.

~~~
matchbok
Still an annoying and anti-consumer practice. Another "growth marketing"
tactic that doesn't take into account the number of people who never visit
that site again because of the spammy stuff.

~~~
Alupis
The overwhelming majority of folks aren't so principled as to black-ball a
website they like, selling products they like, from brands they like, and
prices they like all because they received a cart reminder email with a
special coupon inside.

Maybe you are? Just don't project that onto everyone else.

------
rkagerer
Are there legal implications to Google bots transacting with websites under
false pretenses?

I mean their normal web crawler identifies itself as such. Here, I feel like
they're committing (very) minor fraud by putting in fake shopper information
and actively hiding their identity. Not a big deal if it were just some Joe
Schmoe somewhere, but at their scale might it border on harassment? The robot
equivalent of a prank call?

~~~
the_pwner224
Probably a violation of the CFAA. Lots of people hate it because they think
it's overreaching, and lots of companies use it to legally threaten scrapers
and security research. But in this case Google is doing mass unauthorized use
of other people's computers.

~~~
shadowgovt
If I'm doing price comparison between online vendors, I will---as a human---
put some items in the cart and get right to the edge of checkout to determine
what my final bill would be. I may not close the sale if I'm looking at a
better option elsewhere.

How is what I'm doing materially different from what Google's doing? Is scale
a factor that matters for CFAA?

~~~
lmm
Maybe you _are_ violating the CFAA by doing that? It's a very broad law.

------
vmateixeira
Genuine question, is this not considered a DoS attack?

Let's imagine I have my online stock linked to limited physical items/assets,
ex tickets for a show, which will get reserved for a period of time. This will
be preventing genuine clients from buying them.

~~~
Mizza
I'm thinking - if I forbid this in my site's Terms of Service, will DoJ go
after Google for CFAA violations like they did to Aaron?

~~~
vmateixeira
Yeah.. probably depend$ on how _loud_ you can make yourself heard..

RIP Aaron

------
tacon
Would it be too much for Google to program the bot to get the final price, and
then delete all the items from the cart? Seems rather rude, even for Google.

~~~
disposekinetics
Is abandoning a cart really rude behavior? I sometimes do it just to see if
they'll spam me as a test of if I want to do business with a site.

~~~
jawns
It's not rude at a consumer level, where (in general) you're at least
considering making the purchase. It's arguably rude at a bot level, depending
on the frequency, where there is 0% chance of conversion.

~~~
dragonwriter
The entire purpose of the bot is to provide listings to consumers who are
looking to buy.

If it was consumer journalist doing it to get the price for a news article (in
a for-profit publication) about the product, would it be “rude”? If not, how
is it for Google bot?

~~~
jsnell
Because bots will do it at a much larger scale than individual humans. The
first law of web robotics applies here: the bot should not harm the website
it's crawling, or through inaction allow it to come to harm.

I didn't read the article due to the paywall, but I assume that the problem is
that the problem is that these goods are reserved for that (non)-customer
until the shopping cart times out? That is directly costing the merchant
money, either in lost sales or having to maintain extra inventory.

So yeah, that bot really should have been programmed to end the session with
an empty basket one way or another.

------
leoh
Such a bot could be used to damage ad tracking

------
doe88
I wouldn't fault them for that, I've observed some sites most likely are
gaming the system by detecting and providing Google bots with artificially
lower prices so that they would appear in indexes summaries and then when you
access the product, its real price is always higher than the one reported in
the index.

~~~
dylz
yep, I see this type of behaviour constantly - faked prices for Gbot, fake
prices on Cache, significantly higher price for end user.

It's also infuriating to sort by price and get inflated fake shipping prices
to "make up the total"

------
madmax108
I used to work at a company that provided APIs used for
search/personalization/autosuggest for a whole bunch of huge e-commerce
companies. Since the entire integration with the customer site was API based,
we worked off of tracking pixels, API requests and cookies to determine
shopping behaviour. A lot of this went into determining things like ranking
(If someone searches "Tshirt" what shows up on the first page and in what
order etc.)

Since we were only running search and not payment processing, the tracking
pixel/API for "Add to Cart" was a big thing for us. The whole product ran on
revenue-share so we were paid per X ATCs

Interesting to see if any of the customers were affected by bots doing ATC and
how it was handled if it was.

------
aaron695
Digital shopping cart abandonment/Inventory Exhaustion/Hoarder bots is an
interesting type of DDOS.

There's a popular moment of people using it atm
[https://heavy.com/news/2020/06/shopping-card-abandonment-
tik...](https://heavy.com/news/2020/06/shopping-card-abandonment-tiktok/)

------
amelius
It would be cool if Google could manage to become a storefront for the entire
web, thereby eliminating Amazon.

~~~
murgindrag
For Google (or anyone) to become a storefront for the entire web, they'd need
to handle scams (and errors) well.

eBay is a cesspool. Aliexpress is worse. Random web sites are bad. Amazon
isn't perfect, but it's better.

Amazon also has customer service; they've always made me whole. Random web
sites, I'm basically SOL. Aliexpress and eBay are random. Someone flips a
coin, heads seller wins, tails buyer wins, regardless of who the scammer is.

I mostly buy from Amazon since my odds of not having problems are that much
higher.

~~~
ihumanable
Exactly this, the customer service for the average consumer from Amazon is
very difficult to beat and is Google's biggest weakness.

Bought some cables from Amazon Basic, one ended up not working, another had
some cosmetic damage but works fine. They refunded both, sent out
replacements, and just told me to discard them, it wasn't worth it for Amazon
to pay to have it shipped back.

Of course if you abuse this too much Amazon will ban you. If you are an honest
consumer though, their customer service generally provides a great experience.

I still remember a time when everyone was afraid of purchasing stuff over the
internet, Amazon has so greatly reduced the friction and concern that
sometimes I find myself going from "hmm, I need something" to "it will be here
tomorrow" in the matter of a minute or two.

Although more competition in this space would ultimately benefit the consumer,
it seems unlikely that Google is going to be the source of that competition.
They've got shopping results integrated into their search engine, and it's a
feature I've maybe browsed from time to time, but I often just end up
searching and purchasing on amazon directly. I don't know if I would be super
comfortable purchasing from Google in the same way that I am with Amazon, too
many horror stories of App Developers / YouTube Creators / etc getting caught
in some sort of Machine Learning Customer Support system.

Curious if others use the Google Shopping thing in the search engine and what
their experiences are with it.

~~~
amelius
> Exactly this, the customer service for the average consumer from Amazon is
> very difficult to beat and is Google's biggest weakness.

Amazon's customer service is a robot, which switches to someone in a
callcenter in India, and then finally switches to a local person. I know
because I recently had to contact them.

Not sure how this is "difficult to beat".

~~~
murgindrag
It's difficult to beat because prices are a race to the bottom, and small
players have no effective way to build up and manage reputations.

If I need a widget, and Vendor A charges a buck, while Vendor B charges two
bucks, all else being equal, I'll buy from Vendor A. Bad customer service
helps both vendors compete with each other, but prevents small companies,
collectively, from competing with Amazon.

On eBay, small players do manage reputations, but only for a few weeks. If a
product fails (or is discovered to be a fake) after 60 days, the seller is all
good. Next sucker! There are things I'll buy there, but far more I won't.

Google itself has the problem that culturally, it relies on algorithms which
know better than you do, and is not a service company. It does great tech, but
holds human being outside of Google in open contempt. That's find for running
a search engine, adwords, or gmail, but it crashes-and-burns for ecommerce.

------
caser
This feels like a great way to get data on how all these different e-commerce
companies approach remarketing.

------
Keyframe
I think I've seen most Google's technologies dissected and/or explained in
detail over the years. Lots of their own papers too. If you look into how and
what they're doing regarding data collection, including scraping, there's
nothing.

------
baybal2
Funny, a one quick gig I did in my college years was to write a shopping bot
protection against "guaranteed lowest price" scraper like tigerdirect, or RFD.

Back then, the goal was exactly the opposite.

------
Youden
When and why did news cease being news and start being short stories and
opinion? This entire article could have been cut down to the last few
paragraphs and nothing of value would have been lost.

Look at The New York Times in 1921 [0]. Generally the stories are factual and
to the point. The entire front page seems to be pure news. There's very little
storytelling here, at most there are a few timelines of events.

Look at The New York Times today [1]. There's a bunch of factual and useful
Coronavirus information but ~15% of the page is dedicated to "Opinion", the
second article appears to be pure speculation, the third article is a bunch of
storytime fluff around a little bit of news and the front page has a mix of
actual news and opinion pieces being passed off as news.

When did this happen? Why? Did people lose interest in actual news? Is there
less actual news to report?

Perhaps this is regional? Take for example the story about the San Quentin
prison. NYTimes [2] has the same drawn out nonsense as this Google story while
Aljazeera [3] adds a lot of background but sticks to factual reporting.

[0]:
[https://archive.org/details/NYTimes_jul16_31_1921](https://archive.org/details/NYTimes_jul16_31_1921)

[1]: [http://archive.is/oiiXU](http://archive.is/oiiXU)

[2]: [https://www.nytimes.com/2020/06/30/us/san-quentin-prison-
cor...](https://www.nytimes.com/2020/06/30/us/san-quentin-prison-
coronavirus.html)

[3]: [https://www.aljazeera.com/news/2020/07/san-quentin-prison-
se...](https://www.aljazeera.com/news/2020/07/san-quentin-prison-
sees-600-coronavirus-cases-5-days-200701192059040.html)

~~~
supernova87a
Maybe you don't know this, but the "A-hed" article of the WSJ is the humorous,
light-hearted take on some cultural phenomenon that appears every couple of
days. It's got a distinct separation (graphically) from the rest of the news,
and is written not to be taken too seriously. (It's not so apparent in the
online version, if you haven't read it before).

So you don't have to worry that it's some broad decline in journalistic
standards (at least based on this)... The WSJ is one of the few quite
reputable news rooms out there.

You can read about A-hed articles here:
[https://www.wsj.com/articles/SB10001424052702303362404575580...](https://www.wsj.com/articles/SB10001424052702303362404575580494180594982)

And there was even a book published a few years ago with collections of these
kinds of amusing stories: [https://www.amazon.com/Floating-Off-Page-Stories-
Journals/dp...](https://www.amazon.com/Floating-Off-Page-Stories-
Journals/dp/074322664X)

~~~
harry8
> The WSJ is one of the few quite reputable news rooms out there.

The WSJ is owned by Rupert Murdoch. The credibility of their newsroom begins
being compromised by his owning it. He will destroy its credibility utterly by
selling it for politicial influence in news reporting. Just as he has
everywhere he has bought media. The particular example of compromised
credibility that comes to mind is the Times of London which is now Murdoch
propaganda (all be it vastly more polite than fox news) where it used to do
credible news reporting. Times reporting now can still be excellent but has a
"be cautious" flag on it that it used not to have in the days prior to Mudoch.
The man has become vastly worse in the past couple of decades as has
everything he touches.

~~~
amadeuspagel
Murdoch bought the WSJ in 2007. When is he going to start destroying its
credibility "utterly"?

~~~
SquishyPanda23
Uh have you read their commentary/opinions? Half the time they come off as if
they're trolling.

I'm sure at one point they were a thinking man's newspaper. At this point
they're just fan service for people who have drunk the koolaid but can't
stomach Fox's mass market approach.

~~~
harry8
To be fair the WSJ has always had some pretty outlandish opinion pieces. The
tradition was that these were separate to the news reporting and the news
reporting was untouched by them. But now it's in Murdoch stable. Sad.

------
ycombonator
Google product Growth hack: Fake it Until you make it

------
hbarka
Didn’t some #tiktokteens do the same with some guy’s web store?

~~~
moneywoes
Sorry, what is the context here?

------
s1k3s
Is this supposed to intrigue me? Good bot

------
tudorw
Nice, I think it has my CC details )

------
ardy42
> When The Wall Street Journal contacted Google in June, a spokesman at the
> internet giant, after a few days of digging, provided an update: The mystery
> shopper is a bot of its own creation.

> The purpose: making sure the all-in price for the product, including tax and
> shipping, matches the listing on its Google Shopping platform or in
> advertisements. It wasn’t to cause angst to merchants due to thousands of
> abandoned carts.

> “We use automated systems to ensure consumers are getting accurate pricing
> information from our merchants,” a company spokesman said. “This sometimes
> leads to merchants seeing abandoned carts as a result of our system testing
> whether the price displayed matches the price at checkout.”

You'd think they could have better identified themselves in accounts they were
creating rather than creating this mysterious "John Smith" persona. Maybe
"GoogleBot PriceVerifier" would have been a better choice.

edit: remove my inaccurate confusion about something, and fix quotes that I'd
copied from a plagiarized version of the article.

~~~
bluGill
They need to be non traceable. If I'm doing something underhanded with pricing
information I want to detect Google and other such bots and give them
different information.

~~~
inetknght
You really think it's wise to lie to your customers?

~~~
its_dario
No, and that's not their point.

They're saying if they were to lie to their customers, they'd want to make
sure they're deceiving Google. In that case, having an easy way to detect that
it's Google would make that trivial.

------
Animats
Now even the WSJ has clickbait titles. Should have been "Google price-checking
system annoying merchants".

~~~
hyperrail
This is an A Hed, one of the Wall Street Journal's daily funny news stories on
the front page. Other recent ones include:

* Baseball Stadiums Are Closed to Fans - but This Guy's Balcony Is Open for Business

* Americans Craving Contact Ponder New Rules for Throwing a Party in Real Life

* When Your Best Friend in Quarantine Is a Squirrel, You May Be Going Nuts

* Beware of Falling Tofu: China Takes on High-Altitude Littering

* Did You Forget Things During Lockdown? So Did People With Superior Memories

In that context I don't have a problem with the title "Who Is the Mystery
Shopper Leaving Behind Thousands of Online Shopping Carts?".

~~~
agustif
Hahaha That's the NYT website headlines nowadays LOL

------
abofh
Google.

Saved you a click.

~~~
lawnchair_larry
Thanks. “A Google bot scrapes pricing info by adding items to carts” could
have replaced that whole fairy tale that they wanted us to pay for.

~~~
dang
Ok, we'll use yours. Thanks!

I kind of liked the mystery shopper angle, but since there's more than one
complaint in the thread, the guidelines win (" _Please use the original title,
unless it is misleading or linkbait_ ")

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

------
bravoetch
TL;DR - it's a google bot

~~~
whoisjuan
This is the problem that I have with HN editorializing titles. This comment
made perfect sense and was useful before they changed the title, but now that
is changed it looks like the poster is an idiot who is just saying what the
title says and some people downvote it.

I know HN is not very keen on adding features, but this is one that is missing
for the sake of transparency (seeing if the original title was editorialized)

I understand that the original title was click-bait trash and this one makes
sense, but it would be nice to understand how it changed so certain comments
don’t get de-contextualized.

But I guess is the same problem with editing comments.

