Hacker News new | past | comments | ask | show | jobs | submit login
A Google bot scrapes pricing info by adding items to carts (wsj.com)
362 points by psim1 87 days ago | hide | past | favorite | 277 comments



This bot is simply trying to get the final price (with tax and shipping) which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.

I always have found that kind of shady but it's probably known to increase conversions.

What I found interesting is that this an open attack vector for e-commerces. Multiple bots can hit a website and start adding items and start the checkout process. This basically creates an unprecedented cart behavior data influx that ruins any possible usage for data coming from legit customers. Maybe cleaning the data wouldn't be that hard but if someone knows what they are doing they can really make it hard (separate IPs, emails and cart behavior)

I doubt Shopify or Magento have anything to prevent this.


Not all shipping charges can be calculated ahead of time. For example, you may offer free shipping on orders over $50. You may charge $9.99 for the first item, $5.99 for each additional item. You may charge by weight of the whole order. You may have oversized items or packages that can be combined to reduce shipping charges. Some items may ship together as OTR Freight, while others can go via the local postal service. Buying multiple items changes this calculation.

So, yes, you can estimate shipping for a single item but you can't always present the per-item shipping charge as it depends on the context of the whole order.


What this poster said.

Yes, a lot of smaller e-commerce platforms could do this, but finalizing order value can be a very complex workflow for bigger merchants with more varied sku mixes.

I’ve worked in multi-billion dollar Ecom companies where the programs to refine the order checkout process gets scoped as a multi year effort accounting for a couple of decades of legacy cruft... even if you separate the “product/tax/shipping” calculations from the “customer/credit/rewards” dependencies. But it’s often not worth separating them because they’re very inter-dependent. Moreso when you involve drop shipping or made-to-order things.


How does that change by having the bot add items to the cart? You haven't solved anything

You are still left with the same scenario as if the store listed the individual shipping price on the front page

Google isn't going to know what other items you _might_ add to show you a "real" shipping cost


I’d assume parent’s point is regarding the “which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.” part.

There’s a lot of legitimate case were showing shipping price upfront is just not doable or valuable to the customer.

BTW there are a surprising amount of shops for specialized goods that won’t even list the final price at the end. The customer places an order, and they update it with a finalized price after a human looks at the content, and from there the customer is free to pay the transaction or give up the order.


Even the Y2K-style ecommerce stores usually had a separate S&H section for some guidance. These days the H part (handling) seems less in vogue (perhaps still common on ebay), while S part is pretty predictable if not free.

It's the T (taxes) part that may be still a tipping point these days, but it's just between vendor and your state,


We are in agreement that there needs to be explanation on what's going on, and not just "we'll set some price yon won't know why".

In my experience, the most fluctuations were on international shipping by small vendors. Lego bricks for instance, where it makes a big difference if you request 5 small pieces that weight 20g total and can wait 3 months, or if it's 500+g in a middle sized box and you want it in 2 days.

Even with average indication on what to expect, depending on the combination you are requesting the vendor might use a different carrier, different shipping method and so on. They could make it more simple with a range of arbitrary standard fees, but then it costs a lot more to the customer, putting the vendor at a disadvantage price wise. In particular people have visceral reactions to overly high shipping prices.


And what's even more interesting - human would do exactly the same thing.

Add items to card, check the total price and then decide whether to buy it(i remember trying to order some stuff form one japanese plamo store - and it didn't provide exact prices before checkout. I went through the process, but even the cheapest option for delivery was way too high - as 2x price of the whole order)


I've had that sort of experience with a Japanese store once--the prices were good but the shipping (the only thing they offered was international FedEx) killed it. The US is just as bad--we don't have slow international options.


It matter because if you purchase multiple things your average shipping cost per item changes. If you only calculate shipping based on first item shipping cost it will be inaccurate.


But Google is only showing a result for a single item.


No one said the bot is getting good data. I assume it's trying to get the best possible outcome by adding to the cart, but I doubt it's getting the real final price for many merchants even by doing that.


You could GeoIP the user's IP address and display an initial tax + shipping estimate.


Tax isn't always only region based. You also need to account for Vat Tax which has a bunch of conditions around it too that you can't assume.


Include the statistical average tax in the price, and then up it or down it slightly at check out like a tax return. The user will be happy that the advertised price is a good estimate of the final price. Same for shipping.


that's true that the calculations can get complicated pretty quickly in ecommerce, but google probably has all the data it needs (origin zip, destination zip, likely carrier(s), possibly even the weight/size of each item) to provide a pretty good estimate in most cases. they could even calculate a range for [1 item per box, all items in 1 box].

the important bit is to present it as a separate line item (with grand total) so that consumers can decide how much to trust the estimate.

that would be an even clearer shot across the bow of amazon, walmart, and the like, who provide comparison across their own platform merchants, but not across all merchants everywhere.


Google guessing seems like a terrible idea. That will just confuse the consumer when they go to purchase and find a different value, possibly creating a customer service problem for the vendor through no fault of their own.


True, but you can at least show the total shipping cost for the shopping cart, given a zip code or a similar indication, without completing the whole account creation/checkout process.


Is Google indexing the shipping cost?

I don't see how this is relevant because ecommerce sites will change the price in the cart (or reveal it) before shipping is even calculated.


Even assuming you already know the customer's shipping address and ignore the multiple-items problem, this is still difficult to accomplish from a computational complexity perspective. Calculating shipping cost is likely at least an order of magnitude more expensive than simply looking up list prices in a database - you have to look up the customer's address, go through a bunch of tax rules, figure out the shipping cost (however that works, I honestly don't know but I assume it's non-trivial), etc. Now consider the fact that prices displayed at checkout make up a tiny fraction of prices that are requested by the site. Every time an item appears anywhere on the site you probably want to display a price with it. So now your infrastructure costs for handling pricing requests go up by an order of magnitude since all of them no require expensive pricing computation, whereas only a tiny fraction did before.

On top of all this, if all you're displaying is list price you can cache that very effectively and significantly reduce the load on your backend, probably by at least another order of magnitude. As with many things, items loaded on ecommerce sites tend to follow a Pareto distribution, for which caching is very effective. Adding a shipping address to the mix will destroy this caching ability, so not only are your requests 10x more expensive, 10x more of them now make it to the backend. There are various tricks you can do to try to have your cake and eat it too, but none of them are easy or simple. At the end of the day, while this is definitely a useful and desirable feature for customers, it has significant cost for both development time and hardware.

TL;DR this is actually a much more difficult technical problem to solve cost effectively at scale than it initially appears.


In many cases, this is not only computationally expensive on the server, it also requires one or more requests to external APIs which further slows things down. Imagine needing to query the API of your tax vendor, then also your shipping provider of choice for every single item displayed on a page. Even if you did this client side, asynchronously, it would be a lot of extra requests for something that most shoppers won't even pay attention to.

I ran an ecommerce platform company for many years and you had merchants with very complex shipping and tax schemes, or you had merchants that made it super simple with a basic rate table. The complex merchants had margin on every order at the cost of processing external API calls. The simple rate table merchants had great margin on some, lost money on others but were happy with their average shipping margin.


Having worked with a major eCom platform as well, this is exactly the standard case. Both shipping and tax are complex problems which do not have a simple solution for scraping by a search engine.

Shipping is often highly dependent on the location of the buyer and often involves full estimate calls from each carriers APIs (USPS, FedEx, UPS). The only major data point I would focus on is whether the shipping is free or flat rate.

Tax is even more complicated. Merchants often outsource tax calculations to a third-party service such as Avalara, which calculates unbelievably complex taxing schemes even down to the zip code, as tax laws are becoming increasingly more complex.

Because of these reasons taxes and shipping are not widely useful data points for search engines. That may change in the future, however. I could imagine it becoming another SEO topic to be accounted for, similar to meta tags on product pages.


Well we can change this by including shipping as the total price and not give deals on shipping. Deals on shipping are dark patterns.


I don't understand. So if the item is $5 and shipping is $5 regardless of the number of items being bought, then if I bought 2 I should pay $20 instead of $15?


What don't you understand about deals on shipping being dark patterns?


It seems that you are saying that properly charging for shipping to be a dark patterns? Or you mean different kind of deal? Can you elaborate? As far as I know, the shipping charge doesn't scale linearly with number of items, so to me including shipping within the items price is going to overcharge the customer.


> This bot is simply trying to get the final price (with tax and shipping) which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.

It's usually not possible because you don't know how much the shipping + taxes are until the customer enters the billing information.


I just used a website that had a simple form at the bottom of the cart. It had one text input for the postal code, and a button to get the rates to that postal code based on what was in my cart. IMHO, this is how it's done right, since all you need to know is general location and weight.


Maybe I haven’t done enough online shopping recently, but as far back as I remember this used to be the norm: enter postcode to calculate shipping, get precise final price without even adding to cart. Is it not the case anymore?


On many websites it still is. But recently many smaller independent stores use the Shopify platform where shipping is the penultimate step, before billing. You have to give address, email(!), phone number(!, mandatory), etc. before getting the price. I normally just use a fake email and number to get shipping price, and then do the actual checkout in incognito. Pretty sure if you do enter your real info and don't continue with the purchase then you'll get email spam telling you to buy stuff.


Also increasing conversion rates via the sunk-cost fallacy. If I see up front that the air conditioner I'm ordering costs $30 to ship, I'll check another site. But if I already decided on this one and I just did all the work to go find my credit card, enter my billing info - maybe I'll just say "ehhh. Fine." and purchase it anyways.


Phone number is required or otherwise highly encouraged by some shippers, like FedEx.


I'm pretty sure most sites (eg. Amazon) does the same thing. Probably for the reason you mentioned: so they can have your contact information to send you spam later.


My girlfriend uses this as a pretty effective tactic to get discounts - you just have to wait a couple of days and they'll send you an email with a lower price to keep down abandonment rates


Haven't received any "items in your cart" emails from Amazon peronally, but I will end up seeing Ads for the products later on.


Is that US only? You need country too.


For commercial scale products you need more than that. The full address. Is the address residential vs commercial with a loading dock? That and more factors impact the shipping price a lot! Logistics companies have people who have to research an address and look at Google Earth photos of the property to answer these questions.


I will bail from the purchasing process on sites that are unwilling to give me a final price before I enter payment information.

A postal code should suffice, and I'm not providing more personal information if the site is unwilling to say what I'll be charged upfront.


Isn't that most of them? Unless you assume they eat variable shipping by 'fronting' you a fixed price. Apologies if I'm misunderstanding what you're saying.


It is still typical (for which I am thankful), but that seems to be shifting a bit.

A relatively common example would be shops using the Shopify platform.


Yeah that's true. But from a UX perspective there are ways to make this less opaque. Perhaps a call to action at the top of the listing with an entry box to enter the Zip Code so an approximate final price can be calculated.

Any good UX designer can come up with a solution for this in a couple of hours or less. There's just no motivation to make it happen because this obfuscation of data is particularly optimized and useful for the sellers.


Amazon just lists it as a "subtotal," which I think is probably the best way to do it. I don't know about the "UX designer in a couple of hours" line: automatic shipping is a nightmarish bag of worms and isn't really a UX problem. What do you do if they order multiple SKU's that don't pack nicely into one box or are warehoused in different locations? Or if there is something weird about their shipping address? What do you use for box size and weight, and what approximation did you use for dunnage?

You can approximate UPS/FedEX costs by fitting a trend line, they are decently modeled by a linear (base charge + K*distance) function, but when you go to buy the label you might be way off. This puts you in the lose-lose-lose of either eating the difference of wrong estimates, overcharging for shipping and losing conversions, or just making people hate you by increasing the shipping over the estimate. Making a shipping API call is noticeably slow, so most people require an interaction after the address is entered.

Tim Sweeney's hot take was that "the two hardest problems in computer science are cache invalidation and shopping carts!"


e-commerce sites seem to be asking for my location all the time anyways.

If they're doing that anyways, they should have everything they need to hazard a pretty good guess (and then they have an actual inducement for me to provide it).


Agree!


ZIP code box that doesn't oblige you to provide any more accurate data, and also without it, it should still be possible to display brackets. "Shipping: $6 - $24, [enter ZIP code for detailed quote]".


They could at least include the tax in the price. That's normally fixed depending on the item category e.g. low or high tax rate. Only the US is being weird with its taxes.


Tax is depending on where the buyer is coming from.


Canada has provincial sales taxes.

It's really not that weird when you consider the US is not a unitary state.


I don't know if this is true, but I've been many websites that actually claim that they have an extra deal they can't show you until you put the item in the cart. I used to see those a lot a few years ago, not sure if it was a real legal thing or just a trick to get people to add it, but that was definitely a thing and not related to tax+shipping.


It is a contractual obligation with the product manufacturer.

Something like: we will allow you to sell our product but only if you don't advertise discounts more than x%.


> It's usually not possible because you don't know how much the shipping + taxes are until the customer enters the billing information.

Sure, they show you a different content depending on your IP address and lots of shady heuristics, but when it comes to estimate a shipping cost, it is absolutely impossible: you can just be anywhere, who knows where you are. I say bullshit.


All EU prices must be tax included.

All hospitality prices must include cleaning and service fees.

It is only the US were hidden fees and charges may apply and the price regulation, if any, is more tilted towards corporations.

Only upfront shipping charge is tricky because it depends on so many factors.


I find not including tax better tbh. That way people are being reminded how much tax they are with every purchase.


Every EU bill should come with the price without VAT (value-added tax) and with the VAT applied, leading to the same result.

The only difference is that the VAT is a certain % depending on your country (mostly 15-25%), making it easy for a merchant to calculate it with a single data point. Percentage is also always a fixed round number, so you can calculate it in your head when you stumble upon "+ VAT" somewhere without providing that data point to the merchant for calculation purposes.


Yea, you still can get the info but how many people do really care. I am just saying the people would be more aware on how much they are being taxed. Imo making it obvious like US does would be more helpful even if it is not as convenient

I used to live in Turkey, where taxes on some stuff are insane but most people won't ever know because it is not as obvious. In a similar vein, income tax is also quite hidden in Turkey, at least from yhe perspective of employee


Yeah. It's easy in the EU.

My favorite example of how absurd it can get in the US: one side of my friend's street has 9.5% sales tax, the other side has 7.25%. same state, city, and zip.


Your are right, normally VAT is a fixed integer value. But that is not guaranteed, historically we had decimal values, too. (My father encountered this once at IBM in the time of punch cards - turns out, having to change this on machines with limited RAM is quite difficult, akin to the Y2K problem.)


The UK was 17.5% VAT for years.


We don't break out how much the purchase price was reduced by the use of public goods, so it seems like breaking out the tax would be more misleading than informative.


Not only that, but certain brands on certain sites won't show the price, with that message "add to cart to see the price!"

I've heard varying explanations as to why, but at the end of the day it doesn't matter. Adding to the cart is the only way to scrape the price.


How do you show a final price if you don't have all the information needed: tax locality, shipping preference, total cart value discounts etc.


If I'm remembering right, Best Buy used to have "deals" on items that they "couldn't show you" until item was in cart. They may still be doing this. Best Buy's justification for it was that its agreements with manufacturers prevented it from displaying items below certain prices on their site. I'd never seen this elsewhere to know how pervasive these agreements were (or if Best Buy was just taking losses on certain items).


> Best Buy used to have "deals" on items that they "couldn't show you" until item was in cart. They may still be doing this. Best Buy's justification for it was that its agreements with manufacturers prevented it from displaying items below certain prices on their site. I'd never seen this elsewhere to know how pervasive these agreements were

This was also pretty common on Amazon.


Newegg still does this. Here's an item from their 4th of July sale.

https://www.newegg.com/p/2AM-008Y-00003?Item=9SIAEG2BMZ4393


KitchenAid is famous for these kinds of agreements as a measure to ensure the public perception of their value doesn’t go down whenever there are sales.


I used to build and manage ecomm sites. We had several manufacturers/brands that we had to agree not to openly display a retail price below a certain amount.

Incidentally, for almost every brand in our industry, that number was 1.8X listed wholesale.

We could sell for less, but not list a price of less. Implementing "add to cart to see price" was good enough at the time to keep them happy.


They still do it.


Makes you wonder whether the smart thing to do is just make it convenient for the bots to get the info out so they won't ruin your data and waste your bandwidth. Can't fight them, join them. Can't really stop free information flow.


Or they could stop spying on their customers and trying to figure out how to add dark patterns to maximize "engagement" and "conversion" lol.

I mean yeah, I get that if you inadvertently make the checkout button hard to find, you'll lose potential sales, but I don't think you need intricate data about what your customers are doing to figure that out.


Not sure about dark patterns, but you're talking about magnitudes of tens if not hundreds of thousands of visitors here.

Increasing conversion rate by even a few percentage points has huge revenue implications.


They already do that, there’s all kinds of XML and JSON and whatnot standards to communicate product info, inventory, whatnot. The reason Google is doing this is because this information cannot be trusted all the time, there will always be bad actors.

The process may eventually evolve in a cat-and-mouse game, where malicious e-commerce sites try to detect these Google crawlers and serve different price info to them, but let’s hope it doesn’t get this far.


Given the possibility you get detected and it impacts your organic search ranking... I'm not sure any serious vendor would risk it. And if they do, let them burn.


Or quietly detect the bots and feed them junk data after they've gone through the hoops. Not saying it's the better option, but knowing the business maybe the more likely one.


> Or quietly detect the bots and feed them junk data

Just a few customers with alternative browsers being detected as "bots" will poison your reputation and income stream.


No, it definitely won’t. In fact our card payment system has been updated to think alternative os and browsers are high-risk and even decline payment. We didn’t have any revenue loss.


That's why bots now mimic users beyond user agents, even going so far as loading page assets and javascript. Unless you're using something like recaptcha V3, it's going to be difficult to detect them, and even that requires some interactions first.


As someone who uses an alternative browser… I kinda doubt it. As a group we wouldn’t even move the analytics needle let alone revenue.


Nah. Google doesn't even let people log into their own email with alternative browsers, and they are doing fine.


Which is why it should be a chrome extension (like Honey); exfiltrate the data out while providing the end user financial benefits. Messing with the data breaks the user experience and impacts revenue of the target site.


There's a lot that goes into when and where you can show the final item price.

Assuming a simple product, you don't know where the user lives so you can't apply the correct taxes yet. In AUS this is easy because it's the same nationally. But in the US there are dozens of tax combinations that could be applied depending on the location.

Shipping obviously depends on where you live and what you're buying, and few places charge per item shipping these days so it doesn't even make sense to include shipping in a single item cost.

Then you have customer group discounts, some customers get different pricing when they are logged in, even item combinations in the cart can have different prices, you get the idea. It is usually not possible to calculate ahead of time.


>which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.

On top of the other reasons mentioned, the seller may have a contract with the manufacturer that covers a "minimum advertised price"

"The FTC says that the price displayed in a secure or encrypted shopping cart isn’t subject to MAP because it’s technically not advertising."

(https://www.thebalancesmb.com/what-is-minimum-advertised-pri...)


It seems like it will also mess up item availability information. If an item has limited stock, bots adding it to carts could make it appear out of stock to real customers.


Most sites don't subtract it from inventory until it's on a completed order.

The main exceptions I can think of are venue and plane tickets, and hotel rooms. These might put a hold on a specific piece of inventory for a short time. They usually tell you when they do.


As a developer of automotive ecommerce sites, it is very common for sites to list a higher price 0h the catalog, and show the true (generaly lower) price in the cart. This is becuase the businessmodels are highly margin sensative, and competative pricing can have big impact, so its a measure to try and mask real pricing.


Also, seems like a clever hack for automated scraping after all most carts are pretty uniform in their structure.


e-commerce storefronts should do that in the first place without going through the whole checkout process

Yes but how would you verify this or hold them accountable?


There's a tiktok meme doing this to harass the Trump campaign's online store.


For people saying this to calculate the final price with shipping and tax, it's not (or at least not entirely). It is for this new sales conversion dark pattern where prices aren't listed until you add to cart.

Ebay sellers are particularly bad offenders: https://www.ebay.com/itm/Open-Box-Certified-Samsung-Galaxy-1...


Google disagrees with you:

> When The Wall Street Journal contacted Google in June, a spokesman at the internet giant, after a few days of digging, provided an update: The mystery shopper is a bot of its own creation. The purpose: making sure the all-in price for the product, including tax and shipping, matches the listing on its Google Shopping platform or in advertisements.


this is what we've seen as well. it validates that whatever price, promo, shipping and taxes you've put into your feed is what ends up in the final checkout and there's no bait-and-switch going on between the feed and reality.

it's rather annoying because it creates dozens of "abandoned" carts per day which we have to continually clear out (based on Google's known ip address ranges) so our reps can go through actual abandoned carts.


This is more likely just a contractual MAP (Minimum Advertised Price) policy by the manufacturer, not a dark pattern that is of the retailer's choosing.

https://www.thebalancesmb.com/what-is-minimum-advertised-pri...


I personally believe that anything that can be automated by software should be automated by software. If it takes programmatically clicking exactly the same A, B, C, D sequence to display what the user wants, that clicking should be done by the machine, not the human.


What am I missing on here? That item has the price listed without having to Add To Cart.


The modal that pops up is not in the dom until you click the "See details" link, which has target="javascript:;". The "Add to cart" button is an actual link. I wouldn't be surprised if Google just doesn't want to run javascript to extract pricing information if it doesn't necessarily have to.


That clearly has a see details button that shows the price.


Most dark patterns have a non-intuitive way of circumventing them (the small-font faded-color "no, thank you" button comes to mind). That is Ebay's.

Other examples here: https://ux.stackexchange.com/questions/83050/price-too-low-t...

Amazon example from a few years ago: https://lh5.googleusercontent.com/ztyT6xTPaTr9TtP8LwlRJBE6RV...


That sparked a funny idea in my head, what if we tricked product managers industry wide to follow KPIs and A/B tests that resulted in a better user experience for consumers, instead of experiences that coincidentally slightly upticked "engagement".

Because it seems like this mystery shopper is already doing that.


„Messing up your competitors A/B test“ is not unheard of as a tactic in highly competitive ecommerce settings.


Do software engineers actually implement that? That seems pretty immoral. I'd rather let them run the a/b test and steal whatever solution they end up with.


I can't find reasons why would this be immoral. I'd say it's rather aggressive and won't earn you good reputation for sure. But it's sort of fair game. Compared to many business practices (lobbying, forced arbitration, patent trolling, DMCA, price dumping etc.) this is extremely mild one.


Generally active sabotage is frowned upon as opposed to winning in fair competition.


Eh, I am sure you could convince yourself it isn't immoral... everyone in HN seems to think things like google analytics are bad because of the privacy implications, and doesn't have a problem blocking them (which would also 'mess with a/b tests'). You could just argue that you are hindering their user spying.

Not a great argument, but good enough to allow a developer to sleep at night.


True, but this is not "messing up your competitors A/B test".


In some contexts you do that or you're fired. Some people can't afford to be fired, so they do it.


Consider companies like Uber....


Given that engagement metrics have been heavily interfered with for many years, as a result of bots and other activities, and yet PMs still rely on them it seems unlikely that they will be pulled away from that spectacle anytime soon. I like your idea though.


> "and yet PMs still rely on them it seems unlikely that they will be pulled away from that spectacle anytime soon."

I think they meant, since PMs will never stop using metrics, we should write bots that skew those metrics in favor of an experience for the consumer rather than the perceived increase in engagement.


What are some examples good KPIs and A/B tests for better consumer user experiences? Engagement is obviously deeply flawed if a good consumer user experience is your goal, but it does have the nice property of being easily measured. Do you rely on users constantly rating their experience on a numeric scale?



Thanks! I was not aware you could use Web Archive for that. All the more reason to Love that site!


I'm not sure archive.is and archive.org are the same site.


They're not same!


robots.txt, man, if you don't want search engines to visit certain part of your page, use robots.txt!

Once heard a tale of an angry site owner calling Google (back when Google itself was novel) - Google deleted his whole website! Turned out he had "DELETE" button in each page, which generated plain GET request. So Googlebot visited the site, followed links to every page, and then of course followed every link that generated GET requests - because they are supposed to be safe.

Don't be like that site owner.


That has nothing to do with robots.txt, the problem is doing things in response to GET requests. I've said it before and I'll say it again: you do not do things on GET requests.


I think you're thinking of this: https://thedailywtf.com/articles/The_Spider_of_Doom - which obviously has two issues: the auth, and the actions on GET.


I'm curious though, why didn't Google properly tag their robot in this case? Or was the reporter did not know about user agents? It seems strange for them to crawl with a "mystery bot".


How do I use robots.txt to tell google to not add item to the shopping cart?


Well, theoretically, your Add To Cart button could have an href with a path that’s banned in robots.txt, but overridden with JS.

But most online stores should be happy to have Google crawling their prices and showing up under the Shopping results.


Erm... hide the shopping cart page behind robots.txt?


As someone who has seen way too many robots.txt files that's exactly how you do it.


Protip: You will often get a discount coupon if you go through most of the checkout process(need to provide email), but wait a couple days. Many stores automate abandoned checkout promotions.


Yes! This is also something that is common with smaller online retailers. Don't expect this with B&H, Adorama, or Newegg. Frequently these small companies give one time codes you won't see or be able to gain elsewhere.


For a while there were registrars that gave a discount when you abandoned your cart.


A good example of "proof of work" used for price differentiation.


It's just price data collection. In particular, MAP policies can be skirted by not publishing a final price but having a price below MAP in the cart which is a common tactic that online sellers utilize. By pretending to walk through the cart, all sorts of data about pricing, taxes, etc. can be learned. It's not entirely uncommon to see different prices at different times, for different user agents, for different locations, etc. Used to work for a company that build huge price collection systems and built many of them...


MAP == Minimum Advertised Price


The real problem with this is from the merchant side of things.

This bot generates thousands of "Abandoned Carts" on one of our sites... thousands...

We send cart reminders to Abandoned Carts after a few days, sometimes with a coupon offer to complete checkout.

This bot is responsible for thousands of bounced emails each week, which impacts our metrics with Mandrill among other things.

Maybe we shouldn't care, but it's sloppy and ruins all sorts of stats we keep track of regarding cart abandonment rates, recapture rates and more.


>We send cart reminders to Abandoned Carts after a few days, sometimes with a coupon offer to complete checkout.

I consider this spammy behaviour, and mark the emails as such. I can only hope this discourages such practices in the future.


It doesn't. If you mark it as Spam through most email programs, it's reported to the sender (Mandrill in our case) and Mandrill automatically black-lists your email address so we don't continue to send to someone that doesn't want the emails.

That's a win-win.


Still an annoying and anti-consumer practice. Another "growth marketing" tactic that doesn't take into account the number of people who never visit that site again because of the spammy stuff.


The overwhelming majority of folks aren't so principled as to black-ball a website they like, selling products they like, from brands they like, and prices they like all because they received a cart reminder email with a special coupon inside.

Maybe you are? Just don't project that onto everyone else.


Do your users consent to contact before you send them reminders or coupons? If not, you've earned your bounce rate.


Of course we have consent. Not sure what kind of question that is?

Violate CAN-SPAM Act and risk a $16,000 fine per instance? No legit business is going to do that.

Just ask Papa Johns how painful those fines/settlements can be[1].

[1] https://topclassactions.com/lawsuit-settlements/lawsuit-news...


I personally find the 'email abandoned carts' behavior to be a dark pattern


Dark pattern or not, it's super effective - particularly when accompanied with a coupon ;)

Besides, the user is opting-in to receiving these emails. They don't have to provide an email address - so some are probably playing the game and seeing if they get a coupon or not.

As an aside - if the internet worked the way SV hipster brogrammers thought it should work, nobody would use it.

Yes, ads are crazy effective - you can ignore or block them, we don't care because enough people don't block them and are happy to click.

Yes, emails are crazy effective - you can ignore or opt-out or never opt-in, we don't care, you just cost us money if you're not engaged anyway so we'd rather you not be on our mailing list.


So you are saying these bot accounts have opted in to receiving emails? You don't validate the email when someone signs up?


The site terms and conditions are displayed very publicaly and accessible to anyone who cares to read them.

By entering your email address you are opting-in for transaction related emails, including new order confirmations, shipment notifications, and yes cart reminders. It's spelled out for you.

It's the same for almost every ecommerce site.

That's different than marketing emails, which require a separate explicit opt-in - ie. the user has to go and type their email address into another form and click "Sign up".

It doesn't get any more transparent than that.

Don't call something a dark pattern just because you can't be bothered to understand what you're consenting to when using someone else's website and start entering information like your email address or more. That's entirely on you.


I am not sure what your conception of a dark pattern is, but the idea that adding something to a TOS means it can't be a dark pattern is simply false.

The whole concept of a dark pattern is about UX choices that lead people to agree to things that they don't actually want; it isn't about whether you break your TOS or not.

I am saying, I think that if you asked people point blank "do you want websites to email you reminders if you leave something in the cart?", most people (myself included) would say, "no, I don't want to get that email"

You can put whatever you want in the TOS, but it doesn't mean users like it, and a user agreeing to something doesn't mean they like all the things they are agreeing to.

Whether it is "on me" or not, it is still a dark pattern.

Also, as a user entering my email, I am not promising you that I will always accept email to that address from you. If you try to send me email and I reject it, that is 'on you' to deal with what that rejection does to your spam scores.


That's a one-time rejection, and is rightfully treated like an unsubscribe request. No harm done to either party - and the merchant is actually happy since we don't want to bother you. Guess what? Angry customers don't buy your products. Seems intuitive.

You might think people don't want these emails just because you don't want these emails. That's pretty biased.

Ask any ecommerce company - these emails, both marketing and transactional, are wanted by most people shopping on the site. The statistics simply prove your argument is false.

This is a common theme among techies. You don't like something... say ads... so you assume nobody should like them and they should be done away with entirely. That's an absurdly short view of the world.


I never said no one wants them; I was very careful to say that I considered it a dark pattern, not that it was a certain dark pattern.

Also, I don't think any amount of statistics is going to be able to show you if people truly want emails or ads... just because they increase sales doesn't mean people want them. Ads can be both effective and unwanted.


> I never said no one wants them; I was very careful to say that I considered it a dark pattern, not that it was a certain dark pattern.

OK fine, perhaps I interpreted your OP incorrectly. That's your prerogative, and you can do whatever you please.

> Also, I don't think any amount of statistics is going to be able to show you if people truly want emails or ads... just because they increase sales doesn't mean people want them. Ads can be both effective and unwanted.

True, but we're not just talking about increased sales.

We, and most companies that are serious about this stuff, track open rates, number of times the same person opened the same email, click rates, text link vs image link clicks, page dwell time after clicking through, session length and bounce rate, which pages they browse, which products they view, were the products related to the email that initiated the session, did they add something to their cart, how frequently this individual engages with our content, how long they were dormant, order frequency, etc.

Basically, we're interested in how "Engaged" you are with the site, brand(s), products and content. People who open every email, click on a bunch of links, hang out on the site for 20 minutes and add stuff to their cart are highly engaged, and are doing actions that indicate they like what they are seeing/receiving.

Remember, the people signing up for marketing emails are the most likely to be engaged with your brand/product/website. They've actively said, yes, please end me content from your company. If they lose interest some day, no problem, either our stats will show this and we'll unsubscribe them automatically, or they'll actively unsubscribe themselves.

Ads might be a different beast - however, you'd be surprised how many people click ads, and then buy products. It's immense. Clearly, the value provided there was getting the right product in front of them, matching what they were looking for, and offering it at a price that's attractive to that customer. In this scenario, I'd say it's wanted too - they got what they were looking for quickly and effectively. Everyone is happy in that scenario.

Everyone else can just run an ad blocker and choose not to subscribe to marketing emails. I do both myself... but I'd never assert these things were unwanted by a lot of people or ineffective.


> By entering your email address you are opting-in for transaction related emails

You realize that you can't send marketing emails without double opt-in, right? "Hey, you forgot to buy these ..." is definitely a marketing email.


> You realize that you can't send marketing emails without double opt-in, right?

That's not true.

> "Hey, you forgot to buy these ..." is definitely a marketing email

Also untrue.


Can’t you useragent sniff the bot and cut it off?

If u want help coding this or advice happy to help (for free)


If you cut it off, you get penalized in Google Merchant Tools, and possibly have your product feed suppressed, which will dramatically impact your search visibility for both text and product searches. It can also impact your Google Ads if you link that with your product feed, and more.

So, effectively, no you cannot cut this bot off.

To make it worse, the bot doesn't always follow the same pattern. Sometimes slightly different names, addresses, etc.

We initially thought it was fraud attempts, but none of them actually attempt a checkout. They just enter all their info on the checkout page, get the final quote, and bail.

It would have been nice if Google told people about this instead of it just happening. Or allowed you to schedule a time slot for it to do what it's going to do.


It sounds like you interpreted malux85 suggestion as cutting the bot off from putting things in the cart altogether. What I understood is that, given it's recognizable by user agent, those carts can be marked as created by a bot to exclude it from statistics and reminder mails only.


Perhaps. Not everyone is lucky enough to have built their own ecommerce platform, so many people are at the whims of whatever tools Shopify, BigCommerce, 3dCart or others provide.

For this particular problem, none of those platforms can provide any assistance.


I wonder why Google didn't include a "cleanup" routine that empties the shopping cart after the data is collected. It seems like it would be a trivial thing to do, unless I am missing something. I guess the answer is because it would not benefit them in any way.


We'd still have an abandoned cart, since the session was created and held some sort of data - but it would be far less disruptive for sure.

Empty Abandoned Carts are useless anyway (for stats and other things - we track bounce rates in other ways), so that would be a large improvement from what is going on right now.


Good lord, that sucks, hopefully a google engineer is in this thread...


What kind of emails does the bot use?


johnsmithus95@gmail.com john.smithus74@gmail.com johnsmith.us43@gmail.com

and more variations...


Are there legal implications to Google bots transacting with websites under false pretenses?

I mean their normal web crawler identifies itself as such. Here, I feel like they're committing (very) minor fraud by putting in fake shopper information and actively hiding their identity. Not a big deal if it were just some Joe Schmoe somewhere, but at their scale might it border on harassment? The robot equivalent of a prank call?


Probably a violation of the CFAA. Lots of people hate it because they think it's overreaching, and lots of companies use it to legally threaten scrapers and security research. But in this case Google is doing mass unauthorized use of other people's computers.


I think that's outdated information. ToS violations aren't prosecutable under CFAA since April.[1]

1. https://www.eff.org/deeplinks/2020/04/federal-judge-rules-it...


If I'm doing price comparison between online vendors, I will---as a human---put some items in the cart and get right to the edge of checkout to determine what my final bill would be. I may not close the sale if I'm looking at a better option elsewhere.

How is what I'm doing materially different from what Google's doing? Is scale a factor that matters for CFAA?


Maybe you are violating the CFAA by doing that? It's a very broad law.


I think FTC should install a law that says that shops should be more transparent about their prices. That would solve the entire problem in the first place.


You should worry more about sellers engaging in anti-competitive behavior like bait-and-switch or price fixing.


Genuine question, is this not considered a DoS attack?

Let's imagine I have my online stock linked to limited physical items/assets, ex tickets for a show, which will get reserved for a period of time. This will be preventing genuine clients from buying them.


I'm thinking - if I forbid this in my site's Terms of Service, will DoJ go after Google for CFAA violations like they did to Aaron?


Yeah.. probably depend$ on how loud you can make yourself heard..

RIP Aaron


You can always update your robots.txt or block the Googlebot UA. (lol)


Possibly it is lower traffic than a full on dos?


Yes, in regards to traffic. But it's still denying me from providing a service to real customers.


Would it be too much for Google to program the bot to get the final price, and then delete all the items from the cart? Seems rather rude, even for Google.


I abandon carts more often than not. Pretty much for the same reason as the bot: I wanna know how much I'm actually getting charged with taxes, shipping and coupons. I'll do similar orders on multiple stores, and only finalize the best deal if I'm satisfied with it. Sometimes I just "walk away" because nobody's selling at my pricepoint.

Is this rude? I really don't care.


Do you consider visiting a site and then leaving the site before looking at anything there rude? People can't change their mind?

Putting something in a cart and leaving it there should be inconsequential to the seller. The only thing it might affect is their analytics. By the way, I consider analyzing my site visits and other data about me to be much more rude than abandoning a cart, especially when most sites don't even tell me they're doing it.


Is abandoning a cart really rude behavior? I sometimes do it just to see if they'll spam me as a test of if I want to do business with a site.


It's not rude at a consumer level, where (in general) you're at least considering making the purchase. It's arguably rude at a bot level, depending on the frequency, where there is 0% chance of conversion.


The entire purpose of the bot is to provide listings to consumers who are looking to buy.

If it was consumer journalist doing it to get the price for a news article (in a for-profit publication) about the product, would it be “rude”? If not, how is it for Google bot?


Because bots will do it at a much larger scale than individual humans. The first law of web robotics applies here: the bot should not harm the website it's crawling, or through inaction allow it to come to harm.

I didn't read the article due to the paywall, but I assume that the problem is that the problem is that these goods are reserved for that (non)-customer until the shopping cart times out? That is directly costing the merchant money, either in lost sales or having to maintain extra inventory.

So yeah, that bot really should have been programmed to end the session with an empty basket one way or another.


An abandoned cart reminder email sent a few hours later has a ridiculously high conversion rate - around 15% in my experience. Online vendors aren’t going to stop that practice, especially when the big e-commerce platforms make it easy to do.


Reason enough that they should be illegal or at least strictly and specifically opt in, not as part of general marketing consent.

If a customer really wants or needs something they’ll go back and buy it. The world doesn’t need the excess consumption of people psychologically manipulated into buying stuff they weren’t going to.


Such a bot could be used to damage ad tracking


I wouldn't fault them for that, I've observed some sites most likely are gaming the system by detecting and providing Google bots with artificially lower prices so that they would appear in indexes summaries and then when you access the product, its real price is always higher than the one reported in the index.


yep, I see this type of behaviour constantly - faked prices for Gbot, fake prices on Cache, significantly higher price for end user.

It's also infuriating to sort by price and get inflated fake shipping prices to "make up the total"


I used to work at a company that provided APIs used for search/personalization/autosuggest for a whole bunch of huge e-commerce companies. Since the entire integration with the customer site was API based, we worked off of tracking pixels, API requests and cookies to determine shopping behaviour. A lot of this went into determining things like ranking (If someone searches "Tshirt" what shows up on the first page and in what order etc.)

Since we were only running search and not payment processing, the tracking pixel/API for "Add to Cart" was a big thing for us. The whole product ran on revenue-share so we were paid per X ATCs

Interesting to see if any of the customers were affected by bots doing ATC and how it was handled if it was.


Digital shopping cart abandonment/Inventory Exhaustion/Hoarder bots is an interesting type of DDOS.

There's a popular moment of people using it atm https://heavy.com/news/2020/06/shopping-card-abandonment-tik...


It would be cool if Google could manage to become a storefront for the entire web, thereby eliminating Amazon.


Between the two, personally, I would rather an advertising company like Google be eliminated, or at-least be regulated to protect user privacy. For me, it's easy enough to avoid Amazon, harder to avoid Google since every website uses them to spy on users.


For Google (or anyone) to become a storefront for the entire web, they'd need to handle scams (and errors) well.

eBay is a cesspool. Aliexpress is worse. Random web sites are bad. Amazon isn't perfect, but it's better.

Amazon also has customer service; they've always made me whole. Random web sites, I'm basically SOL. Aliexpress and eBay are random. Someone flips a coin, heads seller wins, tails buyer wins, regardless of who the scammer is.

I mostly buy from Amazon since my odds of not having problems are that much higher.


Exactly this, the customer service for the average consumer from Amazon is very difficult to beat and is Google's biggest weakness.

Bought some cables from Amazon Basic, one ended up not working, another had some cosmetic damage but works fine. They refunded both, sent out replacements, and just told me to discard them, it wasn't worth it for Amazon to pay to have it shipped back.

Of course if you abuse this too much Amazon will ban you. If you are an honest consumer though, their customer service generally provides a great experience.

I still remember a time when everyone was afraid of purchasing stuff over the internet, Amazon has so greatly reduced the friction and concern that sometimes I find myself going from "hmm, I need something" to "it will be here tomorrow" in the matter of a minute or two.

Although more competition in this space would ultimately benefit the consumer, it seems unlikely that Google is going to be the source of that competition. They've got shopping results integrated into their search engine, and it's a feature I've maybe browsed from time to time, but I often just end up searching and purchasing on amazon directly. I don't know if I would be super comfortable purchasing from Google in the same way that I am with Amazon, too many horror stories of App Developers / YouTube Creators / etc getting caught in some sort of Machine Learning Customer Support system.

Curious if others use the Google Shopping thing in the search engine and what their experiences are with it.


> Exactly this, the customer service for the average consumer from Amazon is very difficult to beat and is Google's biggest weakness.

Amazon's customer service is a robot, which switches to someone in a callcenter in India, and then finally switches to a local person. I know because I recently had to contact them.

Not sure how this is "difficult to beat".


It's difficult to beat because prices are a race to the bottom, and small players have no effective way to build up and manage reputations.

If I need a widget, and Vendor A charges a buck, while Vendor B charges two bucks, all else being equal, I'll buy from Vendor A. Bad customer service helps both vendors compete with each other, but prevents small companies, collectively, from competing with Amazon.

On eBay, small players do manage reputations, but only for a few weeks. If a product fails (or is discovered to be a fake) after 60 days, the seller is all good. Next sucker! There are things I'll buy there, but far more I won't.

Google itself has the problem that culturally, it relies on algorithms which know better than you do, and is not a service company. It does great tech, but holds human being outside of Google in open contempt. That's find for running a search engine, adwords, or gmail, but it crashes-and-burns for ecommerce.


It's difficult to beat because those people can issue refunds. Now you need to deal with buyers scamming the system, sellers scamming the system, customer service employees scamming the system, and idiots. That's a lot of complexity to balance that can cost you a lot of money if not done well. In an industry with very low margins.


You and I have a very different definition of "cool".


I sure would love for my ability buy goods to be blocked by some badly coded ML algorithm with my only recourse being to yell about it on social media. Yeah, I'll take Amazon any day of the week.


This feels like a great way to get data on how all these different e-commerce companies approach remarketing.


I think I've seen most Google's technologies dissected and/or explained in detail over the years. Lots of their own papers too. If you look into how and what they're doing regarding data collection, including scraping, there's nothing.


Funny, a one quick gig I did in my college years was to write a shopping bot protection against "guaranteed lowest price" scraper like tigerdirect, or RFD.

Back then, the goal was exactly the opposite.


When and why did news cease being news and start being short stories and opinion? This entire article could have been cut down to the last few paragraphs and nothing of value would have been lost.

Look at The New York Times in 1921 [0]. Generally the stories are factual and to the point. The entire front page seems to be pure news. There's very little storytelling here, at most there are a few timelines of events.

Look at The New York Times today [1]. There's a bunch of factual and useful Coronavirus information but ~15% of the page is dedicated to "Opinion", the second article appears to be pure speculation, the third article is a bunch of storytime fluff around a little bit of news and the front page has a mix of actual news and opinion pieces being passed off as news.

When did this happen? Why? Did people lose interest in actual news? Is there less actual news to report?

Perhaps this is regional? Take for example the story about the San Quentin prison. NYTimes [2] has the same drawn out nonsense as this Google story while Aljazeera [3] adds a lot of background but sticks to factual reporting.

[0]: https://archive.org/details/NYTimes_jul16_31_1921

[1]: http://archive.is/oiiXU

[2]: https://www.nytimes.com/2020/06/30/us/san-quentin-prison-cor...

[3]: https://www.aljazeera.com/news/2020/07/san-quentin-prison-se...


Maybe you don't know this, but the "A-hed" article of the WSJ is the humorous, light-hearted take on some cultural phenomenon that appears every couple of days. It's got a distinct separation (graphically) from the rest of the news, and is written not to be taken too seriously. (It's not so apparent in the online version, if you haven't read it before).

So you don't have to worry that it's some broad decline in journalistic standards (at least based on this)... The WSJ is one of the few quite reputable news rooms out there.

You can read about A-hed articles here: https://www.wsj.com/articles/SB10001424052702303362404575580...

And there was even a book published a few years ago with collections of these kinds of amusing stories: https://www.amazon.com/Floating-Off-Page-Stories-Journals/dp...


> It's not so apparent in the online version, if you haven't read it before.

I think this is the core issue that leads to sentiment like OPs. Real news still exists, but it's the highly editorialized and opinionated articles that are shared more widely. News agencies universally are terrible at obviously differentiating the two to users.

99% of people when they open an article scroll directly to the content. But any discerning features (in this case, a small font A-hed link) are tucked way at the top. In this case, the A-hed link takes you to the A-hed home page but still does not offer any context to what A-hed is.

> So you don't have to worry that it's some broad decline in journalistic standards (at least based on this).

As long as the WSJ does such a bad job at separating "WSJ the proper news room" and "A-hed the not proper news room", their brand will suffer. OP is proof of that, and I think it's safe to assume the average HN reader is more astute than the average citizen. A tiny link to nowhere useful is not enough of a UI change for us to blame the user. The onus is on the news agencies to do a better job giving context for articles.


Is it the news agency's fault, or the reader's fault? In this case, it really isn't clear that it's not "real news". But I see plenty of people on social media sharing articles from news sites where they're clearly marked as an editorial or opinion piece, and believing or treating them as if they were exactly the same as a news article that attempts to paint a neutral picture of the facts.

Do newspapers need to put an explanation of what an opinion piece is at the top of every opinion piece?


Perhaps newspapers should stop running opinions and editorials altogether, or move them to an entirely different brand.


I don't really think that's the answer. Actual physical newspapers ran editorials / opinions for decades without a problem. I think the solution is better UI/UX.


A separate brand would be a better UX.

It's clear that internet distribution is very different from physical newspapers and people don't pay much attention to the domain and styling.


IMO, it is the news agency's responsibility to make it absolutely clear to the user what they are reading. If a majority of people come away with the wrong understanding after visiting a website, that is the website's fault. It's a clear UX problem.

Old paper newspapers accomplished this by physically having a separate section for opinion pieces. It's a harder problem when people share direct links to an article.

That said, news agencies don't try very hard. I think it's easy to argue they intentionally make it hard to differentiate. If WSJ really cared about making it clear to the user what this article was, they would have done more than hiding the word "A-hed" in a small font in the top corner.


Amen. Learning how to read the news is a skill just like everything else. The direct links of the internet make it harder, but one still needs to try and understand the context of the publication and how the article is presented. A-heads are entertainment.


This is the "You are holding it wrong" opinion I guess.

If A-heads are entertainment then why does it say A-head and not Entertainment? I am not at fault here if I don't know what the hell A-head is supposed to mean and just think that this news site is kinda garbage.

What exactly about the presentation of this article was I supposed to pick up on and see that it is entertainment. It is the exact same presentation than their "U.S. Seeks to Seize Iranian Fuel Bound for Venezuela" article. Is that one news or is it also entertainment?


This is a very literal approach. I assume that when you watch network TV you assume everything is coming at you with the same purpose and intent (news, talking head show, some documentary like 60 minutes.) Just because you want everything you read online to be "hard news" doesn't mean it will be. It takes a bit of work to understand the context of what you are seeing. So yes, you are holding it wrong.


What are you talking about? NO, I do not assume everything on the TV has the same intent. I never said anything even remotely close to that. I do however assume that if I am watching the news that I get the news.

I also don't want to be everything online to be "hard news", but when I go to a newspaper site I expect it to be news if it isn't otherwise noted. No the cryptic "A-Head" does not count. I am not a news insider and I think it is completely inappropriate to demand from the consumer to learn such cryptic jargon.

You said this:

> one still needs to try and understand the context of the publication and how the article is presented.

The context of the publication is that it is an article on the Wall Street Journal which is a business focused Newspaper and the Presentation of the article is exactly the same as the rest.

How is it my error then that I think an article like this is just garbage. (Hint: If your customers think your product is garbage then you need to do something about it and not just say "You are holding it wrong".)


> The WSJ is one of the few quite reputable news rooms out there.

The WSJ is owned by Rupert Murdoch. The credibility of their newsroom begins being compromised by his owning it. He will destroy its credibility utterly by selling it for politicial influence in news reporting. Just as he has everywhere he has bought media. The particular example of compromised credibility that comes to mind is the Times of London which is now Murdoch propaganda (all be it vastly more polite than fox news) where it used to do credible news reporting. Times reporting now can still be excellent but has a "be cautious" flag on it that it used not to have in the days prior to Mudoch. The man has become vastly worse in the past couple of decades as has everything he touches.


Murdoch bought the WSJ in 2007. When is he going to start destroying its credibility "utterly"?


Uh have you read their commentary/opinions? Half the time they come off as if they're trolling.

I'm sure at one point they were a thinking man's newspaper. At this point they're just fan service for people who have drunk the koolaid but can't stomach Fox's mass market approach.


To be fair the WSJ has always had some pretty outlandish opinion pieces. The tradition was that these were separate to the news reporting and the news reporting was untouched by them. But now it's in Murdoch stable. Sad.


He started in 2007. The date when you consider it utterly destroyed is up to the reader.


I feel like he hasn't taken any actions to compromise the credibility of the Journal. Remember their Theranos coverage? Wikipedia writes: "Elizabeth Holmes asked Rupert Murdoch —- who at the time was a major investor in Theranos and owner of the Journal — to "personally kill" an investigative piece being written about Theranos. Murdoch refused, instead stating that he "had confidence in editors to handle the truth - whatever it may be". Murdoch went on to lose approximately $100 million in his investments in Theranos."

The opinion columns slant very far right, and I'm sure Murdoch has played some role in that. I subscribe to the Journal and am outraged every time I see them. But, it's good to see what other people are talking about. It does you no harm to read an opinion column that you vehemently disagree with. I'm sure people feel the same way about opinion columns in The New York Times, which I also read and tend to agree with.

My TL;DR here is that if you are interested in the news, you should get it from a variety of reputable sources. The Journal has proved over the years that it is reputable. I haven't seen anything recently that would change my mind. I have friends that get all their news through social media and they are constantly outraged based on pure misinformation. If they read the Journal, they'd have a much better understanding of the facts. (Same goes for the New York Times, or Washington Post, or The Guardian.)


All of the links to the A-hed section in the WSJ article link to a 404. Is there an updated link?


https://www.wsj.com/news/types/a-hed works for me right now, as do all of these:

https://www.wsj.com/articles/baseball-stadiums-major-league-... "Baseball Stadiums Are Closed to Fans—but This Guy’s Balcony Is Open for Business"

https://www.wsj.com/articles/americans-ponder-how-to-throw-a... "Americans Ponder How to Throw a Party"

https://www.wsj.com/articles/how-do-doctors-treating-coronav... "How Do Doctors Treating Coronavirus Relax? By Playing the Game ‘Pandemic’"

https://www.wsj.com/articles/a-scientist-turned-the-coronavi... "A Scientist Turned the Coronavirus Into Music—Here’s What It Sounds Like"

https://www.wsj.com/articles/the-covid-15-lockdowns-are-lift... "The Covid 15: Lockdowns Are Lifting, and Our Clothes Don’t Fit"

https://www.wsj.com/articles/oversize-sneakers-are-hot-but-v... "Oversize Sneakers Are Hot but Very Hard to Wear. ‘I Just Can’t Balance.’"


[flagged]


All the news doesn't have to be serious. Hope you didn't miss the WSJ's reporting on Theranos which potentially saved lives by exposing Theranos' fraudulent lab tests.


They tried very hard to kill guy’s reputation by calling him nazi and falsifying facts. Those news were supposed to be taken seriously.

I’d even say that they were taken seriously, but in the sense that WSJ didn’t expected.


If you want the "pure" news as in 1921, subscribe directly to AP/AFP/Reuters/EFE. The world has changed, and most of the added value of newspapers now comes from contextualization that you seem to disregard to having no value, despite being the higher journalistic effort.

It's easy to report that a chicken died crossing the road, it's a lot harder to explain what chain of events led to a chicken dying crossing the road.


> The world has changed, and most of the added value of newspapers now comes from contextualization that you seem to disregard to having no value, despite being the higher journalistic effort.

But that's not what most of them are doing anymore. It's not so much context as it is agenda. The stories are written to lead you to a particular conclusion. From right now:

NY Times: Republicans are abruptly pushing Americans to wear masks, despite President Trump’s resistance.

Fox News: 'Looked like the Lone Ranger': Trump jokes about wearing mask, supports it

And that's not even getting into how the stories are chosen or prioritized.

NY Times: Some States and Cities Halt Reopenings As U.S. Cases Surge

Fox News: Media narrative of peaceful Seattle CHOP zone turned upside down

They've chosen sides. Maybe they always had, but it seems more blatant now. They're not even pretending anymore.


Fox News isn't even a news source — of course they "inject agenda". As for NY Times I don't feel see any agendas in the titles, am I missing something?


> Fox News isn't even a news source — of course they "inject agenda".

You could say that about a lot of places these days.

> As for NY Times I don't feel see any agendas in the titles, am I missing something?

In the first case they're both covering the exact same story but putting the opposite spin on it. Trump recommends that people wear a mask but then he often doesn't wear one himself.

He says he doesn't wear it when he's only around people who have already been tested, which could be true enough but I suspect a big reason is that it impairs communication. Human communication is lossy and full of redundancy to compensate. If you don't quite hear something but you can see their lips moving you still know what they said. People say as much with facial expressions as with words. But not if you have a mask over your face.

Trump's primary job right now is communicating, so "get Trump to wear a mask whenever he's talking to the public" becomes a priority for anybody who doesn't like him, because they have a plausible-sounding line about him setting an example, so it's a win-win for them -- either they paint him as reckless for not wearing a mask or they get him to wear it which impairs his communication. The media writes a hundred stories about it a week. Fox is only even covering that story to try to defend him, the others are using it as a political weapon and they know it.

A mask also reminds everybody of the pandemic and a pandemic is always going to be bad for the incumbent. Which is the issue with the other story -- not the contents but the priority. Trump has been wanting to reopen the country, so reopening problems in some places are bad for Trump, so that's their top story. Important things happening in Hong Kong right now. I think there's some BLM stuff going on too.

It would be one thing if this was an anomaly, but it's consistent.

The Times literally has a thing on their site right now called "Meet the Supporters Trump has Lost."


I would say Fox News is a classic example of a Rupert Murdoch news outlet.

And since Murdoch likes to run his news outlets with an iron fist, these news outlets tend to report the news in a manner that strongly reflects Murdoch's thinking at the time.

Any thing Murdoch has owned and run has always comes with a level of Murdoch bias.


I agree. Before it was a more or less united America vs the commies and the commies were bad and we weren’t perfect and we had some housecleaning to do...

Now it’s all narrative and the focus is pannational. Not much has a local flavor. It’s not what’s good for Canadians or Ghanaians, it’s what can Canadians and Ghanaians do to make the world better (despite there being lots of things Ghanaians and Canadians could be doing in their own backyards to improve the life of their own downtrodden.

It feels like a recast comintern.


Are you really complaining about 15% of their front page featuring their opinion section?

Your 1921 NYT example has "Broker is Slain at Bride's Door By Her Gardener", which the Times would never feature prominently today, because it's basically yellow press.

It also features some French pilot's unverified claims about a new record – something that would see HN storm the barricades today.

There's a story there, right at the top, about a group of visitors of the Capitol being disappointed. I'm not entirely sure if it's trying to make fun of the congresswoman it mentions. But it does sound a lot more like today's opinion pieces than a straight news story.


Because pure news are pretty much free today. So newspaper need to provide something else: analysis, opinions, etc. to be able to sell something.


Not to mention we have metrics which tell us exactly what kind of reporting is most likely to have good reader engagement (and thus higher ad payouts). Most news sources are just giving us exactly what the majority of people want.


Even HN does this. It is the magic of the reply button. As a result, they pull more screen time and have a stickier crowd to show "we are hiring" ads.


I view this phenomenon as a sort of sociological entropy. Anything bad that can happen will happen if the people tolerate it. Same with shitty politicians, encroaching of rights, etc. Its all just a matter of time.


This theory doesn't hold up against the test of time.

We used to have tyrant monarchs - Genghis Khan would roam the steppe and cutting people's heads off and that would be normal every day life.

There are still bad things today, but more people have more rights and a higher quality of life.


> Genghis Khan

Genghis Khan was neither a tyrant nor a monarch, nor was cutting off heads a part of his everyday life. He actually got more Mongol people more rights and a higher quality of life.

I would request you to read a detailed and accurate account of his life by a historian.

For example, Genghis Khan: His Conquests, His Empire, His Legacy by Frank McLynn https://amzn.com/B00X2ZW5ZI


Do you find the following inaccurate or biased, and if so, for the uninformed, briefly why?

"Campaigns initiated in his lifetime include those against the Qara Khitai, Khwarezmia, and the Western Xia and Jin dynasties, and raids into Medieval Georgia, the Kievan Rus', and Volga Bulgaria. These campaigns were often accompanied by large-scale massacres of the civilian populations, especially in the Khwarazmian- and Western Xia–controlled lands. Because of this brutality, which left millions dead, he is considered by many to have been a brutal ruler"


From what I've seen Ganghis Khan was brutal against non-surrendering nations, but beneficial for those who followed him. E.g. you woulsn't see him cutting his subjects heads, and his brutality was pretty much an intimidation tactic so others army would join him.


>brutal against non-surrendering nations

This is ambiguous about cities or nations that resisted and then surrendered. There seem to be a lot of references to how surrender was followed by everyone being killed or enslaved. I kind of think that's an area of contrast between their standards and today's. You can point to "civilized" countries destroying whole cities in the 20th century, but doing it after your enemy is defeated is much more frowned on.

Also, when you say "beneficial for those who followed him" it elides what it meant when they enslaved the women and children from a conquered population. I don't think they generally allowed individuals to choose whether to be among those massacred or not. And what happened next probably wasn't consistent with modern human rights standards, which again, I think goes back to the original point that started the subthread.


That's how authoritarian regimes usually work. Do everything they say and you'll be okay. The problem is a decent number of people don't want to be absolutely subservient.


>Do everything they say and you'll be okay

This is an extremely odd thing to say about authoritarians right now, although this may be my American perspective, since people have been pointing out recently how futile it can be when dealing with police in the US. Still, I kind of think it's universal - being polite to authority is utterly useless once you are a suspect, thought to be dangerous, or have something they want.

It made a lasting impression on me when I was stopped in my car as a teenager and an officer decided that I must be a drug smuggler because I was too quiet/nervous.

Edit: I guess if you interpret "everything" broadly enough, your statement is somewhat plausible, but "everything" includes many situations where an authority asks you something that is theoretically or ought to be voluntary and you have to be aware that it's not really a choice. And it's still not always true.


Oh it’s ok then


> considered by many to have been a brutal ruler

He raided these places in military campaigns; he did not rule them at the time of the raids. So he was not a brutal ruler. In fact, he was a benevolent ruler, distributing loot equally and promoting on merit rather than nepotism.

As another commentator has pointed out, the brutality was an intimidation tactic. It was very selectively applied. https://news.ycombinator.com/item?id=23708182

He also did much to bring modern human rights to his subjects. You can read about it in Genghis Khan and the Making of the Modern World by Jack Weatherford, an anthropologist who lives in Mongolia part-time. https://amzn.com/B000FCK206


Wasn't expecting to see Khan revisionism here, but I suppose for every mass murderer there are people who like defending them.


It's unclear what you're rebutting here. My position is that on a cyclic basis, there will always be societal elements that will trend towards the negative and nefarious as long as it is tolerated. The fact that the global minimum for such things manifested themselves much more severely (as in your example) in the past does not change the fact that it still occurs today.


Suggest you read my comment above, before you let this one story make you think the decline of civilization is imminent.


Nowhere did I say I think "the decline of civilization is imminent."


Speaking with my physics hat on I have to caution against this -- I wouldn't call it entropy, and I am very happy abusing that word in a variety of other contexts. (Like at church I will call Satan entropy.)

But the reason I'd balk here is that entropy as a phenomenon happens in absence of a driving force.

(1) In the present WSJ case, this is not mere indifference on the part of readers: storytelling is deeply a part of what makes us human and readers do actively turn away from flat descriptions of events to storied accounts—the WSJ is just satisfying the demand. Like, this part of the WSJ, the A-hed is actively edited to be a bit more fluff and less hard reporting to try and rope more of these folks in, as observed in comments above.

(2) In the case of shitty politicians, at least in the US, something even more remarkable is happening. The very construction of the US's voting system has a very non-obvious fixed point (as a transformation polity → polity), so that fixed-point theorems from mathematics seem to impose their own biasing force on the candidates. Or to use more economic language, the rules of elections in the USA actively incentivize its politicians to be shitty. Well, the parties are incentivized to be spineless and the politicians are incentivized to belong to one and thus inherit the shittiness—but it's the same thing. [note]

Entropy still has a part to play when you introduce a bias voltage like this, but it is usually the exact opposite. Entropy explains why air molecules are not falling to the floor in spite of the fact that there is a clear bias force—gravity—that makes everything else fall to the floor quite swiftly and the air molecules have certainly had enough time to start hitting the ground. Similarly, entropy may be more helpful in explaining why some news is still buried inside the Wall Street Journal articles or why the US political parties are able to work together on various operations to bomb remote lands.

[note] I can discourse at length about the downfall of the Whigs but the short and fast of it is that they had a spine or agenda of core issues, chiefly economic. They brokered compromises under one Whig president about the issue-of-the-day, slavery, to try and refocus attention on their agenda items. They nominated General Scott in 1852 to represent them. The spineless political party they were against, used the opportunity to nominate a candidate who nobody really knew and who was himself a compromise, a Southerner who was personally anti-slavery, and mirrored their opponents' agenda quite thoroughly. (Something, I should note, you can only do if your opponent has a spine and you don’t.) Voter turnout in the election of 1852 was incredibly low as a result, and that Southerner won the electoral college in a landslide, despite the fact that he did not actually do any campaigning. “I don't know what he is gonna do about the issue-of-the-day but he seems to have something in common with me” actively won out against “he thinks the issue-of-the-day is a distraction from the real problems and will broker compromises on it to try and address those real problems,” which, I don’t know, maybe was a fluke. But what wasn’t a fluke was the aftermath: any landslide loss leads to self-reflection, right, but if you have a spine it is worse: the internal tensions in the Whigs over that issue-of-the-day broke the party into two chunks and suddenly the party was no more, and a similarly spineless party ultimately took its place in the great two party system of the USA. That party’s spinelessness means that it has weathered similarly disastrous losses as the one which broke the Whigs with almost no alarm and has also led to it slowly reversing its geographic constituency from North to South as the opposite has happened to its spineless opponent, which is again much more of an interesting random-walk phenomenon that one could ascribe to entropy.


Damnn So why do air molecules not hit the ground ?


So they do—they just don’t remain there, they get kicked back up.

What is at stake is that you carry a subtle contradiction in your head, assuming you got some pieces of wisdom from your physics classes but not others.

One side of this is the minimum energy principle. Like, this is common sense, you leave a basketball bouncing in your driveway and you expect it to stop somewhere, probably (if it's not a perfectly flat blacktop) downhill of wherever you started it. Heck if you've had a hoop in your driveway you probably have a reflex to run after the ball when it touches ground, otherwise it'll eventually find the road and roll very far away as it chases the downhill. That's the minimum energy principle, dynamic friction reduces kinetic energy in a system while forces tend to make potential energy into kinetic energy, so you would expect if you just leave the system alone it ends up at rest at some minimum of potential energy.

The other side of the contradiction is that we tell you that energy is conserved cannot be created or destroyed. If you are very lucky, we tell you that there is a way of phrasing the laws of physics such that energy conservation is the same as saying that the laws of physics are the same today as they are tomorrow—we call this “time translation symmetry” and did the theorem that connects continuous symmetry is to conserved quantities is Noether’s Theorem if you are looking for something to google here.

The only way to resolve the contradiction is to say that friction is actually dissipation—energy is getting more spread out among the universe but is not being destroyed. So you want to picture a big bucket of water and a thousand little glasses and we empty the bucket only by putting it into all of those glasses. And the idea is that eventually if random processes take over the moving of the water from any of these to any other of these, all of them will have the same water level. You can actually see this if you see demonstrations of the siphon effect, water will actually flow up and down a hose to equalize two water levels in two reservoirs. When energy does this, has the same average occupation in every degree of freedom of a system, we say that the system has thermalized and we can measure its absolute temperature as that energy level. Technically temperature is not uniquely defined in any other context—mostly, we find physical objects whose properties like volume or length or so vary approximately linearly with temperature in this sense, then we use them as thermometers to measure temperatures in other contexts.

Now there is an interesting result, which is that if your bucket is at the same level as the cups, in some sense your bucket never ends up empty. Like there can be a lot of cups and that water can be spread over everything and there is only a tiny film of water in the actual bucket left, but it’s not zero.

This is also a theorem, it is called the fluctuation-dissipation theorem. It says that I can't dissipate energy into some environment without feeling noise from that environment prevent me from dissipating all of my energy into it: I have to accept random fluctuations back from it. In other words, there are no one-way channels for energy.

To bring this back to your question, the basketball only comes to rest on the ground because it has so much more energy than the thermal fluctuation energy which it gets back from the ground. When it eventually settles, it turns out that it is not fully at rest but is moving imperceptibly due to these fluctuations. And those fluctuations are imperceptible because the mass of the basketball is very large, large enough that this disturbs the center of mass by a height way smaller than the size of atoms, which it turns out are way smaller than the light you can see. So like even with a microscope, visible light is too chunky to show you this on a basketball.

But repeat the calculation for how far those 25 meV of thermal energy will launch a 28-amu nitrogen molecule and you will find that the height is roughly nine kilometers [1] which is a pretty good rough estimate for the height of the Earth's atmosphere, that's about where the troposphere ends. Just to be clear, the ground doesn't kick any individual molecules that high, they collide with other air molecules way before they get anywhere near that high, but that energy and momentum does ultimately get communicated to the whole swarm of air molecules and stops the swarm collectively from falling lower than that distance on average, even though everything is one big colliding mess. The first order prediction is actually an exponential decrease in density as you go to those higher heights, and I think that 9km figure is a 1/e decay constant, but the truth gets a lot more complicated as the ultraviolet light coming into the Earth is getting preferentially scattered in the high atmosphere and contributing a second source of energy to the system.

But yeah, the air doesn't fall down to the ground because the sun is shining and keeping our planet warm, and that warmth is imperceptible in the motion of a basketball but several kilometers in terms of the height of air molecules.

[1] https://www.google.com/search?q=25+meV+%2F+%2828+amu+*+9.81+...


Great counterpoints!


>When did this happen? Why? Did people lose interest in actual news? Is there less actual news to report?

Social media happened.

People have been made to think that they have got what they needed from a news article from just the title, image, summary as the next shiny news article was just waiting below to scroll.

News media are under tremendous pressure to entertain the few who actually clicked the article and is reading it.

World over only those news media which started out as traditional newspaper have some journalism integrity left and are serving news as it is. They are paying for that by going bankrupt, getting sued, journalists getting murdered etc.

More over people are getting news only from those sources which gives the news 'they like', rather than going to source which reports factual news.


When has journalism ever stuck to facts? When has the “news” been more than short stories and opinion?

Hint: Never

Read: A History of News by Mitchell Stephens https://openlibrary.org/works/OL1854941W/A_history_of_news

Read: https://en.m.wikipedia.org/wiki/Yellow_journalism

Read: https://en.m.wikipedia.org/wiki/History_of_French_journalism


> When did this happen? Why? Did people lose interest in actual news? Is there less actual news to report?

1. Search engines prioritized longer content.

2. People are less aware of longer-lasting events and most people land on news stories from social media as a one-off. The additional context is needed.


In 1999 the "Eric Breindel Award for Excellence in Opinion Journalism" [0] was instituted with an endowment from Rupert Murdoch's News Corporation.

Who is "Eric Breindel" ... well Wikipedia includes this, as his legacy:

"In 1988, Spy magazine ran a feature that depicted Breindel as a ruthlessly career-driven opportunist whose career was effectively ended by his drug bust"

And regardless of accuracy, I consider this to be an excellent definition for whatever "Opinion Journalism" must be.

[0]: https://en.wikipedia.org/wiki/Eric_Breindel_Award_for_Excell...


> When and why did news cease being news and start being short stories and opinion?

As radio and then TV (both of which favor emotion over information) displaced print as the major avenues for “news”, shaping expectations even of print media.


I'm sure that it predates Fox News Channel by a lot, after all, news business are business first and news second, but it would be misguided to ignore the fact that FNC has completely dominated viewership numbers in the US almost since its inception by doing exactly what you're complaining about. Opinion is a format that engages large numbers of people and therefore makes money extremely well.


> FNC has completely dominated viewership numbers in the US almost since its inception by doing exactly what you're complaining about.

To be fair, they mostly dominate viewership numbers by being the only right-leaning news network available, so they get half the viewing public to themselves while their competitors fight over the other half.

Sort of an indictment of the efficient market hypothesis that nobody has figured that out and gone into competition with them.


No to take away from your point, which I think is valid, but the NYT in 1921 is _the_ Internet. We’ve gone from being starved for information to being overloaded. So really you’re comparing apples to oranges they just happen to be named the same thing.


We rehash this every time a NYT, WaPo, WSJ, etc. submission gets popular.

It’s off-topic.


Opinion masquerading as news sells, see Fox News, MSNBC, et al.


This is an example of what’s called literary journalism. Some people (including myself) consider this a high art form. If you want to study up on the classics, read Joan Didion and John McPhee - and Tom Wolfe, the titan of creative nonfiction.

There is no crime in making a creative story that people enjoy reading out of a true event.


Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: