Hacker News new | comments | show | ask | jobs | submit login
The Bullshit Web (pxlnv.com)
1017 points by codesections 75 days ago | hide | past | web | favorite | 550 comments



I've said this before, but it bears repeating:

Moby Dick is 1.2mb uncompressed in plain-text. That's lower than the "average" news website by quite a bit--I just loaded the New York Times front page. It was 6.6mb. that's more than 5 copies of Moby Dick, solely for a gateway to the actual content that I want. A secondary reload was only 5mb.

I then opened a random article. The article itself was about 1,400 words long, but the page was 5.9mb. That's about 4kb per word without including the gateway (which is required if you're not using social media). Including the gateway, that's about 8kb per word, which is actually about the size of the actual content of the article itself.

So all told, to read just one article from the New York Times, I had to download the equivalent of ten copies of Moby Dick. That's about 4,600 pages. That's approaching the entirety of George R.R. Martin's A Song of Ice and Fire, without appendices.

If I check the NY Times just 4 times a day and read three articles each time, I'm downloading 100mb worth of stuff (83 Moby-Dicks) to read 72kb worth of plaintext.

Even ignoring first-principles ecological conservatism, that's just insanely inefficient and wasteful, regardless of how inexpensive bandwidth and computing power are in the west.

EDIT: I wrote a longer write-up on this a while ago on a personal blog, but don't want it to be hugged to death:

http://txti.es/theneedforplaintext


I like this rant, you should go the next step:

All you need to 'fix' this is a fast loading news website that gets enough paid subscribers to earn enough margin from subscriptions that you can pay for a news staff, an office, and various overheads.

That is a longish way of saying that 99.9% of the overhead in any modern web site can be traced almost entirely to the mechanisms by which that web site is attempting to extract value from you for visiting/reading.

If people would visit with a 56K modem and deal with a 3 - 10 second page load, then that is the bar. And any spare bandwidth you might have is available for the web site to exploit in some way to generate revenue. The more bandwidth between you and them, the more ways they can come up with to exploit that bandwidth for additional surveillance, ads, or analytics that will get them more money.

When you are the customer, which to say it is your purchasing of a subscription or articles is the only revenue the site needs in order to survive, then the things that retain you as a customer have the highest priority (like fast page load times, minimal bandwidth usage).

But when you are a data cow, a random bit of insight into a picture much bigger than you can comprehend, a pixel in a much larger tapestry, or an action droplet in a much larger river of action. Well then there isn't really any incentive to make your life better, as long as the machine we have milking you for data can get even a couple of molecules more of that precious data milk without scaring you out of the barn. Well we'll build right up to that limit.


Hilariously, the New York times tries both: you get five or so article reads (with shitloads of tracking), and then you have to pay to read more per month.

But if you're paying, the pages don't load any different. You're paying to be mined.


In paper newspapers this is the norm. You can read the front page for free, have to pay to get the rest, and the rest is still filled with ads.

The difference is the tracking. I don’t think ads are really the problem. It’s the tracking that bloats pages and intrudes on privacy, and the tracking doesn’t need to be there because other media have ads without tracking and manage just fine.

There’s a race to the bottom here. Tracking earns more revenue, so to be competitive you have to do it. Most sites won’t stop tracking until forced to by either the basic infrastructure of the web or legal requirements. I hope GDPR will lead to the disappearance of tracking, but so far most sites seem to pretend tracking is compatible with GDPR.


Not really. You're paying for the additional content. The tracking is external to that deal.

I'm not saying I like it that way, but you're conflating two unrelated things.


This is why I've never subscribed to cable TV. I'm not going to pay for the privilege of watching 20 mins of commercials an hour.


That's not what you are paying for with basic cable; you get that with broadcast for free.

What you are paying for is a broader choice in the filler between the commercials.


I agree in theory. However, I haven't noticed the slowness in their website and the ads are well done and blend in with the webpage. They may have to redesign/rearchitect their whole website to get what you are asking for.

It should be noted that traditional newspapers include ads alongside news content and no one complains. In fact people used to sift through the Sunday NYT simply for the ads.


> the ads are well done and blend in with the webpage

That's called "native advertisement" and it's supposed to trick you into thinking you're reading a genuine article instead of an ad. I actively avoid sites that do this.

> In fact people used to sift through the Sunday NYT simply for the ads.

Back when people couldn't google for stuff, and the ads were useful because they mostly came from local businesses you actually needed once in a while.


There is no such thing as "the ads are well done and blend in with the webpage". Especially on a website where you pay for a subscription.


Moreover, there is a tipping point, beyond which the ad blends too well and becomes plain deception. This is even more of a problem for journalistic publications. See also: native advertising.


Indeed. In print, you had dedicated pages to advertisement. Not optimal, but much easier to ignore. Nowadays, you never know if an “article” is simply a marketing agenda.


Yes because the signs people put around the "native ads" literally stating that it's an "ad" or "sponsored content" are easily ignorable. Get real. People pay for a newspaper with ads in it. You still have to sift through it. Ignore it? You literally have to turn the page or spend 5 minutes pulling the ads out of the newspaper if you don't want them and in some cases there isn't a way to escape the ad with print because part of the article is there with the ad. Ad block won't save you then.


That's no different from paying for a real newspaper subscription though.

But I guess you could say that you are paying for delivery instead.


Tbh they could email me articles plaintext and I'd happily hand over my money.


Imagine that, a digital newspaper in your digital mailbox!


You mean sending you news in a digital letter?


How intriguing! I would like to subscribe to your... how do you say... letter of news.


but a letter of news may be too small, maybe we can remove that size constraint and call it 'a paper of news' that would be delivered


Maybe they could send multiple letters and I could have some kind of client that shows me the headlines and allows me to open the articles I'm interested in.


That would be amazing. How come Google hasn't invented something like that yet?


Over 20 years ago, the San Jose Mercury-News offered exactly such a service.

Called Newshound, it let you set up to face sets of keywords (in the basic $5/month subscription), and it would email you the plain text of every article that matched the criteria, whether generated within the publisher network or from wire services.


It would be trivial to write a script to scrape text.npr.org and send it to you.

Or you can just visit it, I suppose.


Firefox and Safari each have a 'Reader Mode' which does exactly what you want: presenting a web-page in the absence of any web design.

It's really the ultimate condemnation of modern web design that this feature is so useful.

Edit: won't help the data-consumption though, as I believe it can only be enabled after the page has loaded


In reply to your edit. You can use something like umatrix to block almost everything (even css) and reader mode will still work, I do this for most newspapers and works quite well.


we used to have usenet...


And when I first got online in the early 90's, my ISP had an additional subscription option to get real newspaper-type articles delivered in a special usenet newsgroup hierarchy (can't remember the name of the news service itself though).


What do you mean "used to"? Usenet still exists.


And how good is it?


Like any community, that's defined by the people in it. Some groups are excellent, some groups are mostly dead, some display varying levels of toxicity. Overall my experience hasn't been a bad one.


What kinds of groups are there?

Given the lack of popularity and commercial support, combined with the complexity to connect a client with an available server (compared with downloading an app from the store), I'd expect that they are populated mostly by old tech people or passers-by from universities; and any special interest group would have small user base coming from that demographic profile. Are my assumptions correct?


Reader Mode, my friend.


> That is a longish way of saying that 99.9% of the overhead in any modern web site can be traced almost entirely to the mechanisms by which that web site is attempting to extract value from you for visiting/reading.

Well, that and the fact that front-end developers just can't seem to exist without pulling in hundreds of kilobytes, or even megabytes, of JS libraries. You actually don't need all that crap to serve adverts, or really even to do tracking: people managed without it in the 90s. It's just that it's more work to get the same effect without a buttload of JS in this day and age, and most third party tracking services involve their own possibly bulky JS lib[1]. The thing is, given the slim margins on ad-serving - whilst I don't condone it - I can see why people don't bother to put in the extra effort to slim their payloads.

[1]And if you have a particularly idiotic marketing department they might want, say, tracking to be done in three or four different ways, requiring three or four different libraries/providers. This is not merely cynicism: I encountered exactly this situation at a gig a few years back.


Enjoy a recent Hacker News discussion of a 404 error page that employed a 2.4MiB JavaScript framework and consumed significant CPU time to display.

* https://news.ycombinator.com/item?id=17383464


Smart marketing treads the fine line providing rich experiences for their users based on the average internet speed vs annoyance. If the service is free they should try to get 'some value' for the content.

As highlighted, some take it way too far by trying to extract 'maximum value' which ends up being counterproductive.


> "You actually don't need all that crap to serve adverts, or really even to do tracking: people managed without it in the 90s. It's just that it's more work to get the same effect without a buttload of JS in this day and age"

While I sympathize with this sentiment, this is also the entire history of computing in a nutshell. Moore's Law has driven us orders of magnitude beyond where we were when personal computers first came into existence; but Wirth's law[1] has kept pace. The laptop I'm typing this on right now has 8 GB of RAM, and that's already become pathetically tiny, pretty much the minimum viable for a consumer PC; I have to keep checking my memory usage or I'll spill over into swap (on a mechanical drive) and have to wait several minutes while my computer recovers.

Performance in computer applications fundamentally doesn't improve. Stuff gets prettier, sure, and applications do more. But things will still run about as slowly as they always have, sometimes a little worse. (There are exceptions - some things like loading programs from tape, or loading things from an HDD once SSDs were invented, were so painfully slow compared to their replacement that you'd have to actively try to write slow code to get anywhere near that performance.) It's ease of programming, flexibility, and freedom of design (in aesthetics and interface) that the advance of computing technology has always enabled. And all of those are extremely valuable in their own way, and can make applications genuinely better - even allowing qualitatively new things to come into existence that wouldn't have been feasible before - even if they don't run faster or take up less of your memory.

(To understand why, think about the development of - say - computer games since the 90s. For all that we mock poorly optimized games, how inaccessible would game development be if we required them to be coded as efficiently as Carmack built Doom? For all that mindlessly chasing "better graphics" has ballooned costs and led developers to compromise on gameplay, how many games simply couldn't be translated to 90s-era graphics without fatally compromising the experience? How many projects would never have been started if we set the skill floor for devs so high that a hobbyist couldn't just download Unity and start writing "shitty" code?)

(Or think about something like Python. Python is a perfect example of something that allows devs to massively sacrifice performance just to make programming less work. If we kept our once-higher-by-necessity standards for efficient usage of resources, something like Python's sluggish runtime would be laughable. But I think you, and I, and everyone else can agree that Python is a very good thing.)

[1] "What Intel giveth, Microsoft taketh away."


(All that being said, I'm also fairly salty about having 8 GB of RAM and a mechanical hard drive rendering my computer incredibly painful to use as technology has marched on. Discord - which I use almost exclusively as an IRC chatroom with persistent while-you-were-gone chat history, embedded media, and fun custom emotes - is an entire Electron app that eats over 100 MB minimum; Firefox is eating 750 MB just keeping this single tab open while I type this. Even with no other applications but those open, Windows 10 and assorted background processes already push me to 5.7 GB allocated. Various Windows background processes will randomly decide they'd like to peg my disk usage to 100% for ten to fifteen minutes at a time, which I imagine is because spinning rust disks are considered deprecated.

I saw a discussion on HN a few months back about a survey of computer hardware, and one dev in the comments was shocked - shocked! - to find out that the typical user didn't have 16 GB and a 4k screen. That definitely rustled my jimmies a bit.)


> I saw a discussion on HN a few months back about a survey of computer hardware, and one dev in the comments was shocked - shocked! - to find out that the typical user didn't have 16 GB and a 4k screen. That definitely rustled my jimmies a bit.)

This is extremely common in dev circles, it's an area where we're completely detached from average users. Just to make the point, here is the mozilla hardware survey that shows >50% of users having 4GB or less: https://hardware.metrics.mozilla.com/ .

If we look at the more technical users on steam (https://store.steampowered.com/hwsurvey) then only ~15% of users have 4GB or less, along with 40% having 8GB.

There's a good reason macbooks top out at 16GB.


Oh, I recognize that Mozilla survey as actually the specific one that user was talking about! Let me see if I can track down the actual comment thread; it's probably less ridiculous than I actually remember it being.

Ah, found it. https://news.ycombinator.com/item?id=16735354


I'm really surprised at the reaction to the resolution. I like 1080p for movies on my (too) big TV but for coding I was more than satisfied once we got to 1024x768 and haven't thought of it since. My home coding machine is a cheap dell at 1366x768 and I've always been happy with it.


I agree with everything you said. I just have a somewhat different experience with Firefox on Windows 10:

>Firefox is eating 750 MB just keeping this single tab open while I type this.

I have 127 tabs open on Firefox Quantum 61.0.1 (64 bit). It uses ~ 1100 MB spread among 7 processes. I have 6 addons enabled (Decentraleyes, Firefox pioneer, I don't care about cookies, Tab counter, Tree style tab and uMatrix).


Why do you hate spinning disks so much? And no they are not considered "deprecated", they're really the only way to affordably store large volumes of data. It's an old, venerable, and still-very-useful tech


I'm also on a system with 8Gb of ram at the moment. Firefox is using up a hilarious 4.6Gb keeping a few dozen web pages open, but the entire rest of my Linux system, including Inkscape, qCAD, and SketchUp under Wine are using only a combined 907 megabytes. So it's possible part of your problem is just Windows 10.


My i3wm environment doesn't randomly start anything I don't ask it to start and runs very comfortably with 8gb of ram heck it would run fine with 4gb and no swap. Maybe you are running the wrong os.


A tiling window manager won't save you when dealing with Electron apps.

I run Linux and StumpWM on my desktop, and recently I had upgrade to 12GB of RAM, because it turns out 8GB is very easy to exhaust these days. I currently have 9.3GB tied up, mostly by browser processes.


Yeah, this. I'm mostly using swaywm instead of Gnome in order to free up about 1 extra gigabyte of RAM for apps, but that equals about one Electron app. The only Electron app I haven't eliminated from my daily usage, though is Patchwork, so it's not so bad.


I don't think esoteric linuxes are really necessary--I'm running vanilla Ubuntu 16.04 with 16GB RAM and top three processes are only Crashplan (~800MB), Dropbox (~460) and Chrome (327 resident, 1380 shared, per htop, with 6 tabs running). My total usage at the moment is 4.04GB out of 15.4 available, again per htop. Some of the other numbers in this thread are baffling to me.

But I don't run any Electron apps, so there's that.

But, yeah, I guess I wouldn't be able to run this same workload with 4GB RAM. That's what lubuntu is for.


Oh, no, I'm aware these are very much Win10 specific problems and I look at Linux people with a not insignificant degree of envy. Unfortunately, I do quite like PC gaming.


Modern gnome is a pig, some of this could be tracked back to gnome-shell's use of javascript and css styling if you were so inclined. Most linux users don't really notice it due to the fact that most x86 machines are crazy fast.

OTOH, try starting a modern full blown distro on something like an rpi instead of raspian and you will quickly discover that you _NEED_ more than 4G of ram and a lot of CPU just to start firefox. Its even worse if you don't have hardware GL acceleration.

OTOH, the lightweight desktops (lxqt, lxdt, xfce) really are..


Funny thing is I use my i3wm environment on 16gb of RAM and a 4k screen. I'm actually migrating to rat poison because it's so incredibly simple and basic that it has been making me drool. I mean, look at the source code. It doesn't get much simpler than that for a tiled WM.


If you are interested in rat poison, you may also enjoy xmonad. I've used both and much preferred xmonad


Load the entire GHC garbage collected runtime just for my window manager? Isn't this the same philosophy that causes people to use Electron? And that results in unnecessarily large memory footprints and runtime performance penalties?


Xmonad is rock solid, lightening fast, and perfect for many who prefer to minimize their reliance on a mouse


While it would seem convenient to simply switch operating systems from the popular, widespread options to...whatever i3wm is, practically speaking that is seldom possible.


I3wm is a window manager you can run on any Linux distribution


Tiling wm does not an OS make.


Can you run recent versions of Photoshop? How about Premiere, or Final Cut?

SolidWorks? CATIA? Matlab?


I find that the best way to do this is to just VNC to a windows or Mac dedicated slave computer to do graphic arts work. VNC is so good nowadays that I can use my QHD phone screen as a second monitor for all my Adobe apps- and the lag nowadays is almost non-existent thanks to super fast wifi.


FreeRDP for connecting to Windows is a good option too.

Should say that KRDC is a good RD connection manager, but it's KDE only unless you wanna install a third of it.


Matlab works on linux.

Premier or Final cut don't but DaVinci Resolve and Lightworks do.

Obviously if your workflow and thus your livelihood depends on a particular tool you should run a platform that runs it but I would question why someone with money would pick windows over mac.


Those are commercial apps for people with real money. OTOH, there are a number of unbeatable free apps that don't have linux ports. Fusion360 comes to mind, its not solidworks level , but its light years ahead of freecad.


Hence the rant mentioning Molochian economy, under which we operate. And that reference explains in depth why this is a very hard problem.

I like the rant too. Except maybe the bit about sending content at the speed of humans - I for one would like to take lightweight, bullshit-free content as fast as it can be sent, to pipe it to further processing on my end, in the never-ending quest to automate things in my life.


Background for anyone who hasn't read it: https://slatestarcodex.com/2014/07/30/meditations-on-moloch/


I highly recommend everyone give this a read if they haven't. It's probably the best post on Slate Star Codex.


I mean, if you're just reducing the content even further, just request that they make the reduction possible server-side and everybody wins.


I was more thinking about e.g. running a script to fetch 3 different lightweight sites, run some personal code on it and combine the data. If the script would spend 99.9% of its time waiting on IO because of "human speeds", I wouldn't be too happy.

That said, I would be willing to bite the bullet and accept speed limits across the board if it resulted in lean web.


I achieve this with an RSS reader, in my case Miniflux.

Runs on a RPi under my TV and I stay well below my 300MB data cap, while consuming dozens of news sources.


Which news sources? How did you find the ones that still provide RSS? How much of it do you actually read?


I read virtually all of them. Most of them provide RSS feeds, some are a bit hidden but it's Googlable.

I started with the basics BBC, NYT, Guardian, The Intercept for general news, The Conversation for science news without sensationalism, and some tech blogs. Then I just read most of it, and follow some links to find new sources. Most of the times news start with "As reported by X", or just a link, so you can discover new sources like that.

You can also browse HN (and n-gate for the highlights) and Reddit to discover new sources.

If you add so much you can't keep up, remove some, or change the feeds into section feeds. Most online newspapers provide them. I miss Yahoo Pipes, and have yet to find a simple hosted alternative. There is also RSS Bridge for sites without feeds (Twitter, Facebook), but I still haven't found the time to set it up.

You can also add paywalled sources to read the headlines only. You can mark them as read from the index.


I think this is why batch jobs and crontabs exist. Just do what the BBSes used to do for syncing--wait until anti-peak-time and then let loose.


I don't think 3-10 seconds should be the bar. I spent years using a 14.4k / 28k / 56k modem.

That was during the mid and late 90s.

Browsing the web where you need to wait 3-10 seconds for everything to load is not a good user experience. It's a colossal waste of time, and today we have so many more reasons to view more pages compared to back then.

We should strive for an improvement instead of trying to stick with limitations from 20 years ago.

The real problem is people developing sites now give zero fucks about resource constraints. This is exactly like lottery winners who went from being poor to having 50 mil in their pocket but then end up broke in 3 years because they have no idea how to deal with constraints.

It's also a completely different type of person who is running these sites today. Back in the day you made a site to share what you've learned. Now you have "marketing people" who want to invade your privacy, track every mouse movement and correlate you to a dollar amount instead of providing value.


>Browsing the web where you need to wait 3-10 seconds for everything to load is not a good user experience.

Im a lone owner of a website. I have enough time to accomplish one of 3 things before January

>Finish my Finance App

>update 200 pages to have pictures load based on screen type so you can load in 3 seconds instead of 6.

>Collect and compile data to create 20 more pages, all of which my 3000 subscribers actually come to my page for.

Very quickly you can see why an extra 2 seconds of loading time is not on our mind. Its important to allocate resources effectively, changing my website to load faster is limited value added vs creating content that my users actually want.


> Very quickly you can see why an extra 2 seconds of loading time is not on our mind. Its important to allocate resources effectively, changing my website to load faster is limited value added vs creating content that my users actually want.

I understand. I'm also a sole owner of a website where I'm selling a product (video courses for software developers).

My priorities are to give as much value as possible for free and also sell some courses if I can.

According to Google's network tab the DOMContentLoaded time is about 250ms to load any page on my site (which are typically 1,000 to 5,000 word blog posts with some images). From the user's POV, the page loads pretty much instantly. Then about a second later Google Analytics and a social sharing widget pop up, but those happen after the content.

The interesting thing is I really didn't try hard to make this happen. I just stuck to server rendered templates and compressed my assets. I also made an effort to avoid heavy front end libraries and only add javascript / CSS when I needed to. I basically run a heavily modified theme based on Bootstrap with a couple of third party javascript libs (including jquery).

There's a lot of room for improvement but I haven't bothered because it seems good enough. It's very possible to get the perceived load speed of a page to be under 1 second without dedicating a lot of time to it.


So am I and I just don't buy it.

What on earth do people do to get over 1 second load time? Remember that you have to actively spend time to bloat a site.


The most profitable news paper in my country, didn’t have a website with articles or news stories on it until earlier this year. Their page was something from the 90ies (it was probably newer), and all it really offered was info about the paper and a way to buy it.

What they do, that other Danish papers don’t, is write lengthy meaty articles that takes time to read because they actually teach you something new. There was a story on Trumps connections to Russia, it was three full pages long, and we’re talking old school news paper format, so that’s what? 10 a4 pages worth of text?

The paper only comes out once a week, because it takes time to write, but also because it takes time to read.

I’m not sure where I am going with this, I just think it’s interesting how they’ve increased their subscription amounts while not really giving two shits about the bullshit web.

They may give two shits about the bullshit web now of course, having gotten a webpage with articles. I don’t know though, I’m a subscriber, but I haven’t visited their site yet.


What is the name of this newspaper?


Or what if, we allow tracking and ads, with native speed?

There is the problem of Network that won't / cant be fixed in any short term. Then the problem with Rendering and Reflow.

The first one being a long time before everyone gets 1Gbps internet, the 2nd being even if you have 1Gbps internet it will still be slow due to all the mini scripts.

What if the fonts were there in the first place?

What if the browser actively tack every mouse movement, links etc, bringing 80-90% of all the tracking scripts datapoint, and doing so natively, sending back the data to website as requested.

No more 3 - 5MB of Scripts downloaded per site, no more CPU running of these scripts, no more 1MB of fonts. And they don't cause the page to jank. You get Butterly smooth webpages while still getting Ads.

My biggest problem is with the idea of extending the Web via Javascript and everything should be Javascript only. Rather than extending the Web Browser native function.

Unfortunately this is an idea that Apple may not like, even if the data are anonymised.


FWIW, you have effectively just described the 'app' solution.

In that solution all of the tracking and analytics, fonts, and other 'baseline' content are part of the app, which then fetches the unique content (the few Kb of story text and Mb of images) and then renders it all locally. There is even some ability to do A/B testing in that setup.

The "App" itself is basically a browser with none of the non-content UI controls that browsers normally have, that can only go to a specific URL (the content supplier).


Precisely, but we don't want to be bounded by the App Store ecosystem. And we want to improve the UX of Website, which so far hasn't been great.

As a matter of fact, may be these API for tracking, analytics should be the same across Apps and Browsers.

We tech nerds keeps throwing out new terms, Web App, Web Pages, etc, but to our user they are all the same. They want to consumer information in Text, Video and Images, and in a fast and smooth way without Jank.


> All you need

You say that as if it's easy. "All you need" is to have a product that users what to pay for. But I think, all the various tries and attempts has proven that users aren't really that keen to pay for content online, at scale.


They don't need paid subscribers. Physical papers and tv have been supported by ads without tracking for decades. Companies pay for space or time.

Internet allowed advertisers to track so now we have this BS. They had many years to fix this. But tracking is the business of many companies like google and facebook. Now lot of people uses ad blocker and it's increasing.


The price an advertiser pays to be a full page ad in the NYTimes and printed with the paper is on the order of $150,000, the price an advertiser pays to obscure your entire screen with an ad is as little as $1.00.

What you're missing is that advertising rates for television and print are several decimal orders higher than the rates for internet advertising. Why that is is more complicated than you might guess, but the economics of "printing" a newspaper by sending you the text is a couple of orders of magnitude cheaper than running a printing press. Between those two realities a lot of news web sites are being crushed.


If 600,000 [0] people see that $150,000 ad, the advertiser has paid $4 per impression. At $1 CPM, 600k impressions is $60,000, but as you said this cost is at the lowest end. An ad that obscures your entire screen might cost as much as $8 CPM or more [1] and now buying the newspaper ad is sounding like a better deal.

[0] https://www.nytco.com/the-times-sees-circulation-growth-in-f...

[1] https://www.buysellads.com/buy/leaderboard/id/17/soldout/1/c...


Consider the fate of Dr. Dobbs journal - print magazine (https://news.ycombinator.com/item?id=8758915)


NPR text?


data cow.

thank you


Your welcome, but it is oil89's invention of 10 months ago : https://news.ycombinator.com/item?id=15350778 I just love it though.


I don't think thats a meaningful comparison. Moby Dick is a book, written by 1 guy and maybe an editor or two. NYT employs 1,300 people.

When you read a book all you get is the text. NYT has text, images, related articles, analytics, etc. Moby Dick doesn't have to know what pages you read. NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

If Moby Dick was being rewritten and optimized every single day it would be a few mb. Its not, so you can't compare the two.

Yes NYT should be lighter, no your comparison is not meaningful. A better comparison would by Moby Dick to the physical NYT newspaper.


> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

No they don't. They really don't need to know any of that. They don't even get a pass on tracking because they're providing a free whatever - I pay for a subscription to the NYT. The business, or a meaningfully substantial core of it, is viable without tracking.

It would be nice if the things I pay for didn't start stuffing their content with bullshit. What and who do I have to pay to get single second page loads? It's not a given that advertising has to be so bloated and privacy-invasive. Various podcasts and blogs (like Daring Fireball) plug the same ad to their entire audience each post/episode for set periods of time. If you're going to cry about needing advertising then take your geographic and demographic based targeting. But no war of attrition will get me to concede you need user-by-user tracking.

You want me to pay for your content? Fine, I like it well enough. You want to present ads as well? Okay sure, the writing and perspectives are worth that too I suppose. But in addition to all of this you want to track my behavior and correlate it to my online activity that has nothing to do with your content? No, that's ridiculous.


> No they don't. They really don't need to know any of that. They don't even get a pass on tracking because they're providing a free whatever - I pay for a subscription to the NYT. The business, or a meaningfully substantial core of it, is viable without tracking.

Clearly they disagree. Or maybe you should let them know that they don't need that.

To say it without sarcasm, what you feel you are entitled as a paying customer and what they feel they need/want to understand their customers are clearly at odds. Ultimately, what you think matters nothing in isolation and what they think matters nothing in isolation. What you two agree upon, is the only thing that matters. That is to say, if you think they shouldn't track you but you use their tracking product anyway, you've compromised and agreed to new terms.

I imagine you could come up with a subscription that would adequately compensate them for a truly no tracking experience. But I doubt you two would agree on a price to pay for said UX.


You're correct of course, but I don't really see how this isn't a vacuous observation. Yes clearly our perceptions are at odds, but that has nothing to do with the reality of whether or not they need to be doing that tracking. Obviously they think they need to, or they wouldn't do it. But I think I've laid out a pretty strong argument that they actually don't need to, which leads me to believe that they actually haven't considered it seriously enough to give it a shot.

Would they be as profitable? Maybe, maybe not. Would they become unprofitable? No, strictly speaking. I'm confident in that because the NYT weathered the decline of traditional news media before the rise or hyper-targeted ads, and because I've maintained a free website in the Alexa top 100,000 on my own, with well over 500,000 unique visitors per day. That doesn't come close to the online audience of a major newspaper, but it's illustrative. There is a phenomenal amount of advertising optimization you can do using basic analytics based on page requests and basic demographic data that still respects privacy and doesn't track individual users. I outlined a few methods, such as Daring Fireball's.

Maybe instead of this being a philosophical issue of perspective between a user and an organization, it's an issue of an organization that hasn't examined how else it can exist. Does the NYT need over 10,000 employees? Is there a long tail of unpopular and generally underperforming content that nevertheless sticks around, sucking up money and forcing ever more privacy-invasive targeting? If the NYT doesn't know its audience well enough to present demographic-targeted ads on particular articles and sections, what the hell is it doing tracking users individually? It's just taking the easy way out and giving advertising partners the enhanced tracking they want. But they don't need to do that, and whether or not they think they need to do it is orthogonal to the problem itself.


> You're correct of course, but I don't really see how this isn't a vacuous observation. Yes clearly our perceptions are at odds, but that has nothing to do with the reality of whether or not they need to be doing that tracking. Obviously they think they need to, or they wouldn't do it. But I think I've laid out a pretty strong argument that they actually don't need to, which leads me to believe that they actually haven't considered it seriously enough to give it a shot.

It most definitely is. But so is the word need, in this context. How would we define what they need to do, and what they don't need to do?

My argument is simply such that, of course they don't need to (by my definition), but nothing will change that unless they see a different, more lucrative offer. Ie, "oh hey, here's 2 million readers who will only read the page in plain html and will pay an extra $20/m". It just seems like a needless argument, as I don't believe there's anything that can change their behavior without us changing ours. Without the market changing.

Rather, I think the solution lies not in them, but in you. In us. To use blockers and filters to such an extreme degree that it's made clear that UX wins here, and they need to provide the UX to retain the customers.

Thus far, we've not done enough to change their "need". If a day comes that they do need to stop tracking us, well, they'll either live or die. But the problem, and solution, lies in us. My 2c.


> What you two agree upon, is the only thing that matters.

That's precisely why many of us use (and promote the use of) adblockers and filtering extensions.


Classic narrowcasting mistake that dying companies make.

Statista claims 2.3 million digital subscribers. NYT is trying to milk that 2.3M for everything they got, squeeze the last drops of blood from the stone while they still can.

That's a great way to go out of business, when 99.97% of the world population is not your customer and your squeezing labors are not going to encourage them to sign up.

If you hyperoptimize to squeeze every drop out of a small customer base, eventually you end up with something like legacy TV networks where 99% of the population won't watch a show even for free, and the tighter the target focus on an ever shrinking legacy audience, the smaller the audience gets, until the whole house of cards collapses.

Its similar to the slice of pie argument; there are many business strategies that make a pie slice "better" at the price of shrinking it, and eventually the paper-thin slice disappears from the market because the enormous number of the employees can't eat anymore, but that certainly will be the most hyperoptimized slice of pie ever made, right before it entirely disappears.

NYT is going to have a truly amazing spy product right before it closes.


Why is that doubtful? There's all kinds of examples of tiered subscriptions in the world. I think it would be doubtful because the NYT wouldn't want to explicitly admit all the tracking they are doing.


> Why is that doubtful? There's all kinds of examples of tiered subscriptions in the world. I think it would be doubtful because the NYT wouldn't want to explicitly admit all the tracking they are doing.

Many reasons, one of which you said. What would the price tag be for them to admit all they are tracking?


Currently the price is free, and comes bundled with uMatrix, and a cookie flush. I’d like to pay the NYT for their journalism, but only with money, not the ability to track me. As a result they get no money, and no tracking.


> Currently the price is free, and comes bundled with uMatrix, and a cookie flush. I’d like to pay the NYT for their journalism, but only with money, not the ability to track me. As a result they get no money, and no tracking.

You misunderstood me. I mean, what would they like you to pay them, for them to be 100% transparent about what they're doing for tracking, what their advertisers are doing and who they are, and possibly stopping all that entirely. Ie, what is it worth to them.


Interestingly if you pay them, and thus are logged in when you view an article, then they can better track you.

In contrast if you never sign up, disable JS, and periodically clear your cookies, then the entire site works fine and none of the third party trackers work. At best they can link your browser user agent and IP to a hit on the server side.


NYT needs to produce and recommend content that people find engaging to continue earning their subscription dollars.

The idea that tracking is purely or primarily there to support a business model of selling user data is a strawman invented by self-righteous HNers. You need to know what parts of your product are effective to make it competitive in today’s marketplace.


90% of that can be accomplished with server-side stats. Do you really need to track mouse movements and follow readers with super-cookies across the web to find out what articles people find engaging on your site?

> The idea that tracking is purely or primarily there to support a business model of selling user data

Purely, no. Primarily? You can bet your sweet ass.


I agree in general but there are some things which I don’t see going away any time soon that publishers need. Online advertisers want to know that their ads are being viewed by a human and not a bot, and that they were on screen for long enough and that the user didn’t just scroll past. Publishers want to know how far down you make it in their article, so they know where to put the ads in the body of the article.


I'm not accusing anyone of selling my data and I'm not trying to champion a crusade against the entire advertising industry. I'm asserting that the NYT can achieve the substantial majority of the advertising optimization and targeting it needs to do to be profitable 1) without doing user-specific tracking and 2) without making page loads extremely slow.

Like I said, serve me an ad. I'm not an idealist, I understand why advertising exists. But don't justify collecting data about which articles I read to serve to some inscrutable network of advertisers by saying that it has to be this way. We don't need this version of advertising.


> I'm asserting that the NYT can achieve the substantial majority of the advertising optimization and targeting it needs to do to be profitable

Majority, not all. Why should they leave money on the table, exactly?


Because it's disrespectful of user privacy, performance inefficient and computationally wasteful?

Most companies are not achieving the platonic maximalization of profit or shareholder value. They leave money on the table for a variety of reasons. It's not beyond the pale that this would be one of them. If you don't agree, then frankly it's probably an axiomatic disagreement and I don't think we can reason one another to a synthesis.


There's nothing axiomatic about our disagreement here, it's not like I'm unaware of the existence of inefficient businesses. Individual companies may choose to leave money on the table, but industries and markets as a whole do not (not intentionally, anyway).

You've just described the status quo, where businesses have to sacrifice their lifeblood to achieve your ideals. Those businesses tend to be beaten by more focused competitors, which results in the industry you see today, filled with winners that don't achieve your ideals.

But good luck trying to champion an efficient web industry by essentially moralizing.


I'm not trying to champion anything, I'm speaking my mind. I don't expect the NYT to change because I'm writing an HN comment. If market forces or legislation are insufficient to force companies to respect user privacy across unrelated domains, then I'll rely on my own setup: a Pi-Hole VPN for mobile devices, and uBlock Origin for desktop devices. I happily whitelist domains with non-intrusive ads and respect for Do-Not-Track.

But more to the point, you're presenting an argument which implies the NYT is a business which will be beaten by its competitors if it doesn't track users through their unrelated web history. I don't think that kind of tracking is an existential necessity for the NYT. It's not their core competency. Their core competency is journalism - if they are beaten by a competitor it won't be because the competitor has superior tracking, for several reasons:

1. Journalism is not a winner take all environment,

2. Newspapers were surviving in online media well before this tracking was around,

3. The NYT already has sufficiently many inefficiencies that if they actually cared about user privacy, they could trim the fat elsewhere so they wouldn't have to know to within 0.001% precision whether or not a user will read an entire article just to be profitable.

I really don't think this is too idealistic. It's not like I'm saying they need to abandon advertising altogether. I don't even have a problem with the majority of advertising. It's the poor quality control and data collection that I take issue with. All I'm saying is that they don't need to do what they're doing to be profitable.


> Journalism is not a winner take all environment

So? I'm not sure how this means that news orgs won't suffer from losing business to competitors with superior tracking.

> Newspapers were surviving in online media well before this tracking was around

Markets change. Advertisers have different expectations. Readers have more news to choose from. This is a silly argument.

> The NYT already has sufficiently many inefficiencies that if they actually cared about user privacy, they could trim the fat elsewhere

Sure, but why? Why would they do that? Why wouldn't they trim the fat elsewhere AND keep the tracking to make more money?

The point you make doesn't really make sense. Yeah, it's theoretically possible for news orgs to stop tracking in the same way that it's theoretically possible for me to take out a knife right now and cut off my legs. News orgs can make up their losses elsewhere and survive in the same way that I can still get around with a wheelchair.

But why on earth would I or the NYT do that?

I respond to you with these questions because it seems to me that both you and the OP speak out against these practices because you feel they are unnecessary. My point is that they are necessary. You just don't acknowledge the forces that make them so.


> Majority, not all. Why should they leave money on the table, exactly?

It may be that they are, in fact, driving users away. Tracking user behaviour can become a distraction.


NYT used to exist only in paper form which had no tracking abilities at all. They may benefit from this but I’m skeptical that they “need” it.


The marketplace for your attention was a lot less competitive then. Editors could even feed you true and nuanced reporting, out of a sense of professional obligation, and you had no choice but to sit there and take it.


That sounds like a case of acute metrics-itis personally - looking for things to measure as a yardstick while forgetting that you get what you measure for instead of the core of the business.

While it may give some insight does it give anything of meaning to know most people skim into the first few paragraphs before clicking away? Does it improve the writing quality or fact checking? Is it worth the risk of alienating customers over? To give a deliberately extreme and absurd example Victoria's Secret could hire people to stalk their customers to find out more about product failure conditions in the field but that would be a horrifying and stupid idea.

"Everybody is doing it." is a poor rationale for implementing a business practice.


Except that the content is directly related to user behavior. If they see no one reads the style section, they'll cut it and move resources to financial news. If they didn't have tracking they'd never know that, be wasting resources and having a comparatively inferior product.

They can't do UX anaylsis, nothing.


For the first, it's sufficient enough to just look at the number of page requests.

For the second, I never got explained to me how UX analysis really works for news sites. Isn't it enough to put 2 or 3 people in a room and show them a few variations of the UI? There isn't really much to publishing text, images, and a few graphs. Graphs are a very well explored field, I don't think you can learn more about them by just watching hot maps and click through rates.


I suspect there's lot of bullshitting happening around "UX analysis", with third-party "experts" offering analyses which may, or may not, show something significant. As long as everyone in the chain can convince their customer/superior that their work is useful, the money will flow, whether or not the work is actually useful.


That's one of the fundamental problems in tech today, namely:

"It is difficult to get a man to understand something, when his salary depends on his not understanding it.”


It absolutely isn't sufficient to look at the number of page requests. How do you discern like-reads vs hate-reads? How do you determine whether someone clicked on an article, read the first line, and then bailed vs read the whole thing? There are a heap of metrics used to determine engagement which factor into the material decisions referred to in the grandparent.


Why do you need to track user behavior across unrelated domains to achieve any of that?


It's pretty simple, try a few different designs with A/B testing and you will see which one has the most revenues.

However the result will usually be a lot of dark patterns. For instance, that's why you get popup to ask you to register.


Server logs just aren't sufficient, no matter how many times hn says so. You don't get enough data to make data-driven decisions. Thats like giving a data scientist 1/3rd of the available data, and saying "thats good enough".

You'd expect ux to be a small unit, but that includes everyone who works on revenue-generating ads. Moving one tiny thing has a direct impact on revenue, which affects every person employed by nyt.


>Thats like giving a data scientist 1/3rd of the available data, and saying "thats good enough".

And it could easily be enough. Having 1/3 of a quantity of something doesn't mean you barely had enough before.


They could certainly track what you mention here (which pages are being accessed) via logging requests - without any use of additional front-end assets.


You can do all of those analytics server side, there's no reason to deliver it via JS and have the client do the computation. You're already sending all the required info to track that sort of thing via the request itself.


It's amazing to me that no one out there seems to do server-local handling of ads, either... If you put ads directly into your page instead of relying on burdensome external systems, suddenly blocking isn't a thing anymore. ALL of the functionality supposedly needed for analytics and an ad-driven business model can happen server side, without the page becoming sentient and loading a billion scripts and scattered resources, with the one exception being filtering out headless browsers. If external systems need to be communicated with, most of that can happen before or after the fact of that page load. Advertising and analytics is implemented in the most lazy, user hostile way possible on the majority of sites.


I don't think it's very surprising. Advertisers won't let publishers serve ads directly because that requires trust in publishers to not misrepresent stats like impressions and real views. I don't know how you'd solve that trust problem when publishers are actually incentivized to cheat advertisers.


I think you may have identified the biggest issue, and it's a shame the pragmatic solution is an unpleasant technical solution.


Couldn’t they eg have some trusted proxy server that routes some requests to the real-content NYTimes server and some to the ad server?


That sounds like a viable solution to the trust issue. They don't need to respond to the requests, just see copies they can be sure are real requests.


For advertisers to trust this proxy server, the NYT cannot control this proxy server to preserve its integrity. So now you're asking the NYT to base their business on an advertiser-controlled server?

What happens when the proxy goes down? What happens when there are bugs? Do you think publishers can really trust advertisers to be good stewards of the publisher's business? Think for a moment about publishers that are not as big as the NYT.

Okay, maybe they do trust an advertiser-controlled proxy server. This means that both tracking scripts and NYT scripts are served from the same domain, meaning they no longer have cross origin security tampering protection. What's stopping the NYT from serving a script that tampers with an advertiser's tracking script?


Those are issues, but not insurmountable, especially when the benefit is "obviate any adblocker".

They can use a trusted third party to run the proxy and use industry standards/SLAs for site reliability/uptime. And they can still use different subdomains with no obvious pattern (web1.nytimes.com vs web2.nytimes.com -- which is the ad server?) or audit the scripts sent through the proxy for malice.


The way it's implemented has several "benefits":

- It externalizes resource usage - the waste happens on users' machines. Who cares that it adds up to meaningful electricity consumption anyway?

- It makes it easier for marketing people and developers to independently work on the site. Developers can optimize, marketers can waste all that effort by including yet another third-party JS.

- It supports ad auctions and other services in the whole adtech economy.

- You don't have to do much analytics yourself, as the third party provides cute graphs ideal for presenting to management.


There used to be an open source self-hosted (php) ad application called openx. It worked well for quite a while. In its later years, it suffered a number of high-profile security vulnerabilities, and the open source version was poorly maintained since OpenX [the company] was focused more on their hosted solution which probably had migrated to a different codebase or at least was a major version past the open source codebase.

The open source version has been renamed "Revive Adserver", and it looks maintained, but I don't think it's used nearly as much as the openx [open source version] of old.

If you use Revive Adserver or you design a server-local ad system in-house, it won't be as sophisticated as gigantic ad-providers who can do all sorts of segmentation and analysis (producing pretty reports which execs and stakeholders love even if that knowledge adds no value to the business).


Funny that you mention that --in a former life I had to develop around and maintain an openx system.


It's because they use systems that identify the client via js to deliver the most "expensive" ad possible. It's complete garbage of course, Google/Facebook should be held liable for what they advertise, not run massive automated systems full of fraud. If Google delivers malware they shouldn't be able to throw their hands up and go "well, section 230!".


> They can't do UX anaylsis, nothing

They could, but that would require paying people and firms like Nielsen to gather data. Instead they engage in the same freeloading that the industry derides users for.


Reading without cookies or JavaScript enables seems to fix every problem the NYT has.


FWIW: ars technica turns off tracking for paying customers (and provide full articles in the rss feed if you pay for it.)


I need to like this comment more than my single upvote allows.


A random archive of the New York Times frontpage in 2005 is 300kb. Articles were probably comparable in size.

Are you honestly saying that the landscape of the internet and/or the staffing needs of the NY Times has changed so drastically that they actually needed a 22x increase in size to deliver fundamentally text-based reporting?


I mean, if most of that is a few images, then those images could just be bigger today for nicer screens and faster internet.

Not that that is the case.


You’re right about the problem: web pages tend to scale with the size of the organization serving them, not the size of the content. But this is the failure, not a defense.

It’s a big problem on mobile and the reason I read HN comments before the article.

> NYT employs 1,300 people


Definitely a good point. You'd imagine they'd have at least 1 person who optimizes the site for page size / load speed.


And 100 other people who’s job it is to cram more features in.


> Moby Dick is a book, written by 1 guy and maybe an editor or two. NYT employs 1,300 people.

Totally irrelevant. Why should the number of employees in the company have any bearing on the size or cost of the product? Ford has 5x as many employees as Tesla. Should their cars be 5x as big or 5x more expensive?

> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product

They may want this but they don’t need it. They successfully produced their product in the past without it.

> If Moby Dick was being rewritten and optimized every single day it would be a few mb.

Irrelevant and likely false. If anything, books and other text media tend to get smaller after subsequent editing and revising.

> A better comparison would by Moby Dick to the physical NYT newspaper.

Comparing a digital text product (Moby Dick) with a digital text product (a NYT article) is as close as it gets.


> Totally irrelevant. Why should the number of employees in the company have any bearing on the size or cost of the product? Ford has 5x as many employees as Tesla. Should their cars be 5x as big or 5x more expensive?

If the cost or the size wasn't a constraint, for sure Ford would build a car 5x as big or 5x as expensive.

The website size isn't a constraint here, if it was, they would works on it and make it smaller. It's only a constraint for highly technical people here. Currently at my job I'm optimizing some queries that takes way too long. It has been like that for years but we hit a wall recently, our SQL Server can't take it anymore. I always found it stupid that it took so long to optimize it... but at the end of the day, the clients just didn't care that it took 3 seconds to load the page. I could be working on more features right now, something that the client actually care about.

What makes the number of employees relevant to the size? Well if you were the only one building that website, you would know everything about it right? You would always use the exact same component, reuse everything you can, you already know every single part of the code. Add a second employee, now you don't know exactly what he does, you do know some of it but some time you forget and you may duplicate something or do it badly or whatever. At one point, something is just too big to be understood by a any single employee and you get code badly reused, stuff that serve no direct purpose too but make maintenance easier, etc... You never decrease the size simply because it's never worth it to but each and every single one of the employee add stuff to it.


>Why should the number of employees in the company have any bearing on the size or cost of the product? Ford has 5x as many employees as Tesla. Should their cars be 5x as big or 5x more expensive?

This is a bad example. Tesla is a failing/failed car company that cant produce 200k cars per year. Ford has their truck program that sells that in a few months.

>Comparing a digital text product (Moby Dick) with a digital text product (a NYT article) is as close as it gets.

Comparing a book vs a timely article is unfair. NYT produces content daily to encourage shares and people clicking on various links on the website. Links, Images, Videos, comments, etc... None of those are available in dumb text.


> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

Nonsense. I subscribe to the NYT so that I can read the news. Nothing about that necessitates tracking which users read which articles.

If the NYT uses page view data for anything other than statistics for their advertising partners, it's a shame. I don't want the NYT to tailor write their articles to maximize page views, time spent, or any other vanity statistic; if I felt like reading rage bait fed to me by an algorithm personally customized for all my rage buttons, there is plenty of that elsewhere.

NYT's differentiating factor is that they are one of the few businesses left that pays people to conduct actual journalism. If they give up on that, then I imagine their customers will just go to buzzfeed or wherever.


So, how does the NYT style section fit into your idea of 'actual journalism'?


It doesn't. Not every word printed in the NYT is journalism, but it doesn't change the fact that they are one of the few websites that have any journalism at all.

If the NYT cut their paper down to just the style section, horoscopes, and other garbage, they would be just another Buzzfeed and are probably not equipped to compete.

On the other hand, Info Wars, Mother Jones and friends offer publications with basically no journalism at all. That's the space the NYT, WSJ, Miami Herland, Chicago Tribune, and so on fill. They do Pulitzer prize worthy reporting. If these papers become run-of-the-mill click farms, I'm sure silicon valley will run them out of business, as rage bait is not really their core competency.


> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

This just seems like such an abuse of what the web was meant to be. I can imagine the horror people in the 90s would have experienced if they new what JS was going to be used for when perusing news sites.

Sometimes I wonder if it would have been better keeping the web as a document platform without any scripting, and creating a separate one for apps.

Anyway, an alternative model news sites could use is to let users choose which content they want to pay for. That's a way to track which content users prefer.


> I can imagine the horror people in the 90s would have experienced if they new what JS was going to be used for

We understood the horror no less than we do now. Javascript in the 90s gave us infinite pop-ups, pop-unders, evasive controls, drive-by downloads and otherwise hijacked your browser and/or computer. There's a reason Proxomitron and other content blockers hit the scene by the early 2000s-- the need to shut that shit off was clear.


Yeah, people under 33 seem to have a romanticized view of the internet. They believed there was no ads and it was flush with the kinds of content we enjoy today. Nope. Content existed but it was scarce/thin. Many of the internet users just stayed on AOL/Prodigy/Compuserve and never left to explore the WWW side of things. Those service providers were essentially national level BBSs.

There was no youtube, wikipedia, itunes or reddit. No instagram, twitter or google earth. The internet was basically geocities where most webpages were fan pages or pages/forums about niche interests.

I think people want to believe that because they believe that if ads were to disappear off the internet tomorrow, nothing would change. They don't realize that ads subsidize the content they consume, whether it's a youtube video they're watching or a reddit thread, ads are paying for that content. Nothing is free.


The web was intended to be a free exchange of knowledge, not ad driven, regardless of how JS was abused in the late 90s, early 2000s. A scripting language was added to the web because of Netscape’s commercial interests in Creating an alternative to MS products.


> The web was intended to be

I'm sorry, but this is ridiculous. You cannot say what the web was intended to be because you had no hand in inventing it. You do not know the mind of Tim Berners-Lee.

In fact, I argue the opposite - he had a vision of a global hyperlinked information system. While he wanted the protocol itself to be free (a move away from gopher), the information itself had no such protections. And that is precisely what we have today; it doesn't cost anything to use the WWW protocol. His vision has been fulfilled.

Now, the information (the content) itself is another matter. IP laws exist for a reason, people want to be paid for the content they create. They have ownership of that content. Whether it's the latest episode of game of thrones, a video game IP, or a book I wrote, the law protects my intellectual property. If I want to charge for access to that content, I'm more than within my legal right. Whether it's accessed over WWW, a cable box, or purchased from a book store, it makes no difference to my legal protections.


You write as if Tim Berners-Lee died in 1990.

I think it's quite possible to know what TBL thinks about privacy-invasive javascript, given that he's very much alive and writing about such things to this day.

https://webfoundation.org/2017/03/web-turns-28-letter/


> Sometimes I wonder if it would have been better keeping the web as a document platform without any scripting, and creating a separate one for apps.

The app platform would consume the document platform because it is easier to enforce DRM requirements in an app platform than in a document platform.

I'm just thankful that I can still Print to PDF nearly anything. I don't think it will last much longer though because "web" "designers" are driven to destroy anything and everything that was actually good about the web in their quest to monetize every pixel on my screen.


They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

I disagree. They need journalists, and they need to find some way to monetize. Your argument implies there is no other way than to add user tracking. Sure, images take up space, but I refuse to believe the current way of the web is the only viable option.


The New York Times existed for 145 years from its founding in 1851 to the creation of its website in 1996, and it got by just fine without tracking pixels in all those years.


This is a fallacy. Humans got by just fine without smartphones, Internet, electricity for thousands of years. While you could do just fine without those things today it is impractical. Times change (pun intended).


> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

That's analytics. The marketing stuff that is causing the bloat isn't doing that much in comparison. Trackers are often coded very wasteful and are redundant by nature. You can have easily dozens to hundreds of them all doing the same stuff just with different APIs. It is insane and out of control and has absolutely nothing to do with gathering insights about your app and improving. It is pure external 3rd party marketing.


Newspapers did just fine for centuries without tracking. The business is viable without tracking.


> They need data to produce the product

[citation needed]

There's no reason they need to use Javascript to track user behavior down to "how long have they read this article".


> NYT employs 1,300 people.

See 2 in the article. A sample:

“As Graeber observed in his essay and book, bullshit jobs tend to spawn other bullshit jobs for which the sole function is a dependence on the existence of more senior bullshit jobs:“

I work for one of the vacation rentals

No reason private owners couldn’t be doing the work over email. But certain fetishized models of doing, in this case cloud and web apps, get the focus.

It’s all for eyeballs and buy in at scale to justify the bullshit. “Look everyone is watching us talk up this shit! Better keep justifying it, bringing them into our flock!”

It’s turned us all into corporate sycophants. Religious conviction isn’t limited to belief in sky wizards

Anything sufficiently magical to the layman will instill blind allegiance

And despite all the smart people here, life as is seems magical and there’s a lot of blind buy-in


This is both an appeal to people's universal appreciation of efficiency, and a weak denunciation of the modern web. Your argument is

1) that a website's value is the number of words on the page, and

2) that raw text is the highest value data that can be transmitted over the internet, and

3) that inefficiency and wastefulness of bits is a bad thing

First off, you need to defend your first two assumptions. Don't websites do a lot more than display text? Does HTML/markup not have a magnitude more value than raw text? And how exactly is being inefficient with something that is abundant a tautologically bad thing?


#3 is an interesting thought. So when something is abundant, inefficiency and wastefulness are fine?

When it comes to bits, abundant to who? Those who can afford said abundance? Certainly not to those with bandwidth caps and slow internet access.

Reminds me of a cartoon I saw once: "What if climate change turns out to be a hoax and we end up making the world better for nothing!"

Since when is doing something efficiently and non-wastefully not a good idea for its own sake?


  Since when is doing something efficiently and non-wastefully not a good idea for its own sake?
I agree we should make things efficient for their own sake. But that's not what people are arguing about. They are arguing that the modern web is bad, and their reasons are weak. The modern web is terrific and we should not carelessly denigrate it, which is what I'm against.

Do not denigrate the modern web in the name of efficiency when your measure for evaluating websites is wrong and you incorrectly assume that the difference between a 1mb and 10mb payload matters to the actual operation of the site and visitor satisfaction.

Re: That comic you once saw.

Yeah, what if we shut down all coal power plants and make the earth better? Oh wait, now we have destroyed our critical infrastructure and our government/nation has absolutely no leverage to even make decisions regarding the environment. If people would stop dumbing down these complex problems, it would be a good thing.


If those coal power plants are replaced with alternative forms of energy, I don't see the problem... Talking about just "shutting them down" with nothing to replace them has absolutely nothing to do with that sentiment or the original cartoon... it's not remotely a reasonable reading of what I was saying.

Not to mention, my reply is to parent comment, not to the article in general. In no way did I say I completely agree with the author of the article- that's not what my comment was about.

However, if a news article with a load of JS and ads is considered to be this "modern web" then good riddance to it. I hope it goes away.

Yes, cool things can be done with the web, and some of those cool things take a lot of bandwidth or payload to accomplish- the examples (news sites, etc) are not these things.

What are your reasons for thinking the modern web is terrific? I agreed with that statement when I saw it, but as I think about it, I'm not so sure I do


Since when is doing something efficiently and non-wastefully not a good idea for its own sake?

When the effort to optimize something would be better spent elsewhere.

On the web there are two important points. First, bandwidth isn't always abundant so optimising is worthwhile, and secondly optimising is effectively a negative cost if you just don't include wasteful features in the first place - building optimally is less effort than building wastefully.


1 and 2 are your own extrapolations and have nothing to do with the author's arguments. Please point out where the author makes these arguments in case I missed something.

> Don't websites do a lot more than display text?

Yes, evidently, but the author doesn't disagree with this. In fact, the entire premise of the article is that websites do a bunch of useless stuff that have nothing to do with delivering the content you are browsing them for.

> Does HTML/markup not have a magnitude more value than raw text?

As a universal rule there is of course no answer to this. There is a lot of absolutely worthless text on the web. Given the example, though, would you say that the markup is worth more than the text on a website whose primary attraction is written articles? Do you frequently visit websites just to admire their markup?

> And how exactly is being inefficient with something that is abundant a tautologically bad thing?

My time is limited. RAM is limited, CPU time is limited. My mobile data plan is limited. No one is happy with a page loading for 10 seconds for something which should take 1/10000 of that. That's not to say that this is tautologically a bad thing. The author explains why he thinks its bad and I think that people that value their time and resources should agree. If you think that an article loading for 10 seconds is a bad thing, it's bad that an article takes 10 seconds to load. It only becomes tautological once you apply your value system to it, if ever.


1. A website's value is the amount of information that it provides to the end user. I would argue that the entirety of the works of Shakespeare or the 1911 Brittanica Encyclopedia provides more information than a high-definition picture of Donald Trump grimacing at EU leaders or an autoplaying ad for Doritos Locos Tacos NEW AT TACO BELL. As far as supplemental uses, at the end of the day, people are using plaintext to communicate with other people. Images and videos are secondary. If that ever changes, then society is already doomed as literacy is fundamental to the maintenance of technology.

2. Raw text is the highest value data/information return that can be transmitted over the internet. There's a reason that Morse Code and APRS are still around: they're reliable, appropriate tech, and require little to no middlemen outside of the transceivers themselves.

3. If wantonly (namely, for no enduringly good reason) increasing the amount of entropy in the universe isn't tautologically bad to you, then I really doubt that any argument would sway you to the contrary.

Concerning the relative value of HTML and CSS, yes, you could argue that UX matters in that department, but even the most bloated static HTML/CSS page is going to pale dramatically in comparison to the size of what's considered acceptable throughput today.


1. Well how is the format of plaintext the best method of getting information to the end user? What if you added a thin indexing layer on top of the plaintext? That would allow people to jump through huge documents with ease, but it's no longer plaintext. Sounds more valuable to me. Where is the line? What's the ideal?

2. Fair enough

3. Referencing "increasing the entropy in the universe" isn't a good argument because the amount of entropy increase due to humans is much, much less than how much entropy is increased by particles being blasted out of all the stars in the universe (unless I fundamentally misunderstand what entropy is). I think that stars blasting out particles is a much larger contributor to entropy than humans not using computer bits effectively.

And also what does entropy as a concept have to do with anything, anyway? Why should human engineering tasks have such considerations? If being super efficient with an abundant resource has a large cost (of some sort), but low efficiency has no business- or environmental- downside, then why be efficient with it?

In your last bit you argue that it's acceptable to send a site that is bigger than even the most bloated HTML/CSS page; I don't think that's true for any site/app that wants to be fast. It slows you down and people notice and stop using your service unless it's required of them.

I think that in general things are not as bad as you make them out to be, and your arguments have some merit but are mostly revealed as nonsense when the rubber hits the road. Universe entropy is completely unrelated to modern software engineering and web sites that have a lot of devs and are not SalesForce are not _that_ bloated.


> but low efficiency has no business- or environmental- downside, then why be efficient with it?

But it has. Data transfer and processing isn't free. It works on electricity. You may think that a difference between 10KB (efficient) and 10MB (current web) is meaningless because resources are abundant, and it let you save couple hours of dev time[0] - but consider that this difference is per user, and you saved a couple hours for yourself by making thousands[1] of people waste three orders of magnitude more electricity that they would if you were a bit more caring.

Like plastic trash and inefficient cars, this stuff adds up.

--

[0] - Such savings on larger pages take obviously much more work, but then this time gets amortized over all the use the website has - so the argument still holds.

[1] - Millions, for large sites.


I don't think people actually waste three orders of magnitude more electricity by loading 10MB vs 10KB - sure, that much more CPU time is used specifically on loading the extra data, but that would be a fraction of what's being used for all the other processing going on, and people don't just flip the power switch as soon as a page load finishes.


> I don't think people actually waste three orders of magnitude more electricity by loading 10MB vs 10KB (...)

Yeah, they actually waste more in this case. The base cost is that of processing of content, which is linear with size (on a category level; parsing JS may have a different constant factor than displaying an image). But in the typical makeup of a website, just how much stuff can be in the 10KB case? Content + some image + a bit of CSS and maybe a JS tracker. In the 10MB case, you have tons of JS code, a lot of which will keep running in the background. This incurs continuous CPU cost.

> and people don't just flip the power switch as soon as a page load finishes

CPUs have power-saving modes, power supplies can vary their power draw too.

Or, for those with whom such abstract terms as "wastefulness" don't resonate, let me put it in other words: if you ever wondered why your laptop and your smartphone drains its battery so quickly, this is why. All the accumulated software bloat, both web and desktop, is why.


I agree with your point, but the GP's has validity too: the infrastructure to get that page to where it is read, does draw power in proportion with the amount of data it's handling.


As far as #1 goes, I'm arguing that plaintext is the ideal to strive towards, not the living practical reality. As far as indexing and access, the Gopher protocol and Teletext are great options to look at.

As previously noted, if you don't find that waste is fundamentally wrong on a moral level, there's no point forward from here. I view myself on a planet of dwindling resources, vanishing biodiversity, and warming at increasing rates.

If you think the energy that goes into computation is free or lacking external environmental downstream effects, then at the root of it, you carelessly shit where you eat and I don't. That's a fundamental disagreement.

[1] https://en.wikipedia.org/wiki/Gopher_protocol

[2] https://en.wikipedia.org/wiki/Teletext


Well hold on, I don't disagree that we are on a planet of dwindling resources. It's the method of environmental improvement that will have the greatest positive effect that is the root source of disagreement. That's the crux of the problem - what is the process of solving these problems?

I would argue that arguing over how big our websites are is not the important factor. I submit to you that PC electricity usage is the most relevant quantity we need to discuss when it comes to consumption of bits. I argue that the increased load on the network of sending more bits is negligible compared to the many endusers and their PCs that consume our data.

If we agree that PC electricity consumption is the most important thing to address, then we must ask whether or not the electricity generation process is bad for the environment. Most likely, electricity is generated by hydroelectric dams or coal/combustibles power plants. Suppose we replace those two types of power generation with low-maintenance, 50-year-lifetime solar panels (for which the tech exists). Can you still argue that the increased amount of bits sent over the wire for heavy modern websites is an environmental negative that we should address? I would say, no, at this point we have reduced the environmental impact of most electricity-consuming devices, and we can ignore PCs for the time being.

Therefore it is not the personal computer and the quantity of bits it consumes that should be your focus. It should be electricity generation.

I would like to ask you to consider whether or not your compassion-based arguments contain any resentment. Are you acting and speaking entirely on the grounds of compassion? And if so, how can you be sure that your supposed actions are going to reduce suffering of people and the planet and not have the opposite effect? How can you suggest solutions, like decreasing the weight of websites, and know with a high degree of certainty that it will produce the desired outcome (environmental preservation)? Could it have an unintended consequence?


In all honesty--you're right about there being bigger problems.

But let me put it this way, I still reduce, reuse, and recycle even though I know that one unconscientous suburban family will essentially dwarf my lifelong efforts in a year of their average living.

I know that those efforts are futile for the end goal of environmental conservation. That doesn't mean that I'm going to stop doing them. Being dedicated to acting in accordance with an understanding of first principles is not a bad thing, even if those actions are relatively impotent or ineffectual in and of themselves in the current moment.

As far as changing out power sources to nominally sustainable forms, yes, I would still find issue with people wasting those resources, just as I would find issue with people running air conditioners with the windows open.

As far as compassion and unintended consequences, everyone might be here for a reason and maybe trashing the planet is part of that plan, but equally so I might be here to speak against trashing the planet as a part of said reason and said plan.

It boils down again to if you need to find a reason to justify minimizing unnecessary energy usage, we're not going to see eye to eye and I doubt any argument will sway either of us towards the other's camp. Chalk it up to different contexts.


Gopher- I remember setting up and using that as a part of a intern-like job at my local high-school. Back in the days of trumpet winsock and other relics of the hand-crafted TCP stack. shudders Mind you- it does deliver text at low bandwidth. :) Minimalism in communication. I wonder if I can get my SO off facebook and onto Gopher in the interests of the environment... manic laughter fades into the distance


I have given this discussion a good looking over to see if anyone cares for the environment. Glad to find someone that does.

I believe that care for the environment and design that puts being green first is going to have its time in web design. I also believe that along with document structure, accessibility and 'don't make me think' UX that eco-friendliness is going to become a core design principle in a lot of the web. If you put this stuff first then you can have a website that is pretty close to the plaintext ideal. This can be layered on with progressive web app 'no network' functionality and other progressive enhancements, e.g. CSS then JS, with the content working without either of these add-ons.

We all know that you have to minimise your scripts and mash them all into some big ball of goo, we all know that images that are too big aren't going to download quickly. But the focus is on 'site speed' rather than being green. In fact no developer I have ever met has mentioned 'being green' as a reason to cut down on the bloat and existing thinking on 'going green' consists of having wind turbines rather than dinosaur farts powering the data centre. Cutting down the megabytes to go green is kind of crazy talk.

A lot of this thinking is a bit like compacting your rubbish before putting it put for the bin men. Really we would do best to not do the rigmarole of compacting the trash and just having less of it to start with, ideally with more of it re-used or put out for recycling.

We saw what cheap gasoline did to the U.S. auto industry. For decades the big three added on more and more inches to bonnets (hoods) and boots (trunks) with very big V8 engines a standard feature. Until 1973 came along there was no incentive to do otherwise. Who would have thought to have cut down on the fuel consumption?

Outside of America, in the land of the rising sun they did not have a lot of oil. Every gallon they bought had to be bought in U.S. Dollars and so those U.S. Dollars had to be earned first. Europe faced the same problem so economy was of importance in a lot of the world outside America. The four cylinder engines powering cars in Europe and Japan became vastly more efficient than U.S. V8 monster engines. Not only that but cars with a four cylinder engine did not have to weigh many metric tonnes. Nowadays the big three can only really make trucks and truck based SUVs that are protected with the Chicken Tax. Nobody in America is buying U.S. made small cars, U.S. made luxury sedans or even U.S. made 'exotic' sportscars. Economy and 'being green' is not a big deal to U.S. car buyers, nonetheless the lack of having efficiency and economy as a core part of the design ethos has led to a domestic industry that has lost to the innovators in the rest of the world that did put these things central to what they do.

We haven't had 1973 yet and the web pages of today are those hideous Cadillac things with the big fins on them and boots big enough to smuggle extended families across the border with. AMP pages are a bit like those early 'Datsun' efforts that fell short in many ways. But I think that the time of efficient web pages is coming.

The Japanese also developed The Toyota Way with things like Just in Time and a careful keeping tabs on waste. Quality circles also were part of this new world of manufacturing ethos.

The old ways of making stuff didn't really give the results the Japanese were getting but exchange rates, Japanese people willing to work for nothing and other non-sequiturs distracted people from what was going on and how the miracles were achieved. The Germans and the Japanese built great engineering 'platforms' and then got some great styling for the bodywork from the legendary Italian design studios to package it all together. Meanwhile, in the USA there were more fins, more chrome bullets on the grille and more velour in the interiors.

So with the web it isn't just the Lotus 'just add lightness' that is going to be coming along to kill the bloat. It is also ways of working. For a long time the industry has been doing design with lorem ipsum, static PDF mockups and then handing this to some developers with the client expecting not a pixel to differ from the mockups, regardless of whether any of it made any sense. So we have got stuck with the same carousels on the same homepages - the tailfins of the web.

Although it is 'industry standard' to work certain ways, e.g. the big up front design by someone who can't really read, the project manager who can't do HTML, the agile design process that means nobody knows what they are doing, something has to change. Content driven, iterative improvements and much else we forgot from the 'Toyota Way' will ultimately win out with things like being green actually being important.

As for the article, the 'bulls4it web' and David Graeber's ideas as applied to web bloat is an excellent contribution to what web developers should be thinking about.


I think the car analogy is flawed; yes, mid-century US cars were not very economical/efficient, but they sacrificed that for comfort --- big roomy interiors, cushy seats, soft suspensions, automatic transmissions, A/C, etc. The automakers were simply obliging to please customers with these features. There's a reason "econobox" is mostly a pejorative.

On the other hand, bloated slow websites only serve the needs of their authors, while annoying all their users. Users aren't asking for more tracking, ads, or any of that other bullshit.

(Full disclosure: I'm a big fan of vintage "Detroit Iron". You really have to ride in one to understand the experience.)


> On the other hand, bloated slow websites only serve the needs of their authors, while annoying all their users. Users aren't asking for more tracking, ads, or any of that other bullshit.

The bloat actually serves a lot of people, directly or indirectly. It is like packaging, everyone complains about packaging and plastic but when you are in the supermarket do you take that box that is already opened or that tin with the dent?

There are lots of stakeholders behind the bloat. Including the bullshit jobs people of the online world, e.g. the SEO people, the people in marketing and the programmers. In my opinion the cookie-cutter way of churning out websites is being done by a lot of people that are barking up the wrong tree on how to do it with knowledge of modern web technologies rarely gained. Buried in what you see is a bundle of reset scripts, IE6 polyfills and other stuff that nobody dares to touch as it has been there since 2009 and nobody knows what it does, within the company that wrote the CMS or in the agency that adds the 'theme'. It is worrying really with the best people can do is to add layers of ever more complex 'build' tools to mash this cruft into something they don't have to think about.

P.S. There is no way I would be in an econobox if travelling through the American West, give me one of your trucks, SUVs or even a sedan any day. In Europe though, tables are turned, a country lane or a city stranded in a U.S. vehicle would be a special kind of hell.


Awesome take on the situation.


In response to #2, there is also a reason that photojournalism exists--unless you believe in a world where everyone imagines what important figures and historic events look like based solely on textual descriptions. This is completely ignoring the fact that I, nor most of society, would never attempt to receive the day's news via Morse Code.


Hint: TCP/IP & UTF-8 are fundamentally along the same principles as APRS. So yes, really you are.

As far as photojournalism goes, I can think of thousands of reasons why the predominance of photojournalism has been pernicious to civil society. The strategic use of the identifiable victim effect in atrocity propaganda being the most obvious.

However, with that said, I'm not arguing that visual media is entirely unnecessary, but rather against the idea that every user needs to download a 1680x1050 image when a default 600px width image or even the horror of having the image as an external link would be more than adequate for the overwhelming majority of users, particularly those in rural and developing areas that can't afford to waste their total allotment of monthly mobile data on "What Chance the Rapper’s Purchase of Chicagoist Means".

[1] https://en.wikipedia.org/wiki/Atrocity_propaganda

[2] https://en.wikipedia.org/wiki/Identifiable_victim_effect

[3] https://www.nytimes.com/2018/07/31/opinion/culture/chance-th...


This dovetails into an idea I had [0]. Basically just client side scrape the web as it's used and deliver people this plain text and simple forms. It would have a maintained set of definitions and potentially even logic to put a better "front" on all this bullshit. It's like reverse ad block where you only whitelist some content instead of blacklisting it. You could argue sites will get good at fighting it, but if used enough by the common user, they'd just alienate them (e.g. my scrape/front for Google search makes it clear which results the app has a friendly scrape/front for).

0 - https://github.com/cretz/software-ideas/issues/82


I've often toyed with the idea of using multi-user systems over SSH running Gopher and Lynx to achieve something like this.

In the process, it would also decentralize communities and establish digital equivalents of coffee shops (i.e. places to work in public and meet strangers)--basically SDF, but deployable on Raspberry Pis with more modern userland toys (i.e. software actually designed to be multi-user on the same system).

[1] https://en.wikipedia.org/wiki/Gopher_(protocol)

[2] https://en.wikipedia.org/wiki/SDF_Public_Access_Unix_System


My main reason for client side is to skirt legal troubles that can result from running a web-filtering proxy (not whether it's legal or not, but whether you will be in legal fights). Either way, needs to be as transparent as possible and as usable by the less-tech-savvy as possible.

But that's really all it is, a web server (or an app, or an extension, or a combo) that serves you up the web looking like Craigslist. Would require strongly curated set of "fronts"/"recipes".


Sounds a bit like tedunangst's miniwebproxy[0]. I've been wondering about writing either something like it or a youtube-dl-like "article-dl" for my own use, but haven't quite been annoyed enough into doing it yet.

[0] https://www.tedunangst.com/flak/post/miniwebproxy (self-signed cert)



I don't know how many people here read usenet or were on old mailing lists. You could have removed a some hard edges in usability and everything would have been connected to phones with monochrome text displays back in the nineties already. They didn't even provide decent email experience.

But instead we got the technology developing through some ringtone stuff advertised on TV.

I guess it's something that you can instantly show to your friends.


That's one of the wildest things! We had the technological capacity to run Unix systems with several hundred users simultaneously and access at the speed of thought with 26kbps modems back in 1992, complete with instant messaging and personal directories! What happened?!


Another wild thing like this is what you'll notice when you read up on Lisp machines. We had development environments in the 70s/80s that would seem magical today.


I’m reading the book valley of genius where the xerox parc people basically make the same argument. The Xerox Alto’s smalltalk environment still isn’t matched today and the PC experience is much weaker for it.

The problem with those kinds of environments (where everything is editable at runtime using highly expressive langiages) is they assume everyone is a power user and there are no malevolant actors trying to mess up your machine. That’s not what the modern landscape is like.


kinda so did reasonably modern ideas. take active desktop for example! sure it’s more high level, but i believe the quote is that they wanted websites to do “cool things” with the desktop. cringeworthy by today’s standards...

things get more locked down as we develop abstractions that we have more control over


It's heartbreaking.


Shows we are not limited by technology but somehow get distracted by other things.


The "other things" are the short-termism and appealing to the lowest common denominator that go with the pursuit of profit before anything else.


In Usenet’s particular case it was its open, unmoderated nature that killed it - once it became 99% spam, warez and CP most ISPs dropped it.


There were moderated groups but IIRC they were updating more slowly because every message was reviewed.

Anyway, there was a certain barrier for entry so there were less users and messages. But some really good experts posted there. And some really fun jokers.


I support your effort to make Moby-Dicks the football-field-like unit of measurement for text-focused data. It’s close enough to the 1.44 MB floppy disk to handle easy mental conversion of historical rants, and half of the people reading this have probably never held one of those. I still remember downloading a text version of a 0.9 Moby-Dick book from some FTP site and carrying it around on a floppy so I could read it on whatever computer was handy.

That aside, the most shocking part of your analysis is how inefficient the nytimes was at caching resources for your reload.


For a rather more technical comparison, 4,600 pages is more than the size of Intel's x86/64 Software Developer's Manual, which is ~3-4k pages.


"I just loaded the New York Times front page. It was 6.6mb."

   ftp -4o 1.htm https://www.nytimes.com

   du -h 1.htm

   206K
For the author, 206K somehow grew to 6.6M.

Could it have anything to do with the browser he is using?

Does it automatically load resources specified by someone other than the user, without any user input?

Above I specified www.nytimes.com. I did not specify any other sources. I got what I wanted: text/html. It came from the domain I specified. (I can use a client that does not do redirects.)

But what if I used a popular web browser to download the front page?

What would I get then? Maybe I would get more than just text, more than 206K and perhaps more from sources I did not specify.

If the user wants application/json instead of text/html, NYTimes has feeds for each section:

    curl  https://static01.nyt.com/services/json/sectionfronts/$1/index.jsonp
where $1 is the section name, e.g., "world".

The user can use the json to create the html page she wants, client-side. Or she can let a browser javascript engine use it to construct the page that someone else wants, probably constructing it in a way that benefits advertisers.


I don’t think there is anything wrong with user agents downloading resources (like images and stylesheets) linked to by an html document. It is the providers, not the user agents, who have violated the trust of users by including unnecessary scripts, fonts, spyware, advertisements, etc.


"I don't think there is anything wrong with user agents downloading resources (like images and stylesheets) linked to by an html document."

Neither do I. For some websites, this is both necessary and appropriate.

However, in cases where the user does not want/need these resources, or where she does not trust the provider, I do not think there is anything wrong with not downloading images, stylesheets, unnecessary scripts, fonts, spyware, advertisements, etc.


My pet comparisons for everything being too big nowadays are Mario 64 (8mb!), Super Mario World (512kb!), and Super Mario (32kb!!).


I first realised how heavy these pages are when I disabled javascript. Things load in the blink of an eye. \Most\ pages work and the web remains largely usable.


Yes this is my experience as well, JavaScript is often the key antagonist. Unfortunately many websites require JavaScript to function


bbc.co.uk will load just fine without JS and actually be more enjoyable (IMO) than JS version.

cnn.com fails miserably without JS.


IMHO you are confusing data with information with knowledge. And mixing mediums. You can't compare a novel - the plainest of plain-text mediums, with the front page online of a major news organization in 2018 - of course it will be interactive content, its an entirely different medium, a different market, and different sets of user expectations and competition.

https://www.quora.com/What%E2%80%99s-the-difference-between-...

DATA: a "given" or a fact; number; picture represents something in real world raw materials in production of information

INFORMATION: Data that have meaning in context Data related Data after manipulation

KNOWLEDGE: familiarity, awareness and understanding of someone or something acquired through experience or learning it is a concept mainly for humans unlike data and information.


but don't want it to be hugged to death:

Incidentally, this is also another reason for keeping pages small --- bandwidth costs. I remember when free hosts with quite miniscule monthly bandwidth and disk space allotments were the norm, and kept my pages on those as small as possible.


I just loaded up a nytimes[1] article too - and only weighed in at 1.0MB. For a 1000 word article. Subsequent reloads dropped it to ~1000KB. I don't think that's too bad, considering there are images in there as well.

Now of course, I'm running an ad blocker. I assume the remaining MB that you noticed had come from advertising sources. In which case, bloat isn't the issue, ads are.

[1] - https://www.nytimes.com/2018/07/31/us/politics/facebook-poli...


weighed in at 1.0MB. For a 1000 word article. Subsequent reloads dropped it to ~1000KB.

You really can't beat savings like that.


That'll knock dollars... no, cents... no half-cents off his ISP bill!


That distinction is nonsense. The ads are part of the page and are no more or less bloat then the rest of the useless junk that gets embedded. It's deliberately put there by the NY times, they don't end up there by accident.


Are you also running noscript? I'm running a DNS sinkhole and still get 3mb on reloads.


Comparing the raw text of a fiction novel to the code of a website is a pretty asinine comparison, honestly.


Maybe you'd prefer comparing the code of a website to the amount of useful content on the website, which OP also did. Taking "I'm downloading 100mb worth of stuff (83 Moby-Dicks) to read 72kb worth of plaintext" at face value, we could also say that 0.072% of the data transferred is useful, or, equivalently, that 99.928% of it is crap.


You don't have to download the typography of a physical book but it still plays a huge role in the readability and enjoyment of it. So I guess the typography of websites is "crap" because it has to be downloaded?

It's a ridiculous apples-and-hammers comparison thinly veiled as an intelligent critique.


I've already downloaded everything I need for perfect typography. I can apply it to most websites with Firefox's Reader Mode. Websites cannot possibly improve on this, because the best and most legible typography is the typography you're most familiar with. I don't care about branding or image or whatever bullshit "designers" use to justify their jobs. Web fonts and CSS have negative value to me. I disable them as far as possible.


How much of the data downloaded is actually for typography?

Also, browsers have good enough typography by default, which can be controlled with CSS.


Everyone's favorite example of that kind of "brutalist" design:

http://bettermotherfuckingwebsite.com/



A++++++ would inspect elements again.


Aww! Thanks ;-)


Came looking for motherfuckingwebsite.com & find an even better site to reference now instead. Kudos!


Bad design because of the low contrast text. There used to be a superior version at https://bestmotherfucking.website/ , but it isn't loading correctly for me on Firefox.


I can put raw text on a kindle and read it. In fact, I frequently do. Comparing digital text to digital text is not apples-and-hammers.


Exactly, it’s not the NYTs fault that plain text compresses well compared to jpgs.

Also, is a world where he NYT is subscriber only really preferable?


This is a textbook definition of a false dichotomy. There are other distribution models for digital news services. There are other methods for transmitting digital content. It's not an either/or situation.


Fair enough but these rants on HN about publishers never seem to contain any examples of publications that are both successful and delivering pages that weigh scarcely more than their plain text equivalent.

I think you invite the “false dichotomy” by making the comparison between plain text Moby Dick and the front page of one of the most successful newspapers in the world in 2018. I agree with many of your points but find the way you make your argument to be full of comparisons of apples and oranges while avoiding proposing any kind of solution.


> Fair enough but these rants on HN about publishers never seem to contain any examples of publications that are both successful and delivering pages that weigh scarcely more than their plain text equivalent.

Examples would be the same publications 10 or 20 years ago. They weren't exactly plain text but they were a lot lighter and the content has not improved measurably in that time.

Here's one from the 70's that's still going: https://en.wikipedia.org/wiki/Minitel


I don't care about examples from 10 or 20 years ago, I want examples from the current market.

Minitel isn't a publication? I'm familiar with Minitel, I've read a book on it, but I don't know what you're trying to insinuate by linking it here in a discussion about publishing.

Also, "still going strong?" Minitel was discontinued, in 2012. Because France has the modern internet now.


Did you read the article I posted? I definitely put forward both an acknowledgment that media firms won't change and ideas towards reducing usage where possible.


Again with your post I see many more words about Moby Dick and nostalgizing for a past that no longer exists then words about a possible solution, and while your solutions are better than the typical “just move to a susbscription model,” I can’t see how we would generate the political will for anything other than cheap internet. We can’t even agree to tax carbon emissions yet. How are we going to tax bandwith and convince everyone to accept a low bandwith internet?

I don’t have any great ideas myself, but part of me sees the Americans with Disabilities Act as a model for getting this done.

I really don’t like AMP because of the Google Cache and the potential for google to bias their search results page to emphasize AMP pages over similarly performing non AMP pages, but it’s a better thought out attempt to fix “the Bullshit Web” than anything else I’ve seen, and it has been extremely successful in decreasing payload size for many readers of sites like NYT and other major publishers.


Completely agreed. I could continue by comparing it to the amount of bandwidth in a 30-minute CNN broadcast, all to read a few thousand words at me.


Broadcasts are great! You can send a lot of content out, and it’s essentially zero marginal cost per additional receiver. Plus, it’s very hard to track behavior of consumers without their explicit consent.

People are most familiar with realtime broadcast audio and video, however I could see something like newsgroups working via a broadcast medium.


I can't agree more with the points you make, I've spent a decent amount of time and effort reducing the overhead of my blog, for example - https://goose.us/thoughts/on-the-purpose-of-life/

That page includes images and "embedded" youtube videos, but loads 454kb of 3,187 words with 10 requests in 450-600ms - if anyone has any suggestions on how to reduce that further, I'd love to hear.

Going to bookmark that txti.es service for the future, definitely seems useful for publishing simple content without needing to fit it into a blog theme.


I'd start with the 100+KB PNG images. None of those pictures are complex, they can be compressed much more.


I put them all through ImageOptim, so not sure if those PNGs can compress anymore...

I agree on those complexity graphs since they're actually scanned from Sean Carroll's book and edited in Pixelmator - if I was better with graphics I could probably recreate them as SVG or in an editor and make it only a couple kb.

Same for the MinutePhysics screen grab, though I couldn't bring myself to attempt a poor recreation of their work and couldn't get rid of the weird pink gradient background which probably prevents better compression.

The Youtube cover images are still being pulled in from youtube - I debated downloading/compressing/hosting them myself, but figured I'd gain more from downloading them from the separate domain since I still think there is a default limit of connections the browser will open to a single domain at the same time.


That's because PNG is lossless, use JPG, and you'll dramatically cut down the space.


Have you tried opening the image(s) in Photoshop and try outputting them at different resolutions and compression settings?


Ditch the "embedded" videos. Use a screenshot linking to the videos directly.

Embedded videos still load a ton of shit and track visitors without their consent.


> mb

I think you didn't mean milibits, but megabytes (MB). 1.2 mb = 1.5e-10 MB.


Your math does not add up. It is 1 * (6.6 + 3 * 5.9) + 3 * (5 + 3 * 5.9) == 92.4. Unless you are speaking about approximations obviously.


I must say, despite what I imagine is a good bit of traffic the instant load times on your site were a joy to behold. I never get to experience that kind of speed on the "modern" web. Even HN loads orders of magnitude slower than that.


Not my website, but please send along thanks and awareness to @thebarrytone on Twitter!


> Moby Dick is 1.2mb uncompressed in plain-text

How many times did you read moby dick online.


> As my father was a news and politics junkie (as well as a collector of email addresses)

I feel like there's something I'm missing here: something like getting his hands on clever usernames?


NoScript cuts the bullshit down to 1.35 MB with all scripts blocked and it's still readable. I can barely tolerate the web without it.


You want to compare plain text to a newspaper, which even in paper form, people expect to contain pictures and advertisements.


Bad comparison, Moby Dick never tracked your activity across the web and sold your data to advertisers.


Moby Dick doesn’t have any pictures or video.

Audiovisual media has value.


Audiovisual media of value has value.

Audiovisual media that I choose to donate my bandwidth towards downloading may have value.

Audiovisual media in and of itself has no value.

Audiovisual media that automatically loads, thus slowing down the loading of everything else, has negative value.


The 100mb pays the bills for the other 72kb of text.


> to read just one article from the New York Times, I had to download the equivalent of ten copies of Moby Dick.

But how many Mona Lisas are this?


Well said.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: