
This Page Is Why The Internet Sucks - mikecane
http://mikecanex.wordpress.com/2013/04/15/this-page-is-why-the-internet-sucks/
======
Drakim
On this blog there are 11 "share this" buttons, a huge pointless banner of
mostly black space, and half a page worthwhile of text is followed by 5 pages
of black space to allow for all the sidebar links. Is he being ironic?

~~~
rjwebb
Does anyone even use those buttons?

~~~
pointyhatuk
What buttons?

(adblock)

~~~
obviouslygreen
Using Ghostery here... my initial perception of a site is always based
partially on how many entries are blocked on load. I've seen pages with 20+
calls to outside sharing services; at that point, a little extra cynicism
kicks in.

~~~
pointyhatuk
Agreed. The number of things Ghostery blocks seems inversely proportional to
the utility value of the site!

------
brudgers
"This" is the internet Google created. Not a catalog of all information, but a
catalog of all the information which has been wrapped in monetization schemes.

Use Google Search for "convert PDF to HTML." See page after page of links
which Google might monetize alongside the paid advertisements it already has.

Google makes it look like this is a difficult task, not a solved problem with
a FOSS solution - Pdftohtml that ships in many Linux distros.

Google obfuscates the most relevant information by burying it within links to
discussions on Linux forums. Even though there's a fucking SourceForge page
and it links to a Windows friendly version. The crap results increases the
odds that the average user will click through a revenue generating link either
directly on Google's page or on a site running their ads.

~~~
mseebach
No, Google did not create attention-grabbing-for-money. Look in a good old
yellow pages phone book: All those colourful, attention-grabbing ads diverting
your attention from the small-print actual listings? They're paid for.

Google merely made a very successful business out of making businesses willing
to pay up slightly easier to find, roughly like the yellow pages did.

Finally, on your "convert PDF to HTML" example: My first hit is PDFOnline.com,
that has a nice, green button labeled "upload". When I use that button to
upload a PDF document, the site generates a HTML document for me. It was
quick, easy and it didn't ask for money. Totally passable for the top result
for the query.

pdftohtml was the fourth link - hardly outrageous for a terse, technically
worded page. Oh and that Windows version (helpfully referred to as a "win32
GUI", because nobody calls it "windows" anyway) that it linked to? The link is
dead.

~~~
leephillips
I know what brudgers is talking about - I frequently encounter the same
frustration when attempting to use Google to find actual information in a
space where too many people are trying to make money. The links you see depend
on what Google knows about you. I tried the search in an incognito tab, and
the pdftohtml link, which would be the most useful result, was nowhere on the
first page. When I add the word "linux" to the search, it appears. Not bad at
all in this case, but I've been stymied in the past when no combination of
keywords and search operators could locate good, technical information that I
knew was out there, but instead returned page after page of spam and junk. I
know Matt Cutts still claims that there is a firewall between search and ads,
and there is no compromise on search quality for profit reasons, but does
anyone still believe this?

~~~
mseebach
This whole sub-thread hinges on the assumption that the "pdftohtml" result is
objectively the correct one for the the query "convert PDF to HTML", and that
it's commercial corruption of Google that keeps that result off the front
page. I think that that is a false assumption. If a non-technical user need to
convert a PDF to HTML, the pdfonline.com result (the one with the friendly
green button) is leaps and bounds better than the pdftohtml SourceForge onw
with the broken Windows link. Google doesn't have any kind of obligation to
promote FOSS software at any cost, they have an obligation to answer the users
question - and in this case it does just that.

Yes, I too have had to sort through pages of various SEO keyword-spam to find
what I'm looking for, but the connection from there to this poor example to
the number of extraneous elements on a website to accusing Google of having
forged a broken web is, well, weak.

~~~
brudgers
Your point about the difficulty in establishing one or another result as
objectively better is a good one - though somewhat weakened by then arguing
that the online service is better.

However, it is not as if pdftohtml is obscure. It is a common component of a
great number of GNU/Linux distributions and has been for many years. So
setting aside the merits of the top result, the lower results are still
problematic.

Unlike the top result, neither Pdftohtml nor the high ranking results are SaaS
- they're downloads for offline use. What differentiates them is their value
propositions. They are in opposition between Google and the lay person and the
results are _caveat emptor_ solely for Google's benefit.

Elsewhere, I have used a better example, "weather". The top results are
advertising revenue driven. The best result? In most cases, the National
Weather Service point data - advertising free, updated regularly, and
generated by Phd meteorologists, not pretty faces for TV.

In this case, Pdftohtml shows the way in which Google drives us toward online
rather than offline solutions. It drives us away from using our CPU cycles and
towards consuming bandwidth, and absent that toward tools which pay for
advertising.

------
kingmanaz
As an AmigaOS 3.1 stalwart unable to view HTML5 content, I’d love to see an
Internet partially based upon a TeX-like markup. Rather than “<!DOCTYPE
html>”, such a document would use “<!DOCTYPE typeset>”. An alternative browser
would parse and present the markup when either the doctype is encountered, or
a “.ts” extension is linked to. Typeset content would be displayed using the
traditional tricks of typesetting: kerning, proper justification, avoidance of
rivers, optimal reading line-length, etc. Here’s a sample of what the web
could be: <http://i.stack.imgur.com/W9uon.jpg> .

More often than not, today’s Web is gaudy and garish, as 99% of HTML, CSS, and
JavaScript’s functionality is unneeded for everyday reading. For plain reading
a Typeset browser and typesetting-friendly markup would an improvement over
the status quo, wherein every man reinvents the art of typesetting at his
website, and often with tragic results.

~~~
ChrisLTD
This is why Instapaper and similar services have a market. We should be
thankful that enough of the web is parseable for these tools to work and not
stuck in something like a Flash file.

~~~
drdaeman
Flash file is parseable too, just harder to parse. (And it still is, even if
text is converted to curves.)

The real problem is, websites are not much of documents anymore. Their parts
are steadily moving up the Chomsky hierarchy, becoming programs.

------
jpswade
No. The Internet is fine. It's just some websites that suck.

What next?

\- This Monitor Is Why Electricity Sucks. \- This Truck Is Why Roads Suck. \-
This Toilet Is Why Water Sucks. \- This Man Is Why Earth Sucks.

~~~
bpatrianakos
This comment is why communication sucks.

Edit: (I'm kidding, not being mean)

------
notjustanymike
I worked at newsweek and later The Daily Beast as a web developer in charge of
analytics, so I can provide some insight into this.

The product managers were under constant pressure to increase traffic, so
they'd add every new feature that appeared in front of them. Facebook,
comments, social crap, stock ticker (that -nearly- happened), it didn't
matter. Each one of these did add a marginal amount of measurable traffic.

Here's where the analytics part kicks in. The numbers did not reflect that our
readers hated all that noise, and the product managers wouldn't dare remove
something that generates traffic. So no matter how pointless something was, if
it generated traffic it stayed for good.

~~~
laumars
Did it generate more traffic, or were those extra hits just users tapping
"refresh" because of page timeouts when trying to grab the larger sized site
on a crappy 3G connection? (joke)

~~~
Kesty
It might be a joke, but is real.

Adwise, the reason why you have more impression doesn't really matter.

------
DanielBMarkham
I've started using "why the internet sucks" as an indicator that the person
writing may be more in love with a good rant than anything else.

I don't think you can weigh the number of bytes you receive that are the
message against the total weight of the page. Doesn't work like that. Facebook
gives me a stream of people I know saying useless things. I consume maybe 4-5K
of plaintext on it every day. Have any idea how much crap it pushes down the
pipe? How much real-estate all the things I don't care about consume? A lot.
Hell, the information isn't even that valuable. But still I consume it.

It's way more complicated than the author makes out. If I wanted to know
Abraham Lincoln's birthday and you put it in bold 48-point text in the center
of the page, I'm spending 3 seconds reading it and I'm gone. The rest of the
experience is just a waste of time and bytes on the part of the supplier. Who
cares?

ADD: It's not the crappy ads and social stuff that's the problem. It's sites
using more and more techniques to subtly make you stick around. The subtle
distractions, like a FB list of which friends also liked reading the
particular article, or offering badges for participation, or tracking you
across sites, that cause the most long-term trouble for folks. The big,
glaring stuff is easy. People are used to that crap by now.

------
trotsky
While stuck with the dreaded neighbor "hey, you're a computer guy" support
visit It was a real eyeopener about the ux of crapware laden toolbar filled
ie9 win 7 experience. No wonder some people just hate computers.

What struck me the most was how she gets her email. FIOS set her up with their
webmail, so she clicks on the bookmark, it shows a progress bar for 5 seconds
while i assume it's pulling your customer info. Once that clears they showed a
full page interstitial about some verizon product and i think lacked any skip
button.

Once that cleared they showed what amounted to an old school portal - about a
third was verizon info/support/ok apps, but then the rest was filled with some
crap news feed inserts a few product upsell teasers, and i think a third party
ad or two.

Ok click on email, normal login page, open email app. So this thing has a
Verizon top banner that's huge, like 25% or 30% of her screen that is framed
and never scrolls off. It takes two clicks just to make it show her inbox
instead of a blank content area.

The app wasn't terrible, just maybe circa 2002 or so with only a few of the
controls disguising themselves as the background. Usable enough as long as you
don't mind the actual mail/composer only getting about 25% of the screen real
estate. The kicker was the timeout - "for her protection" it kills her idle
session after somewhere around 10-20 minutes at which point it pops up a modal
dialog about being logged out, then pops it up again after you hit ok. Then
it's back to stage 1 (5 second progress bar) and the experience begins anew
including the interstitial and submitting you email credentials again.

I know telcos do some of the worst software engineering on earth, but jesus.
No wonder some people consider the web as shitty and just want to get whatever
task they need to use it for over with.

And no, I didn't really do much to improve it aside from exiling the toolbars
and some shittier than average bestbuy run at login crap ware. I feel shitty
about not making it at least not terrible, but that's hours to set things up
right + a few hours of instruction + then the support calls come :/ And then
the neighbor she told the story to calls.

It sucks that things like geeksquad are so shitty and are prone to upselling
crap and exploiting technophobes. And I assume the it pro flier sector is at
least as bad on average. Because if you're a novice and don't have a relative
that's ok you're basically stuck.

And yet I made her swear twice that she won't tell any of her friends that I
helped her.

[1] Ok, yeah, I guess I got a bit off topic there. But fios portal webmail is
shitty.

~~~
cleis
Are geeksquad that bad? My dad uses them (although he's very far from being a
technophobe - despite spending a lot of time missing MS-DOS) and has had
nothing but a good experience as far as I know.

~~~
trotsky
Well, I only have one real datapoint, but I took a look at a laptop a
grandmother type had brought to them for help but probably didn't do a very
good job explaining what was wrong. It sounded like her email had stopped
going in or out, but I had to prompt her a bit. I can't remember precisely
what the services rendered were but it came to something close to 200, they
sold her a new AV despite her current subscription, av added a toolbar,
installed some kind of system cleaner that deleted temp files and browser
caches and such once a day, charged her for a "full tuneup" which was nearly
half the cost and seemed like maybe a defrag, and $25 to fix her outlook -
seems like she had managed to switch to a new blank profile so no mail, no
settings, so they switched her back.

I mean, I guess they did fix her problem and I doubt youd have much trouble
with them if you have a basic clue, but the people who need that service the
most are the exact same ones that have no way to tell if they really do need
this thing they're telling you is critical.

Plus I've seen enough online discussion about what techs employer unknown find
on hard drives and how much their boss or 4chan enjoyed it to suspect that
line of business may not have the strongest of ethical cultures.

------
shawabawa3
The real problem with that page is that as far as I can tell, the entire
article is a lie. Some googling on "Barry Clams" only comes up with the daily
mash as sources, and the daily mash doesn't list any sources.

edit: No relevant results for

    
    
      "barry clams" -thedailymash
    
      barry clams -thedailymash
    
      bond clams -thedailymash
    
      fleming clams -thedailymash
    

edit2: oops, apparently the daily mash is satire

by the way, jasoncartwright, your reply is "dead"

~~~
danielsamuels
The Daily Mash is the UKs version of The Onion.

------
danso
It's not the "page" that's the problem, it's the CMS. The linked-to article by
the OP may only be 400 bytes, but that website (presumably) template is meant
to scale for content of 40-400,000 bytes. Would it be nice if there was a way
to scale down extraneous files dependent on the actual content
size...sure...but then you'd have developers and designers complaining about
all the movable parts in the CMS (i.e. you'd basically be designing a site for
different article sizes...for each of the different browsers you already
design for...so multiply your template work by at least 2).

------
InclinedPlane
Ratios for this article: 713 characters of content (43 kb if you include the
screenshot), total page size: 898 kb.

Edit: adjusted sizes to correct for errors due to caching.

~~~
film42
I just did this myself: 765 characters including the title. With caching, it's
around 680kb.

------
claudius
But somehow the author has to monetise the content, because the huge amount of
bandwidth required to blurb out 1.6 MB on every access is not cheap. So he has
to include some advertisements and then also content-like pictures, because
people don’t like advertisements being the only pictures on a website.

------
richorama
Wait a minute, your page saves to 1.28MB, and you only have 133 words in your
article. This gives you 10kb of download per word. The article you’re pointing
to is only 4kb per word.

Pot calling kettle?

------
chewxy
I've just moved and while my ISP is still setting up my internet, I'm using my
phone's 3G for internet.

Having a quota on the internet bandwidth, I created a minimal browsing
profile: no images, no javascript, no flash. You'd be surprised how many
websites are broken.

Facebook, twitter, mashable, theverge all consistently consistently consumed
hundreds of megabytes per page. When I used my minimal browsing profile, those
sites were so much faster.

Browsing reddit without being logged in shows how much junk is in reddit (no
point clicking on links that go to imgur afterall). I suddenly found myself to
be far more productive.

In fact I think when I get my ADSL up again, I'm going to keep using the
minimal profile

------
dreen
Apparently the joke is lost on HN readers, The Daily Mash is a _parody_ site,
it parodies news sites (specifically tabliod ones in UK, its kind of like The
Onion). You would not expect a clear form from them - In fact, whether
intentional or not, the overbloat would actually be a parody itself, and of
the very thing bitched about in the article.

------
drdaeman
"Given a choice between dancing pigs and security, users will pick dancing
pigs every time." — <https://en.wikipedia.org/wiki/Dancing_pigs>

I believe, one can replace "security" with "correctness", "compactness",
"simplicity", "low signal-to-noise ratio", "openness", "freedom" or many other
terms, and the statement will still remain true.

------
lifeformed
I wish I could pay for my web visits in real money instead of ad views. It'd
be nice to get the experience of AdBlock without having to deprive content
providers of revenue. It'd also be nice if people could design websites
without giving prime visual estate to viewer annoyances. Maybe you could pick
what percentage you distribute to your viewed sites, so you can support your
favorite content providers.

How much would it cost to offset my ad-less experience? It couldn't be that
much, could it? Let's say the average website that I visit has 3 ads per page,
with an average CPM of $2 per ad. That means every 1,000 times I visit those
sites, I would need to pay $6. According to RescueTime, I spent roughly 16
hours a week on websites. If I spend 1 minute per page (a really rough guess),
that's 960 views per week. Since I'm using really fuzzy numbers, the ballpark
range is looking like $10-50 per month, probably around $20ish.

That's doable, but it's pretty steep considering the alternative. I'm sure
there are probably ways to reduce the cost that I'm overlooking.

~~~
hipsters_unite
I've thought this for years. Something like Flattr, except, y'know, that works
and people use.

------
nkozyra
This blurb was painful to read.

The entire page, sans Adblock-able advertisements is 900KB-1MB.

Granted, that includes:

The site's scripts - which can ostensibly increase usability

Images - which help tell the story, establish branding and increase usability

Stylesheets - which make a page visually appealing and increase usability.

If you want to complain about the state of the Web, at least take into the
account that the Web/Hypertext is about more than just text.

~~~
paranoiacblack
> If you want to complain about the state of the Web, at least take into the
> account that the Web/Hypertext is about more than just text.

Yeah, I was wondering about this. Is he advocating turning the entire web into
a text file? How does that even work with content rich websites like twitter
or facebook or JS-necessary sites like your favorite music/video streaming
site? Yes, encoding text with other technologies to increase the user's
experience is going to make your file size bigger, but who cares?

~~~
nkozyra
It works poorly. The author makes the mistake in saying that everything that
is not text is unnecessary and/or advertisement. There's a lot of cruft on
most Web pages, but there's also a lot of valuable non-text info.

------
lucb1e
This has been my point for years, and it has only gotten worse. Right now I'm
thinking of writing an addon that simply blocks all third-party content (not
just cookies), removes any divs that have a className or ID containing
"share", removes comments after the first 100 (5000 comments and no pagination
is not exceptional), etc. Perhaps just ignore output after
FirstH1TagOnPage.ParentNode.Endtag.PositionInDataStream.

I think my site does alright, sameless plug: <https://lucb1e.com/>. Note that
it's hosted on low-end hardware; if you want to view loading times, append
?debug to the URL. Or should I even get rid of the share buttons on articles
here? I don't think they're used much anyway.

~~~
ay
A small (offtopic) comment on your blog entry about IPv6... /64 is not twice
the IPv4 address space - it's twice the number of bits. This translates 2^32
times bigger than entire address space.

~~~
lucb1e
That's what I meant. I'll edit for clarity, thanks :)

------
lyndonh
This is meant to be intentionally ironic, right ?

Or "This Page..." is actually a self reference ?

~~~
rejschaap
He did say he didn't have any solutions.

I'd recommend getting inspiration from other blogs, <https://svbtle.com/> for
instance.

------
lmm
Anyone have a count of how big a "complete" page of this article itself - all
748 characters of it - is? I make 64kb for the html alone, but apples-to-
apples means we should compare the rest of the assets too.

~~~
shawabawa3
I did right click -> save page as and it came to ~700kb

That gives him a ratio of ~1/1000 as opposed to the article he's complaining
about's ~1/4000

~~~
shared4you
Yea, it's 683 KB in all for ~800 bytes of text! Looks like, TFA's author makes
the same mistake he's complaining about!

------
leephillips
He forgot the most important reason why "the internet" sucks, even though the
page he's complaining about flaunts a prominent example: stealing content.
Check out the unsourced photograph of the two actors. Actually, a second look
at all the fluff on the page shows that much of it is itself poking fun at
typical bloated websites with cheesy advertising, so this turns out to be a
complicated example.

------
belorn
My browser is looking like windows did during XP. Back then, I had a anti
virus, a couple of anti-adware, one anti hacking and a firewall installed.
After XP, most of that got built in while the momentum of adwares and
viruses/worms went down, and today I only got a anti-virus, and even that one
is not even doing much (no warnings for the last few years).

My browser however got adblock + extension, noscript, ghostery, https-
everywhere, cookie handler, and I have also dabbled with privoxy. One can also
add tor to the list.

So I guess, the question will be if the browser manufacturers will do the same
as microsoft did and start having a bunch of that built in. Debian-live CD has
already started with having adblock pre-installed.

~~~
leephillips
If you wind up using Privoxy, it can take the place of _all_ the other add-ons
you mentioned, except https-everywhere. And it works transparently for all
browsers.

------
AndrewDucker
I have wondered about this. Sites are definitely growing much, much larger as
time goes one.

But then the front page of HN is just 24k, while the front page of the BBC is
112kb, without any of the images.

What's the easiest way of measuring the complete "load" of a page?

~~~
zokier
> What's the easiest way of measuring the complete "load" of a page?

Firebugs net panel shows how much data the page pulls down, and how long it
took to do it.

------
Groxx

      jha: then stop selling ads on this very page
      Reply by mikecane: I don’t. WordPressdotcom does so I
        can use it for free. I don’t see any of that money.
    

Oh boy. Where to start...

~~~
ctdonath
Yup. He responds to the "pot, kettle, black" observations with a whining "it's
not my fault! it'd be better if I had my own server!" (someone point him to a
cheap server, please).

~~~
nickzoic
* Wordpress.com plus $30/year to remove ads?

* wordpress on a AWS microinstance (free for the first year, 'bout $100/yr after that)?

* Cheapie shared host like asmallorange.com (better than you'd expect for $35/year)

* Static blog tool (pelican etc) pushed into AWS S3 plus disqus for comments?

------
aaron695
Ummm I'm not sure a parody site is the best example to be using.

A 400 byte joke is kinda boring without all the pics and ads around it. Still
not really funny at all.....

------
philbarr
Also annoying is that the page jumps around whilst the adverts load, making it
hard to even read the text.

~~~
smcl
This is something I see on The Guardian and it infuriates me no end. I end up
randomly clicking on articles, even adverts (deliberate maybe?) accidentally
as they load and shove what I _wanted_ to click down the page...

~~~
oneeyedpigeon
Yes, The Guardian is particularly annoying in this regard. Such a shame
because, compared to other newspaper sites, they do an awful lot of things
very well indeed. It's particularly a problem with Appple devices which impose
a _horrible_ lag between input and action, and causes no end of problems -
what the hell is up with that?!

------
nvr219
This is why whenever I open an article or blog post, the first thing I do
before even trying to read the article is click the "readability" button (or
pocket/instapaper/safari reader/whichever one you like). I appreciate what the
person above said "take into the account that the Web/Hypertext is about more
than just text." That's fine. But unless I'm on my tumblr dashboard, I'm
visiting a blog probably to read text.

Branding is important, sure. But you can have great branding and still be
readable. You don't need huge style sheets, you don't need a ton of images,
you don't need a ton of scripts. Look at Svbtle for example, right? So what's
the problem?

"Monetizing content, especially written content, is extremely difficult. I
think Svbtle’s biggest innovation will be in this area, but I don’t know what
it is yet."[1]

So what's the solution? First of all I love that _most_ of the personal blogs
on HN lead by example. Secondly I think adblock and noscript should be
encouraged whenever possible (I know there's no way adblock would be rolled in
to browsers). Thirdly get native "reader" button in Chrome, FF and IE. Content
creators need to make money but until we figure that out, we can't expect
users to stare at disgusting web sites for much longer. Or maybe we can, if
the state of TV and FM radio advertising teaches us anything... Maybe that's
it, most people just deal with it so who cares. Still makes me sad though.

I googled for a list of "top blogs" and found this list[2], amazing how many
of them are visually DISGUSTING.

1: [http://techcrunch.com/2013/01/08/with-funding-for-svbtle-
dus...](http://techcrunch.com/2013/01/08/with-funding-for-svbtle-dustin-
curtis-wants-to-build-a-business-in-long-form-online-content/)

2: <http://technorati.com/blogs/top100/>

------
Jabbles
To experience the full effect, disable your adblocker.

------
throwawayG9
This is how I see it: <http://i.imm.io/12Y11.jpeg>

------
scottcanoni
Even the website that is referenced to count the letters, lettercount.com, is
guilty of fluff and bloat.

------
evolve2k
To be honest that awful big google sponsored banner ad that shows on your site
ain't much better.

------
JacksonGariety
The internet doesn't suck. People's lives suck around the internet.

------
infoman
and this is why cellphone apps rock! they keep it minimalistic

~~~
lucb1e
Not for long man, not for long...

But yes, right now they're really great for usability. Including disabled
people; I travel together with a completely blind person together in the bus
every day. He uses a computer with a screen reader at his job and at home.
Mobile websites are by far the greatest to use. On a sidenote, HTML5 is hardly
better than Flash here.

~~~
icebraining
_HTML5 is hardly better than Flash here._

Do most screen readers support Flash?

~~~
lucb1e
To some extent they try, but it's rather hard. As is Javascript, though JS is
slightly better than Flash. I think JS will eventually become much better than
Flash, but not very soon.

------
chubbard
"Gentlemen start your bitchin!1!!1!!"

------
paparush
Somebody has to pay for the electricity to move all these bits and bytes
around the infrastructure that someone had to pay for.

------
njharman
> And no, I don’t have a solution.

AdBlock, duh!

