
The Web Is A Mess - gliese1337
http://mkronline.com/2012/10/30/you-arent-imagining-it-the-web-is-a-mess/
======
unimpressive
One innovative "solution" is millionshort:

<http://millionshort.com/>

Basically the idea is that you take sites that perform _too_ well on search
metrics and remove them. Of course, that only works as long as a majority uses
services like google.

~~~
hkon
Cool idea. I tried it just now, I searched "how to make rocket", thought I'd
get some cool old style pages on people launching rockets in their spare time.
I did.

However the top result when I removed top 10k sites was the second result when
I did it without removing anything. That was kind of disappointing.

~~~
d23
Yeah, I just tried a sample search and it returned mostly the same results, no
matter how many sites I removed. They were just in a different order.

------
zwieback
_"You remember way back in the early ’00s when your favorite blogs posted a
few times a day at most, had a handful of great writers, and were a joy to
read."_

I fondly remember the golden days of Usenet. Eternal September was nothing
compared to the devastation caused by blogs and twitter.

~~~
Luyt
Yup, and your files you got via FTP. You searched for them using Archie. There
was no web; but there was Gopher, and Veronica was the search engine for
Gopherspace.

Instead of Twittering, everybody was chatting on IRC. And the internet wasn't
corrupted by commercials yet, as the Web is today.

~~~
eridius
"Instead of"? People still chat on IRC. I'm on IRC right now.

~~~
zalew
mostly those who 'still'.

------
spindritf
I completely don't see the problem he's writing about, my favourite blogs
don't post daily, some authors post 1-3 times a month[1]. Others are bringing
co-bloggers on board, and even then only come up with a few posts a week[2].

That's why RSS must live on. It takes over 200 subscriptions to get a decent
daily dose of reading out of blogs like that.

[1] <http://thelastpsychiatrist.com/> [2] <http://www.overcomingbias.com/>

------
javajosh
We are entering the era where successful blogging will be the domain of
already successful, high visibility figures like pg or Fred Wilson. The
attempts of the popular blogs to 'scale' (which basically meant delegating
content-creation to poorly-paid underlings who fight over ever-shrinking
scoops) have failed miserably. (Personally, I don't touch lifehacker or
techcrunch with a 10-foot pole these days, to name two I used to read).

The central problem, I believe, is that we don't have a good model to predict
a piece of information's _relevance_ to a given reader. So far in human
history the mechanisms that solve this problem have been ad hoc and error-
prone - only by luck do you ever 'fall in with the right crowd' and start to
get the information that you craved all along. Personally I feel strongly that
this is a solvable problem, and when it is solved it will change human society
forever.

------
Detrus
Why would banning content farms draw the eye of regulators when the content
farms are gaming the search engines?

Because search engines fell asleep at the wheel and now content farms are a
sizable chunk of the economy?

And they can always release some features like curated/customized search to
weasel around any regulation dangers. Maybe search engines just don't care
about the health of the web.

~~~
TeMPOraL
> Why would banning content farms draw the eye of regulators when the content
> farms are gaming the search engines?

I don't think OP meant content farms. I think he meant half of the sites that
have articles repeatedly appearing on HN. All those smaller and bigger "news"
services that write a lot of poor, shallow, lying or just plain wrong,
controversial articles. The problem is with incentives - they do it for ad
money, not to inform/educate people.

~~~
klibertp
Is it impossible to give people incentive to inform/educate people? I think
this even worked for a short while, but I'm no expert on this matter.

~~~
TeMPOraL
I'm not sure. For what I can tell, it's a difficult and delicate topic, as
intrinsic incentives tend to be overrided by extrinsic ones (e.g. pay someone
for doing what he loves, he might soon start doing a worse job at it). See the
RSA Animate talk about motivation[0].

But hey, we're a clever species, I do believe that someone will figure out how
to structure reality so that we get more of the things we really want without
using proxy incentives that backfire when overdone.

[0] - <http://www.youtube.com/watch?v=u6XAPnuFjJc>

~~~
klibertp
Wow. I get a feeling that I saw it before, but I'm happy I got to watch this
again. Now, doesn't solution seem obvious? Shouldn't there be a cap on how
much you can earn from ads on your site? Just enough "to take the issue of
money off the table" and not a cent more?

Ok, this really is not my area of expertise. I just feel that what we have now
is both unfair to writers and disastrous for society :(

~~~
TeMPOraL
> Shouldn't there be a cap on how much you can earn from ads on your site?
> Just enough "to take the issue of money off the table" and not a cent more?

It could work for bloggers, but they're not the problem. The "news" services /
aggregators / whatever are, and it's hard to put a cap on what a company
should earn (I'm not even sure if it is a Right Thing to do).

> I just feel that what we have now is both unfair to writers and disastrous
> for society :(

Couldn't say it better myself.

~~~
ladzoppelin
I agree with the "disastrous for society" part. Information if becoming very
hard to verify and I have seen with my own eyes this year the news being
censored even from the "fire hose" aggregators like Google news.

~~~
TeMPOraL
What particularly annoys me are the article titles in popular news sites, that
often are plain lies (that articles themselves later correct), intended to
lure people into viewing full articles. The point is, people don't read all
the articles, but they do skim the list of them, and they remember lies from
the titles.

------
danboarder
Who uses google to find current news articles though? I think this is mostly a
non-issue as people use aggregators like HN here or Subreddits or other news
reader apps to filter top stories.

Google is used for what I would call "archive search" and research these days
(how to do X, etc) in my experience, not current news.

~~~
anigbrowl
I get my daily headlines from Google News as a matter of course. I find Reddit
unusable for day-to-day news, and only find aggregators such as HN valuable in
proportion to their relatively narrow focus. Google News used to be pretty
excellent for news dicovery, but it's being SEO'ed into the ground. In recent
months I've started seeing letters to the editor presented as news stories.

~~~
bduerst
Even Reddit can be difficult for daily news.

You have to finely tune your subreddit subscriptions otherwise you get a meme
whiteout from some of the other subs.

Crafted news content matching what I want, with little overhead, is an area I
think you could see some disruption.

------
brilee
I went online to escape the mindless reporting and tabloid news that had taken
over television. But it looks like they chased me into the internet.

------
dreamdu5t
_"Most top blogs don’t deserve the top slot anymore."_

What "top slot?" Blogs aren't ranked by a committee. I can only assume OP
means traffic. The blogs with the most traffic have the most for a reason, and
OP still doesn't realize why after many years in the industry.

I'm sure the author thinks the things he wants to have the "top slot" deserve
the "top slot." The naivety.

 _"If I ran a search engine, I would ban these sites from the index."_

I wish someone like you was around to curate the Internet for me.

------
zalew
true blogs are dead (yes, scotsman). I sometimes miss the old internet. now
get off my lawn.

ironically, OP's blog's purpose is also to sell ads and ebooks. In the early
00s he's talking about, people blogged just because they felt like it and
didn't complain about monetizing strategies. techcrunch? what techcrunch? 2005
is not 2000.

~~~
klibertp
I think the author is concerned _exactly because_ his blog purpose is to sell
ads. He even wrote it at the end of his post. I guess he just would like to
sell ads using quality content - his own content - and not by aggregating tons
of crap and producing even more of it. Maybe he even tries - I don't know, I
didn't read anything besides this post. But the fact is that this doesn't work
anymore - and that's just sad.

~~~
zalew
the whole premise of his post is completely flawed. he talks like he misses
'the old days', but in the old days nobody gave a crap about ads. he's
complaining that he's too weak to make a living of blogging while back then
nobody even thought of that. 'the blogosphere' meant linked blogs of friends
and foafs, not cross-posting every piece of turd to 10 social networks and
upvote sites so you get more likes and clicks.

------
dools
Did anyone else get that awesome ad at the bottom just below where he talks
about how crap ad money is? One of those classic "you have been selected to
win an ipad" ones. It was so juxtaposed with the content I thought for a
second perhaps the entire post was satire.

------
seanconaty
I actually think that this change is more from social media than it is from
SEO. Both are excellent sources of traffic but optimized a bit differently.
But the end result of both if these "optimizations" (including optimizing for
ads) is a terrible experience and almost completely worthless content. Yet,
sometimes, I cannot help but click...

I see a site like Mashable as totally optimized for social and ads. I remember
a point in time when the site was interesting--when APIs seemed like new thing
and hackers were "mashing" sites together on a scale like had never been done
before. It chronicled the new web 2.0 trends (which I have to admit were
really powerful--no one can deny that the landscape of the Internet was
changing).

But now it's just a bunch of crap. Top ten lists (which, of course, make you
click through each item so you'll refresh ads). And infographics: they used to
be cool. Now they're just stupid charts with a graphical background and font.
Every time I click one I think to myself, this didn't have to be graphical and
it's not very informational. Like a movie trailer the headlines of these
traffic-hoarders are catchy just to get you to click on it and once you arrive
you're disappointed and 10 cookies have been dropped on you. Or you see some
modal covering the content asking you to do something that will help the site
proliferate itself. If you can stomach all this deception and read the meat of
the article most of the time you don't feel very satisfied. You click the back
button unless you're tricked into clicking another of their links.

I think the only real way to stop this is to significantly demote sites that
have one than one or 2 ad units on the site.

------
recursive
Blogs are not the web. I didn't have any favorite blogs in the early `00s, and
I don't have any now. I don't read TechCrunch, and I don't read LifeHacker. I
don't pay attention to Twitter. The web is fine.

------
state
One key difference between now and "the early '00s" is the number of people
using the web.

------
Groxx
Wait. Blogs posting a few times a day at most? I think their definition of
"blog" is now closer to a news website. Overall there I'd agree that quality
has gone down - it happens anywhere there's substantial money to be made. But
I have more _personal blogs_ than I can keep up with, and most of them are
what I'd call extremely high quality.

I don't think there was any "mind" that kept quality high and volume low.
Volume was just lower, and there were fewer interested parties, and they were
less motivated by money (because there was less money to be had). There are
ridiculous quantities of excellent lower-volume "blogs" though, you can hardly
call them dead because you're using the populist channels to try to find them.
Is that how you found them in the 00s?

------
guimarin
sounds like a plug for using blekko's slashtag system. You could always create
your own, and then never have to see techcrunch/other spammy blogs ever again.

------
ashray
Google is gamed like crazy. Especially by sites like Tripadvisor, Skyscanner,
etc. Whenever I make a search for something like 'flights from la paz to new
york' - all I get is some landing pages.

Tripadvisors gaming is even worse. When I search for 'restaurants in place_x'
I get results from every tripadvisor site, like tripadvisor.com,
tripadvisor.es, tripadvisor.in and more.. the results are duplicates!

The quality of google results in certain niches is very very poor. I haven't
found that bing is better though. =/ It's just amazing that they haven't been
able to make considerable improvements to this with the amount of money they
have. I guess that's the problem with lack of competition.

~~~
nsns
Video search is even worse - see this search result, offering endless pages of
the exact same video:
[https://www.google.com/search?q=%22CHASTE+DANCING%22&hl=...](https://www.google.com/search?q=%22CHASTE+DANCING%22&hl=en&safe=off&channel=fflb&tbm=vid#q=%22CHASTE+DANCING%22&hl=en&safe=off&tbo=d&tbm=vid)

------
ako
I'm mostly using Zite to read find relevant news. As I understand it, it takes
my RSS feed, combines it with tweets from people I follow, and finds the most
relevant news articles for me. Then I can up vote or dhow vote the articles to
teach it what I find interesting.

Zite seems to be on the right track to fix the web is a mess problem...

------
jamesmiller5
At least to me the author seems like someone who values and wants more content
with the "Slow Web" principles of interaction, timely based high quality
content.

------
pattisapu
<http://art.teleportacia.org/observation/vernacular/>

------
sh_vipin
Just like the life may be. But both are important and beautiful too.

------
nirvana
This is the failure of google. I stopped using google about 6 months ago and
started using duckduckgo. But at the time I stopped using google, one of the
reasons I stopped was that the quality was so low.

Hell, the quality of Google is so low that Bing is actually running ads right
now with blind taste tests where people preferred bing. Of course this isn't
scientific at all, but my point is-- no big scandal has erupted about how
wrong this is. It's totally plausible for bing to do this because everyone
realizes that google has gotten to the point where _microsoft_ can plausibly
compete with them!

Page rank was really cutting edge, but that was 10 years ago, yet it is still
their primary mechanism. It's been gamed, but they seem uninterested in moving
to more sophisticated mechanisms (they use them but the influence of better
methods seems to be too low) ... meanwhile they've used their bully pulpit to
influence the web to conserve page juice which has backfired in such a way
that actual links to authoritative and useful sites are lower ranked than spam
links, making it easier to game. (When wikipedia is using no-follow on
relevant outbound links to pages that wikipedia is quoting or citing, things
are fundamentally broken- no site on the web has a more favored ranking
position than wikipedia. Not to mention hand curation of pages. You can't even
correct errors there without having them reverted by some know-nothing whose
sole accomplishment is rising in the ranks of wikipedia editors, so its not
like they need this to prevent spam.)

This means the site that google unquestionably considers the most
authoritative, when it cites a page that it considers authoritative, google
gives that site no credibility. But let me create a web of sites that
construct text that passes grammer parsers as "good english" but whose purpose
is to spam keywords and link to each other and I can rank for those terms up
close to wikipedia. (This is essentially what techcrunch is doing only they
are having humans write low quality text instead of a computer.)

It's broken, and google broke it.

~~~
magicalist
I disagree with a number of your points. I've noticed a huge reduction in the
amount of spammy content I see in google results (over the last year, maybe),
to the point that I actually make "productName reviews" searches again.

Your focus on pagerank is at least somewhat off base, considering that it's
only a factor in ranking as noted by someone below, and it's a ranking that
pretty much all the search engines use as well ("The Bing ranking algorithm
analyzes many factors, including but not limited to: ... the number,
relevance, and authoritative quality of websites that link to your webpages").

There is something to be said for no-follow links being a symptom of something
broken, but, OTOH, page rank is still a good indicator of what people out on
the web find to be useful and relevant content, allowing you to find popular
content, cluster it by subject, etc...essentially crowd sourcing (a portion
of) relevancy via something people do anyways. Gaming was inevitable, and no-
follow is really more of a way to disincentivize spammers...the fact that with
no-follow you get the spammers anyways (to get human eyeballs instead of
crawlers') demonstrates that the motivation is always there. If a search
engine trusts wikipedia's outbound links they don't _have_ to obey no-follow
in any case, but you still have the situation that everyone will have their
own favorite "impartial" external links to add, not to mention people with a
vested interest in the subject.

The possibility you forget in your "microsoft can plausibly compete with
[google]" point (leaving aside the fact that most people are just ignoring it)
was that bing has improved, and that has nothing to do with google breaking
anything.

