
Searching the Creative Internet - EdiX
https://crawshaw.io/blog/searching-the-creative-internet
======
rchaud
The type of site the author is discussing began to disappear around the time
"blogs" entered normal people's vocabulary, and the commercialization of
Internet content really began. I remember the 2004 US Pres election being all
about "the hottest blogs", which had Washington insider information the major
media outlets didn't have. Once "blogging" became a business (ad-supported),
quality collapsed as the KPIs were now pageviews and clickthroughs.

This caused a a shift away from creating content for content's sake. 15 years
ago, someone may have posted an in-depth technical article for no other reason
than to share knowledge. Now, that article would be posted on Medium.com in a
gimped form, as a lead-in to their 10-hour video course on that topic.

The early period of Twitter briefly brought the "weird web" back, but once
brands and businesses descended upon it, it became all about retweets and
follower counts. Quality nosedived as a result.

~~~
eksemplar
The blogs of old are still there, I mean, this article even links to a few.
They are just really hard to find, and that’s the problem.

The internet didn’t stop being interesting, HN is a great example of this in
action. The blog article about rejected Disney princesses is likely the most
interesting piece anyone of us read on the internet today.

Only Google didn’t deliver it, a HN link to an article about the creative net
did. I say google, but really, any of the gape keepers is equally guilty.

Interestingly I think all the gate keepers are at a place that yahoo, AOL and
others where when google disrupted them.

So maybe, just maybe, we’ll see the dawn of something better soon.

~~~
ticmasta
I immediately though of HN-style sites too. What we need is some sort of gumbo
made from spanning all of these specialized , human-curated sites, stripping
out the Alexa top 100, 1000, top-whatever sites and then re-ordering the
results by some new measure that encapsulates "neatness", "quirky" or
"weirdness", or what ever we can call it.

This sounds really hard, and for me I don't know where I'd start, but I also
didn't create google or pagerank which at it's dawn also seemed like magic, so
I have faith this could be done.

It's the sort of mission to which I could dedicate my life, until I too was
corrupted by the success and lure of endless wealth, at which point today-me
sincerely hopes some other smart oblivious kids would come along and displace
me.

So I too hope to see something better in my lifetime.

------
mindslight
I've been meaning to try something out for some time -

Take the results of a big search engine, and programmatically filter out
everything that contains something from the Adblock filter list. Not just
getting rid of the ads, but ignore the entire page if it contains advertising.
Iterate through enough results until you actually get the first 50 or 100 hits
not containing any advertising, and return those as the basic search result.

There would be collateral damage of informative pages that attempted to good-
faith "monetize", and you'd miss out on the stackoverflow results etc, but I
would hope this would surface many of those super informative sites of yore.
That is assuming they're even still indexed.

~~~
lubujackson
Yahoo (circa 2000) used to let you filter search results by no ads or no
monetization, I forget the exact phrasing. It was sometimes useful then but it
would be more interesting now. There is also
[https://millionshort.com/](https://millionshort.com/) which was a Show HN
product from 2012, stripping out X number of top ranked websites from Google
results.

These ideas are half the battle, I think the other half is the curation, which
is a little more abstract, but possibly machine-learnable to some degree?

------
probably_wrong
I think the author may like "Million Short" [1], a search engine that lets you
remove up to the top million most popular websites from your results.

At a more general level, I miss the days in which I could type a search query
like `intitle:"index of" mp3 mb` and get actual, unfiltered results. I've
toyed with the idea of indexing the web myself and use simple filters, but I
think I'll wait until someone here gets funding for it instead.

[1] [https://www.millionshort.com/](https://www.millionshort.com/)

------
fundamental
Every time I see an article lamenting the loss of the early internet I think
that it boils down to community quality vs. scale. As communities get too
large, then content and discussion will gravitate to whatever seems to get a
response from the biggest group. That can easily leave things feeling bland
and make efforts to start conversations feel too competitive. Smaller
specialized communities that are somewhat insulated do have the chance to
avoid this phenomena, though often times they fail under their own success or
are too insulated to pick up new individuals. It's a hard balance to be made.

I personally miss the prevalence of technical and personal web logs of the
late 90s. That's not to say that they aren't still around, there's simply more
alternatives to shift through and many of them do focus on marketing
themselves for visibility (which seems to take something away from the feeling
of the older net IMO). A large focus on centralization certainly seems to have
shifted the broader tone online, though there's still plenty of gems floating
out there.

~~~
_hemlock
Can you give some examples of those type of websites? I've been really
interested in the idea of stripped down personal static sites.

~~~
rchaud
The demise of Geocities (in 2009, much later than I imagined) has caused a
mass migration of these types of sites to places like Tumblr, whose search
functionality leaves a lot to be desired.

~~~
solarkraft
Now that Tumblr is dead there's some hope the alternative will offer better
features.

------
rchaud
Try Millionshort.com

By trimming the top 1000 results, I was able to find some truly random
websites that just happened to have a deep archive of interviews of a band I
was looking up. All interviews were from '80s magazines, pretty much all
defunct now.

Had I stuck with Google/Bing, every top search result would either an
eCommerce store, Spotify artist page, Allmusic, or a snarky Pitchfork/Vice
piece about them from the 2000s.

For this search, Spotify, Pitchfork and Vice were at the top of the search
results because they are SEO-optimized This means that Google/Bing search will
show the links from the domains that perform best under their page rank
algorithms. Since Pitchfork and Vice are domains with a high number of
backlinks and lots of active traffic, those were the ones that rank the best.

Given that so much of web content is just rehashing what was orgiinally
reported/said somewhere else, finding niche content is going to be harder and
harder.

------
foobarbecue
"The second link is a NASA press release. (Why does NASA even have those?)"

Oo oo I know this one. They have those so that they can report discoveries
about the nature of the universe to the taxpayers who are paying them to make
those discoveries.

~~~
TeMPOraL
I think the implied full question was, "Why does NASA even have press releases
next to the actual research they publish, and why those releases occupy higher
spots in search results than the research?".

~~~
sjf
The answer is still the same. Press releases are accessible to the general
public and have a wider appeal than research papers, so they appear higher in
the search ranking. I'm not sure why the author of the article thinks only
people capable of reading and understanding the consequences of a research
paper are interested in reports from NASA.

~~~
foobarbecue
Also, in case anyone didn't realize, press releases are what the press use to
write popular articles. Hence the term. Journalists generally don't read
scientific papers.

------
talkingtab
I completely agree that it is hard to find things - and it is getting worse.
If you search for a recipe for chicken soup, you now get a 20 page life
history, filled with ads (ha ha) and finally a recipe can be as perfunctory as
"cook chicken in water". This is click-bait-world. In my opinion this comes
from intrusive advertising as the business model for the internet. (See Bruce
Schneier).

~~~
floren
I want a search engine that deprioritizes results based on the number of
trackers or ads on the page. Like, rank = (relevance * 1.0) + (trackers *
-0.5) + (ads * -0.5).

~~~
headgasket
That's an awesome idea; which leads me to think, what would be needed to
experiment with alternative page ranks and levels of bubble-encapsulation?
What resources are out there from which one could experiment-- as crawling the
whole web is too large of an endeavour? How about a distributed database of
the text-internet that anyone could clone and build indexes from? A layered
approach? This must surely already exist?

~~~
ss2003
You don't need to crawl the web yourself.
[http://commoncrawl.org/](http://commoncrawl.org/) will give you the data for
free. It's a little out of date by the time you get it but that shouldn't
effect a project like this.

------
whistle650
Totally agree with this. Though I often want something similar but not
necessarily “creative”. I’d like to be able to ask “show me blogs or
discussions that are substantial about x”. x could be a scientific paper or
something.

It is a question of search / discovery mechanisms. Mostly this kind of query
is “satisfied” by things like Twitter. But I wish there were a good blog /
discussion search engine. Those died a long time ago. As you say the results I
am looking for only show up on lower pages in Google. Maybe there is a better
search engine for that I don’t know about.

~~~
TeMPOraL
It really saddens me that the best way to get to _actual research_ these days
is via Twitter. It always feel like I'm forced to wade through a sea of manure
because that's the only place you can find diamonds.

~~~
petra
>> the best way to get to actual research these days is via Twitter.

I'm usually disappointed from twitter search(unless it's for finding smart
people, and endlessly browsing through their stream), how do you manage to get
so much out of it?

~~~
TeMPOraL
I don't, that's my point. But apparently I'm supposed to, since for some
reason everyone chooses Twitter as a platform for research discussion.

------
throwaway645389
Wouldn't be too hard to build a simple MVP - just search, but block/omit any
domain owned by an organization worth more than $XX million.

Specific threshold/blacklist TBD/subjective, and you'd get false positives
(people posting truly original content on their Facebook page or Blogspot
blog). But by and large, a _lot_ of the truly "labor of love" content out
there is done by folks both savvy and invested enough to set up their own
domain name, and would pop out if you just filter out all the corporate
domains.

------
bobowzki
"a site filled with excellent original stories based on historical figures.
Some Disney executive should buy them."

And turn it into the mainstream content the author doesn't like?

I mostly don't get this reminiscing about the past state of a technology. When
I see it I always suspect someone is remembering their youth in a candid way.

------
mattbierner
Everyone's a content creator these days. Even your uncle is posting pictures
of his meals and writing about how #blessed he is. And you know, while that's
not my cup of tea, there's nothing inherently wrong about that. It's great
even, so long as it's authentic. But it usually doesn't strike me that way; it
usually feels more like they've become a social media coordinator, sharing—and
selling—a fake version of themselves to you. He's become a brand

If you were to stumble across Uncle Brand's online presence, you'd likely be
amazed with what an interesting and humble and cultured and well traveled
person Uncle Brand is. And is that so wrong? He's only creating what he's seen
other people do after all. Only giving you what you want. Why not put your
best self out there?

And maybe Uncle Brand has a little something unique he does, that one emoji he
always uses or that obsession with ramen. And maybe he starts attracting an
audience. He's reliable! He's relatable! He's authentic! He's safe!

But an audience is something you have to maintain, something you have to grow.
The audience didn't come for Uncle Brand the man; they came for Uncle Brand
the brand. So he starts refining his brand, churning out more content, gets a
better camera for his photos. He's got more resources now and can ape what big
brands do.

In some ways, the internet became too real, too tied to the real world. You
can even make real money on the ol' www! But when this happened, rather than
the internet liberating us from the old, we just recreated the old incentives
and shallowness and commercialism. But shittier and more random. Youtube
celebrities are mostly just shittier celebrities. Instagram is mostly just
shittier magazine and food and travel photography. Internet journalism is
mostly just shittier journalism. So much online content is pre-internet
content just pushed on a new channel.

And what's so wrong with getting real? Uncle Brand is all in. He's a souper
star! The Martha Stewart of ramen. By now, Uncle Brand is using his brand to
hawk stuff too. He's using his brand to hawk other people's brands. Promote,
promote, promote. Sell, sell, sell.

The internet defies generalization. There are certainly great communities and
forums and subcultures and people creating amazing stuff out there today. More
than ever even. But the internet as many people experience today is indeed
quite different from what I original loved. It feels like all the incentives
are wrong; platform incentives resold to creators and users as their own.

I don't want to be a brand. I don't want what old world is selling: the
celebrities, the popularity contests, the consumption, the fear of judgement.
I just want to create awesome stuff and have fun and try something new. And I
want connect with people who are doing the same!

------
gavinpc
A fine article, but since the author singles out NPR and Wikipedia, I would
just like to say a word in their defense. And then some other words.

Thank God for Wikipedia, it is a miracle. Long may it stay antifragile!

And long may NPR use its supporters' money to produce consistently archive-
worthy content!

Donations to Wikimedia foundation or your location public radio station make
great "solstice" gifts, if you're into that sort of thing.

So yeah, I just turned this comment into an ad, because let's not throw out
the baby...

And yes, I'm old enough to remember the "good old days." There is every bit as
much signal now. And every bit as much more noise.

And yes, Google's hegemony is a threat to the capital-I Internet's
antifragility. (I just got Taleb's book, can you tell?) Guess who powers the
analytics for crawshaw.io? It's all of a piece, people. Walk the talk.

And yes, I have a little Shakespeare site that's "better" than the top-ranked
ones in many ways, but I accept that if I wanted those top spots---like any
other top spots---I would have to sweat and hustle and fight for them. I
don't, and no contrarian anti-Google is going to hand them to me.

I'm all for the better thing. Blue sky, every day! Step away from the machine!
But c'mon, it's 2018, let's not dis NPR and Wikipedia! We're fighting the good
fight!

 _EDIT_ Also, to eksemplar's point that "The blog article about rejected
Disney princesses is likely the most interesting piece anyone of us read on
the internet today." I picked this piece at random and it was so amazing. I
emailed it to my wife who's an artist. Man. Thanks for that alone, ye OP!

[https://www.rejectedprincesses.com/princesses/sarah-
biffin](https://www.rejectedprincesses.com/princesses/sarah-biffin)

------
sloum
The weird internet is still there. It just isnt on the web. Join a pubnix, use
gopher, find a good bbs, get back on IRC. Those are just a few great spaces.
The bit of technical knowhow to get to or use them reduces their viability as
capitalist enterprises, which keeps them at least slightly more creative/old
school. Generally it is easy to find a community with the right balance of
size, content, and participation in these palces than things hosted on the
web.

