
A platform for discussing a new web directory with a focus on human editing - sebst
https://curlz.org
======
rgbrenner
DMOZ was great in the 90s (I was there.. and I remember using DMOZ and I
appreciated it at the time). Search engines werent that great, and there
werent that many websites.. 250k in mid-90s, and a few million at the end of
the 90s. That's few enough that a group of people could review each one, and
add it to a directory.

But now there are perhaps 250 million active websites[0]. Is it possible for a
small group to review all of these?

And if it is, how many are directory worthy? I would guess there are enough
high quality websites to overwhelm a directory...

My question is, with the amount of high quality content out there.. is a
directory the way to find it today? And what about the near future.. how would
a directory keep up with the rapid growth of the web?

The obvious answer here is to simply leave out websites.. which is exactly
what DMOZ did.. it's been very difficult to impossible to get your website on
DMOZ for many years.. but IMO, this is exactly what lead to the irrelevance of
DMOZ, since it forced users to go elsewhere to find the content they wanted.

IDK.. I really like the idea.. but part of me also thinks the time for a
directory has passed. (edit: at least in the form of dmoz.. maybe you can
create a dmoz 2.0 that addresses some of those challenges though.)

0\. [http://www.internetlivestats.com/total-number-of-
websites/](http://www.internetlivestats.com/total-number-of-websites/) (this
says 1B websites, but 75% are inactive)

------
sebst
I asked former editors of DMOZ if they would like to continue working on a
human edited directory.

100 of them replied with mixed emotions, but nearly all of them are willing to
contribute to the development of a DMOZ successor.

So, I build a platform (just a forum and a wiki at the moment) to collect all
these likeminded people and start building something great together.

Tell me what you think about humanly curated web directories in 2017 or your
ambitions to be a user or a contributor of a decent successor of the ODP.

Thank you for your feedback!

~~~
emilyfm
Starting from the other end, who would use a DMOZ successor?

Clearly DMOZ hasn't been getting a lot of real user traffic, which I think is
because newer users (that's most users) have only ever used search, never been
introduced to the concept of a directory.

So, the focus should be on who it's being built for, and how they would find
and use it, rather than just replicating something from a different era.

For example, it could be a blog where guest bloggers (or blogger teams) each
submit a carefully edited post describing the top sites (and perhaps apps) in
a niche, and what's good about them. And then keep the posts up to date.

Quality could be maintained by a moderating team that only accepts the best
posts, and ensures that they are maintained (otherwise deleted).

The resulting post when found in a search engine would be more readable, and
probably more useful, than a directory listing page.

~~~
sebst
In my opinion you're 100% right!

Those challenges are real and a new version cannot be just a clone of dmoz.

AI/recommender systems could play a role in such a project, the relationship
between discovery of new stuff and finding resources on already known topics
has to be discussed and so on...

I invite you to bring in your ideas here or in the curlz.org forum.

------
Animats
DMOZ is a useful data source for what's on the web. Are there any machine-
processable alternatives?

I use DMOZ to drive a list of major domains currently being used by phishing
scams.[1] That's a list of who's in both PhishTank and DMOZ, ordered by the
number of hits. (216 today. I used to try to get it down by nagging companies
on the list, and got it down to 47 once.)

[1]
[http://sitetruth.com/reports/phishes.html](http://sitetruth.com/reports/phishes.html)

~~~
sebst
Someone mentioned the 4 yrs old blekko slashtag collection
([https://github.com/blekko/slashtag-data](https://github.com/blekko/slashtag-
data))

I'm not aware of any other collection, but I hope curlz.org is gaining some
importance soon ;)

~~~
Animats
I just signed up for "curlz.org". All they have right now is a forum, with a
total of five posts. They don't even have a snapshot of the DMOZ directories.
They need to at least host the DMOZ dump.

The entire content of DMOZ, in XML, is at

    
    
        http://rdf.dmoz.org/rdf/content.rdf.u8.gz
    

It's about 250MB. Save a copy, just in case. I just downloaded a copy.

Archive.org has a copy, at

    
    
        http://web.archive.org/web/*/http://rdf.dmoz.org/rdf/content.rdf.u8.gz

~~~
sebst
I will add the dump in the next days, of course

------
captainbenises
The DMOZ data is licensed under creative commons and there is a RDF export. I
downloaded the data, registered a domain -
[http://www.zedurl.com/](http://www.zedurl.com/) \- and I'll try knock up a
mirror of DMOZ tonight as a rails app. I'll update this post if things go
well.

~~~
jacquesm
How about offering to collaborate with the OP?

------
kowdermeister
There are endless link directories out there. Why do you think you can make a
difference? Starting from fresh nobody will care to submit links because it's
not really how SEO works anymore.

I doubt anyone just casually browsed DMOZ. A general link dir is probably a
dead idea. A specialized niche one, not so much.

~~~
jacquesm
> There are endless link directories out there. Why do you think you can make
> a difference? Starting from fresh nobody will care to submit links because
> it's not really how SEO works anymore.

It _never_ was how SEO worked, ever. The fact that DMOZ was a collection of
high quality links made SEO jerks spam it to death but that was a side effect.

DMOZ was the polar opposite of search engines, though many of them took DMOZ
as their starting point _because_ of the high quality of the links.

------
jacquesm
DMOZ is how just about every search engine currently alive got seeded. The
quality of the links was good enough that you could be reasonably sure to only
hit spam after the second or third link traversal which gives a good testset
to train a classifier with.

~~~
sebst
I've heard that point quite frequently.

So, one option for DMOZ to continue could be a movielens.org for bookmarks ;)
But I'm sure there are further possibilities...

------
mwfj
I know you're probably just looking to replace DMOZ, and build something vast
in scope/size. But please keep the idea I posted yesterday in mind:

[https://news.ycombinator.com/item?id=13802555](https://news.ycombinator.com/item?id=13802555)

Somewhat ironically/fitting one of the triggers for this idea was my dislike
for the amount of corporate/institutional "spam" on dmoz.org. Browsing dmoz
today it doesn't feel like browsing a treasure chest of information; it feels
like browsing one of those vanity contact books that people are scammed into
paying for placement in.

Just because 50 different museums from around the world are super reputable
(to someone) that doesn't mean they should have a guaranteed place in the
"art" category, for instance. They should need to have a kickass art
collection online with great UX, super-highres images, easily downloadable if
it's stuff that's out of copyright and so on. Just list the best ten (twenty?
thirty? no idea what the right number is. it probably depends on the category)
things, not every single art institution on earth that someone has bothered to
submit.

I guess I would summarize this as opinionated, content quality-based
selection.

