
DIRT Protocol raises $3M to build a Wikipedia for structured data - iamwil
https://medium.com/dirt-protocol/dirt-protocol-raises-3m-from-general-catalyst-greylock-lightspeed-and-pantera-to-build-a-a76c3295f227
======
avichal
It's unfortunate so many people come out of the woodwork to tell people their
ideas are terrible or won't work.

I think it's far more interesting to ask how a thing might work, which uses
cases might be dramatically underserved today and serve as a beachhead, or the
tradeoffs being made rather than just say something is a "bad idea."

Dropbox launch:
[https://news.ycombinator.com/item?id=8863](https://news.ycombinator.com/item?id=8863)

Coinbase launch:
[https://news.ycombinator.com/item?id=4703443](https://news.ycombinator.com/item?id=4703443)

A 2012 thread discussing comment negativity where, coincidentally, the top
comment is from @iamwil who posted this link and is on the DIRT team:
[https://news.ycombinator.com/item?id=4363717](https://news.ycombinator.com/item?id=4363717)

A classic thread from 2012 where PG talks about negative comments:
[https://news.ycombinator.com/item?id=4396747](https://news.ycombinator.com/item?id=4396747)

To me, the most interesting ideas in the world are the ones that at first
blush look like they can't possibly work. But upon thinking through how they
might, you learn something.

Props to everyone in the thread who is asking genuine questions and actually
trying to understand what the team is building.

------
p1necone
This seems like a hilariously bad idea. You're basically building a system
where whichever group has the most money to burn gets to decide the "truth". I
would love you to change my mind though.

~~~
TazeTSchnitzel
Climate change will quickly be proven as completely fake

…by big oil's dollars.

~~~
yinyinwu
Climate change is an interesting example. An example of a DIRT registry, would
not be the question of "is climate change real". It's unclear what defines
"real". Rather a registry could be a list of the average air temperature in a
city over time.

~~~
heyitsguay
Average air temperature according to whom? How do you verify the values,
recorded ostensibly from sensors, were ever measured? How do you verify
historical data for everything pre-2018?

~~~
redavni
Ground truth can be constrained statistically.

Don't take me wrong. I don't really see the value in inventing a truth
bureaucracy that rewards participants in fake money. This dirt system is just
silicon valley, in it's typical naivety, trying to reinvent propaganda.

------
ram_rar
Firstly, Congratulations for launching DIRT.

>If the data is incorrect, anyone can challenge the data and earn tokens for
identifying these inaccurate facts.

How do you moderate censorship or conflicting information ? if someone uploads
my personal info, without my consent. How do I get to purge it ? From the
current model, it seems like I ll have to pay money to "request" purging my
own data.

~~~
yinyinwu
Thanks! DIRT is a fundamentally different approach because it removes the
middleman and there is no central moderator. Our goal is to move the trust
from a central party to a system of rules that anyone can participate in.

I think verified identity could be a registry on DIRT. Adding reputation on
top of voting is something we're exploring.

~~~
NegativeLatency
I don't understand your answer. Can you explain more clearly what the process
would be for removing personal info?

~~~
Sir_Substance
I can't see any obvious place to download this dirt protocol so I can check
(the website seems to be 100% marketing, 0% documentation), but based on a
combination of pattern recognition and the evasiveness of the above answer, I
would wager the answer is "there is no process, we didn't think about data
deletion".

~~~
Semirhage
You’re being generous... this reeks more of scam than carelessness. Either
way, it’s going to never actually be delivered, not be GDPR complient, and
based on the answers to questions raised here, full of obvious holes. If it
isn’t just a scam, it smells like someone went from “here’s a neat idea I’ve
spent a whole ten minutes on,” to fundraising without a beat in between.

No resistance to censorship.

No resistance to conflicting information.

More money = more “truth” with no recourse, because removing info costs.

Incorrect info loses you money, but there is no functional systemic way to
determine correct/incorrect.

Waaaaay too much marketing and nothing else extant.

~~~
paulie_a
For most of the planet GDPR is irrelevant. I personally will never give it any
thought when building something.

~~~
Semirhage
Failing to give thought to something doesn’t make it go away, but of course
it’s your right not to think. As long as you understand that ideology and
opinion aside, it can greatly impact your ability to do business in Europe,
then you go be you.

 _Most companies don 't do business in Europe._

As defined under GDPR? I’d love citations for that claim.

~~~
paulie_a
Most companies don't do business in Europe.

~~~
Aeolun
That's an interesting argument.

But unfortunately, it was beaten out by 2%.

[https://www.inc.com/anne-gherini/fifty-two-percent-of-us-
bus...](https://www.inc.com/anne-gherini/fifty-two-percent-of-us-businesses-
are-affected-by-this-new-regulation.html)

------
eindiran
Can someone explain how exactly DIRT protocol will do moderation? From their
site [1] it seems like they do some sort of moderating of crowd-sourced
structured data. But this is where I am a bit confused: "DIRT maintains
accuracy because every contributor needs to deposit tokens to write data. If
the data is correct, it is freely shared. If the data is incorrect, anyone can
challenge the data and earn tokens for identifying these inaccurate facts."
How is the data flagged as incorrect? Who decides that the original data is
wrong and the new data is correct?

Also, given that contributors have to put money up to add information, what
incentive do they have to add information in the first place?

[1] [https://dirtprotocol.com/](https://dirtprotocol.com/)

~~~
yinyinwu
Hi! Thanks for the question. DIRT works well for objective information. In the
cryptocurrency use case, this could be the ERC-20 smart contract address for a
token or a list of investors for a project.

To flag information as incorrect, you need to take tokens and challenge the
data. A challenge starts a vote and anyone in the DIRT network can vote with
their tokens on what information is correct. The vote winner and majority
voters earn tokens. The vote loser and minority voters are penalized.

We are planning to publish our protocol design in a few weeks with more
details.

~~~
fake-name
This doesn't answer the more important question. The issue isn't how do you
handle "incorrect" information, it's how do you _REMOVE_ information.

Basically, when someone doxxes someone, and dumps the result onto your system,
how to you remove it?

If you don't have a facility for rendering intentionally malicious information
_impossible to access_ , your system is fundamentally broken.

~~~
yinyinwu
I think the tough part is deciding who should be the arbiter or whether
information is intentionally malicious. Censorship can be a real issue.

For the DIRT protocol, if want to remove information, you also need to deposit
tokens and put forth evidence to convince voters in the network to side with
you. If you lose the vote, then you would also lose tokens. We put forth an
economic penalty for being inaccurate.

~~~
mcny
Firstly, congratulations on launching!

> For the DIRT protocol, if want to remove information, you also need to
> deposit tokens and put forth evidence to convince voters in the network to
> side with you. If you lose the vote, then you would also tokens. We put
> forth an economic for being malicious.

Sorry if this is a stupid question but if I lose a vote, can I bring it up for
a vote again? I think that should be ok, right? Even if I get some fact
removed by some kind of trickery where I sneak in a vote it should be OK
because if the fact belongs there, maybe someone else can add it again?

I think this works well for facts like Robert Kennedy as a matter of fact did
NOT kill John F Kennedy. However, some things are not objectively clear. What
happens when people keep adding "fake news" that is not obviously/patently
false? Does someone need to pay to have it removed? How often do I do that?
Every time someone adds it?

Now again the flip side is troublesome. We can't require identification for
anyone to post facts, right? I mean I think that would be unthinkable, right?
As such, how do we "rate limit" "fake news"?

Sorry if all of these have obvious answers. I just couldn't think of it...

~~~
yinyinwu
Thanks! Great question.

1\. Repeat votes - If you lose the vote, you can vote again. DIRT works not
only for correcting intentionally misleading data, but also for fixing out of
data information. It's similar to a bug bounty for data.

2\. Cycle of challenges - Each time misinformation re-surfaces, you would have
to challenge and vote down the data. However, every time you are successful in
your challenge, you earn tokens for your efforts.

3\. Fake news - Subjective information can be harder to adjudicate with DIRT
and that will not be our initial market. As an engineer, I would love to
believe that the blockchain can solve fake news. However, a lot of research
shows it is not the lack of data that the issue, but rather perspective.
People believe what they want to believe.

------
detaro
"Wikipedia for structured data" seems like an odd tagline when Wikipedia
already has a Wikipedia for structured data.

~~~
yinyinwu
Thanks for the feedback! One of the differences between DIRT and Wikipedia is
that we place a higher value on information accuracy. For example, in the
cryptocurrency market project teams often list investors and advisors as
affiliated with their project when they are not involved. A Wikipedia list of
investors would not be sufficient. You want these project teams to have skin
in the game and something to lose if they are spreading false information.

That said - the explanation didn't fit as well into a one liner :)

~~~
cdibona
So wikidata but for crypto? That's pretty thin. Why didn't you try to 'update'
wikidata?

~~~
yinyinwu
With wikipedia or projects like Open Street Maps, vandalism, spam, and poor
data quality are problems at scale. In particular, crowdsourcing does not work
for curating commercial data or information where you can profit by spreading
misinformation.

For example, in the cryptocurrency space. Projects raise funding through
initial coin offers (ICOs). In an ICO, you can contribute ETH to a smart
contract for the promise of tokens in the future. Having an openly editable
list of ICOs and their contribution address would not work. The list would be
quickly spammed because malicious actors have a really high incentive to put
their personal wallet address as the contribution address.

------
sometimesijust
Seems like a pretty great way to remove the middleman from populist
infotainment consumption. Can't be worse than what we already have and by
making it explicitly richest party wins it makes the process more transparent.
It might not be the Truth but at least it is Honest.

~~~
yinyinwu
Thanks for the comment!

Transparency would be the third benefit. With the blockchain, you can see the
entire history of votes. Every transaction is recorded. Today, if a website
accepts bribes for reviews, visitors to the site do not know that this
happened. With DIRT, if a wealthy token holder had a lot of tokens and tries
to throw a vote, you can see the attack happening.

~~~
procedural_love
How do you associate a token holder with an actual person or organization?

Is there a method for doing this built into the protocol, or would that be a
responsibility for the implementer?

I agree that transparency could be a great benefit of this technology, but if
a "wealthy token holder" can create several puppet accounts with their own
tokens, throwing a vote can be made to look "organic". Does DIRT do anything
to prevent this?

(Thanks btw, it's great to see you active in the comments.)

------
TekMol
My feeling is that this is either an illusion ("it will work out somehow") or
a sham ("let's milk the crypto craze").

No answer to the question why selling votes should result in more accuracy.
Buzzword Bingo. An overly broad approach. Lot's of social proof but thin on
content. This reminds me of all those ICOs we see these days.

Looking forward to read the whitepaper. But somehow I have the feeling it will
either never come or it will be just another marketing brochure without
technical details.

~~~
yinyinwu
Before Wikipedia, the idea that an openly edited online journal would be
better and more accurate alternative to Encyclopedia Brittanica would be
surprising to most people.

There's two parts of the design that leads to more accuracy for DIRT:

1\. Skin in the game - a token deposit to write encourages accuracy because
you can lose the deposit if you are incorrect.

2\. Encouraging moderation - moderators can earn tokens. If you vote and
challenge correctly, you can earn tokens. This creates an economic reward for
moderators that can protects the data accuracy in the long term.

We're posting the whitepaper and more importantly, launching the protocol with
a first application in the coming months. Stay tuned!

~~~
TekMol

        you can lose the deposit if you are incorrect
    

How would the system know which side is correct and which side is incorrect?
From what you wrote so far, it just counts which side put in more money.

~~~
VectorLock
Either thats the secret sauce that got them $3m with zero openly published
work, or the "who is right" is completely hand-waved over or relies on some
oracle.

------
microcolonel
Kinda like WikiData? [0]

[0]:
[https://www.wikidata.org/wiki/Wikidata:Main_Page](https://www.wikidata.org/wiki/Wikidata:Main_Page)

~~~
o_____________o
I recently began using WikiData for a large project and have to say that it's
surprisingly great.

~~~
waffle_ss
Can you say more about your project? I was thinking of using it for a project
where I'd have to categorize a lot of data and I found the
ontologies/modelling and program setup a bit daunting so shelved it for now.
Would be curious what others are doing with it.

------
forgottenpass
I don't see how the business model causes data to trend towards truth. I can
see how it would trend towards whatever the people with financial incentive to
change it want it to say. While DIRT pockets a tax on the edit war, ofc.

------
lurker456
Interesting idea. How will this handle legal take-down requests ? Backed by
DMCA, GDPR, and so on.

~~~
lm_nop
Take down requests indeed. Especially for individuals, and especially with
GDPR(right to be forgotten), CA privacy law(dont sell my data), etc. If
someone else writes on DIRT that I'm an alien from an alien planet bringing a
virus to earth (when in fact I'm a human from SF with no viruses), do I need
to 1) PAY to get a token to challenge this and also 2) correct inaccurate
information with correct information with no option to totally remove the
entry (however "ridiculous" the entry)?

Applied to the case of getting accurate VC listings, DIRT has a ploy to get
VCs to PAY to get tokens to challenge incorrect entries. Consumers also have
an interest in the quality of information, but a primary concern lies with the
subject of an entry.

DIRT -If I may, my request to you is to document the heck out of your policies
and expected behaviors. The grey line of "ridiculous" that I point out is
something that you've mentioned in another response, that you're not in the
business of fake news. At some point, you'll need to be making decisions and
providing ethical guidelines.

------
wslh
This sounds like GraphPath/GraphOS
initiative:[https://www.graphpath.ai](https://www.graphpath.ai)

------
jameslk
If it costs tokens to submit information, what incentive is there to submit
information?

~~~
iamwil
It can depend on the contents of the registry and who depends on that
information. For example, with a list of top 100 colleges, readers might use
it to decide which colleges to go to. And hence, writers would be incentivized
to submit their own college to the list.

A better, but less mainstream-relatable example is a list of ERC-20 smart
contract addresses.

------
rgbrgb
very cool! anyone have a list of useful TCRs? i haven't found one yet, but
love the idea.

------
dtran
Inspired by
[https://news.ycombinator.com/item?id=17512045](https://news.ycombinator.com/item?id=17512045),
I'm trying to figure out what gets me most excited about what DIRT and TCRs
could enable.

For me, it's actually not data easily verifiable as true or false, but more
for "wisdom of the crowds" type of knowledge—things that you couldn't put up
on a source like Wikipedia. These tend to be lists or recommendations that
contain some subjectivity, but also tend to coalesce around a mostly-agreed
upon set of answers from a trusted set of sources.

In the centralized world, we usually rely upon institutions like the Michelin
Guide to develop a fair set of criteria, but we ultimately as end users trust
that institution's "objectivity" and judge whether we think that list is
valuable. Sometimes when I research, I informally end up creating lists of
lists and combining them ad-hoc if I can't tell which of them is more trusted.
These lists also tend to end up being static or only updated once or twice a
year and can fall horribly out of date.

I think TCR incentives could potentially be really interesting as an
alternative to these lists which rely on the institution's brand. For example,
I think Quora Answer Wikis (like this one: [https://www.quora.com/What-are-
the-best-independent-coffee-s...](https://www.quora.com/What-are-the-best-
independent-coffee-shops-in-San-Francisco)) and general consensus for
recommendations in forums for questions like "Which cities should I visit in
Thailand if I'm looking for nightlife and places to hike?" or "Which REST
framework library should I use for a Django project?" It'd be amazing if DIRT
could balance the incentives for community members to contribute to this type
of data and keep them as living lists, with all changes and updates maintained
through a community with the right checks and balances and incentives.

From the Medium post: >If the data is correct, it is freely shared. If the
data is incorrect, anyone can challenge the data and earn tokens for
identifying these inaccurate facts. Our protocol and platform makes it
economically irrational for misinformation to persist in a data set.

I think the more interesting data would be data that's on a gray scale, e.g.
using the above coffee shop in San Francisco example, obviously if John Doe
tries to get his burger joint on the list as a growth hack even though they
don't serve coffee, that should easily be verified as misinformation. But what
if a coffee shop just closed for business, or moved to Mill Valley but thinks
they should still be on the list, or just switched beans and raised the prices
so that everyone agrees that it no longer deserves to be on the list?

Disclaimer: I know most of the team working on DIRT, and I don't know very
much about TCRs.

~~~
iamwil
You're right that there's often lists that people make, but usually ends up
outdated. Often in these cases, the incentives for reading the list are
usually more than those maintaining the list.

People in the earlier days of the internet imagined a better world brought
about by immediate and unfettered access to information. Many have tried to
make freely available information on the internet. Wikipedia, IMDB, and
Freebase are direct products of this school of thought. However, we can only
count these on one hand. In fact, most free data projects languish and have a
hard time getting off the ground.

What we all discovered as we built out the web is that only some kinds of data
can be maintained for free sustainably. Sure, if it's something that engages
fandom, like all the different types of starships in star trek, people are
intrinsically motivated to update that list. But if it's something that's
considered dry but useful, like the tax rates in every county in the US, or
points of interest on a map, there won't be enough people with intrinsic
motivation to keep that updated.

As builders and users of the web, we've compensated by subsidizing that
dry/useful data, typically with a company selling advertising or subscriptions
in adjacent services. The implicit deal we make as users is if the company
provides the data for free, we're ok with the company accrue profits off the
data we help curate. Recently, the sentiment has been growing that this may
have been a raw deal for users of the web as a company's profits accrue to the
point of immense power over our lives.

What I think the builders of the early web got wrong, was that certain types
are data needed to involve other incentives besides intrinsic. While we've
found other ways to incentivize users in the 2.5 decades of the internet,
cryptocurrencies now give us one more tool in our toolbox to use economic
incentives to design systems that converge on the curated lists that are
regularly updated.

With this new toolkit, we may be able find another way to provide freely
curated data without using subsidization. Instead of the value capture
accruing in a single company, we may find a way to sustainably distribute it
amongst the curators.

We're not as sure subjective data is a good first fit for TCRs. With any
startup, it's better to find a niche application that's a great fit, and we
think we've found one in objective data for the crypto space.

I think another aspect that might be exciting for you to think about is if
you're able to link the data between registries. It's a non-obvious aspect
that almost no one asks about.

------
progval
So it's Wikidata but with "blockchain" stamped on it to raise more money?

~~~
tribler
The Truth is determined by the rich! Majority voting by token owners
determines the facts. You just can't make up stuff like this.

Curious how this will play out. Threshold for contributing is non-trivial, so
the wasteland scenario is my academic guess. The contribution scoring and
ranking will help, but can you be anonymous?

~~~
yinyinwu
In many ways, a system where the wealthy control information is what we have
today. You can influence elections with donations.

Today, if a publication or data source posts information about you or your
business, there is no means to correct this data. You can post about it on
Twitter or email the service, and wait for a response. You can't just fix this
information. With DIRT, we create a way for people to at least be part of the
curation process.

Different types of data require different governance models. Some datasets are
not as critical to protect and could have a low token stake. For other pieces
of information, you want every writer to have more skin in the game and thus
have a high stake for writing. Our bet with building a protocol is to test
what incentives will curate the best data.

------
asdsa5325
Wikidata + cryptoscams? Yikes.

