Hacker News new | past | comments | ask | show | jobs | submit login
Golden: Mapping human knowledge (golden.com)
397 points by yarapavan on Apr 30, 2019 | hide | past | favorite | 167 comments

Cynical self: They raised a $5mil round just to re-do wikipedia with more self-referential VC pats on the back.

Intrigued self: If there is anything worth raising money for, it is building out a collective knowledge base for all of humanity.

Angry self: Wikipedia is quite literally one of the most amazing things that the internet and millions of strangers have ever produced together, and lives as a vital proof of concept of a collaborative process on a scale not seen before. If you are interested in making wikipedia more useful with more linking on in-depth knowledge, more multimedia, and a richer UI for editing, why not build something that builds on wikipedia instead of building something entirely new?

Academic self: You are throwing out ideas like "cover all topics that exist" and "neutral objectivity" as if entire fields of human inquiry don't exist into what those things even mean. You seem to dislike the current editorial standards of wikipedia, but I'm not at all clear where Golden will draw its lines or how?

Also, what are your thoughts on the morality of essentially crowdsourcing all the input that you will feed to your query engines and profit off people using? Aren't you essentially making your contributors all unpaid workers? If you are serious about building out a for-profit version of wikipedia, why not figure out ways to distribute micropayments based on how useful others find the information someone took the time to contribute (as an example)

Jude from Golden here. Agreed that WP is one of the most amazing things ever built and interesting to see your various lenses on our mission.

To the cynical self: see dropbox launch on HN back in the day. PS I’m no way claiming we are dropbox :>

To the angry self: There are various constraints that we want to release ourselves from in working on this problem by starting fresh. We believe the constraint space is too high to not build something new here. There are things that can be reused to build on what has been done already (linking out to WP, WP linking back to us when appropriate, the name space being similar/forked, various policies being built on/forked or rewritten, lessons learnt, content summarization with AI, fact cross checking etc).

To the academic self: we want to cover 10bn+ topics, google knowledge graph is around 3bn+ entities. We are not attempting to map all lamp posts in san francisco which would make a useful data set for a self driving car company but we do want to map all businesses, concepts, science topics, people of interest, species, products, services, etc etc. Instead of notability, we are aiming more at a validation model ie ‘does this entity exist’. There is also a difference between ‘article’ of WP and ‘entity’ of Golden for our model. So I believe there is space for positive coexistence between Golden and WP. We will still want discussion around the validation and ‘what next after 10bn entities are done’ debate.

In terms of the morality part, we wish to be at a more open standard than WP. The trade being for the common user: we open up all the pages on CC-4.0-BY-SA, go hopefully 1000x more entity cardinality, open source useful queries in exchange for less work per topic than alternatives (due to the leverage of the automation on alternatives). So I think we are on strong moral ground here, otherwise I would not work on it. We also have paid helpers as well to fill in gaps on our side to increase content and using part of our revenue to increase content at an ever faster speed. We reviewed the micropayments model and we don’t think it will work (see lunyr failure and others on that front).

Thanks for replying! I agree micropayments are incredibly difficult to make work, as is evidenced by the race to the bottom advertising model that seems to be everywhere all-the-time.

I hear from you that you want to essentially cover 10 billion topics, and essentially validate that they exist, but that says nothing about validating the content of what someone is saying about it, nor organizing it, etc.

I hear lots of AI buzzwords, but essentially I don't see any staff that would leverage all the thinking humanity has done on organizing, validating, and cataloging information. Where are the information scientists? The librarians? The archivists? The journalists? The philosophers? Etc etc etc.

Essentially you are talking about a profoundly /human/ endeavor that requires input (IMHO) from many corners of human knowledge to do in any way that begins to approach wikipedia (or even an encyclopedia, much less a library) in terms of quality and scale.

I hear buzzwords, and see an alarming lack of acknowledgement of how difficult these questions are (or even that they exist).

However, you have the $$ and the people, and I'm just here hiding behind a keyboard criticizing. Clearly you've convinced more people of your ideas than I have of mine, and by all means it's a noble goal, so I wish you the best of luck and will be interesting to see what you are and what Golden looks like in a decade or so!

Thanks so much, in terms of the hard questions after todays madness of launch comes down a little I'll tackle the hardest comments/questions here. In the coming months we will do some technical blog posts to explain how we will tackle the problem space. Many of the problems we have not figured out yet and welcome the community to contact us with new ideas. I 100% agree some of the problems are very hard. In terms of giving a glimpse into some problems we have solved so far, please test out the AI assisted editor, the magic table cells in the editor for auto filling tables, the citation tool by pasting a academic paper in the citation UI, the event detection on the timeline UI and the AI suggestions as well to get to some of the early results we have on automating the problem. Topic prediction, taxonomic detection, claim validation, structured data extraction, auto field detection/suggestions, crosslinking, spelling/grammar checking, sentiment checking, event detection, tense detection, quality on human edit feedbacks and ultimately prose writing (see recent open AI auto writing research) [non exhaustive list] - some we have solved and some not yet, but we will keep working on it. Generally speaking, keen to work on something difficult for the next 10 years...

Hi Jude,

On your page you said "If an extremely niche topic is valuable to just a handful of people and positively contributes to society, it will have a home on Golden."

Who will make that judgement call of what "contributes to society", and who will be paying their salary?

You also said "We believe this advanced query tool is extremely useful for investment funds, large consultancies and large companies, so please get in touch if you want to experience one of the best query tools out there."

That sounds great but its a far cry from "human knowledge". There wasn't much about advanced query tools for academics, nonprofits, activists, or government employees.

Sorry to be so cynical but one can only hear so much of "making the world a better place", to quote Silicon Valley.

> To the academic self: we want to cover 10bn+ topics, google knowledge graph is around 3bn+ entities. We are not attempting to map all lamp posts in san francisco which would make a useful data set for a self driving car company but we do want to map all businesses, concepts, science topics, people of interest, species, products, services, etc etc. Instead of notability, we are aiming more at a validation model ie ‘does this entity exist’. There is also a difference between ‘article’ of WP and ‘entity’ of Golden for our model. So I believe there is space for positive coexistence between Golden and WP. We will still want discussion around the validation and ‘what next after 10bn entities are done’ debate.

How does this differ from wikidata.org?

it will have to be profitable to pay back investors. for one difference.

> we wish to be at a more open standard than WP

As far as I can see, CC-BY-SA license only applies to the text and not the knowledge graph that users contribute.

Hey judegomila,

You're using my images - hundreds of them, it seems - in breach of the licence. Where do I send my invoice?

I'm sure as you wish to be "on strong moral ground", you won't want to deny me what you owe me.

So, you're aiming for an exit as per Metaweb/Freebase?

JG here. No, just a damn good website for you all.

If you've taken VC, then surely there's an exit in about 5 years?

JG: I hope not :>

Like a lot of others here I really wish you good luck, this can become amazing!

Like a lot of others here I'm also afraid hat will happen as VC starts to demand profit, now.

For the record, I'm in no way against successful companies being wildly profitable, quite the contrary: I see that as a guard against being forced to do dumb things.

What I am wary of however is companies being forced by VC to do all kinds of crazy stuff, like back when Quora decided to publish everything one looked at and I left there and then never to return, or when short after WhatsApp joined Facebook talk would start about "integration" and I would immediately start moving my account and all groups elsewhere.

What I could hope for[0] - especially with companies that are hoping to crowdsource a lot of data - would be some kind of effective guarantee and/or escrow to prevent short sighted plays by VCs or hostile takeover by Google, Facebook or similar companies, i.e. that companies would "tie themselves to the mast" to escape the siren songs.

[0]: but don't really expect in most cases as it would limit a number of profitable exits. An upside I could see would be that it would be easier to get crowdsourced data, from both companies as well as from individual contributors if one could believe that the data would stay accessible and not be abused.

Don't let your VC see this because they're definitely expecting to see a return on their investment reasonably soon.

All the best.

It seems like Wikipedia might have some inherent incompatibilities with what they're trying to do, such as its editorial rules and its definition of notability, which can't easily be "fixed" by whatever definition of fixing they need. I appreciate a more inclusive view of notability that lets you take it all in and surface whats relevant more selectively although that is open for insane amounts of debate around it as we've seen on Wikipedia.

Wikipedia is a great resource, but it's also one of those success stories that owes as much to the incompetence of its competitors as it does to its own merit. I've often thought it would be interesting to create a "Wrapipedia" of sorts, basically a copy of the Wiki codebase that could be used to circumvent Wikipedia's often-nonsensical notability and deletion rules.

Basically, instead of going to wikipedia.org to look up something, you'd go to wrapipedia.org, which would issue the query on your behalf and supply its own page if the Wikipedia lookup fails. Wikipedia pages that were created by users but subsequently deleted by editors would essentially be treated as pure virtual functions to be implemented by wrapper classes. As with Wikipedia itself, users could create new entries and edit existing ones.

That way the original Wikipedia project could remain true to its charter without leaving its users frustrated or unsatisfied. Its editors would also appreciate an alternative to being insulted and browbeaten by annoyed users, I'm sure.

True, but that sounds more reactionary and less constructive than what I had in mind. I'd envision it more as an alternative access URL for Wikipedia, one that acts as a supplementary source when necessary but otherwise is indistinguishable from using wikipedia.org directly.

Should we be thinking of Wikipedia alone, or Wikipedia plus all of the fan and other wikis that have sprung up around it? When you consider the collection of all of them there isn't much of a notability restriction and cardinality takes a jump forward too. Of course the collection of them doesn't get you an ontology carefully elaborated by a single mastermind, you just get whatever emerges from this decentralised web.

And perhaps it's ontologies like these that you get if you do go with emergence


Thank you for the comment! I think it sums up some important critical questions very nicely.

While I am happy to see talented people working on important topics like this, I am really sad to see that this seems to be a for profit effort.

If you have the right team and vision it should be easy enough to get five million from public funders like the NSF without the leash of having to maximize profits (likely) at the expense of being able to maximize the overall benefit for society. For things that can go wrong just have a look at the dumpster fire aka fb.

Although I do believe that companies like Golden start out with the best of intentions, experience shows that the reality of financial expectations will likely win out in the end (e.g., Google, FB, Twitter).

So, why not go with an institutional structure that doesn’t have as many strings attached? INAL but maybe something like a public benefit corp [1]?

1: https://en.m.wikipedia.org/wiki/Public-benefit_corporation

> Also, what are your thoughts on the morality of essentially crowdsourcing all the input that you will feed to your query engines and profit off people using? Aren't you essentially making your contributors all unpaid workers?

My mind went in a different direction, although your point is very valuable and the question needs to be asked.

My concern is this: Wikipedia editors work hard to maintain a level of quality high enough that the kooks and cranks and racists of the world feel like Wikipedia is biased against them. Well, guess what: It is. Wikipedia gives no platform to people who think vaccines cause autism. NPOV is biased towards the scientific consensus.

So, will Golden even attempt that level of quality? Will it keep the constant tide of insanity out or will it be inundated by the loudest few who believe the craziest shit? Crowdsourcing is dangerous when some people are endlessly motivated to game the system, and I don't just mean making Moot win Man Of The Year competitions.

Jude from Golden here. Yes we will work very hard to keep to scientific consensus and reach that level of quality or higher. Please bear in mind this is not only crowdsourcing but there is automation in the collection of the information, that too will have its fair share of issues as well no doubt :> There is likely opportunity to auto detect cranky/racist/gaming behavior and we have some plans in that area. Feel free to shoot over ideas as well if you have further thoughts.

"there is likely opportunity to auto detect cranky/racist/gaming behavior and we have some plans in that area."

Ask facebook and twitter how easy automated content moderation is...

True, very hard problem. But combinatorial opinion free form comments phase space >> than phase space of canonical knowledge thus number of patterns of things that can go wrong for them much larger. Still keeping our shields up and not discounting this issue.

As well, think of the kind of impact 5mil would have on archive.org.

To angry self: I dont entirely trust Wikipedia to give an objective/apolitical view of things all the time: there've been times where articles have been noticably influenced by a defacto, anachronistic political (progressive) position. I wouldnt mind having a redundant source of information where the possibility exists of current politics corrupting its articles' integrity

> articles have been noticably influenced by a defacto, anachronistic political (progressive) position

I hate it when the right pretends that everyone is biasing everything in terms of a progressive agenda, when that's just not the case, even in Wikipedia's case.

There was the case of Cross doing absolutely right-wing, biased, pro right-wing Israeli governments entries and edits and being protected by Wikipedia editors to do so.

I think what's often gets conflated is that munch of Western society and organizations are socially progressive, sure, but they're very much economically right-leaning, something that rarely gets acknowledged and which makes it difficult for me to have a honest conversation with the other side.

I'd also note that Wikipedia's founder is very much right-leaning, regularly attacking the likes of Corbyn and certainly not holding progressive economics dear.

Ok, my point still stands then; it's good to have multiple sources of information

Sure, I agree with that.

Many people have tried this, and they invariably end up creating propaganda cesspools like conservapedia. Progressivism fundamentally values objectivity, so I don't think you can easily exclude the former without also losing the later.


Jude CEO and Founder of Golden here. Super excited to take this live. We are out to build the next place for canonical knowledge on the Internet. It has been a long-term mission for me to open up the knowledge coverage of billions of niche topics, companies, technologies and new concepts. Our aim is to cover in excess of 10bn topics in high detail over time. Although we all love Wikipedia, there have been various issues in the last 18 years, from constant deletion of data (product hunt was almost removed a few months back) to fact validation and automation of processes/work and UI/ease of user. We also believe there are many more features that users want, like a knowledge feed, keyboard commands, AI assisted feedback on editor contributions and tables that can automatically update.

We have set out to:

1. Cover all topics that exist over time rather than just notable topics.

2. Go into greater depth around a topic, from its timeline to videos and other useful resources surrounding the topic (eg learning videos, further reading, blog posts, Q&A, podcasts etc).

3. Support a larger population of people trying to learn about topics.

4. Make knowledge more accessible, richer and fun to read about.

5. Allow you to track topics of interest and be updated when new information is available on the subject.

6. Save time making the knowledge in the first place by using design, UI and AI to aid construction of the information. Especially by automating repetitive tasks and bring smart editor features.

Initially we have kicked off with various areas from cell and plant based meat to synthetic biology to cryptocurrency consensus mechanisms to artificial intelligence, microbiome, stem cell technology and startup topics. We expect these areas to increase in scope over time covering space, medical food, clean technology, robotics and many more exciting fields.

We are still early on our journey to delivering our vision and very much looking forward to product feedback and help with building up the content. Our team is hard at work making the product easier to use, we are up for taking every flow to its simplest form and removing every bug. If there is a feature you have been dying for on Wikipedia but could not get it, please also let us know. We look forward to seeing you in our community and covering topics especially under represented elsewhere.

How to you intend to account for merit and credibility across data sources?

The way I see it we've scaled our communication of information far beyond are ability to scale our assessment of credibility and merit of said information. This problem of merit is where I see the big gap in our tools. Are you planning on doing some kind of credibility assessments per-user based on content written/consumed?

Golden's feature-set is similar in many ways to app's I've prototyped towards this problem; I'd love to hear your thoughts on the subject.

The missing pages you speak of are mostly VC/company names, and you just gloss over useful knowledge as "others." As I look into DNA sequencing[0] for example, there is little information while the bulk of it seems to be about companies. Care to explain if this is the direction Golden is taking?

[0] https://golden.com/wiki/DNA_sequencing#Companies

Hi. I see that you're using some of my photos that are licensed under a Creative Commons-Attribution-ShareAlike license, but I cannot find where you have attributed me as the author. Could you elucidate how to, in general, go from the title picture of an article to find out the license information?

How will you be handling "soft" topics like history?

What about the handling/records of/provenance of artifacts?

Who is the arbiter of Truth? Will this wind up like Snopes or Wikipedia with an entrenched viewpoint that is dogma and defended at all costs? How will you avoid this?

Will alternative views/paths of inquiry be considered seriously? For example; vaccination efficacy research? I understand this is a hot-button issue, but that's exactly why I picked it. How you handle it will be a litmus test of how other issues will be handled.

How will you avoid the apathy and dogmatic approaches of many scientific journals?

What about scientific verification (or lack-there-of)?

Sad to see the most important set of questions in this thread not only go unanswered, but downvoted too.

Speaking as someone who asked a similar question: I doubt they're going to answer this, because it would tip their hand, and no possible answer would benefit this little publicity junket/advertorial PR dump.

If they say they're going to moderate the site such that vaccines don't cause autism, in their view, that's taking the "party line dogma all alternative views are crushed by moderator fiat" approach, as per the comment you're replying to.

If they say it's going to be a free-for-all, they're implicitly saying that, unless they get a team of users onboard to stamp out nonsense quickly, their site is going to be Yahoo Answers with more Ajax, and a clearinghouse for "Big Pharma Chemtrails Cause Morgellons Vaccines To Sterilize White Babies" type nonsense, because the extremists always seem to have time to spread their idiocy into any unmoderated or lightly-moderated forum.

However, the questions still have value, if only because the answers resonate in their nonexistence.

You say that you'll be doing more media and learning oriented content than Wikipedia: how are you planning to support this - given how many contributors Wikipedia has? (And if you don't mind then maybe you could go deeper into how would you compare to Wikipedia. :)

Good luck!

Jude from Golden. Beyond the blog post in terms of more learning content - video and extensive further reading links can help in this area but at the core much deeper pages eg https://golden.com/wiki/Cryobacterium vs https://en.wikipedia.org/wiki/Cryobacterium

In terms of numbers of contributors we believe Golden is at the stage where 1 hour editing on Golden produces more content/data than alternatives and the friction to edit is much lower. Are you looking for more depth on comparisons beyond the blog post?

> In terms of numbers of contributors we believe Golden is at the stage where 1 hour editing on Golden produces more content/data than alternatives and the friction to edit is much lower.

In my experience as a Wikipedia editor, 80% of the time I spend writing an article is searching for (good) sources; 10% is editing friction like looking for the right infobox or the right model for what I want (although that time is reduced by experience); 10% is actually writing the article. Wikipedia’s contributing documentation discoverability is quite bad.

Your “high resolution” citations is a really great idea I wish WP would had. Does it support having multiple citations for the same part of the content? Overlaping parts?

If the goal is to build a learning plan, I find LearnAwesome's approach far simpler & better: No UGC except collecting links to existing resources on the Web, a simple markdown file and connections to other topics for discovery: https://github.com/learn-awesome/learn-awesome

How you earn money?

Are you going to collect user data and track them?

Just reading the article I am extremely interested in the Advanced Query search. Bravo on allowing the user to decide how they search(unlike some products). It's something I have been waiting for years. What I would like to see next is the ability to mark items that should be excluded from the search. Usually when I am doing a long complicated and unpredictable search for information, I will go down rabbit holes and from my experience exclude all the information results that I know for sure I do not want. Unfortunately I have to keep track of the items mentally which limits the capacity of relevant info I can find.

Hi, Jed from Golden here. I'm obviously biased, but our query tool is really powerful. :) You can both include specific things in your search, but also exclude specific things. For example, show me companies that went through an accelerator, is in FinTech, but not based in the US or UK. Happy to talk through specific use cases - feel free to reach out directly - jed@golden.com

The real limit is the quality of the data. Wikidata’s (free) query tool is quite powerful but if the knowledge subject you’re interested in is poorly defined the query engine can’t help.

How your vision is different from Freebase and other ontology projects?

Is there a tension in designing a knowledge database for amateurs vs experts, and how do you mitigate or address it? Wikipedia has such a tension for a lot of topics.

This is absolutely fantastic. I'd been distraught by Wikipedia policy of "notability" and deleting valuable articles that volunteers created by pouring several hours. I firmly believe that no knowledge, no human is small enough not to be "notable". I also like increased focus on tooling which Wikipedia has failed to deliver in all these years. I think Wikipedia was good start as trying to emulate encyclopedias of 18th century but in new age we need to move on to AI-first knowledge graph that can have billions of nodes where each node in the graph could be anything from some human to some object in my backyard to entire textbook.

There would obviously the question of how do you prevent misinformation and falsehood. If you want to scale to billions of nodes, moderators aren't going to cut it. One possibility is leveraging community and what I'd call chain of trust. For example, community can flag, upvote, downvote. This doesn't result in deletion but simply a signal to the reader about how trustworthy content this may be. The chain of trust mechanism can improve this further by inferring contributors that users have trusted previously. The StackOverflow like gamification for contributors can create wonders here. In addition, you can allowing users to create their social network so they can build their personal chain of trust. Another possibility is to put untrusted articles in draft domain and move them to main domain as trust level is increased. The key is to avoid deletion of content and retain it somehow so it can be improved and evolved.

Now the things I don't like about Golden:

When signing up, it forces bio to 140 chars. Why? Why not collect more knowledge about authors? Not artificially limiting information should be the point here, right?

I also find current interface very cluttered and unfriendly. After signup I was greeted with topic of blockchain and cell based meat occupying most of my screen real estate. I don't care about either and half-visible conversations under each topic does not help. How about asking me what I'm expert in? What are my interest? Add some algo magic to recommend topics for contribution?

I also don't like UX at all. For example, this is page on Bitcoin: https://golden.com/wiki/Bitcoin. The menu that suddenly breaks after quick intro hurts my eyes. The typography is straigning. The left menu just hard to grasp. On page for cluster, you get giant list of contributors on right which I care less. You can say whatever about Wikipedia but they got all these stuff right.

this is a great idea. First URL I faved in the past 12 months. Good luck Jude!

Looking at https://golden.com/wiki/Heyzap it seems like the summary has information that is not in the article.

If you start reading from the introduction, you won't know that Heyzap is a mobile advertising company.

What about poetry, art, music, social movements, and, you know, culture?

This is so awesome!

One reason why I like contributing to Wikipedia is because it feels like I'm contributing to the knowledge-base for all of humanity. Why should I spend my time working on a VC-funded version instead of at a non-profit? VC companies go bust all the time. I would have little share in the governance. Plus, it seems like you all are using public labor for private profit.

That all said, more knowledge is better than less. Good work on creating this.

Jude from Golden here. This knowledge base is for all of humanity as well. I think the important part here is that we are putting out the content on CC-BY-SA-4.0.

There is also risk factor to a donations model for WP and to not having enough revenue to invest in the tools needed to go 1000x on the topics + other features that are important to build. If it means anything, I worked on my last co (Heyzap) for 9 years and its still running today. I can see the worry but there are many options on creating backups and dumps to the text, so we are agreement that more knowledge is better and thanks for the support.

Thank you for your response. I'm greatly encouraged that the license is CC-BY-SA. That's definitely in the public good.

I do believe there is space for non-Wikipedia knowledge repositories. Look at Wikia, for example. People do want to store all their fandom knowledge somewhere. It's just that Wikipedia might not be the place for it.

I also agree that the underinvestment in tools can limit contributions. Wikipedia's new visual editor is the biggest step they've taken in making editing easily accessible. They also have new translation and analysis tools. Not to mention all the specialized wikis under the umbrella - Wiktionary, Wikidata, etc. I do wonder if it's enough though.

Thanks for your work. I hope you can find a balanced business model.

Thanks, glad you like it.

If the knowledge is for all of humanity will you be incorporating as a B-Corp or some other form of social benefit corporation?

Same here. Why should I contribute to Golden for free, where people have to pay to make queries, when I can do the same with Wikidata, which is queryable by anyone?

Jude from Golden here. Good question. We are opening up useful queries over time for users eg https://golden.com/y-combinator-w19-companies/ Future paid/business tools will include using our AI assisted editor for private companies, private storage of their knowledge so we can continue to open up more queries to the public. Our north star is to get to a more open knowledge base than what is currently available. Also here are all the cryptocurrency projects that have whitepapers https://golden.com/cryptocurrency-whitepapers/

Given than Wikidata is published under a CC 0 licence there’s nothing preventing Golden from using it as a knowledge source. Unless Golden provides a better editor experience I see no reason for anyone to contribute to Golden rather than Wikidata.

Just FYI, most of the content on Wikipedia is actually dual licensed under CC-BY-SA and GFDL. Both licenses are copyleft licenses; CC0 is more permissive.


Yes, but I’m talking about Wikidata (CC0), not Wikipedia (CC-BY-SA).

Whoops, I misread your comment! Sorry about that.

Parts of Wikipedia articles (e.g. some infoboxes [1]) are actually sourced from Wikidata, which is licensed under CC0 [2]

[1] https://en.wikipedia.org/wiki/Template:Infobox_person/Wikida...

[2] https://www.wikidata.org/wiki/Wikidata:Data_access#Basic_imp...

Between the politics (both internal and external) around Wikipedia, I gave up as an editor years ago. To me it sounds like Golden actually wants to be a repository of all information, rather than a byzantine bureaucracy that pretends to play encyclopedia.

> To me it sounds like Golden actually wants to be a repository of all information

And based on what information did you draw that conclusion? A well-articulated self-description from a VC-funded startup? Or does it mean some competitor failed in some way automatically makes it favorable and trustworthy?

Jude from Golden here. I'd love to get your feedback on the editor and fix any bugs you come across / comparisons of previous experiences and why you gave up. You can email me at jude [at] golden [dot] com or post it here...

> The arbitrary threshold of what is notable and what is not doesn't cut it in the Knowledge Age. There are currently 5.8 million English language articles in Wikipedia, and Google had 1 billion objects (200x Wikipedia's size) in its Knowledge Graph when it launched in 2015. We estimate internally that there are 1000x the entities to cover than what Wikipedia has today. It’s an exciting challenge!

This is a very weird comparison. Why not helping Wikipedia and contributing to this huge open knowledge base? The rules set by the WP community for acceptability clearly do not imply that “actual technologies, projects, products, theoretical electrical components and academic ideas” might be removed if written there, as the author suggests.

That articles, or even drafts, for specific American VC-funded projects did not reach WP's acceptability threshold does not mean that there is something fundamentally wrong in Wikimedia's approach. I'd love to see a better comparison.

Jude from Golden here. Unfortunately, there are many more topics other than US companies being removed eg right now https://en.wikipedia.org/wiki/Soda_Popinski is up for removal. This notability issue spans almost all entity types. Additionally, many topics just simply don't make it onto there soon enough https://golden.com/wiki/Morphogenetic_Engineering Another example, Lisk was removed https://golden.com/wiki/Lisk SV Angel https://golden.com/wiki/SV_Angel

As far as I see for issue like the morphogenetic engineering page you linked, the notability issue is to prevent people from gaming SEO or making their personal company/theory/whatever from being listed. Which is to say, maybe it's not a good idea for anybody to be able to list their business, thesis, or book (for commercial or discovery purposes) alongside actually notable topics. It seems ripe for marketing or other gaming.

> That articles, or even drafts, for specific American VC-funded projects did not reach WP's acceptability threshold does not mean that there is something fundamentally wrong in Wikimedia's approach.

Unless your goals include preserving knowledge that is outside of WP's guidelines. WP drew the line for what is worth preserving in one place, there is room enough in the world for a service that draws the line somewhere else.

More interesting questions:

1. What does not work in WP's model that you will address? Moderation? Transparency? Funding?

2. How do you plan to have 1000x the number of articles on WP? How do you plan to moderate and follow the accuracy of these 6,000,000,000 articles?

3. Is it an English-only project or do you value international contributions?

>Why not helping Wikipedia and contributing to this huge open knowledge base? The rules set by the WP community for acceptability clearly do not imply that “actual technologies, projects, products, theoretical electrical components and academic ideas” might be removed if written there, as the author suggests.

There are rules as they are written, and there are rules as they are actually enforced. Wikipedia has long had a great divide here, especially thanks to rules-lawyering deletionists and political cliques amongst editors.

> Why not helping Wikipedia and contributing to this huge open knowledge base?

Wikipedia is incredibly selective about what they allow to have its own page. They simply aren't interested in being a huge open universal knowledge base - it's aggressively curated in an awful way that isn't immediately obvious. I've seen notable/important/interesting people be removed simply because they didn't meet some arbitrary level of notability, even if they had high impact in their respective communities. That kind of selective enforcement and control of information is what we should be fighting against, not encouraging.

Wikipedia notability is actually quite strict. Relatively few businesses have significant coverage by multiple independent reliable sources, for example.

While noble in appearance (that's what you're going for with the Golden name?), I'm skeptical of a VC-backed company going after Wikipedia, which is, IMO one of the treasures of the internet.

Without a VERY transparent and provable business model, I would never support this, as it seems like exactly the wrong direction to take "the internet". Especially since from the CEO's language, it seems to be competing directly with Wikipedia, rather than trying to work with them.

This is a great example of a product that has 10 little reasons why it's better than pre-existing alternatives, but seems to lack 1 killer reason why it's better. I don't think it would have passed my personal sanity check for startup ideas [1].

I predict that the team will eventually discover one or more niches where they can leverage their platform to build a differentiated value prop specific to those niches. For example, the niche of "investor databases" with the differentiated value prop of "letting entrepreneurs filter down the list of all investors to find the top 100 best ones that they specifically should try to pitch".

Each time a specific niche and value prop is identified, it will motivate building cool new functionality into the platform. The platform can evolve as a union of these niche-motivated features.

I suspect that a lot of the features that exist in today's horizontal-minded v1 won't be needed because they were conceived of without reference to any specific differentiated value prop.

[1] https://medium.com/@lironshapira/how-to-sanity-check-your-st...

How will you handle people that want to add knowledge that isn't accurate such as flat-Earthers, anti-vaxxers, etc? Seems like only a matter of time until you're overrun with contentious and vocal minorities that have nothing better to do than undermine our species longterm survival.

Jude from Golden here. Great question. We are actively monitoring all the changes right now, building up the community with a scientific/industrial focused seed and building out UI and AI to track the flat earth type changes that might come up in future. I think if we can get transparency on their best arguments/evidence and see the best counters it is going to become clear that the earth is round in that example. Let us get overun with people that want the best known information on the topics.

This approach may work with natural sciences, but what about political topics? People who spent most time studying an ideology X, are in some sense the best available experts (they remember thousands of details), but are far from impartial. And of course, ideologies may try to call themselves "science", making it seem like people who disagree are simply uneducated.

Who gets to decide who is a scientist and who is not a scientist? Scientists?

There's a pile of bias in this question, but it such an important question that needs to be answered.

Nazi policies were the defacto "science" for a decade-and-a-half. @Jude, how ill you prevent similar dogmas from taking hold? Especially if they are the popular dogmas?

True I am heavily biased toward the truth.

So for one of your "showcase" pages, the "Golden AI" plagiarized Wikipedia content and now you are in violation of the CC BY clause Wikipedia uses: https://golden.com/wiki/Product_Hunt/activity/user/golden-ai

How could that happen?

What will happen now?

How will you prevent such copyright violations in the future?

Jude from Golden here.

We attribute to wikipedia in general which is inline with their TOS. Did we miss a place? We say when needed "Text adapted from the Wikipedia page "Product Hunt": https://en.wikipedia.orghttps://en.wikipedia.org/wiki/Produc...

FYI as well the copyright doesn't apply to the actual fact only the text and we give attribution as per their TOS in those cases.

I do not see attribution on https://golden.com/wiki/Product_Hunt

The vast majority of the text on that page is a verbatim copy from Wikipedia. You have to provide appropriate attribution.

For a tool that has the high goal of "mapping human knowledge" and use A.I., I'm just not very impressed by the lack of APIs and no mention of a data model.

I feel that to map human knowledge and make it accessible and understandable for machines, the data model should be put front and center. For me that means a knowledge graph (preferably with SPARQL/OWL capabilities).

Were's the metadata? The wiki pages don't even have schema.org data! I'd be more interesting if golden provided UI improvements over the querying and presentation of data from Wikidata and/or DBPedia, or if golden's editor would make it easier to annotate the content of the wiki pages with RDF data.

> Today, we have a great opportunity to use new technologies to solve the problems: […] real identity […]

> Join Golden

> First Name Last Name

Yeah, no. A “real names” policy is a really terrible idea, for reasons outlined here:


(former wikimedia employee here)

A real names policy also means it'll have essentially no Japanese content. Online psudonyms, or anonymous editing are the norm in Japan.

That’s listed in the linked text:

> in some countries, such as Japan, online pseudonyms are the norm in all circumstances

I'd like to see a breakdown of the benefits next to the costs.

As it stands, the article you link here is a judgement seeking non-anecdotal data.

Additionally, splitting in "first" and "last" name is a terrible idea for a service that aims to be international. Not all names can be split as "first" and "last". Some people have two "last" names. Some languages (e.g. Hungarian) put the family name before the given name.


What are the benefits of a real name policy?

It's easy to create bot accounts that have believable names, and a whatever vetting process put in place may just be a false sense of security that you're interacting with who they say they are.

It certainly doesn't make anyone less of an jerk. Plenty of racists sexists etc on facebook arguing with everyone else in the world.

How does is prevent impersonating someone not on the platform? Twitter is notoriously bad at this... you can just call yourself whatever you want as long as your target isn't a user of the platform and takes notice...

"Really terrible" is a bit harsh imo but your comment is very insightful.

Perhaps there is a middle ground? Some kind of novel solution that accounts for data coming from "real names" vs pseudonym differently?

If the problem is one of credibility, maybe some manual process that verifies the user is part of some prominent organization i.e. a professor in a given field or something could work. Give the user "flair" (similar to how reddit does it for r/askscience). They keep their pseudonymous username, but they're also recognized as an important figure.

> If the problem is one of credibility, maybe some manual process that verifies the user is part of some prominent organization i.e. a professor in a given field or something could work.

Only accepting users from prominent organization would compromise the goal of having 1000x more content than Wikipedia (see the failure of Nupedia).

I hope it will not be registration walled like Quora but more open like stackexchange websites or hn.

I refuse to participate to registration-ponzi-scheme.

What will the content licence be ?

Jude from Golden here. We won't do the reg wall - hold me to it :>. We want the information to be open and EASY to access, thus no reg screen. CC4.0

>We won't do the reg wall - hold me to it :>.

I'm sure it's possible to make this an actual required element of the company, isn't it, rather than just the say so of someone who might'n't be at the company in the future?

How about API access and textual dumps for the other kinds of analysis?

Thomas here, programmer at Golden. We don't have any specific plans to announce at the moment, but we've been thinking about how best to provide API access, including looking into GraphQL. Let me know if you have any specific needs or ideas! thomas@golden.com

Hey Thomas, love what you are building.

If they offer API access I think it's fair to charge for it. At least for a certain scale.

Hard to offer something for free if the server costs & maintainence are non-trivial.

Thanks for this answer. Good luck with the project!

The blog post says, "Public topic pages will be free to access and the text available on CC 4.0."

What does "CC 4.0" mean? There are many version 4 Creative Commons licenses. I looked at an actual Golden article and the bottom of the page says, "Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions."

Okay, so CC-BY-SA-4.0... similar to Wikipedia.

CC-BY-SA-4.0 correct, we are correcting the blog post.

Hi Jude.

1) What makes Golden so different than Wikipedia that we should stop using Wikipedia and use Golden instead?

2) I randomly clicked into the Beyond Meat page on Golden [0] and I am comparing it to Wikipedia's page for the same company [1]. I can see you personally made 10 contributions to the article, more than anyone else. Why do you think I should read Golden's article instead of Wikipedia's article?

3) You want to stop deletion of data. When is content not worth keeping? For example, would you want an article written about every street in the world? How would you write an article on Golden for this [2]? My first click on Random Article on Wikipedia returned this [3]: how would you write the article for H. Day on Golden?

4) How will you implement fact validation?

5) > We also believe there are many more features that users want, like a knowledge feed, keyboard commands, AI assisted feedback on editor contributions and tables that can automatically update.

As a long time Wikipedia user, I can tell you I have never wanted any of those features, maybe with the exception of the auto-updating tables pending more information about what that actually means. How do you know that people want these features?

[0] https://golden.com/wiki/Beyond_Meat

[1] https://en.wikipedia.org/wiki/Beyond_Meat

[2] https://en.wikipedia.org/wiki/V%C3%A4sterbroplan

[3] https://en.wikipedia.org/wiki/H._Day

Yup, wikipedia already maps all human knowledge. In fact , you don't need to map everything, just central concepts in every field. I think there's a law of diminishing returns - summaries of knowledge are extremely useful, but as you add more and more detail , there's less and less benefit.

I created 'Wikipedia-Prime', a series of 27 wiki-books mapping all central concepts in all fields of human knowledge. Approx. 16 700 articles was enough to capture all the central concepts used by experts in all fields.

Wikipedia-Prime Index: http://www.zarzuelazen.com/CoreKnowledgeDomains2.html

> Golden’s mission is to collect, organize and express 10+ billion topics in an accessible way, presented in neutrally-written and comprehensive topic pages.

How do you plan to uphold the neutrally-written part? Wikipedia does not really do this; many editors try, but it is very difficult. This is especially true seeing as every one carries personal biases. How will you make sure opinions that do not agree with the majority of your users, editors, etc. are allowed?

I agree this is the most difficult problem for a global platform -- even Google Maps has to redraw the borders depending on whose asking. I would prefer a wiki where I can read different branches of the 'main' article. Instead of deleting paragraphs that don't agree with my sense of what's true, I can say 'please relegate this to a different branch' and let it be upvoted or downvoted or geo-fenced, but still have it available.

I do really appreciate their priority to keep open logs -- make sure editors and admins can be held accountable for making changes to the tone or frame of an article.

In my opinion, you should include "POWERFUL QUERY TOOL" into the community edition, and then sell the ability to have private spaces including support in the priced plan.

Right now, it doesn't seem very fair or legit given the headline "The intelligent, open knowledge base". If the raw data is accessible but the tool to query it is not accessible to most web users, it doesn't make it "open".

On reading this announcement, I can't help but think of Quora, and what an unmitigated disaster it is, as well as a cautionary tale of what happens when you accept VC funding for a human knowledge project. It's a few small steps from "well, we'll have to start making money" to "let's block the Internet Archive completely and hoard the world's knowledge for our own gain".

But worse than just being a cautionary tale, I guess I'm unhappy to see this announcement because Quora could seriously have been what we got instead of Wikipedia, you know? Who wins in a platform race seems like a function of factors like first mover advantage and luck, rather than any dispassionate analysis of what would be better for the world.

So I wouldn't have the guts to start a project like this with VC funding. You have to model the probability that you'll be pushed out and replaced with someone who doesn't share your ideals and is tasked with finding a path to profit. How could anyone have the stomach to create a potential Wikipedia replacement with that kind of liability attached?

It looks good from a visual and usability standpoint. But it's kind of odd how there's already a cluster of articles on synthetic biology, but nothing written about Escherichia coli.

If the idea is to catch up and eventually overtake Wikipedia in content, that seems extremely ambitious.

Luckily the name Golden is pretty generic so maybe they plan to pivot towards whichever industries adopt it more?

The cynic in me sees an attempt to crowd source a knowledge base for a IMDB style business model...

What is IMDB business model?

Sorry, meant CDDB (now Gracenote). So many acronyms.

Is it Wikipedia - but with a modern editor?

And "AI". And all about blockchain and VC.

And a “Real Names” policy.

And "socialized" like Github with followers & favs, but without the forks or personalized pages.

I like the concept.

I'd like to know how they plan on addressing auditability of content - where it comes from, how it changes over time, etc.

Jude from Golden here. TLDR answer on this important question and I’m interested in the community ideas on this problem set:

1. High visual transparency and open edit logs eg golden.com/wiki/Morphogenetic_Engineering/activity

2. Using real profiles so we can prevent bots / multi accounts etc.

3. Cross checking SPO/fact triples in the prose against our structured data to validate information.

4. Cross checking against multiple sources

5. Using high resolution citations where we actually highlight the claim (please test our highlighting of a claim and citation tool to see this in action).

6. Having source trust ranked citation URLs.

7. Opening up primary sources eg articles of incorporation as evidence for claims.

8. Having a strong audit log of where information comes from.

9. Using a github style ‘issues’ rather than wiki talk in order to discuss content issues.

10. Giving UI affordances to argue out points and give evidence to claims made in these arguments.

We are still working on this UI / AI and general community to really dig into this core challenge.

Please reconsider the real names policy, it is perfectly appropriate to keep my legal name separate from what I write online, and I don't even live in a country where I can be arrested for my opinions.

I would much rather see a 'durable identity' process, whereby when someone sees my username as an author, they know it was authored by me. This goes beyond allowing or not allowing duplicate screennames (where you swap an I for l and impersonate a politician etc) -- consider authenticating edits and transactions with public key cryptography so I can assert my identity -- just not necessarily the one I can't change.

keybase.io is, of course, an innovator in this space and you should consider adopting their strategy for asserting identity online !

You can still make it a pain in the ass to create multiple accounts without asking for a government ID (which can be faked too, by the way!). ban multiple log-ins from single IP, have a phone number challenge, put a waiting period on the account, nothing is perfect but all of this is better than having to use my real name.

9. Github style issues feels much more modern, as a semi-regular wiki editor talk pages are one of my least favorite (and hardest to audit) parts of Wikipedia

It is similar to ideaflow but a lot better :) https://www.ideaflow.io/

Is there anything of traceability in Golden? In the past years, I have been documenting the works, exhibitions, and documents about a Dutch artists. While doing this, I have often come across conflicting information. I have discovered that what I actually need is a reasoning system about statements and have a mechanism for traceability with respect to every fact represented.

You seem to want classic style AI in the style of Cyc:


No, not something as rigid as Cyc, but more a system where you can trace the truth of a certain fact. Wikipedia requires you to reference sources. But that is just one layer. Some forms of knowledge are based on multiple layers of references. Making a statement based on a source, always involves some form of interpretation. It would be nice to know who made the statement (and who are supporting the interpretation). Of course, wikipedia keeps the full history of an article, and you could go back through all the revisions and see who edited what, but in requires a lot of work. It would be nice if Golden would at least acknowledge that traceability is an issue has some support for it.

Sorry, but I see nothing compelling here. If at this point your "customer stories" section includes quote from a VC, you don't have a real product. Good luck, but I'm afraid you are going to have a hard time finding real paying customers. And attracting actual human editors? Numerous people on this thread are already offended by your proposal. Next.

I don't get it, in what scenario would someone pay $99 per user per month?

Could someone please give an example where this makes sense?

Love the modern take and long tail focus!

Do you think the paid features for enterprise use is enough to support a VC model? (How do you avoid this becoming Quora (which also had an enterprise proposed use case)?)

What was the decision process around taking this a VC route vs not?

Jude from Golden here. Yes, I believe we can have the top 50k companies of the world as paying clients and help power open knowledge for the world. We have paying customers today on that front.

The decision for VC includes the following reasons:

1. Making this happen at scale / quicker.

2. Having people like Marc from a16z, FF and Gigafund adding valuable insight in order to pull this off.

3. Derisking the mission with $$$

> neutrally-written

I'd like to see opinionated stuff too. In fact lack of what can be labelled as opinionated, non-credible and non-significant is what I dislike of Wikipedia. I acknowledge value of neutral and credible information but I believe too much of what still can be useful food for thought gets discarded with the rest.

From quickly skimming through this post, it seems like it's basically just a Wikipedia that supports people posting their products as advertisements.

If something is not notable, you're just going to end up with those with vested interests deciding what content goes up. I suspect this is the main reason for Wikipedia's requirement of notability.

Only slightly related. I just went through the founder's website and they are a very fascinating person: www.judegomila.com

Peter Drucker once predicted that the knowledge economy and knowledge workers would be the next big evolution [1]. Are we there yet?

[1] https://en.m.wikipedia.org/wiki/The_Landmarks_of_Tomorrow

why not join efforts with wikipedia?

or at least borrow heavily from?

What will be the monetisation ?

So far looks like tools for companies & pro research, like their query builder.

Will be interesting to see if it scales up...

Being able to provide a query engine that can combine general knowledge with trade secret / internal knowledge will be a huge step for chatbot engines or other document retrieval. Cool project.

You can already do that with Wikidata.

Edit: and before that, with Freebase (which was later merged in Wikidata).

So Wikipedia with better UI and a sprinkle of AI "magic"?

Freebase is back! (until google buys them again)

just kidding (I hope), good luck!

My 2c on design, while scrolling article, sidebar is highlighting current paragraph and that jumping is very annoying in current color, it should be much lighter..

JG here. Will look into this.

This is very disturbing — the idea of all human knowledge being a for-profit venture owned by a founder and his investors is wrong.

"Knowledge is Golden" -- Cute name.

Small UI complaint: I can't quickly open lots of links that are near each other due to the little popups.

so this is a for profit company that intends to use free labor contributions to exploit search engine SEO rankings and subsequently attract useless millions of daily visits and then you exit it for hundreds of millions a few years from now. It worked before, why wouldn't it work now? there are always enough dumb people who will help you make a fortune for free if you understand tech capitalism and manage to fool them; good luck anyway.

Why not actually work on capturing knowledge in a machine-readable and usable way?

To the creators of Golden: how do you view Golden in comparison to Everipedia?

I just took a look at Everipedia, and it looks like they are just cloning content from Wikipedia. At least that's what it seems like based on the article on E. coli.

Will you have an API, or will the information only be available through the heavy Web interface?

How will you handle inherently contentious topics, like vaccine safety, the existence of Morgellons as a disease distinct from delusional parasitosis, and the existence and current territorial extent of Israel and Palestine?

How many topics do you cover now?

This is my idea, happy to see someone had the same one and decided to implement it!

There are two major things I run into: figuring out the minimum knowledge needed to understand a certain thing, and deciding what to work on to further our knowledge and help everyone.

- My knowledge is quite specialized (I didn't have broad foundations), so sometimes it's hard to get into something. I know that I'm missing knowledge, but it takes forever to try and get from "I know how if statements and databases work" to making sense of a symbolic formula that might depend on set theory. If something could tell me "read these two pages and you've bridged the gap" (because the paper is about topic X and I already know topics Y and Z, so I only need those two to bridge my gap), that would be so much faster than googling "what's that sideways chandelier" and figuring out that it's set theory that I'm missing. There could even be circular branches between topics: someone who knows set theory can use that to understand databases more quickly than someone who does not, and vice versa. Posts explaining set theory for database users and vice versa could exist. But that's a lot of manual work to create each possible explanation, not sure that's viable, even if it has to be done only once.

- Related to the previous point, bridging knowledge gaps is currently done by asking questions. If you can ask another human a question, they can give an answer that perfectly fits your knowledge gap. That way of learning is much more efficient (for the student) than trying to give a workshop on the topic for a group of people. If we can do it as described in the previous point, we can free up a lot of resources that are now spent on having people tell the same things in slightly different form over and over again. Things need to be taught only once.

- When hiring, how can you tell if someone has the right knowledge for the job? Even a full day of testing can't tell you whether someone really understands all aspects of software engineering that are relevant for your organisation. If you can tick boxes on all the topics you understand, job matching might become much easier. Employers can request people with certain knowledge, and job seekers might get the jobs most in reach, e.g. "if you learn about these two topics, you are fit for this job" (on a technical level, at least). You'll still have to test that someone actually understands a topic (or have some trusted testing third party), but verifying a claim by asking randomly about a few topics is much easier than asking about everything.

- Our society demands ever more educated people. If you didn't have a lot of schooling, it can be hard to know what you are missing that companies are looking for. A message like "learn these three things and unlock this profession" might help a lot of people. (In the optimistic case, that is. It'll also cause people to reject candidates that are otherwise great because they're missing a topic. But at the start of a project like this, I'm just exploring the potential, not thinking about all the ways it could fail.)

- You can find who has certain knowledge within (or outside) your organisation to help with a project.

- At the leaves of the tree, you can find a topic to work on next, to further our understanding of the world. People can ask for certain issues to be solved or results to be confirmed and pay people to do that. Right now, our knowledge as a species is quite scattered and often hard to find.

I'm excited to see an attempt at starting that knowledge tree and am very curious to see which of the above points can really be done with it!

"navigating the cacophony of formats, designs, sources and standards is challenging. Google, Wikipedia, Bing, DuckDuckGo, Quora, StackExchange, Github, etc"

Makes me think of standards, all we need is one universal one. https://xkcd.com/927/

Thought the same! Haha

So... a wiki?

a press release of a Wikipedia for profit, with no information.

just flag it.


The content is on CC4.0 and not going to be paywalled.

It seems like you're putting the text out under CC-4.0-BY-SA [1] (Attribution, Share-Alike), which is great (the updated version of the same copyright for Wikipedia content). However, you're also collecting a TON of structured and relational data, which seems to be the value generated by your editor. Are you planning on keeping that locked up?

[1]: https://creativecommons.org/licenses/by-sa/4.0

My hope is that all the content, not just prose, will be CC licensed. Jude, will it be so?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact