It looks like this doesn't check whether the article itself cites a source for the claim. Is that why it's called "Citation Needed"? Because it doesn't actually cite anything?
That's really cool, do you think this might be the basis for potential natural language navigation? (when going over a document, instead of having to search by keyword or regex, one can search for more complicated concepts using English)
If not, what extra work is needed to bring it to that level?
I think you could get a pretty good solution for that using RAG and some tricks with prompt engineering and semantic chunking. With google's very-long-context models (Gemini) you may also have good results simply with some prompt engineering. Preprocessing steps like asking the LLM to summarise themes of each section can be helpful too (in RAG, this info would go in the 'metadata' stored with each chunk, presented to the LLM with each chunk).
A key engineering challenge will be speed ... when you're navigating a document you want a fast response time.
If Wikimedia Foundation didn't get paid for basing this project on an aggressively embrace&extend browser, they should look into whether they're leaving money on the table, since projects like this help extinguish the browser's competition.
Many of those sources are not available online (books), point at paywals (research papers) or are dead . Unless you have an API that can bypass these issues reliable you are stuck with a tool that has already landed several lawyers in hot water for making up citations on the fly.
The Wikipedia Library project [0] grants active editors access to a wide range of otherwise paywalled sources. I wonder if it could not be extended to this sort of bot.
What other, more honest endeavors are you talking about?
It's very easy to attack others when you don't need to defend yourself. I want to see you say these things when you have an alternative with its own share of problems.
With similar topic like that it is very obvious but if you compare different topics some cant get enough articles while others must be merged and reduced to a stub. The best examples I know of is to have the entire discography with each album and each song their own article vs anything (historically significant) covered in skeptic wiki where no amount of sources is enough for a single article.
More interesting, there must be literal miles of senseless discussion about this if what you describe came out of it.
Sure, but what happens when the article is updated at a later date, or rescinded, etc.? Should the LLMs be trained to repeat the article verbatim, or to say "according to this article[0], blah blah blah" with links to the sources?
Wikipedia works because we can update it in real time in response to changes. LLMs that need to constantly recrawl every time a page on the internet is updated, and that properly contextualize the content of that page, is a huge ask. Because at that point, it stops being an LLM and starts being a very energy-hungry search engine.
Well, it's just a bot, so no need for it to instantly react to any and every update.
I also have my doubts on whether it is possible to implement efficiently (or at all). I suspect that just yanking in the article and all the sources is non-feasible, and any smaller chunking would be missing too much context. Plus LLM logical capabilities are questionable too, so I don't know how well the comparison would work...
edit: the readme build instructions are incomplete and i don't think hotreload works. use `npm run build-dev` to get a working build.
it's not obvious to me what prevents this from being a firefox extension as well - it might be the sidebar/sidepanel api differences, but i haven't played with those much
Couple this with a search based article generator so you have an article generator and an article checker, and then off you go to generating 1 trillion pages. Could be useful training content for LLMs but also used by people.
> A chrome extension for finding citations in Wikipedia
In academia, Wikepedia citations are generally a no no. One reason is their unrelaibity (the author is citing a source that they themselves can edit). More importantly, Wikepedia may be a good place to find primary sources, but in itself it is a secondary source.
That justification seems a bit behind the times honestly. We've now seen actual academic fraud on a massive scale with extremely little in the way of a correction to fix this, while at the same time we've seen Wikipedia handle abuse extremely well. The academic fraud is a threat to Wikipedia, more than using wikipedia links is a threat to academia.
The point is to find the information in Wikipedia which often then has a citation to another source. If you search Google you often find repetitions of the information but most sites don't cite sources.
This seems no different than it’s always been. Even before Wikipedia you would not cite secondary sources. But you sure would use them to get a foothold on a topic and find some of those sources.
Wikipedia prefers to be a tertiary source. They want the references to be to secondary sources, rather than primary sources.
"Wikipedia articles should be based on reliable, published secondary sources, and to a lesser extent, on tertiary sources and primary sources. Secondary or tertiary sources are needed to establish the topic's notability and avoid novel interpretations of primary sources."
I wonder how this will go over politically with the wikipedia community. AI is such a hot button issue, and the risk of hallucinating and saying something wrong seems more pressing in this application then most.
I haven't used this, but reading the description, it sounds like it's primarily a search engine for wiki articles related to selected text. If so, I imagine it wouldn't be super susceptible to hallucinations.
It uses AI to parse the selected text to choose search keywords, and to parse the related Wikipedia article to decide if it agrees with the selected text. It obviously can bullshit in both cases.
Searching for keywords shouldn't be likely to hallucinate. And I would assume they would have a subsequent step to run a quick check to see they're really in the text. And if there is some issue, I suppose we can always fall back to something like TF-IDF.
The second part does seem more problematic, but still, as essentially a yes/no question, it should be significantly less likely to hallucinate/confabulate than for other tasks.
Also, if you look into their “wrong” example closer, it is a bit misleading, as both sources are correct. Joe Biden was 29 on election day, but 30 when he was sworn in. Understanding this requires more context than the LLM was apparently provided.
Honest question - do you expect there to be hallucinations in this case? I have extensive experience in LLMs and also from talking to peers with similar experience, it is uncontroversial to say that given grounding like this LLMs won't hallucinate.
I am not sure if when people say this they just don't have experience building with LLMs or they do and have experience that would make for a very popular and interesting research paper.
"it is uncontroversial to say that given grounding like this LLMs won't hallucinate"
I disagree. LLMs that have been "grounded" in text that has been injected into their context (RAG style) can absolutely stolñ hallucinate.
They are less likely to, but that's not the same as saying they "won't hallucinate" at all.
I've spotted this myself. It's not uncommon for example for Google Gemini to include a citation link which, when followed, doesn't support the "facts" it reported.
Furthermore, if you think about how most RAG implementations work you'll spot plenty of potential for hallucination. What if the text that was pulled into the context was part of a longer paragraph that started "The following is a common misconception:" - but that prefix was omitted from the extract?
Well wait a sec, you’d be just as guilty of confident bullshit if your claims above don’t pan out, and they didn’t even come from an LLM so it’s worse.
You can read more about it at https://meta.wikimedia.org/wiki/Future_Audiences/Experiment:...
Disclaimer: I worked on this.