Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: DocAsker – Use LLMs to ask documentation questions (docasker.com)
97 points by Ankly on Feb 1, 2023 | hide | past | favorite | 24 comments
We've built this over the last few weeks to leverage vector search and LLMs (this is backed by GPT-3.5, though we're also testing Flan-T5) to answer question over large sets of documents with references. Currently, we've ingested the documentation for React and some key adjacent libraries (Redux, React-Redux, React-Router, MUI). This allows you to ask various natural language questions and the output is hopefully a relevant answer with code examples if applicable, while sourcing the original docs whenever possible.

We're working on adding up more documentations and have more "general" questions (e.g., query your own notion documentation). Any feedback is appreciated at this stage, let us know what you think and if there are any libs you'd like to see added!




Hi HN! I'm the other person working on this with Ankly.

This came up as a pet project as we were eager to put some LLM work into production. Currently piggy-backing off APIs, though the results with certain self-hosted models could be worse (but could be better), as he mentioned, we've ran some experiments with Flan-T5.

We have a bunch of current plans as to where to keep building on this, beyond improving the model/engine/prompting/etc... We're very keen to integrate more libs that we use (FastAPI and Pandas come to mind, in terms of sprawling doc), and adding a "Search through StackOverflow Answers" button, though testing how well our similarity look-up works on that front.

On the non-code/technical aspects, everything we've tried has been pretty encouraging, but has a bunch of different challenges. For documentation questions, we're trying to "ground" the model knowledge -- it probably knows a lot of react, but can't quite reference this exact bit of the documentation, or will use an outdated something, or this, or that... So we're trying to re-center and improve the knowledge of the original model and the way it serves it.

When we test the approach on in-house documentation (such as Notion), the problem is a bit different: in a lot of cases, _all_ the relevant information should be in the context, and we very much don't want the model to rely on whatever latent knowledge it has of "What was agreed with Joe about the framework swap?". We're not seeing much of it anymore, but even with synthetic data, we had some interesting situations where the relevant context wasn't found and some safeguards failed, so an entirely made-up Joe was purported as having approved to swap to Angular.

Happy to answer questions and discuss more about this -- LLM as the "logic" layer of document retrieval is definitely fascinating.


I've noticed when prompting ChatGPT for code that it occasionally invents libraries that don't exist at all, or adds it own input if something about a prompt is strongly connected to its latent knowledge. For example, I asked it to write a program that would select one of my favourite dinner ideas at random, providing a list of options in my prompt. It added 4 more recipes I'd never heard of and some playful commentary about Ruby in the comments (and a working program).

As I'm a complete novice when it comes to LLMs, I'm really curious how you go about building these safeguards/constraints around knowledge you want the model to prioritize?

Is it simply a matter of fine-tuning the model with explicit instructions on how to handle certain topics? Can you simply train it with assertions like "Ignore everything you know about comparing the Angular and React javascript frameworks. Read this document we wrote to compare them instead."?


It's kind of like that. You always have the option of fine-tuning, although that quickly gets pricey if you aren't self hosting (e.g. OpenAI bills an order of magnitude higher for serving fine-tuned models)

The constraints can be put in place through a bunch of different things. The prompt engineering is a big thing, instruction-tuned models can be pretty good at following very restrictive instructions. You do end up sacrificing some creativity in your answers by adding a lot of restrictions but it generally works quite well as a safeguard layer. A lot of the cool LLM applications are, first and foremost, proper prompting. Setting a low temperature is also key, as the higher likelihood suggestions _generally_ (but not always) are less made-up. ChatGPT makes this a bit harder as you have no control over the model parameters (temperature is OpenAI-set) and cannot control the original prompt, meaning you can't fully be in charge of the instructions it gets, so any mitigation to avoid hallucinations will have its limits.

After that yeah, the context documents you provide are pretty important in grounding it. It ties back in with the prompt, but you can more or less drill it into a low-temperature instruction-fine tuned model that if it can't find the answer within a set of documents you provide it, it should simply not answer. Again, you lose out in some contexts (it's a bad feeling on the user's end to not get an answer) but you also ensure that your model isn't live-freewheeling about a new framework called Reagular...


https://platform.openai.com/docs/guides/fine-tuning

You create a series of prompts and their responses and then that tuned model is used with that implicit knowledge already stored in it.

For example a notebook for "lets train GPT on the information about the olympics - https://github.com/openai/openai-cookbook/blob/main/examples... and https://github.com/openai/openai-cookbook/blob/main/examples... and https://github.com/openai/openai-cookbook/blob/main/examples... )

The gotcha for this is that while regular Davinci is $0.02/1k tokens, training is $0.03/1k tokens and use is $0.12/1k tokens.

The other thing to consider is that Chat GPT has a session and history for that session. You can use GPT stateless which doesn't have the "it gets confused about what you were talking about before."

    curl https://api.openai.com/v1/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
      "model": "text-davinci-003",
      "prompt": "Write a recipe based on these ingredients and instructions:\n\nFrito Pie\n\nIngredients:\nFritos\nChili\nShredded cheddar cheese\nSweet white or red onions, diced small\nSour cream\n\nInstructions:",
      "temperature": 0.3,
      "max_tokens": 120,
      "top_p": 1,
      "frequency_penalty": 0,
      "presence_penalty": 0
    }'
And thus asking it about one and only one thing with no additional chat context around it.


How do you decide where to break up the chunks for embedding? On mine I am currently just doing something like X words per chunk. It's seems like ideally I could parse out all source code and avoiding breaking up functions but not sure how to do that for arbitrary languages.


I've experimented with a few approaches and to be honest, kind of gone with what "felt best" as we're quite artisanal with our testing approach at the moment.

We try to always go for logical breakpoints (e.g. never in the middle of a sentence or explanation). Some docs are cut into smaller chunks because the way they're written works quite well for segmentation, and smaller chunks have the advantage of allowing more to be looked-up, so your semantic search is allowed to mess up as long as it finds 1-2 relevant context elements. For some, we felt like cutting into chunk was losing too much information, so we've added them as quite huge chunks. It feels suboptimal in some ways, especially in terms of performance and modularity, but we've also found that the model is very good at parsing a ±2k token length sample and getting the right info from it in most cases.

Ultimately there's no right answer and it's a case-by-case tradeoff.


I just started on a project with about 5000 pages worth of government supplied documentation in PDF form. I wish I could just throw your tool at it.


Parsing pdfs (and powerpoints) and breaking them into "askable" chunks is definitely something we've been looking into and are keen to roll out. If you'd like to talk more about your use case definitely feel free to chuck us an email on the "reach out" email on the page!


Since you are working with raw text, it shouldn't need too much effort. There are a bunch of open source tools to extract text from PDFs.

The hard part would be parsing tables and other layout-dependent semantics. You usually start with text coordinates (like HTML elements with absolute position) and have to work backwards from that. I worked for some years in a project for a client that was full of edge cases, because whenever the input PDF (from a government agency) would have a slight layout change the parser would break. It took multiple iterations to make it robust enough.


don't want to jump on your brigade, but at AnyQuestions.ai we specialise in quality PDF and transcript processing for AI-answer purposes (supporting AI answers with citations, just like your tool for documentation). This comes from 3 years working on tech to parse lecture slides correctly, identifying semantic areas (e.g. what is a title, how are bullet points connected...) turns out this is useful for semantic search and other purposes of embedding. You can verify this somewhat by viewing how transcripts get bunched if you upload a youtube video, or if you search for PDF results (bullet points will be resolved to what they refer to etc as appropriate)

Would love to chat with you if you're up for it - you can test demo run our tool and contact us through the interface


Former attorney here. I dreamt about having access to a tool like this for complex contracts and would’ve paid through the nose for it. I suspect I’m not the only one. Exciting times.


This is something I've thought about a lot, as I worked in legaltech for a few years. The main issue here however would be the (lack of) networking and breaking into the legal market, it takes a lot to get the ball rolling even if the tech is good, and more established actors are likely to do it first I feel.


Would be awesome to have something like this that can ingest a whole GitHub repo including issues and PRs to ask when things got decifed or why they are a certain way.


I think this is supposed to do that https://www.gptduck.com/


This makes me think of confluence search. Maybe you can ingest confluence docs and make this a way for admins and users to interact with corporate docs.


We're working on a similar thing using the Notion API, so far synthetic tests are really encouraging, but we haven't quite tested it on a massive/sprawling corporate doc. Very excited about the prospect though!


I trined a couple of queries and it works really well! Great job.


Thanks so much for checking us out!


This is a great idea. I've already toyed with copy-pasting pages of documentation into ChatGPT with varying degrees of success.


For sure, there are a lot of applications for this kind of platform. We're very happy with the initial feedback we've received, definitely validates our concept. Thanks for checking us out!


when i ask a question it flashes "we are currently thinking of an answer for you" then abruptly cancels and goes back to the beginning state with no answer given. some error going on in the backend? cant see anything in the console


Sorry about that, I'll dig into the logs -- we had a lot more traffic than we expected overnight so it could be that the backend didn't scale quickly enough and ended up timing out some requests.


Very nice, and congratulations! Is botco.ai a competitor to docask?


I'm not too familiar with botco.ai, but from what I can see, we're approaching fairly different problems. DocAsker once deployed on an internal documentation (for example) would allow you to query it, for things like "What was agreed upon on the meeting about X?" or "What's the deployment procedure for the forecast tool?".

We haven't thought about marketing use-cases like botco seems to be focused on, and we're probably a bit too tight resource-wise to target this usage as of right now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: