Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Document Q&A with GPT: web, .pdf, .docx, etc. (klavier.ai)
86 points by dsubburam on March 28, 2023 | hide | past | favorite | 76 comments
Hello fellow hackers, we made a site that gets GPT to answer your question using the info on a webpage you specify or document you upload (e.g., a large textbook .pdf file).

Background: When ChatGPT came out, I had the idea of having it pull answers from my stereo receiver's annoyingly dense 32 page manual. My weekend project prototype proceeded to surprise with great answers—just like what we've all experienced by now. My co-founder thought we should productize it, and make it easy to use online. So here we are with a very early beta! (Try it on a HN thread...)




Interesting ... It responded to a question with "I don't know".

Here's the full text in reverse-chronological order:

------

> By what measure is it popular? >> I don't know.

> Is neverthrow a popular npm library? >> Yes, neverthrow is a popular npm library. It is a fully-tested and type-checked npm package that provides type-safe error handling in TypeScript. It is designed to help users encode failability into the typesystem and provide a friendly API that lets them deal with failable values in a safe way.

> How do I install the neverthrow package? >> To install the neverthrow package, you can use the npm package manager. In your terminal, run the command `npm install neverthrow` and the package will be installed.

> What is typescript? >> TypeScript is a typed superset of JavaScript that compiles to plain JavaScript. It offers classes, modules, and interfaces to help you build robust components. TypeScript is designed for development of large applications and transcompiles to JavaScript.

> What is this blog post about? >> This blog post is about type-safe error handling in TypeScript, and how to encode failability into the typesystem using the Result type. It explains how to use the neverthrow package to create a Result type, and how to use the map and mapErr methods to handle success and failure cases.

> Enter a URL to read: >> https://gdelgado.ca/type-safe-error-handling-in-typescript


Thanks for the feedback. It, unlike ChatGPT, doesn't retain question history--so it didn't know what "it" referred to in your final question.

Given your comment we are going to consider retaining question history (or offer an option to do so)!


Thanks, i uploaded the terms and conditions of some insurance I'm considering and am able to ask it questions without having read it all myself. (Obviously I'm double checking for accuracy in case of hallucinating but i suspect it won't be long before I have the confidence not to do so.)

The crazy thing is I fell asleep a few hours ago wondering how to feed the insurance terms and conditions to ChatGPT so that I could query them, and then I woke up and saw this.

All this AI progress opens up so many possibilities that it's almost anxiety-inducing


Appreciate your writing in, and glad you're finding it useful. Would it help if the service shows where in the document the answer was fetched from? (we could work on adding it.)

(And yes, I feel the anxiety too--to keep up with what people are doing with the tech!)


Yes, that would be ace.


Very cool. I gave it https://www.gutenberg.org/cache/epub/3300/pg3300-images.html and it was able to pick up on some niche references in my questions. The text is quite long but it ingested it pretty quickly, which surprised me. Nice work. Does it cost you a lot to provide a service like this?


Yes, the limit is ~2,000 pages; so have at it! As for costs, it's manageable so far but yes, we'd need to figure out a business model. We will likely keep a basic version such as the current one free.


How are you stuffing long text into GPT context? Are you doing some form of summarization?


2000 pages is great. I could ask question from a book or three. That would be very useful, I'd be willing to pay for that.

What about adding my ebook library, and asking questions against it , would there be some way to do that ?


Neat. How did you land on the 2000 figure?


Generous enough for most use cases, without hogging our compute and storage resources (currently not at scale).


We built something similar but used a "studio" UI because we also can link up with Slack, an embeddable widget, etc.

We are using a scraper for HTML and parsing docs in Node.js.

The tough part for us is normalizing all of the data and working with various file types and encodings.

You can try it out for free here.

https://chatterdocs.ai/studio


Do you have any tips on how to effectively parse website content? I tested it on one of my websites and it was able to answer questions based on content that was located in separate div/p containers. Do you divide content into different section and use embedding to find the relevant text, or do you feed the entire page content into the API?


BeautifulSoup seems to work well for parsing. For your other question: something like that!


Very cool I uploaded a project kick-off doc and it did a fantastic job of summarizing the key points. I'm really stoked about the ChatGPT-meets-all-your-docs tooling that will come to help employees onboard quicker without having to talk to another human!


Bing can do this (with GPT-4) but the problem is it has an unacknowledged limit to the amount it's able to read, and so it mishandles summarizing a large document, seeming to read the start and a few random pages. How does yours handle large sources?

Edit: I tried this one with an 800kb .txt and after digesting it, it got two answers wrong (but at least related to the text) and then started spitting out "I don't know". I asked "what is this document?" because I saw with my previous test that it can get blocked and be working with a 404 page ("this document is a page that says suspicious activity request denied") but this time it just said "I don't know."


Re: your Edit, it's possible that your questions were follow up questions, which are difficult to make sense of on their own--the service at the moment starts from scratch for each question (has no memory of previous questions and answers).

We'll look into adding memory (either as a default or as an option).


I am on the waitlist for Bing and can't check directly--would it also answer specific questions about the doc? (Rather than summarize.)

Our site is meant for Q&A, and has a layer of tech that finds the sections in the large document that are relevant to the Q first. This will not work well in general for summarization on unstructured content. But most content tends to be structured and in practice we are finding that the approach still works on e.g. news articles, wikipedia articles, blog posts. It's almost as if where it doesn't work, a human would have trouble too. (e.g., on a long rambling HN thread).


There is no waitlist anymore, and this functionality is delivered through the Bing sidebar.

Internally, it has the same privileges as a third party extension with webpage content access, so it cannot access the pdf viewer contents. I believe Edge is getting a new PDF viewer in Canary that might solve this. For now, you can select text 2000 characters at a time and send to chat or give it a URL (assuming that Bing can see it in the search index).

Bing chat is already good at handling recursive queries (with internet access) and processing poorly formatted PDFs from the indexer webcache, so I assume it would do well given the right conditions. It does Q&A really well on GitHub repositories, for example.


>would it also answer specific questions about the doc?

Yes.


> the problem is it has an unstated limit to the amount it’s able to read

Not sure which version Bing uses (I’d guess the smaller), but GPT-4 has either an 8K token or 32K token context (prompt + response) space, some of which is taken up by the hidden system prompt, but that limit is also there using the model through the API.


The 'creative tone' mode of Bing was upgraded to 'large context size', presumably 32K. By unstated (I've changed my post now) I meant that it doesn't report back "that's too long", it just says "sure here's a summary", which is not representative of the document. I saw another "pdf reading" service (not OP one) that suffered this too.


Yeah, GPT itself will do this; you could probably do a separate token count in the app wrapping it to figure out if it was likely to have occurred (or limit the input to a size sufficiently less than the limit to assure that a summary wouldn’t exceed the limit.)


I mean, if you wanted to add no value to GPT, yes. The value of any 'read a pdf' app should be that it traverses the full document as needed for a query, copying into its internal context what may be relevant, re-iterating until the true answer is found regardless of whether it's scattered through the large document. Bing and the OP app both have as author says a "layer" that should do that but currently

- Bing isn't using the layer right - it can search the whole internet to orchestrate context for itself but doesn't seem set-up to so search one doc

- OP app seems to crash after a while

- the third service i forget the name of which has been shown on here (something to do with 'pdf') similarly to Bing doesn't 'search around' in long documents appropriately


The crash was unrelated to tech and a provider billing issue. Fixed. We do traverse the full document, all 2,000+ pages that we support. Give it a go!


OP here. We now have a Chrome extension[1] to post whichever webpage you're on to klavier.ai for subsequent Q&A. Avoids hassle of having to copy 'n paste the URL.

We are slowly working through the issues reported in this thread. Thanks for the kind and constructive feedback!

[1] https://chrome.google.com/webstore/detail/qa-with-klavier/jb...


Kagi has something similar for search results. You can pick a result and ask questions about it. Although, having played with both for a bit, the results from Klavier seem superior.


Any concrete examples?

For reference, Kagi's tool is called "Ask Questions about Document" [1]

[1] https://blog.kagi.com/kagi-ai-search#ask


No, totally anecdotal and unrepresentative. I played with both for a while and liked the Klavier chats more, so take my opinion with a huge pinch of salt.


Seems quite similar to https://github.com/whitead/paper-qa with a few more document types added


Really amazing!

First question: are there other tools that do this with ChatGPT? carrolldunham says that Bing can do it. But still?

Second question: what are your plans for the future? Could we integrate your tool into a website, with a pre-made list of documents and web pages, and our members, or customers, can use the tool to get answers? And if I look further, could we imagine a similar tool to answer questions received by email?

Third question: the tool works with French documents, do you intend to translate the interface?

Thank you!


1: We are finding out. Someone else mentioned: https://github.com/whitead/paper-qa We're hoping to keep our service be accessible and easy to use, and add features. Such as from your other questions...

2: We are thinking of the website integration. Do you think OpenAI may release this too? Questions received by email is a new idea that sounds interesting!

3: Thanks for the suggestion – we will look into it.


Maybe OpenAI will produce such a product, but seeing the amount of tools like yours that are coming out every day, I have more of a feeling that they will make their money from the tokens used. My 2 cents.

For the past few months I have been looking for a way to free up staff time to answer the same questions over and over again because our members and customers don't read our documents and website. There is no added value in having employees answer questions like these.

Your tool has a huge added value: instant response for the member or client, time I can invest elsewhere from these employees to do project development.

Fourth question: do you plan to be able to process an entire website, for example with its sitemap? Or is it page by page?

Thank you!


4: We would process the entire website, and be able to cite the page/section from where the answer came from, so the customer can look up the original documentation to confirm.

And given your explanation of your usecase, this feature looks more compelling to build out. Would you consider messaging me (email in my profile)? We'd love to chat, and maybe roll out a solution for you as a pilot customer.


Will do tomorrow! Thank you!


Awesome! Love it. How does it compare to chatpdf.com ?


Thanks. Just hearing about chatpdf.com. Trying them out. Thanks for the tip!


Can we point it to a product documentation website -- with multiple child pages and sections and ask questions across that corups?

I am actually interested in creating such a chat dedicated to one product as a example / hobby project -- any pointers appreciated on where I can start, and what already implemented opensource solutions I can leverage for quick results. Thanks!


I built this repo to do this for your own website content, that should get you a good starting point:

https://github.com/mpaepper/content-chatbot


Hi. I'm one of the people on this project with dsubburam (who started this thread). Thx for the question. Can you email one of us about this (emails in bios)? We're building the feature you are asking for here.


I plugged in https://www.wikiwand.com/en/Enzyme_inhibitor. Then asked it if computers can be used to identify if a compound is a enzyme inhibitor. What a surprise.


This looks promising. For me, it successfully scraped a webpage that Bing Chat would not. A couple comments: (a) preserve newlines in the output (I'm trying to get a CSV output), (b) the output length appears to be truncated - too short for my application.


This is fantastic and a great application of GPT. I uploaded the AWS Lambda developer guide in PDF format (1300ish pages, sorry!) and it's working wonderfully. I hope you keep your website around and expand on it!


Thanks for the positivity! We're going to continue to try to make it better.


React/SPA websites aren't read properly.

So I uploaded a desert recipe and noticed there is no formatting to the answer text. It is one big paragraph.

Other than that, the idea is great, but I got more "I don't know" than anything.


Thanks for the feedback, Joe. We'll look into what's wrong with parsing the React/SPA websites. And maybe do some post-processing for readability. Improved model is in the works too--so hope you give us a try again in a week or so.


I also experienced problems asking it to parse a pdf from researchgate.com, but parsing the same pdf from arxiv.org worked fine. I think it would be a good idea to provide some kind of feedback to the user, what exactly was parsed and used as a prompt. ( I was ready to discard the whole thing as stupid, after the first couple of "I don't know" answers)


Noted!


By the way, after entering the URL that didn't work- I asked klavier a few questions, and then I asked "what is the current prompt?"

I didn't save the exact answer, but it was along the lines of; "the prompt provided only says - document not loaded, please enable JavaScript and cookie's"


Do you call OpenAI's API or do you have a GPT model of your own?


Awesome product!

Is this a company? Or just a cool thing you hacked on the weekend? I ask it because there are tons of companies trying to do the same. How are you planning to monetize or standout?


I imagine at least 10 different teams are all building this. It's a no brainer and pretty easy to implement. I implemented the same thing in Elixir/Phoenix.


Many companies are trying to do the same? This is a fairly straightforward project with many online tutorials available. I have built a small app for my personal use in Python.


This is a great idea! Do you plan to productize a self-hosted binary, possibly allowing other, locally-hosted LLMs to be used in place of OpenAI APIs?

There's definitely an enterprise use-case for this.


Have you evaluated the quality of the responses? In my experience retrieval augmentation is hit and miss, it might collate unrelated information from different passages.


Not exhaustively, but anecdotally, it's good. We find that few questions end up having to retrieve relevant information from multiple document locations and combine. Maybe a good test here is giving it a HN thread and asking to summarize the positive and negative reactions from commenters?


> Sorry, our system encountered an issue. Please try again.


Back up. Try again. Sorry!


Cool.

My query was: How many red numbers are shown in this document?

It said: "I don't know."

...I'm going to keep playing with it =]


Looking into it...


This is fantastic. In many ways I like it more than the traditional chatGPT. How are you able to get it to ingest such long documents?


This is very good. I gave it this HN discussion and asked it to list all links posted so far with description of each and it did.


Great. I uploaded a paper I was reading. You can easily summerise, and ask other questions.


Can you give some details about the concept of transforming a pdf into a GPT input?


I think this application of gpt is way more useful than the chat interface. (Add an option for users to pay sooner). Here are a few suggestions:

1) Allow it to take a search term, do a web search and allow the user to select from those results.

2) Allow it to look at more than one document.

3) Detect if the output contains math formulas/graphs and render them. (or allow me to write a javascript post processor so I can add that logic myself)

4) When a user question can't be answered, prompt the user to allow your system to web search and then include those documents.

5) Create a version that can be run locally for those of us with private data. You should charge a lot for that version (~$100+k if the customer provides the hardware, and $1m+ if you have to provide the hardware (blackbox)).

6) Detect research papers and read the citations. You may have to ask the user for a SSO key to get the citations from paywalled sites.

7) Abstract responses need to be made more concrete. See if you can train the model to provide an example or describe the purpose or intuition when it responds.


I wrote a wrong url on the site by mistake, and it doesn't work now


May be generally down — be patient, possibly first time they're doing any kind of load-testing.


DANmode below was right. Now back up!


I chatted with it about a story I am writing. Inspiring!


Whoa. Maybe see your story in Show HN someday, if you'd share!


Which version of GPT are you using?


how can I train a model on a bunch of pdf's and run it locally?


Fine tuning a model like llama is super easy. You can fine tune models up to 33B parameters on a single consumer GPU at around 3 epochs per minute (less than 6 hours for 1000 epochs).

You can even fine tune on a free Google Colab GPU.

Start here: https://github.com/lxe/simple-llama-finetuner


Running it fully locally isn't easy because you need a complex setup with GPUs. Can you give some more color on exactly what you are trying to do (number of PDFs, types of queries, etc.)? With some more info we can give you some guidance.


oobabooga's textgen webui has a tab for fine tuning now. You only need a single consumer GPU to fine tune up to 33B parameter models at a rate of about 200 epochs per hour, per GPU.

There are also one-click finetuning projects which run on free Google Colab GPUs like https://github.com/lxe/simple-llama-finetuner

It's easy and not complex at all.


used it to interrogate langchains docs. how very meta




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: