Does this take advantage of the new OpenAI functions api? From a quick look, I can't find any indication that it does. Although I find it tricky to disentangle the langchain abstractions, so I might be missing it. Kor's last release predates the announcement of OpenAI functions, so probably not.
Seems like this is now best done via functions, if you're using OpenAI's models? They call out "extracting structured data from text" as a key use case in their announcement.
FYI, the upcoming version of gpt4 does considerably worse emulating function calls / generating code-like strings, but gets better again if you switch to the function call API: https://twitter.com/reissbaker/status/1671361372092010497
(My guess is the same is true of gpt-3.5, although I haven't tested it.)
Are there any useful alternative models though? Most I've found weren't particularly good at following instructions or using tools in the way langchain provides them.
Another tool like this is Marvin. My experience this that these work pretty well, but the world of prompt “engineering” is a very squishy one and getting the exact output format you want is not guaranteed.
Neat, I was just looking for something like this today, I think I'll give it a spin.
Does anybody here have experience with metadata extraction using LLMs? I've been thinking about it recently. and wonder if just making a big prompt and putting that into OpenGPT or even ChatGPT is really the way to go, or if there is a "cleverer" way. Maybe you could train specifically for certain fields, or use the LLM in a different way (like you can use the embeddings directly to do simularity search)?
Another idea was, if you have a lot of similar HTML documents, to not ask the LLM for the metadata, but to ask it for CSS selectors that contain the metadata fields - assuming it can deal with HTML and the data is verbatim in there. Then you should be able to get much more consistent results.
We're using LLMs to generate web scrapers and data processing steps on the fly that adapt to website changes. Using an LLM for every data extraction, as most comparable tools do, is expensive and slow, but using LLMs to generate the scraper code and subsequently adapt it to website modifications is highly efficient.
I gave it some css paths extracted from devtools, and some sample elements with data that needed extraction and had it write a beautiful soup + regex routine to do the extractions. worked fine. Also thousands of times faster.
Seems like this is now best done via functions, if you're using OpenAI's models? They call out "extracting structured data from text" as a key use case in their announcement.
https://openai.com/blog/function-calling-and-other-api-updat...