The most frustrating thing about the many, many clones of this exact type of idea is that pretty much all of them require OpenAI.
Stop doing that.
You will have way more users if you make OpenAI (or anything that requires cloud) the 'technically possible but pretty difficult art of hoops to make it happen' option, instead of the other way around.
The best way to make these apps IMO is to make them work entirely locally, with an easy string that's swappable in a .toml file to any huggingface model. Then if you really want OpenAI crap, you can make it happen with some other docker secret or `pass` chain or something with a key, while changing up the config.
The default should be local first, do as much as possible, and then if the user /really/ wants to, make the collated prompt send a very few set of tokens to openAI.
It depends heavily on the use case, not org size. I consult for a ~70 people org that needs to process ~1M tokens per day. That costs $30K per day on OpenAI ChatGPT API. I'm sure this is not an extraordinary case.
Each person in the org needs 1M GPT-4 token and semantic search can’t be used to trim queries? I would be super curious to know more about this use case.
The data doesn't scale according to employee size. If they manage to cut the headcount in half, they'd still need to process the same amount of info.
The use case is based on public information on the internet. News articles, PRs, social media posts, etc.
LLMs are used to extract info from text in a structured format. It used to have several classification and NLP models to do the job, but now a single LLM can do it faster and with better accuracy.
> Maybe one of you startup inclined people can make an openllama startup that charges by request
I'm currently building www.lalamon.us specifically to provide a fully hosted open source model experience. One slight difference is that I'm providing a private chat instance for each user, so charging based on hours of active chat usage seemed to make more sense. Per-request charging seems more unpredictable for users, but I'd be interested in hearing the case either way.
Feel free to reach out with more questions if interested; my email is in my profile.
Doing this. We soft launched yesterday with a paid Falcon-40B playground - 3 models for now Falcon 40b instruct, uncensored, and base. Adding API and per token pricing this week.
Vector storage isn’t on the roadmap (what stops using a separate vector store from working well? Could add to roadmap but want to add understand more first), and we could add fine tuning if it’s a common request.
Lots of people using LLMs to make chat bots from their existing datasets: customer service troubleshooting, FAQs, billing, scheduling. Being able to upload their own pdfs, spreadsheets, docx, crawl their home page, lets the chat bot become personalized to their use case. While you could locally query your own vectordb before prompting, people buy paid service so they won't have to manage any of the technical details.
If people can drag and drop some files from their nas, you parse them with apache tika or similar https://tika.apache.org/ , they can start using personalized branded bots. It also lets you do things like refusing to answer, if the vector database returns nothing and the use case requires a specific answer from the docs only (not the llm to make stuff up).
FastChat-T5 can work for such a use case and it runs on (beefy) CPUs. With a 700$/month instance, it can do 4 conversations simultaneously, without needing GPUs.
The instant a company has sensitive data, this becomes very viable.
What do you (or anyone else, feel free to chime in) do with other LLMs that makes them useable for anything that is not strictly tinkering?
Here is my premise: We are past the wonder stage. I want to actually get stuff done efficiently. From what I have tested so far, the only model that allows me to do that halfway reliably is GPT-4.
Am I incompetent or are we really just wishfully thinking in HN spirit that other LLMs are a lot better at being applied to actual tasks that require a certain level of quality, consistency and reliability?
I still wonder what makes GPT-4 so much better than its contemporaries. That's why I saw tons of people trying to explain how GPT-4 works starting from simple neural network distasteful, tons of people already knew and do that but none of them is nearly close to GPT-4.l
> I still wonder what makes GPT-4 so much better than its contemporaries.
OpenAI have had many years to craft their dataset down from the noisy public datasets, and GPT4 is (supposedly) a mixture of 8 "expert models" each of which is 220B (5x+ larger than the Falcon 40B) with a total of 1.7B parameters (3x+ Google's huge 540B PaLM). The hardware and software to train networks of that scale is also a deep moat. Relatively speaking, the model architecture ("gpt from scratch") is the easiest piece.
From my understanding. GPT-4 is the biggest, or one of the biggest. It was trained on low quality internet datasets, like the others. What makes it different is post-training on custom data with human supervision. We know they even outsourced that to Africa. Second, they integrated it with external tools. Like Python interpreter, internet browser. But the first is most important. Also most likely they have experimented and found some tricks which make it bit better.
This line of thinking only works if it's impossible to imagine a world where OpenAI isn't the leader. In 2 years if the non OpenAI models are better then it will serve us much better to allow these tools to work with other models as well.
Since OpenAI is all just APIs with simple interfaces, I don't think that plugging a different, capable model in whatever tool you are building is going to be an issue.
You are correct in this assesment. A majority of individuals and startups playing around with turning LLMs into products aim to be prepared for the arrival of the subsequent generation of models. When that occurs, they'll already have a product or company in place and can simply integrate the new models.
Models are getting commoditized, well executed ideas are not.
They're not here to release an actual product. They're here to release part of a CV proving they have "OpenAI" experience. I'm assuming this is the result of OpenAI not actually having any homegrown certification program of their own.
> OpenAI not actually having any homegrown certification program
A bit off topic but where are certifications (e.g. Cisco, Microsoft) useful? I am sure they are useful (both to candidates and companies) because people go to the effort to get these certs, and if they were useless everyone would have stopped long ago. I don't assume people do it for ego satisfaction.
But I've never worked anywhere where it has come up as an interview criterion (nobody has ever pointed it out when we are looking at a resume, for example). Is it a big business thing? Is it just an HR thing?
Years ago, companies could get discounts if they were a “certified gold partner” or whatever.
To be a partner, the company would need a certain number of certifications among their employees, so there was tangible value to companies who either used a ton of Microsoft licensing or Cisco/Dell hardware, or resold those to their own clients (better discount equating to higher margin).
In some cases, getting the higher level certifications like Cisco CCIE was a virtual guarantee of a good job.
I feel like this has become less of a thing in recent years, but I’m not involved in that space anymore.
It is a consulting / business partner thing. Different levels in the business partner programs require minimum number of certified employees in your consulting firm. So if you work in that slice of the industry, certifications matter. Outside of that... not so much.
Mainly when applying for a corporate job where you have 0 referrals. It's a guidepost that you at least have some idea what you're doing and are worth interviewing when people can't find someone who knows you and your previous work.
The only OpenAI 'crap' being used here is to generate the embeddings. Right now, OpenAI has some of the best and cheapest embeddings possible, especially for personal projects.
Once the vectors are created tho, you're completely off the cloud if you so choose.
You can always swap out the embedding generator too, because LangChain abstracts that for your exact gripes.
Everything else is already using huggingface here and can be swapped out for any other model besides GPT2 which supports the prompts.
> Once the vectors are created tho, you're completely off the cloud if you so choose.
Ehr no? You'll need to also create an embedding of your query, which makes you totally dependent on OpenAI. If you swap out embedding algorithm you will have to regenerate all the embeddings as well, they might not be even the same size.
Ah, I see what the top comment was implying. I was a bit short sighted on that side. Yes, you'd be tied to OpenAI for any new queries you need to generate. There could be some ways to offload that (vector of a vector) but it is a cloud dependency. I'd argue not a cost dependency based on how cheap these are.
The only embeddings I currently see listed on https://openai.com/pricing are Ada v2, at $0.1/million tokens.
Even if the alternative is free, how much do you value your time, how long will it take to set up an alternative, and how much use will you get out of it? If you're getting less than a million tokens and it takes half an hour longer to set up, you'd better be a student with zero literally income because that cost of time matches the UN abject poverty level. This is also why it's never been the year of linux on the desktop, and why most businesses still don't use Libre Office and GIMP.
I can't speak for quality; even if I used that API directly, this whole area is changing too fast to really keep up.
If you look at a embeddings leaderboard [1], one of the top competitors called InstructorXL [2] is just a pip install away. It's neck and neck with Ada v2 except for a shorter input length and half the dimensions, with the added benefit that you'll always have the model available.
Most of the other options just work with the transformers library.
Setting up an embedding alternative out of huggingface sentence transformers is fairly easy, the magical thing openai does is that they will create embedding of 8192 characters at a time while most other emerging will force you to chunk your documents in 512 characters long sequences, losing lot of context, multiplying your queries result, search times etc
If you've never coded or used Python before, yeah, go with OpenAI. Otherwise, generating embeddings with SentenceBERT takes 5 minutes.
And from my personal experience Ada embeddings are not the best. They are large (makes aproximate searching harder), are distributed weirdly, and zimply put, other embeddings give better results for retrieval.
Another advantage is that you are not an OA's whim: they just announced the deprecation of some previous model. What are you going to do when they will deprecate Ada v2 and you've built a huge system on top of it? You'll have to regenerate embeddings and hope everything still works just as well.
Yes, exactly this, I also want to say I'm not someone who generally thinks open models are better. I think embeddings just haven't been a focus for OpenAI and it shows. Maybe in the future they will focus on it
Yes, when new technology comes you may need to upgrade. OpenAI aren't hero's but they are covering the cost to move people from old to new embedding models
Of course, that's far from saying that they're the worst, or even headed that way. Just not the best (those would be a couple of fully opensource models, including those of the Instructor family, which we use at my workplace).
I don't liked closedAI either, this seems like the first tech I've played with in a long time that was great on day 1, and seems to get progressively less great over time.
Neither does the github "System requirements" section, which I find disappointing. Ideally, it should give minimum memory requirements and rudimentary performance benchmark table for a sample data set, across a handful of setups (eg, Intel CPU, Apple M1, AMD CPU, with/without a bunch of common GPUs). With that information I would know whether or not it's worth my time even trying it out on my laptop.
Edit: lol, I went through 2 pages of the issues on the github page and most of them could be avoided by putting this basic information into the system requirements:
... And lots more! Some of these people have 128gb memory and 32 cores, and still find it "very slow". Others having memory pool errors. Some of the answers hand-waving at needing "a more better computer"
I reckon a lot of these issues could be closed and linked to a single ticket for proper hardware requirements in the readme.
#Chaxor:
I fully agree with this. I am not keen on this being a one horse race, and for privacy reasons would like to deploy these models locally. However, it seems for many programmers it is somewhat easy to build something that can query into OPenAI so they can put it on their resume.
Do you know of any FAISS / open source / one-click install w<windows app here I can search in my PDFs via vectors? I can see Secondbrain.sh will have the function in the future, but currently it does not.
I have around 500 documents I want to be able to search in.
When I was creating this tool, I made sure to abstract out the reliance on just one LLM or vector db. Instead, I focused on using langchain/huggingface for tokenization, embedding, and conversational modeling. This was done purposely so that it would be simple to replace the OpenAI dependencies with any other LLMs if needed.
It’s significantly worse than OpenAIs offerings, and I’m tired of people pretending as though these models are totally interchangeable yet. They are not.
Is this robust enough to feed all your emails and chat logs into it and have convos with it? Will it be able to extract context to figure out questions to recent logs, etc?
How does this run on an Intel Mac? I have a 6 core i9. Haven't been able to get an M series yet so Im wondering if it would be more worth it to run it in a cloud computing environment with a GPU.
>Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into clang: error: the clang compiler does not support '-march=native' during pip install.
If so set your archflags during pip install. eg: ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt
Oh, not this again. No, it's not a net negative in ALL forms. And if you really were concerned about downsides, AI has a ton more potential downsides than Web3 ever did, including human extinction, as many of its top proponents have come out and publicly said. Nothing even remotely close to that is the case for Web3 at all:
" many of its top proponents have come out and publicly said."
You don't have to uncritically accept that, it's far more likely that they're just self aggrandizing in a "wow I'm so smart my inventions can destroy the world, better give me more money and write articles about me to make sure that doesn't happen".
I see, so when it comes to the top experts in AI screaming “we made a mistake, please be careful” we should use nuance and actually conclude the opposite — that we should press ahead and they’re wrong.
But with Web3, we should just listen to a bunch of random no-name haters say “there are NO GOOD APPLICATIONS, trust us, period, stop talking about it”, use no nuance or critical thinking of our own, and simply stop building on Web3.
Do you happen to see the extreme double standard here you’re employing, while trying to get people to see things your way?
The crypto group had a lot of time and even more money to make a compelling product that took off and so far they've failed. We've watched fraud after fraud as they've shown themselves to just be ignorant and arrogant ideologues who don't understand how the "modern" finance system came to be, what the average user wants out of financial or social products, or just outright scammers. We can keep sinking money into a bottomless pit or we can move on and learn from their mistakes.
I didn't say to dismiss any concerns out of hand, but the whole idea of "x-risk" or "human extinction" from ai is laughable and isn't taken seriously by most people. Again if you think critically about the whole idea of "human extinction" from any of the technology being talked about you should see it as nonsense.
The AI crowd has been working for multile decades and only now has made progress that people care about. Also I’m pretty sure the “eye-watering amounts of money” Sam Altman referred to exceed what developers of even the crypto projects had, when they built eg Bitcoin or Ethereum. That’s what it took to make AI turn heads. Until AlphaGo you could also yell that AI has no real applications.
The personal computer crowd had hobbyists like Wozniak coming to meetups for decades and computers were the province of nerds, now everyone is addicted to them. Decades took place
You are like a person yelling at the video game industry: “pong and space invaders are a stupid waste of time with ugly graphics!! Don’t play or make video games!!” Until a few decades later we have Halo, Call of Duty etc.
I dunno, on the crypto side stablecoins are pretty compelling for hassle-free cross-border transfers—there’s $125bn in circulation, which to me means it’s taken off.
On the AI side, I mean for example it’s not laughable to think anybody on the planet could just feed a bunch of synthetic biology papers to a model and start designing bioweapons. It’s not hard to get your hands on secondhand lab equipment…
100% private? Hmm. I think with the amount of paranoia that the folks in power have about local LLM’s, I wouldn’t be in the slightest surprised that the Windows telemetry will be reporting back what people are doing with them. And anyone who thinks otherwise is in my view just absolutely naive beyond hope.
Is it going to send my personal data to OpenAI? Isn't that a serious problem? Does not sound like a wise thing to do, not at least without redacting all sensitive personal data from the data. Am I missing something?
A few weeks ago GitHub made a strong statement about code in repos not being viewed by humans, that was very liberating.
If OpenAI could offer similar privacy statements it would immediately be much more useful. E.g. if they simply add a 'private' option, I'd pay double or triple for it.
OpenAI's tools are incredibly good and so easy to use, it's just that I simply cannot use them for most the things I want to do with them because of the privacy considerations, and that sucks.
I suspect OpenAI value the insights they get from looking at the data more than they do the extra revenue they'd receive if they could ensure privacy.
This is my question as well. Is there a more nuanced way to tell how personal data is used other than confirming that an OpenAI key is or is not needed?
This readme is very confusing. It says we're going to use the GPT-2 tokenizer, and use GPT-2 as an embedding model. But looking at the code, it seems to use the default LangChain OpenAIEmbeddings and OpenAI LLM. Aren't those text-embedding-ada-002 and text-davinci-003, respectively?
I don't understand how GPT-2 enters into this at all.
The embedding model used is the default OpenAI API embedding which is text-embedding-ada-002. GPT2 is only used during the tokenization process to efficiently calculate token lengths.
I don't get it, GPT-2 is (one of the few) open models from OpenAI, you can just run it locally, why would you use their API for this?
https://github.com/openai/gpt-2
It's using "from langchain.embeddings import OpenAIEmbeddings" - which is the OpenAI embeddings API, text-embedding-ada-002
The only aspect of GPT-2 this is using is GPT2TokenizerFast.from_pretrained("gpt2") - which it uses as the length function to count tokens for the RecursiveCharacterTextSplitter() langchain utility.
Which doesn't really make sense - why use the GPT-2 tokenizer for that? May as well just count characters or even count words based on .split(), it's not particularly important how the counting works here.
The embedding model used is the default OpenAI API embedding, text-embedding-ada-002. GPT2 is only used during the tokenization process to calculate token lengths efficiently. I have updated my readme to reflect this information correctly.
We have a group at work that meets and discusses various investment topics. The guy organizing it is fairly well connected and every week he tries to get an external speaker to come and present. Very educational.
I have raw notes for each of these presentations. My goal has always been to go through those notes, and properly organize the knowledge in there into a wiki of sorts. It's been 3 years since this all started and I still haven't found the time to do it. If I want to be realistic, I should accept that it'll never happen.
How do I go about finding information that I have in those notes? I could use text search but it's too sensitive to my search string - I'll often fail to find what I need. Also, the information may be scattered across several files, and I'd have to open all the hits and scan to find what I need.
With technology like this, I can put all my notes into some vector DB, and then use AI to ask in plain English what I need. Locally the system interprets my query and finds the most relevant documents in the DB. It then sends my query and those hits to OpenAI to interpret my question, and find the answer amongst my notes. A while ago I used Langchain to set it up and I got it working as a proof of concept. An Aha moment was when I asked it something and it gave me a response with information that was scattered over two different presentations. My challenge is that there are so many parameters I could play with, and I haven't yet thought of a way/metric to assess the performance of my system (any pointers would be appreciated!)
There's nothing personal in these notes, so no privacy concerns. I did want to set a similar thing up with over 20 years of emails, but didn't due to privacy. Also, I use a mail indexer (notmuch) which is fairly good so the need to use AI is not as strong.
But for other (non-personal) notes? If I can get this system working fairly well, it'd be a life saver. I've made so many notes on so many topics over the years, and it's worth real money not to have to organize it well. Just let me write my notes, and use an AI to retrieve what I need.
Because that requires retraining the model every time you take new notes. And this way you also still have the raw notes as similarity matches from the vector db, rather than them "disappearing" into the LLM model.
Sometimes I have the data, but I'm not sure where it is.
Sometimes I know where the data is, but there's a lot of it and all I'm looking for is a quick explanation of something.
Sometimes I have a lot of data from a lot of sources, but what I want in the end is a summary based on what most/all of them agree on, or possibly a summary of how they differ.
There's a lot of use-cases here, many of which I think people don't get a "lightbulb moment" about their usefulness until they've dug in and seen what is possible, because we are so used to how we approach these tasks normally.
But the range of uses is quite broad. A project I'm working on for myself is a variation of this, where I've ingested years and years of my own notes and journals, and make queries for the purposes of my own introspection and personal growth. (I think there's a lot of of potential in this arena in general)
Anyone know how milvus, quickwit, pinecone compares?
I've been thinking about seeing if there's consulting opportunities for local businesses for LLMs, finetuning/vector search, chat bots. Also making tools to make it easier to drag and drop files and get personalized inference. Recently I saw this one pop into my linkedin feed, https://gpt-trainer.com/ . There's been a few others for documents I've found
Nope nope, wouldn't want to compete with that on pricing. Local open source LLMs on a 3090 would also be a cool service, but wouldn't have any scalability.
Are there any other finetuning or vector search context startups you've seen?
Pinecone and Milvus would be alternatives for their use of FAISS for the vector store and search component. I think more of the embeddings difference would be noticed by what’s used for creating the embeddings (eg the ones here https://news.ycombinator.com/item?id=36649579 instead of the OpenAI embeddings API they used), rather than noticing differences from the embedding store/search alternatives which I can’t think of what the difference would be other than maybe performance at a large scale and cost and personal preference / developer experience.
Hadn’t heard of Quickwit but from a quick glance at their site it doesn’t look like a vector store, seems perhaps unrelated.
I’m working for a company that works as a security layer between any sensitive enterprise data and the LLMs. Regardless of the model (HF, ChatGPT, Bard), and regardless of the medium - conversational data, pdf, knowledge bases like Notion etc. It hides the sensitive data, preventing risky use and fact checking at the same time. Happy to make an intro if that’s what you’re looking for! tothepoint.tech
Don't build a personal ChatGPT, and don't let OpenAI, Microsoft and their business partners (and probably the US government) have a bunch of your personal and private information.
Please provide this reference in your readme / blog as it is the original source for your work... and provides the background for the tradeoff between the 2 approaches: 1) fine-tuning vs 2) Search-ask
I respect OpenAI for creating a comprehensive cookbook, and my tooling uses OpenAI for embeddings and chat completion which I have mentioned in the Readme. However, it was not built using a single reference or code example, and rather it is a combination of ideas from huggingface, openAI and langchain documentation.
Stop doing that.
You will have way more users if you make OpenAI (or anything that requires cloud) the 'technically possible but pretty difficult art of hoops to make it happen' option, instead of the other way around.
The best way to make these apps IMO is to make them work entirely locally, with an easy string that's swappable in a .toml file to any huggingface model. Then if you really want OpenAI crap, you can make it happen with some other docker secret or `pass` chain or something with a key, while changing up the config.
The default should be local first, do as much as possible, and then if the user /really/ wants to, make the collated prompt send a very few set of tokens to openAI.