Prior to LLaMA 2, I would have agreed with you but LLaMA 2 is a game changer. Th...

easygenes · on Aug 23, 2023

For what tasks do you consider 70B beyond GPT-3.5 performance? There are some I’m aware of, but they are very much the exception and not the rule, even with the best 70B fine-tunes currently available.

syntaxing · on Aug 23, 2023

I mainly use 70B for “text QA” on files I find sensitive like personal documents. The answers have been very close to what I get if I use GPT-3 (langchain makes it easy to switch). Do you use the quantized version? If so, try running the full one on a A100.

ozr · on Aug 23, 2023

I run 70B very cheaply using serverless GPUs. I've had the best experience with Runpod, but there are a few other options out there for it as well.

jimmcslim · on Aug 23, 2023

Out of curiosity and if you are happy to share, what is your 'personal stuff'?

syntaxing · on Aug 23, 2023

I use it a lot for personal coding projects, grammar correction/sentence rewording, and translation (it works better than google translate for longer text). I explicitly call out personal stuff since my job provides an in-house front end that uses the GPT API (I'm actually not sure which version it is, but guessing from the response quality, its probably GPT-4). My work one has made me noticeably more productive. It helps me with a lot of the "boring" work that I procrastinate a lot on. It starts my momentum and allows me focus a lot on the "complex stuff". I'm not sure how much money I use since there is no limit at work but if I had to guess, its probably north of $100 a month on credit.

manjoh · on Aug 23, 2023

Can you talk about how you integrate gpt API at work and why not just use chatgpt 4?

syntaxing · on Aug 23, 2023

The server is provided by my employer so I can’t go into the implementation detail. But overall, most companies provide access to the API endpoint instead of using chatGPT itself since OpenAI uses your results to train (hence why it’s free for 3.5). The API endpoint supposedly doesn’t use your data for training which is why I use the API endpoint for personal stuff as well.

SOLAR_FIELDS · on Aug 23, 2023

As a counter reference, for my work I use it to code (for-4) and it has been between $70 and $200 per month depending on how heavily I use it

syntaxing · on Aug 23, 2023

GPT-4 is significantly more expensive so I can definitely see you spending that amount. For really complex stuff, I switch over the GPT-4 and it will cost me almost $3 a "question" (as in going from the beginning to solving it). Honestly worth it since it solves my problem but it adds up quick so I try to stick with 3.5 when I can.

blorenz · on Aug 23, 2023

Can’t you get by with ChatGPT-4 for these personal assistant type questions? That’s what I do and my 20 a month goes a long way. I’d be interested to see if I am missing out on anything using GPT to is way in contrast to the API.

syntaxing · on Aug 23, 2023

I actually used to use ChatGPT but switched to the API once I had GPT-4 access. Mainly it’s because I simply didn’t use the $20 worth of the GPT-4 at the time. It was extremely slow and the question per hour limitation was annoying and stressful. I would always worry I would need it for something unexpected so I never used more than 15 questions at a time (but this has probably changed these couple months). In addition, the privacy implications are better for the API since the terms are better for how they handle your data. I also like how I can tie in GPT anywhere. I use the matrix bridge so you can give access to people like my parents who are not as tech literate to sign up and get used to chatgpt interface. I allow them to talk to it as a bot through WhatsApp bridge.

SOLAR_FIELDS · on Aug 23, 2023

I use it with a tool that is wired into my terminal that changes my files for me [1]. That alone makes me several times more productive compared to copy pasting back and forth between the chat window. If the chat window makes me twice as productive the command line tool probably makes me 5x as productive. At that kind of output on a developer salary the $70-200 a month is absolute peanuts compared to what you get in return

1: https://github.com/paul-gauthier/aider

blorenz · on Aug 24, 2023

This tool looks splendid. Personally, it evokes in me the memories of MUDding back in the early 90s. What a concept that would be to MUD to build apps via LLM -- or even MUD to build the MUD in real-time outside of the OLC and scripting. That sounds like a passion project for me when I can find the time.

Jach · on Aug 23, 2023

Is your code subject to code review? If so have you done anything to improve that bottleneck, or was it never an issue at previous productivity?

SOLAR_FIELDS · on Aug 23, 2023

It is subject to code review, but I typically spent much more time writing code than having it reviewed (I am very methodical and slow writing code)

armini · on Aug 23, 2023

How are you currently hosting your LLaMA 2? Any tips, tricks or advice?

syntaxing · on Aug 23, 2023

It depends on your needs. For instance, do you want to host an API or do you want to have a front end like chatGPT? Chances are, text-generation-webui [1] should get you pretty close to hosting it yourself. You simply clone the repo, download the model from huggingface using the included helper (download-model.py) and fire up the server with server.py. You can connect to it by SSH port tunneling on port 7860 (there's other way like Ngrok but SSH tunneling is the easiest and secure).

As for hosting, I found that runpod [2] has been the cheapest (not affiliated, just a user). All the other services tend to add up more than them when you include bandwidth and storage. There's some tutorials online [3] but a lot of them use the quantized version. You should be able to fit the original 70B with "load_in_8bit" on one A100 80GB.

[1] https://github.com/oobabooga/text-generation-webui [2] https://www.runpod.io/ [3] https://gpus.llm-utils.org/running-llama-2-on-runpod-with-oo...

robertnishihara · on Aug 23, 2023

If you want to query the Llama-2 models, you can use Anyscale Endpoints [1]. Note: I work on this :)

Llama-2-70B is $1 / million tokens, which is the most cost-efficient on the market that I'm aware of.

[1] https://app.endpoints.anyscale.com/

zo1 · on Aug 23, 2023

Can we supply our own fine-tuned models?

Edit. I'm sure it's answered on your site but sometimes it's better to include it right here! :)

TeMPOraL · on Aug 23, 2023

Tried to plug it in to my favorite chat frontend (TypingMind), bounced off CORS. Is this something you can do something about?

andrewmunn · on Aug 23, 2023

How do you keep the cost down?