Hacker News new | past | comments | ask | show | jobs | submit login
S-LoRA: Serving Concurrent LoRA Adapters (github.com/s-lora)
73 points by Labo333 8 months ago | hide | past | favorite | 20 comments



Tricked by acronym confusion, I thought this was about LoRa (Long Range) radio

https://en.wikipedia.org/wiki/LoRa

Instead it's about LoRA, note the capitalized last A, or Low-Rank Adaptation, a method for tuning LLMs.


Ha! I was tricked by this too. Was in the middle of linking to the wiki page when I noticed your comment.

For further reason to click on the link, it's a consumer grade long range spread spectrum wireless communication technology that's been gaining prominence in recent years.


+1

Multiple lora adapters? Different bands, different channels, spreading factors? Like an ultimate meshtastic relay covering both 433 and 868mhz, but with some simpler logic?

Nope... disappointment.


Also came here thinking this was about LoRa radio


UUGGHH! I've no interesting LLMs, but Long Range Radio, yes.


Add me to the list of the betricked!


Super cool, not sure if there’s already a popular project for this but I’ve seen so many asking for exactly this capability.

‘Conventional’ (if that means anything in a field 10 minutes old) wisdom is “fine tune to add knowledge, LoRA to adjust presentation” - could you comment on your experiences with this?


LoRA stands for low rank approximation - it’s a very clever way to fine tune large models with a fraction of the compute, the idea being that you need much less capacity to estimate the finetuning


Thanks, yeah I know what LoRA is and I didn't think it was so effective on... are we OK with calling it the 'cognitive' side yet? Everything I've read from the LocalLLaMA finetuning community is that you can fine tune presentation but it's not really an effective way to teach models new knowledge.


My understanding was that RAG is generally a better plan when trying to inject fresh knowledge.

Both tuning and LoRA is more about how it responds rather than what it knows


RAG and LoRA are 2 separate things - I’m RAG, you don’t do any fine tuning , rather you use in context learning


Fine-tuning can also make RAG perform better. You can eliminate the need to waste prompt tokens on n-shot examples, and you can get a much more consistent output for your domain.


Looks kind of inactive.

https://github.com/predibase/lorax

Does a similar thing and seems more active.


Probably ignorant question: I know Loras are being used all the time, but where do you get them? All I see on huggingface is the whole models.


There are plenty of LoRAs on Hugging Face.

e.g., https://huggingface.co/tloen/alpaca-lora-7b https://huggingface.co/winddude/wizardLM-LlaMA-LoRA-7B https://huggingface.co/MBZUAI/bactrian-x-llama-7b-lora

But HF is a great place to download stuff, but doesn't really offer much for discoverability. How you find LoRAs you want to use ks a better question, and I don't have an answer (for LLM LoRAs, at least.)


civitai.com has a lot, if you're thinking Stable Diffusion.


I appreciate that. As it happens, I was thinking more for LLMs, but that is a useful link and made it into my bookmarks!


For LLMs I think most people create their own LoRA and rarely share them as they are the result of finetuning on proprietary data that they don't want to leak. Separate to that, the open-source model space is so fragmented and LoRAs are not transferable between models, so few people go through the effort of publishing + documenting their LoRAs for such a small audience.


Them being very tailored to the main model is true, and it takes a lot of work for the lora author to target multiple base models, so they generally just target one.

I plan on releasing my next iteration of my model (trained on generating knowledge graphs, among other things) as both a lora, and a pre-merged model for multiple base models (Mistral 7b, Llama2 13b, Phi2 2.7b, Yi 34b, and possibly Mixtral)

Will be interesting to compare the results between them, considering last time I just did Llama2 13b. Can't wait to see the improvements in the base models since then.


You want to check out TheBloke's Discord server in that case. There is a lot happening there, and sadly not on the indexable open web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: