Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Are you training and running custom LLMs and how are you doing it?
15 points by kordlessagain on Aug 14, 2023 | hide | past | favorite | 2 comments
I have been researching methods and projects built around training and running LLMs locally. I'm interested in what others have been using on this front (including straight up Pytorch/Transformers). Here's what I've gathered so far:

Engines/APIs:

  - vllm: Inference and serving engine for LLMs (none quantized models only?) [1]
  - ollama: Go project to run, create and share LLMs [2]
  - llama.cpp: Inference of LLaMA models in C/C++ w/UI (including quantized models) [3]
  - llama-cpp-python: run OpenAI compatible API bindings for llama.cpp  [4]
  - llm-engine: engine for fine-tuning and serving LLMs [5]
  - Lamini: hosted? closed source? solution for training LLMs [6]
  - GPT4All: free, locally run chatbot [7]
  - SkyPilot: framework for running LLMs, AI, and batch jobs [8]
  - HuggingFace Transformers: APIs and tools to download and train models (via Pytorch) [9]
  - RAGStack: Deploy a private ChatGPT alternative hosted within your VPC [14]
UI/Interface:

  - Simon's `llm` tool [15]
Quantization Bits:

  - AutoGPTQ: GPTQ algorithm based quantization package [10]
  - QLoRA: finetuning of quantized LLMs [11]
  - bitsandbytes: CUDA functions for PyTorch [12]
  - SkyPilot QLoRA [13]
Video Guides:

  - https://www.youtube.com/watch?v=eeM6V5aPjhk
  - https://www.youtube.com/watch?v=TYgtG2Th6fI (jawerty's example)
Reference:

  - [1] https://github.com/vllm-project/vllm
  - [2] https://github.com/jmorganca/ollama
  - [3] https://github.com/ggerganov/llama.cpp
  - [4] https://github.com/abetlen/llama-cpp-python
  - [5] https://github.com/scaleapi/llm-engine
  - [6] https://www.lamini.ai/
  - [7] https://github.com/nomic-ai/gpt4all
  - [8] https://github.com/skypilot-org/skypilot
  - [9] https://github.com/huggingface/transformers/releases
  - [10] https://github.com/PanQiWei/AutoGPTQ
  - [11] https://github.com/artidoro/qlora
  - [12] https://github.com/TimDettmers/bitsandbytes
  - [13] https://github.com/artidoro/qlora/pull/132
  - [14] https://github.com/psychic-api/rag-stack
  - [15] https://simonwillison.net/2023/Aug/1/llama-2-mac/



https://gradient.ai/ API for inference and fine-tuning open-source LLMs





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: