Hacker News new | past | comments | ask | show | jobs | submit | wsxiaoys's comments login

Can the diagram being part of the code it self (e.g special comment string on top of certain piece, or in a special metalfiel)

It'll be really nice if the code diagram itself could be version tracked together with code.


Hi, thank you for trying the tool. I have a plan to store the diagram locally e.g. each diagram is a `.codediagram` file.

Do you think this would help your case?


Nice implementation! It should serve as a great reference for a minimal Tabby's backend API. Thank you for sharing it!

Yeah - ultimately, it won't be as performant or feature-rich compared to https://github.com/TabbyML/tabby, but it's still perfect for educational purposes!


Thank you Meng for building Tabby and providing us with a self hosted alternative to copilot! I absolutely love it! Keep up the amazing work.

You're definitely right about the feature richness, but the truth is I just want completions :D

Performance is a funny thing, mostly scales with the slowest part of the system. Since both servers use the same inference lib (llama.cpp) which does all the heavy lifting, there's essentially no completion performance difference in the single user mode according to my tests. Because I use a smaller model by default (Q5_K_M instead of Tabby's Q8, ~30% difference in size), and LLM inference is essentially memory bandwidth bound: my new deployment is around 30% faster with no noticeable quality difference on identical hardware.

p.s. I'd highly recommend providing additional quantization methods in your model repository to make it easier for novice users.

Thank you


Wow, this is a great topic. I don't really have specific suggestions, but I'd like to contribute some thoughts on the matter.

Monetizing anything isn't inherently problematic; the challenge lies in defining what should be paid for and what should be offered for free.

In the realm of open-source products and SaaS, the common practice is to provide free self-hosting options while charging for cloud hosting or enterprise-specific features, such as access control and authentication integrations.

However, the landscape becomes significantly more challenging for LLMOps (assuming you are still focusing on training as a major aspect of your business, which can be categorized as LLMOps).

Historically, there haven't been many success stories in this area (with exceptions like wand.ai, which focusing on tracking experiments). I believe this difficulty arises from the largely ad-hoc nature of training and fine-tuning processes, making standardization a challenge, coupled with the infrequency of these tasks.

That being said, training/finetuning is a valuable technique. However, transforming it into a company that offers products is really challenging. Successful examples in this realm typically depend heavily on solution customization or consulting-oriented business models.


Thanks for the points! I agree monetization in the LLM Ops space is hard and complex. Agreed fully on customizing solutions or consulting.

Yep self hosting solutions like Redhat, or DBs like MongoDB or Gitlab's dashboard style approach could work - the issue is now as you mentioned we offer training and finetuning.

We do plan to offer inference as well, plus the data gathering process, and the final prompt engineering side - but we thought why not have a shot?

It's possible best to make a training and inference platform - maybe some sort of personal ChatGPT training for the public - everyone can train their own personal ChatGPT not via ChatGPT's in context learning or RAG, but coupled with actual fast 30x finetuning, a personal bot can truly be possible.

Thaks for the suggestions!


You have companies that are spending good money on fine-tuning and will start spending money on fine-tuning. It seems like it would almost be easier to just go directly to these companies by looking at their blog posts--they're telling you that they're doing it in some way or another. I know Plaid and friends are doing it.

It's costing them x. you can shave y off. you can get improvements to market faster and cheaper.


Interesting points! I shall try this with my bro!!

I was thinking along the lines of say the cost of A100s or H100s * electricity cost and engineering costs then how much we save, and some discounting factor.


I think the time savings will be more appealing.

It allows for fast iteration and shorter go-to-market, which can generate virtually infinite value, as opposed to saving electricity, which is a limited game.


Fair point - I forgot to mention the time savings LOLL!!!


You may want to look sideways to companies such as hedge funds. They have DNN teams and experiment with LLMs, you may find interesting optimisation opportunities with such teams. Charge according to opportunity that you open up, not electricity saved!


Interesting! Hedge funds - very interesting.

Oh no yep your right on time saved and what opportunities it gives them not just the electricity and capital costs :))

You can now experiment 30 different models instead of 1 - if you have 100 GPUs, we magically made it 3000!


TabbyML | Software Engineer (Rust) | Open Source | REMOTE

Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.

Project: https://github.com/TabbyML/tabby

TabbyML is seeking a Software Engineer proficient in Rust to join our core engineering team. In this role, you will be responsible for developing the following features: - Source code indexing and searching - LLM model serving and optimization

We have an ambitious roadmap ahead, and we need your help! If you are interested in being an early member of a fast-growing startup team(spoiler: 100% remote , open source , and transparent compensation). Please reach out and apply at https://www.notion.so/tabbyml/Careers-35b1a77f3d1743d9bae06b...


Previously, Tabby ran exclusively on CUDA devices, posing a significant barrier for developers looking to effectively utilize LLMs in their day-to-day coding.

The Tabby team made a significant contribution by enhancing support for the StarCoder series models (1B/3B/7B) in llama.cpp. This enhancement allows these models to run on Metal, providing comparable performance to that of an NVIDIA GPU.


For the 1B version of the model, it operates at approximately 100 tokens per second when decoding with Metal on an Apple M2 Max.

llama_print_timings: load time = 114.00 ms

llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)

llama_print_timings: prompt eval time = 107.79 ms / 22 tokens ( 4.90 ms per token, 204.11 tokens per second)

llama_print_timings: eval time = 1315.10 ms / 127 runs ( 10.36 ms per token, 96.57 tokens per second)

llama_print_timings: total time = 1427.08 ms

(Disclaimer: I submited the PR)


Paper: https://arxiv.org/abs/2308.07124

Highlights: This paper demonstrates the instruct fine-tuning code for Language Model Models (LLMs) using public Github commit data. The proposed method achieves the highest HumanEval pass@1 accuracy among permissive licensed LLM models.


Check out https://github.com/TabbyML/tabby, which is fully self-hostable and comes with niche features.

On M1/M2, it offers a convenient single binary deployment, thanks to Rust. You can find the latest release at https://github.com/TabbyML/tabby/releases/tag/latest

(Disclaimer: I am the author)


A less hyped inference engine with INT8/FP16 inference supports on both CPU / GPU (cuda).

Model supports list: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, LLAMA, T5, WHISPER

( Found this library during my research on alternatives to triton/FasterTransformer in Tabby https://github.com/TabbyML/tabby)


TabbyML | Full Stack Engineer / Machine Learning Engineer | REMOTE

Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.

Project: https://github.com/TabbyML/tabby

Show HN post: https://news.ycombinator.com/item?id=35470915

We have an ambitious roadmap ahead, and we need your help! If you are interested in being an early member of a fast-growing startup team(spoiler: 100% remote , open source , and transparent compensation). Please reach out and apply at https://tabbyml.notion.site/Careers-35b1a77f3d1743d9bae06b7d...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: