More

k2so · 2026-01-21T05:36:27 1768973787

Interesting work, in the examples I can see that quite a few of them have the teracotta/warm-cream colour palette, was that an explicit choice to keep them in the prompts?

From the official frontend-design skill, on multiple occasions, unprompted, even I received the same warm-cream tones for different projects. Wondering if it's a new latent direction the model chooses to go to avoid safe/generic patterns.

k2so · 2025-04-03T07:18:56 1743664736

I make an argument for Model context protocol, and how it can shape the entire ecosystem

k2so · 2025-03-09T06:49:38 1741502978

Beyond the developer, the user massively benefits from MCP. Like you said, using any other SDK to build is a very valid approach but then you are tied down to a single client that you have built on that SDK.

If you would like to switch clients, then you have build it yourself. MCP solves this very well since, any MCP supported client can use the same tools/resources that you have built.

k2so · 2025-02-25T08:59:16 1740473956

A neat trick in Vespa (vectors DB among other things) documentation is to use hex representation of vectors after converting them to binary.

This trick can be used to reduce your payload sizes. In Vespa, they support this format which is particularly useful when the same vectors are referenced multiple times in a document. For ColBERT or ColPaLi like cases (where you have many embedding vectors), this can reduce the size of the vectors stored on disk massively.

https://docs.vespa.ai/en/reference/document-json-format.html...

Not sure why this is not more commonly adopted though

k2so · 2025-01-28T05:24:53 1738041893

LLMs = Latency? That's how most of us perceive it. When examining the timing breakdown of a request on Claude, you'll notice that the majority of the time is spent in Content Download—essentially, decoding output tokens.

In the blog, I discuss how partial json validation can help in workflow driven LLM products.

Would love feedback on how I can improve, thanks!

k2so · on Dec 24, 2024

In one of my earlier jobs a few years back, we were training deep learning models on VMs with GPUs, back then the tooling was not as extensive (vs-code did not have the remote ssh then) as it is is now.

So, we would use SSH into the VM and do our work. This also involved a lot of debugging of code through vim since it's quicker to make in-place edits and re-run experiments, this taught me a lot on effective debugging and writing code for the VM

jebarker · on Dec 25, 2024

I'm interested in any tips you figured out for debugging in that environment. I find a GUI debugger to be an essential tool for this kind of work. It's the thing that keeps me using vscode remote vs just vim on the server (which I'd prefer if all I needed was editing).

k2so · on Sept 27, 2024

Easier to use libraries over highly complicated (supposedly performant) have a significant advantage in driving more adoption.

Recently I was trying to generate text embeddings from a huggingface model. Nvidia triton and text-embedding-inference (built by huggingface) were my two options.

> why large companies are generally incapable of delivering great developer experience. I wanted to curl up and cry while trying to make nvidia-triton spit out embeddings . The error messages are cryptic and you need to have jedi like intuition to get it to work. I finally managed to get it work after like 2 days of wrangling with the extremely verbose and long-winded documentation (thanks in part to claude, helped me understand with better examples)

Triton's documentation starts off with core-principles and throughout the entire documentation, they have hyper links to other badly written documentation to ensure you know the core concepts. The only reason I had endured this was because of the supposed performance gains triton promised but underdelivered (this highly likely being I had missed some config/core-concept and did get all the juice)

On the other hand, text-embedding-inference has a two line front and centre command to pull the docker image and get running. The only delay was due to my internet speed before it started serving the embeddings. Then deploying this on our k8s infra was a breeze, minor modifications to the dockerfile and we are running. And on top, it's more performant than triton!

k2so · on Sept 19, 2024

This is awesome, are you contributing this to candle or is it a standalone package?

zackangelo · on Sept 19, 2024

Just trying to stay focused on launching first (https://docs.mixlayer.com) and keeping early customers happy, but would love to open source some of this work.

It'd probably be a separate crate from candle. If you haven't checked it out yet, mistral.rs implements some of these things (https://github.com/EricLBuehler/mistral.rs). Eric hasn't done multi-GPU inference yet, but I know it's on his roadmap. Not sure if it helped, but I shared an early version of my llama 3.1 implementation with him.

J_Shelby_J · on Sept 19, 2024

Hey, mixlayer is really cool.

I also have a Rust LLM inference project. The overlap is very high between what mixlayer is doing and what my project is doing. It's actually crazy how we basically have the same features. [1] Right now I'm still using llama.cpp on the backend, but eventually want to move to candle via mistral.rs.

[1] https://github.com/ShelbyJenkins/llm_client

k2so · on Sept 3, 2024

I very strongly relate to this, it's been close to 3 months, since I have started working on a blog built on Quarto, and all I have so far is a elaborate design and a half complete blog on an LLM tool I had built.

But like the comments say, I had way too much fun on the journey of a side project, just doing other things like configuring the website and playing around with the design elements like font and how the overall website looks (I'm a data scientist and never usually get to play with design as much as I want to at work). And recently, with claude's help built some cool react elements to push my story further.

Hopefully after this, I ship at least one blog and iterate on the design elements.

k2so · on July 10, 2024

This was my first thought too, after reading through their blog. This feels like a no-frills website made by an engineer, who makes things that just work.

The documentation is great, I really appreciate them putting the roadmap front and centre.