Hacker Newsnew | past | comments | ask | show | jobs | submit | GrayShade's commentslogin

1.82.7 doesn't have litellm_init.pth in the archive. You can download them from pypi to check.

EDIT: no, it's compromised, see proxy/proxy_server.py.


1.82.7 has the payload in `litellm/proxy/proxy_server.py` which executes on import.


I just woke up this morning and I am amazed. I am taking all my nasty words back and I starred the project and followed the author who reacted so fast to my dull negative feedback and this reaction shows how much he cares about the project.

Thanks for pointing this out, to me it seems quite a good response.

I wouldn't mind opt-in telemetry, but possibly the participation rate would be too low to make use of it.


My issue with telemetry is that 99% of software ends up not using it. Why have it? And definitely don't have it by default. Your users will come tell you what they want, making telemetry useless, especially when it's an OSS project you're mostly building for yourself.

Except that telemetry can give you more complete (and foolproof) information than what users report. But yeah, that could also be solved by having debug info that users can attach to their report, the app doesn't have to "call home" for that...

I agree, but it's a cost/benefit thing. Most OSS projects aren't big enough to do anything with the telemetry, so you're just paying in goodwill for no reason.

Opt-in via extension, fine. Opt-in via flag, unreliable. The spyware code should never be anywhere near the main codebase.

Yay! Also, I hadn't noticed an entire section about building from source. Sorry about that. Good work!

Woo, good on them

Don't forget location.

It doesn't, and it's optional.

That's not a shell, it's a Python interpreter compiled to WASM and running in the browser.

Whatever you call it, it’s plainly calling repeatedly to the server once loaded. You couldn’t just throw it on GitHub or cloudflare as is.

This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) on my Radeon 6800 XT.

At what quantization and with what size context window?

Looks like it's a bit slower today. Running llama.cpp b8192 Vulkan.

$ ./llama-cli unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf -c 65536 -p "Hello"

[snip 73 lines]

[ Prompt: 86,6 t/s | Generation: 34,8 t/s ]

$ ./llama-cli unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf -c 262144 -p "Hello"

[snip 128 lines]

[ Prompt: 78,3 t/s | Generation: 30,9 t/s ]

I suspect the ROCm build will be faster, but it doesn't work out of the box for me.


And footnote 3 is unreferenced.


Good catch, thank you both -- fixed!


It did, there's two incompatible approaches, zram and zswap.


The Signal device linking feature is just as fast. It's partly a trick -- it will look for QR codes even outside the central area, so under good conditions it can get a read before you even get a rough orientation.


Maybe that's the only API-visible change, saying nothing about the actual capabilities of the model?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: