Hacker News new | past | comments | ask | show | jobs | submit | ynniv's comments login


  Dog
  Cat
  Jesus

"There isn't a government on this planet that wouldn't kill us all for that thing"


By now they just declare encryption illegal and order backdoors everywhere.


Once you accept that a movie isn't a documentary, Hackers is a lot of fun. The editor of 2600 consulted on it (and lent his nym)


Hackers does a great job at answers the question “How do you make hacking, which is literally just typing, visually interesting?” It is kind of like Pixar anthropomorphizing things like emotions. The software becomes a semi-physical landscape with viruses moving across and through that space (like the rabbit virus literally multiplying rabbits). It isn’t trying to be “accurate”, its playing with the idea and concept of hacking. I think it is a really fun movie.


The soundtrack is much, much, better.


Agreed, just saw the remaster in a theater and it holds up surprisingly well. What seemed like glaring inaccuracies in 1995 now seem like whimsical visualizations and over the top silliness. Except of course "It's got a 28.8 bps modem".

The other thing I noticed was that it is chock-full of gender non-conformity. Not in the plot, but visually - almost every hacker character wears androgynous clothing or makeup or something along those lines. The over-representation of trans people in computer security is a common trope now and I wonder if Hackers was the first time that was depicted in media.


point eight bee pee ess, said no one ever. Also too much ADR to censor words that didn't need to be censored


What is an nym?



Who sees an original mouse and thinks "speech recognition" without this classic context?


Only formally proven systems will be secure


This is a fantastic idea! Reading through one of the games I see:

- players are named based on their model, which can be ambiguous

- some model responses are being cut short

- some models seem to be thinking out loud, or at least not separating their chain of thought from what they tell the group


Thank you. Models are clumsier than I expected in mafia games. Despite my clear instructions to 1) limit max output tokens, 2) avoid thinking out loud, and 3) use the <think></think> tag for internal thoughts, they sometimes behave misleadingly.


You could try issuing two prompts: one to "think", and another to "say to the group". Not as nice as having a model that can properly format its response, but you'll get more accurate results.



This article keeps getting posted but it runs a thinking model at 3-4 tokens/s. You might as well take a vacation if you ask it a question.

It’s a gimmick and not a real solution.


If you value local compute and don't need massive speed, that's still twice as fast as most people can type.


Human typing speed is magnitudes slower than our eyes scanning for the correct answer.

ChatGPT o3 mini high thinks at about 140 tokens/s by my estimation and I sometimes wish it can return answers quicker.

Getting a simple prompt answer would take 2-3 minutes using the AMD system and forget about longer context.


Reasoning models spend a whole bunch of time reasoning before returning an answer. I was toying with QWQ 32B last night and ran into one question I gave it where it spent 18 minutes at 13tok/s in the <think> phase before returning a final answer. I value local compute but reasoning models aren’t terribly feasible at this speed since you don’t really need to see the first 90% of their thinking output.


Exactly! I run it on my old T7910 Dell workstation (2x 2697A V4, 640GB RAM) that I build for way less than a $1k. But so what, it's about ~2 tokens / s. Just like you said, it's cool that it's run at all, but that's it.


It's meant to be a test/development setup for people to prepare the software environment and tooling for running the same on more expensive hardware. Not to be fast.


I remember people trying to run the game Crysis using CPU rendering. They got it to run and move around. People did it for fun and the "cool" factor. But no one actually played the game that way.

It's the same thing here. CPUs can run it but only as a gimmick.


> It's the same thing here. CPUs can run it but only as a gimmick.

No, that's not true.

I work on local inference code via llama.cpp, on both GPU and CPU on every platform, and the bottleneck is much more ram / bandwidth than compute.

Crappy Pixel Fold 2022 mid-range Android CPU gets you roughly same speed as 2024 Apple iPhone GPU, with Metal acceleration that dozens of very smart people hack on.

Additionally, and perhaps more importantly, Arc is a GPU, not a CPU.

The headline of the thing you're commenting on, the very first thing you see when you open it, is "Run llama.cpp Portable Zip on Intel GPU"

Additionally, the HN headline includes "1 or 2 Arc 7700"


It's both compute and bandwidth constrained - just like trying to run Crysis on CPU rendering.

A770 has 16GB of RAM. You're shuffling data to the GPU at a rate of 64GB/s, which is magnitudes slower than the internal VRAM of the GPU. Hence, this setup is memory bandwidth constrained.

However, once you want to use it to do anything useful like a longer context size, the CPU compute will be a huge bottleneck for time-to-first-token as well as tokens/s.

Trying to run a model this large, and a thinking one at that, on CPU RAM is a gimmick.


Okay, let's stipulate LLMs are compute and bandwidth sensitive (of course!)...

#1, should highlight it up front this time: We are talking about _G_PUs :)

#2 You can't get a single consumer GPU that has enough memory to load a 670B parameter model, there's some magic going on here. It's notable and distinct. This is probably due to FlashMoE, given it's prominence in the link.

TL;Dr: 1) these are Intel _G_PUs, and 2) it is a remarkable distinct achievement to be loading a 670B parameter model on only one to two cards


1) This system mostly uses normal DDR RAM, not GPU VRAM.

2) M3 Ultra can load Deepseek R1 671B Q4.

Using a very large LLM across the CPU and GPU is not new. It's been done since the beginning of local LLMs.


> Crappy Pixel Fold 2022 mid-range Android CPU

Can you share what LLMs do you run on such small devices/what user case they address?

(Not a rhetorical question, it's just that I see a lot of work on local inference for edge devices with small models, but I could never get a small model to work for me. So I'm curious about other people's user cases.)


Excellent and accurate q. You sound like the first person I've talked to who might appreciate full exposition here, apologies if this is too much info. TL;DR is you're def not missing anything, and we're just beginning to turn a corner and see some rays of light of hope, where it's a genuine substitute for remote models in consumer applications.

#1) I put a lot of effort into this and, quite frankly, it paid off absolutely 0 until recently.

#2) The "this" in "I put a lot of effort into this", means, I left Google 1.5 years ago and have been quietly building an app that is LLM-agnostic in service of coalescing a lot of nextgen thinking re: computing I saw that's A) now possible due to LLMs B) was shitcanned in 2020, because Android won politically, because all that next-gen thinking seemed impossible given it required a step change in AI capabilities.

This app is Telosnex (telosnex.com).

I have a couple stringent requirements I enforce on myself, it has to run on every platform, and it has to support local LLMs just as well as paid ones.

I see that as essential for avoiding continued algorithmic capture of the means of info distribution, and believe on a long enough timeline, all the rushed hacking people have done to llama.cpp to get model after model supported will give away to UX improvements.

You are completely, utterly, correct to note that the local models on device are, in my words, useless toys, at best. In practice, they kill your battery and barely work.

However, things did pay off recently. How?

#1) llama.cpp landed a significant opus of a PR by @ochafik that normalized tool handling across models, as well as implemented what the models need individually for formatting

#2) Phi-4 mini came out. Long story, but tl;dr: till now there's been various gaping flaws with each Phi release. This one looked absent of any issues. So I hack support for its tool vagaries on top of what @ochafik landed, and all of a sudden I'm seeing the first local model sub-Mixtral 8x7B that's reliably handling RAG flows (i.e. generate search query, then, accept 2K tokens of parsed web pages and answer a q following directions I give you) and tool calls (i.e. generate search query, or file operations like here: https://x.com/jpohhhh/status/1897717300330926109)


What a teaser article! All this info for setting up the system, but no performance numbers.


That's because the OP is linking to the quickstart guide. There are benchmark numbers on the github's root page, but it does not appear to include the new deepseek yet:

https://github.com/intel/ipex-llm/tree/main?tab=readme-ov-fi...


Am I missing something ? I see a lot of the small-scale models results but no results for DeepSeek-R1-671B-Q4_K_M on their github repos.


For a second I thought it had been ported to wasm! This is still a browser based VNC



This is excellent


"you may have concerns about privacy"

Way to FUD: these models are local unless you enable ChatGPT integration.


> ... There are times, however, when Apple Intelligence needs to leverage a model that requires more computational power than your device can provide on its own. For these tasks, Apple Intelligence sends your request to Private Cloud Compute. Private Cloud Compute is a server-based intelligence system designed to handle more complex requests while protecting your privacy. For example, when you use Writing Tools to proofread or edit an email, your device may send the email to Private Cloud Compute for a server-based model to do the proofreading or editing.

-- https://www.apple.com/legal/privacy/data/en/intelligence-eng...

There is also: "The data sent to and returned by Private Cloud Compute is not stored or made accessible to Apple" - but it can leave your computer and even way between the Private Cloud Compute and your device can be privacy concern.


way to FUD, leaking data out of the apps I choose is still a privacy violation.


Is the clipboard also a "data leak"?


In general, yes?

The clipboard has to be locked down by the OS now because random applications would just read from it all the time when it wasn't even relevant to them.


Then you may be interested in Qubes OS: https://forum.qubes-os.org/t/how-to-pitch-qubes-os/4499/15


Everyone should be interested in Qubes OS.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: