Hacker News new | past | comments | ask | show | jobs | submit | StrangeDoctor's comments login

I think the idea is you network them together if you need more and most models can be split nicely.


For that you'd probably be better off removing one of the GPUs, and replacing it with a networking card.

The problem of the form factor will remain. The tinybox is 15U big for compute that you'd normally expect to find in a 4U form factor.


I don't think they're intended for rack usage like that. More like for people to put under their desks... there would be no reason to build the giant case with fancy silent-ish cooling if you're going to put them next to your other jet engines.


Fully agree, and I think the tinybox is great if you put only one of them somewhere in your local office.

I just don't think it makes sense to connect multiple of them into a "cluster" to work with bigger models, as the networking bandwidth isn't good enough and you'd have to fit multiple of these big boxes into your local space. Then I might as well put up a rack in a separate room.


there's an ocp 3.0 mezzanine, so no need to remove a card and you'd get 200gbps, unless I've missed something about needing to remove a card to access it. But yeah stacking these or racking them seems less than ideal.


3kW under your desk... no need to turn on the heat in the winter!


Most models actually can't be split nicely by 6. There's a reason nvidia builds nodes with 4 and 8 GPUs.


I don't see why 6 is inherently worse than 4 or 8, not all of the layers are exactly equal or a power of 2 in count. 2^2, 2^3, vs 2^1*3^1 might give you more options.

The main issue I run into mainly is flops vs ram in any given card/model.


Usually you want to split each layer to run with tensor parallelism, which works optimally if you can assign each kv head to a specific GPU. All currently popular models have a power of 2 number of kv heads.


interesting, thank you for the pointers.


They need support in not attracting the inquiry of oracle auditors.

They’re basically all based on openjdk source with different VMs and GC tech added in or supported to different levels.


In reality, they need support in order to point liability towards Oracle when something goes wrong. Enterprises like having the ability to shift blame. "Yes we know its down, we have a support ticket in with Oracle."


I ultimately gave up playing this game, spent hours testing and tuning. Each core is slightly different (on amd, I’m sure intel has something similar) and stress test for 1-24h each tweak invoking a reboot. Then some new workload comes and kernel panic/bsod.

Stock settings just work, maybe I lost the silicon lottery but too tired to check anymore.


Goodreads is faang?

The problem isn’t manufactured or conspiratorial, it’s just baked into sorting so much content on so few metrics. And needing to account for what the user is currently in the mood for something specific, something generic.


My point is that GoodReads isn't popular enough for it to be profitable to sabotage (yet). And there's still a threat of something more relevant coming along. If they actually wanted to improve discovery for something like prime video/shopping, then they could/would copy what works from GoodReads.


Goodreads is a subsidiary of Amazon.

Edit: I realize I misread your comment. Disregard!


When it’s all in memory you get to amortize the cost of the initial load. Or just pay it when it’s not part of the hot path. When it’s segmented, you’re doing that because memory is full and you need to read in all the segments you don’t have. That’ll completely overwhelm the log n of the search you still get


I was trying to make the point that the dominant factor becomes linear instead of logarithmic, but more accurately it's O(S log N) = O(N log N) because S (number of segments) is proportional to N (number of vectors).


Ah yeah that’s what I wanted to write but I guess I didn’t want to put words in your mouth, and stuck to what I could be certain about happening. We do all this work to throw away the unneeded bits in one situation and when comparing it to a slightly different situation go “huh some of that garbage would be kinda nice here”



It’s not pure chance that the above calculus shakes out, but it doesn’t have to be that way. If you are embedding on a word by word level then it can happen, if it’s a little smaller or larger than word by word it’s not immediately clear what the calculation is doing.

But the main difference here is you get 1 embedding for the document in question, not an embedding per word like word2vec. So it’s something more like “document about OS/2 warp” - “wiki page for ibm” + “wiki page for Microsoft” = “document on windows 3.1”


Any more info about the watermarking? Only Meta can make the determination?

Edit: I can’t find the weights but if I’m reading the paper right anyone could train their own detector.


Hey! a RS from Meta seamless team here.

Yes, we chose not to release the watermark detector to safeguard against adversarial attacks. This decision helps prevent any attempts to erase the watermark by malicious users.

The watermark generator and detector are trained together, one can use the information in our paper to train your own generator and detector model, however in this case the watermark signature created will be distinct from the one we use to protect our seamless translation models. This approach ensures each model maintains its unique security features.


Thanks for clarifying, and seems like a completely reasonable approach. Thanks for the great work.


These are interesting questions but why should I pay you specifically?

It’s running on my hardware, openAI made the model, whisper.cpp keeps it updated. You legitimately solved an issue/problem making it easier to use and made a UI, but that’s not something I’m willing to pay monthly for.

You are an unknown and I have no idea if this will be kept up to date. Id pay for an OS update if something broke or you came out with a new set of features. But again you haven’t added monthly value.


I think it means this? This PR was difficult to follow.

https://github.com/topics/ai-starter-kit


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: