I don't think they're intended for rack usage like that. More like for people to put under their desks... there would be no reason to build the giant case with fancy silent-ish cooling if you're going to put them next to your other jet engines.
Fully agree, and I think the tinybox is great if you put only one of them somewhere in your local office.
I just don't think it makes sense to connect multiple of them into a "cluster" to work with bigger models, as the networking bandwidth isn't good enough and you'd have to fit multiple of these big boxes into your local space. Then I might as well put up a rack in a separate room.
there's an ocp 3.0 mezzanine, so no need to remove a card and you'd get 200gbps, unless I've missed something about needing to remove a card to access it. But yeah stacking these or racking them seems less than ideal.
I don't see why 6 is inherently worse than 4 or 8, not all of the layers are exactly equal or a power of 2 in count. 2^2, 2^3, vs 2^1*3^1 might give you more options.
The main issue I run into mainly is flops vs ram in any given card/model.
Usually you want to split each layer to run with tensor parallelism, which works optimally if you can assign each kv head to a specific GPU. All currently popular models have a power of 2 number of kv heads.
In reality, they need support in order to point liability towards Oracle when something goes wrong. Enterprises like having the ability to shift blame. "Yes we know its down, we have a support ticket in with Oracle."
I ultimately gave up playing this game, spent hours testing and tuning. Each core is slightly different (on amd, I’m sure intel has something similar) and stress test for 1-24h each tweak invoking a reboot. Then some new workload comes and kernel panic/bsod.
Stock settings just work, maybe I lost the silicon lottery but too tired to check anymore.
The problem isn’t manufactured or conspiratorial, it’s just baked into sorting so much content on so few metrics. And needing to account for what the user is currently in the mood for something specific, something generic.
My point is that GoodReads isn't popular enough for it to be profitable to sabotage (yet). And there's still a threat of something more relevant coming along. If they actually wanted to improve discovery for something like prime video/shopping, then they could/would copy what works from GoodReads.
When it’s all in memory you get to amortize the cost of the initial load. Or just pay it when it’s not part of the hot path. When it’s segmented, you’re doing that because memory is full and you need to read in all the segments you don’t have. That’ll completely overwhelm the log n of the search you still get
I was trying to make the point that the dominant factor becomes linear instead of logarithmic, but more accurately it's O(S log N) = O(N log N) because S (number of segments) is proportional to N (number of vectors).
Ah yeah that’s what I wanted to write but I guess I didn’t want to put words in your mouth, and stuck to what I could be certain about happening. We do all this work to throw away the unneeded bits in one situation and when comparing it to a slightly different situation go “huh some of that garbage would be kinda nice here”
It’s not pure chance that the above calculus shakes out, but it doesn’t have to be that way. If you are embedding on a word by word level then it can happen, if it’s a little smaller or larger than word by word it’s not immediately clear what the calculation is doing.
But the main difference here is you get 1 embedding for the document in question, not an embedding per word like word2vec. So it’s something more like “document about OS/2 warp” - “wiki page for ibm” + “wiki page for Microsoft” = “document on windows 3.1”
Yes, we chose not to release the watermark detector to safeguard against adversarial attacks. This decision helps prevent any attempts to erase the watermark by malicious users.
The watermark generator and detector are trained together, one can use the information in our paper to train your own generator and detector model, however in this case the watermark signature created will be distinct from the one we use to protect our seamless translation models. This approach ensures each model maintains its unique security features.
These are interesting questions but why should I pay you specifically?
It’s running on my hardware, openAI made the model, whisper.cpp keeps it updated. You legitimately solved an issue/problem making it easier to use and made a UI, but that’s not something I’m willing to pay monthly for.
You are an unknown and I have no idea if this will be kept up to date. Id pay for an OS update if something broke or you came out with a new set of features. But again you haven’t added monthly value.