The thing I like the most about the current AI wave is the pressure is putting on computing hardware. Yes, mobile phones with long battery lives are cool and all of that, but most cool things I like are locked behind huge computational requirements.
Agree. I work in robotics and we never have enough compute. I want to see us get to the point where the most advanced robot ever has all the compute it needs onboard, and that means huge growth in compute density and efficiency are needed.
a common example from my robotics experience (mainly mobile robots) has been getting something powerful enough to run our image recognition/interpreting sensor data. We often have something like several microprocessors (think:arduino equivalent running c++ or c) which run all the motor control etc and a high level system (used to often be raspberry pi, now more often nvidia jetson nano) listening to all of those and using most of it's computing power on some kind of sensor data, usually image recognition or processing TOF camera/lidar/radar data etc. We often have to optimise hard to get a couple of cycles or "frames" per second with these, which really puts limitations on how robots respond (250ms delay is veeeery noticable, especially if it's in obstacle avoidance - relatively common)
Limiting ourselves to onboard compute available on mobile robots is one thing, but even for fixed installation robots, aka an arm in a factory where space and power aren't limited, we're very much still limited by compute capacity. Trying to use robots to do something as simple as folding clothes still cannot be done at a reasonable speed. Yeah, on a personal level, just buck up and spend the 20 minutes folding your clothes, or hire a maid to do it for you, but the complexity of automating the task of folding clothes by a robot is a stand in for other tasks in industry that we still can't automate because the complexity is still too high for our current computing power, and have to hire a human for.
Researchers at US Berkeley came out with the algorithm they named SpeedFolding in October of last year. Watch https://youtu.be/UTMT2WAUlRw?t=511 and then realize that linked excerpt is sped up 9x.
If we had 9x faster compute we could have laundry folding robots which is one thing, but that amount of compute would enable robots to do tons more tasks in industry.
Robotics is a double whammy, you have compute problems but you also have actuation.
Getting robots to move quickly is easy; getting them to move quickly to exactly where you want them, or with exactly as much force... that is much, much more difficult. Double for mobile robots where you don't have a good energy source. If cost is an issue that is another dimension -- powerful and accurate actuators are extremely expensive.
I don't work in the field but just to kind of put it into perspective, a 12v 100A LiFePO4 battery has 1200 Watts capacity and weighs 30 pounds. A typical gaming PC (which to be fair, is more willing to trade power for performance) consumes about 600 Watts per hour. Problem for a Tesla? Not so much. Problem for a lightweight drone? Definitely.
Watts per hour implies watts/hour. Watt-hour implies a number of watts multiplied by a length of time. Also known as energy. Watts are power. Watt hours are energy. Two different things. Watts/hour is nothing.
The NVIDIA Jetson boards are popular, but even with a full desktop processor and state of the art GPU, you can easily down them in data from a LIDAR sensor or a few cameras. Especially since robots may also need fast response times.
There is another reply to your comment that shares a lot of what I have experienced. You have so many pieces of code that need to run and a good handful of them are working on something like LIDAR point clouds with a million 3D points in them, plus some cameras running several different image recognition and segmentation algorithms, and you want to have fast cycle times, it just all adds up. Every serious robot I have ever worked on is maxing out its system, even ones at Google X with a full desktop CPU, a high end NVIDIA graphics card, and a couple secondary ARM CPUs.
Definitely a good thing but FYI it hasn't been profitable/feasible to mine bitcoin (SHA-256) on GPU for many many years as ASIC based miners have completely taken over. I've talked about it plenty on HN but any way you slice it crypto is an unbelievable waste of resources in every possible way regardless.
What really (finally) more or less killed GPU mining was the Ethereum move to PoS (Proof of Stake).
Bitcoin has a use, but there are other options for consensus algorithms that don't waste as much energy as the citizens of a medium sized country and fill the same user case (and other expanded use cases). Why not just do that?
Strictly speaking, the comment you're replying to doesn't say which of the two is a contest to waste resources and which has potential to have useful results.
In general, the purpose of Bitcoin is not to get rich, but to have a currency that is universally accepted and not tied to a political party’s fiscal decisions.
Only if you really like big numbers for the sake of them. Otherwise, one is just straight up snake oil[0], and the other… is kinda hard to tell yet, because while I'm really impressed, I don't know if it's {a toy, a tool, the first sign of a major transformation}.
[0] did you know the original snake oils contains more omega-3 and therefore improves cognitive function when compared to lard? I did not. But you can get omega-3 elsewhere, and the people who made the term synonymous with fraud didn't use those snakes, so…
or games. People could have been studying or doing something more important than wasting time and energy.
I get that it is entertainment, but so are board games and that don't require mining rare earth minerals or putting pressure on the grid as you can always play board games with candles on.
I'm of two minds on this. I'm not a gamer so part of me thinks gaming is a complete waste of time and resources. Then again, the same could be said about almost any hobby/pastime.
That said, gaming is what gave us GPUs (which have developed for gaming over the course of decades) so that we can now utilize them for more interesting and "productive" applications.
So, for me, in the end I'm happy the PC gaming industry and user base has been pushing GPU capability.
Be careful. The gaming industry has successfully conditioned people into believing they need a $1500 GPU with the TDP of a microwave so they can play the next unfinished-at-release AAA title.
One day we'll find out that all of the VR, crypto, and maybe now AI bubbles were nothing but conspiracies being driven by big-GPU to keep their share price up.
VR has been a godsend for forcing hardware, OS and driver developers to actually pay attention to jitter and max latency. If crypto means we get nice fast pretty games and fancy AI then I’m for it. :)
Charlie Stross (cstross on here) had a fun blog post[1] about this phenomenon just a week and a half ago.
> As for what you should look to invest in?
> I'm sure it's just a coincidence that training neural networks and mining cryptocurrencies are both applications that benefit from very large arrays of GPUs. [...]
> If I was a VC I'd be hiring complexity theory nerds to figure out what areas of research are promising once you have Yottaflops of numerical processing power available, then I'd be placing bets on the GPU manufacturers going there
For most part of the 20th, a bulk of the energy humanity was able to extract was used for industrialization. Now it seems that a vast bulk of the energy being extracted will go towards computation.
I doubt it frankly. Computation consumes a lot of energy, true, but it is dwarfed by how much energy we use in transportation and food production. Energy use per capita in most of the Global North is about 75,000kWh per year,
Yeah but every time we discover a new interesting thing to do with computers the requirements go up by several orders of magnitude. How many more orders of magnitude of energy can we spend on food production from here with current projections of world population to peak in 2100?
theres an economics theory of “supply creates its own demand”. we wanted to do ml, gpus were around for games, we repurposed them for ml, and ml architectures that benefit from gpus won the “hardware lottery” (influential paper from sara hooker in case you are unaware)
It sounds like you are complaining about capitalism :-)
It's not so bad. Nvidia could come and say, "hey, I'm going to lock down your GPU so that you can only use it to render polygons in my whitelisted list of video-games, and then you pay us $$$$$$ to buy our 'datacenter' thingy for anything else." But if they do it, people will go and buy the competitor's product.
And yes, probably their 4090 are being bought by some rich kids with their parents' money, but I reckon most of it are sales to professionals, people who would justify their purchase decision with more than playing First-person-shooters. I for example play videogames with my gf, and we have equivalent GPUs. Hers is AMD and costs less than mine, even if it does the same, but I went for Nvidia so that PhysX were available and I could use Pytorch and Numba+GPU and even C++ CUDA. The moment Nvidia locks that down, I'll have to switch to AMD.
Reminds me of this Choose Your Own Adventure book from 1984. It was about how PCs had organic AI components and each was unique, and you happened to get your hands on a super intelligent one.
Unlike Stable Diffusion, I don't stumble upon people who actually use it. Are there examples of the output this can generate? What happens once you manage to run the model?
I've been playing around with LLMs recently and it's definitely interesting stuff. I've mostly focused on roleplay/MUD applications and it's not quuitteee there but it's pretty good, and it's idiosyncrasies are often hilarious.
(when fed the leaked bing prompt, my AI decided it was Australian and started tossing in random shit like "but here in Australia, we'd call it limey green" when asked about chartreuse, i assume because the codename for bing chat is 'sydney')
I have been using similar models like LLM for helping draft fictional stories. The community fine tuned models are geared towards SFW and/or NSFW story competition.
https://koboldai.net/ is a way to run some of these models in the "cloud". There's no account required and the prompts are run on other people's hardware, with priority weighting based on how much compute you have used or donated. There's an anonymous api key and there's no expectation that the output can't be logged.
The models that run on hardware locally are very basic in the quality of output. Here's an example of a 6B output used to try to emulate chatgpt. https://mobile.twitter.com/Knaikk/status/1629711223863345154 The model was finetuned on story completion so it's not meaningfully comparable.
It's less popular because the hardware required for the great output is still above the top of line consumer specs. 24 gb vram is closer to a bare minimum to get meaningful output, and fine-tuning is still out of reach. There's some development with using services like runpod.
Search for "aicg" or visit https://boards.4channel.org/g/catalog#s=aicg to see the AI Chatbot General thread (a new one is created every time the previous one hits the reply limit).
That's the oldest authoritarian trick in the book, pretty much any successful business in russia got the same fate for example. They even tried it with nginx.
ive used LLMS a lot for filling out details in my dnd worlds. Both openai products but also the open source GPT-J from euluther Things like writing the text of some books for players to read, of I have to curate just like people do with stable diffusion. Also used it to write songs, its surprisingly good at taking things like chord progressions written in notation and rolling with variations on them
It's useless before the model gets instruction and preference tunings. Won't even follow a simple ask, it will just assume it is a list of questions and generate more, or continue with slightly related comments.
FB trained a LLaMA-I (instruction tuned) variant for sports, just to show they can, but I don't think it got released.
I don't know about this fork specifically, but in general yes absolutely.
Even without enough ram, you can stream model weights from disk and run at [size of model/disk read speed] seconds per token.
I'm doing that on a small GPU with this code, but it should be easy to get this working with the CPU as compute instead (and at least with my disk/CPU, I'm not even sure that it would run even slower, I think disk read would probably still be the bottleneck)
A lack of an absurd number of CPUs just means it's slow, not impossible.
Yeah, I find this area fascinating. Like, it's very cool to run a 7B params model locally, but it must feel like a toy when compared to ChatGPT, for example.
However, the 65B parameter, according to the benchmarks, is such a beast that you might be able to do some things on it that are not possible on ChatGPT (despite all of ChatGPT's quality of life features). Amazing times.
You don't need 256 GB. A pair of the new 48GB DDR5 will work along with a pair of 32GB sticks should work in a consumer DDR5 MB to fit the weights. It does burst when initially loading. So, a fast disk with about the same swap size as RAM seems necessary. It took about 25 mins to generate a single 500 character response using a 5800X & 32 GB DDR4, but I was not able to get to it to run on more than 1 thread with the 7B model.
All current Ryzen CPUs do not work with 48GB DDR5, right?
That means if you want to go beyond 128GB you can get an old X399 board (there are some reports of people getting 256GB to work) or more recent Threadripper boards.
I tried mark's OMP_NUM_THREADS suggestion (https://news.ycombinator.com/item?id=35018559), did not see any an obvious change to make it parallel, and given the huggingface patch (https://github.com/huggingface/transformers/pull/21955) once it gets in is suppose to allow streaming from RAM to the GPU. So, for me it was not worth the effort to keep working on the CPU version as even the best case ~30X speedup will still take around a minute to run the 7B.
I wonder if we will start to see complex prune functions and tools start to pop up.
So before you start a task, you sort of describe the domain, and the model is separated into the third most useful and relevant to that topic/query, and 2/3rd most distant from that realm. Then either just the 1/3rd is used in a detached fashion, or it works as 2 layers of cache, one in ram one on disk.
A token is approximately 4 characters. So, four characters per 8 minutes is pretty slow. This comment would take 1224 minutes to generate, if I was an AI.
Conda is gonna work much, much, much better for these kinds of applications, as that's what it's mostly used for, i.e. scientific/numerical computing with C/C++ dependencies.
Conda is an abomination that will download 4Gig of unnecessary shite and carelessly dump it into your system, thereby ruining your existing configuration in the process.
Use it in a container or a VM unless you enjoy re-installing your system from scratch.
Or better still, don't use it al all and let it wither away: these kind of braindead projects need to be put down with extreme prejudice.
The download size is large but conda doesn't ruin any existing configuration unless you explicitly tell it to be your native python environment. Conda is set up as a self-contained set of independent environments. Why would your system care what's inside the Anaconda directory unless you explicitly add it to your PATH/bash?
I haven't touched that steaming pile of shite in a looong while, so - who knows - they might have managed to minimize the amount of havoc their wreak on their user's systems.
But ... I seem to recall ... Conda tries to install GPU drivers does is not? ... Is that not the case anymore?
Because if it still does, your theory about "Why would your system care" and all that doesn't really hold water.
Chill. Use Miniconda3 as a light alternative. Conda is unnecessary. I agree nobody should ever use Conda unless they are extreme noobs. We all have to start somewhere.
Looks like this is just tweaking some defaults and commenting out some code that enables cuda. It also switches to something called gloo, which I'm not familiar with. Seems like an alternate backend.
Lol, all my best work has been when I don’t know what I’m doing and it’s refreshing to see someone moving the ball forward and feeling the same way. Kudos
Usually you just trivially have the model run on cpu or gpu by simply writing .cpu() at specific places, so he's wondering why this isn't the case here.
that's literally all I did (plus switching the tensor type). I'd imagine people are posting and upvoting this not because it's actually interesting code but rather just because it runs unexpectedly fast on consumer CPUs and it's not something they considered feasible before.
That is vastly underestimating how tricky it is to make novel pieces of software run. There is a huge fringe of people who know how to click things but not use the terminal and a large fringe of people who know how to run "./execute.bat" but not how to write syntactically correct Python.
probably pytorch is very optimized to x86. It's likely using lots of SIMD and whatnot. I'm sure it's possible to get similar performance on m1 macs, but not with the current version of pytorch.
Oh yes “””
Hackernews senator:
“”Someone on the internet said meta aka Facebook is not considered a real data native, clean coder and high IQ company unless your new language model exceeds the elegance and slipperiness of mark Zuckerbergs (you) language output in senate hearings. he is smoother than a lake in the metaverse.“”
Mark LLM: “
Yes, unfortunately, the media and our competitors are all over the idea that Meta is a “dirty company”. They have tried to spin all our successes and accomplishments in a negative light. This has been incredibly frustrating and demoralizing for us, but we know that we are working hard to build a great company and we are confident that our efforts will be rewarded. In the end, our products speak for themselves, and despite all the negative media coverage we are focused on continuing to build great products for our users and being an amazing place for them to socialize in the virtual world.”
I have to say "he is smoother than a lake in the metaverse" is presumably accidental, based on the quality of the rest of that text, but it has to be one of the wittiest phrases ive seen LLMs come out with to date
I tested on a decidedly above average CPU, and got several words per second on the 7B model. I'd guess maybe one word per second on a more average one?