Hacker News new | past | comments | ask | show | jobs | submit | smy20011's comments login

Cisco in 2000: Gross margin in fiscal 2000 was 64.4%, compared with 65.0% in fiscal 1999. The following table shows the standard


Paul was mad that Gemini cannot correctly reproduce the racist world in 1800s and correctly draw Nazis. What's the big deal here?


It must be a big offer


I was going to say, question for GP, do you regret not taking the offer? Do you know how much the shares would have been worth today if you had accepted the offer?


Do you regret not plowing all of your money into DOGE in 2020? In the scheme of the things you could do to make money with hindsight, accepting a job offer barely ranks.


Unless you were weighing DOGE vs FTC or something though and went with the latter, it's different? It's not just Oh if I knew then what I know now I could've made so much money; it's I almost made this decision, or could have, but explicitly decided against it.

I had a badly timed offer of something (I don't really remember might just have been an interview, or contracting work, on the back of some OSS contribution) at Cloudflare pre-IPO, and I do occasionally regret/wonder what-if, because it's not just a random hypothetical, it was an actual opportunity I made a decision against. (Doesn't help that it's still a company on my interested-in list, much like the trap in investing of liking a stock but wishing you'd bought it lower when you first thought of it, and so continuing to not buy as it climbs (or not even investing necessarily but just being a consumer in an inflationary environment).)


Absolutely I do. And I could have sold when my Aunt told me how the teachers had a Doge pool at work and the guy at the electrical counter was looking at his Doge and telling me about it. There are smart ways to invest in the silliest of endeavors.


Do you also regret not playing the winning lottery numbers, now that you know what they are (or can easily look them up)?


Absolutely, now that you mention it. In addition every day there are millions of option call/put plays which when I found out did well I will have absolute regrets.


I pity you as you are so full of regrets. Try to enjoy life without looking back so much


Now I regret having lived a life full of regrets.


Every time my girlfriend buys lottery tickets I tell her to buy the winning ones, and every time she gets the numbers wrong. Every. Friggin. Time.


Write out a list with all the possible numbers, and her get one of each before buying more. That way you can have better odds. The real issue will be how long it takes to buy them all.


You should try telling her to pick the wrong numbers. She’ll either be successful or rich!


The chances of making some money from stock in a Silicon Valley darling is not really comparable to having plowed money into a meme coin that was literally created as a joke.


Not comparable at all... the latter knows and admits it's a joke/money-grab and is not trying to pretend it's actually doing something good for the world.


You cannot take as many Silicon Valley stock bets (as an employee) as shitcoin bets. Therefore, making money from the latter is easier.


I actually agree with you. I think it's easier to make money on shitcoins than on on employer stock.


No. There are plenty of well paying jobs that actually match my skills. I probably would have made a bit more at OpenAI, but at the cost of not enjoying my work, and probably not excelling at it.


Some cheapest BYD cars sells in China cost 14k USD. It's cheaper than a lot of EVs in US even considering tariffs.


Thanks but I would like to have a localLM version. I don't want my email sent to any third party for analysis.


We plan to add this!


So spin up a local llm with openai compatible api, and selfhost the inbox zero yourself lol it's opensource


I don't quite understand. Most of LLM inference is MatMul and Softmax. It's not rocket science to implement matrix multiplication in any GPU. We don't have very complex custom kernel for the LLM inference. If AMD can provide fast GPU with cheaper price, a lot of people don't mind to add support for them. GGML shows you can create a working LLM inference solution with one man's work.


> It's not rocket science to implement matrix multiplication in any GPU.

You're right, it's harder. Saying this as someone who's done more work on the former than the latter. (I have, with a team, built a rocket engine. And not your school or backyard project size, but nozzle bigger than your face kind. I've also written CUDA kernels and boy is there a big learning curve to the latter that you gotta fundamentally rethink how you view a problem. It's unquestionable why CUDA devs are paid so much. Really it's only questionable why they aren't paid more)

I know it is easy to think this problem is easy, it really looks that way. But there's an incredible amount of optimization that goes into all of this and that's what's really hard. You aren't going to get away with just N for loops for a tensor rank N. You got to chop the data up, be intelligent about it, manage memory, how you load memory, handle many data types, take into consideration different results for different FMA operations, and a whole lot more. There's a whole lot of non-obvious things that result in high optimization (maybe obvious __after__ the fact, but that's not truthfully "obvious"). The thing is, the space is so well researched and implemented that you can't get away with naive implementations, you have to be on the bleeding edge.

Then you have to do that and make it reasonably usable for the programmer too, abstracting away all of that. Cuda also has a huge head start and momentum is not a force to be reckoned with (pun intended).

Look at TensorRT[0]. The software isn't even complete and it still isn't going to cover all neural networks on all GPUs. I've had stuff work on a V100 and H100 but not an A100, then later get fixed. They even have the "Apple Advantage" in that they have control of the hardware. I'm not certain AMD will have the same advantage. We talk a lot about the difficulties of being first mover, but I think we can also recognize that momentum is an advantage of being first mover. And it isn't one to scoff at.

[0] https://github.com/NVIDIA/TensorRT


You’re right. The hard part is to manage processing of your data as close to the metal as possible to optimize memory, data flow, and computation.

Coming from FPGA (both Verilog and VHDL), to me CUDA offers the best way to handle all three parts in comparison to let‘s say OpenCL, Vulkan, or Metal.

For game developers this might be different, as here you always use high-level language and packages (in most cases).

AMD need‘s to rethink there strategy regarding compute tool stack radically. If the HelloWorldMatrixMul is longer as ten lines and not compile on every new AMD powered notebook out of the box, they have no chance to beat NVIDIA. (I know, that the last point is not given for NVIDIA. But it would give AMD the doubleplus good, currently missing.)


> For game developers this might be different, as here you always use high-level language and packages (in most cases).

... I'm not sure what you mean that you always use high-level language and packages... But it doesn't sound right.

The reason game developers use DX/VK/GL as opposed to cuda is entirely: A. You want access to non-nvidia customers B. You need access to more aspects of the hardware then is exposed to Cuda. Cuda provides a nice interface to Nvidia's compute. Almost no access to the entire rest of the fixed function pipeline.


I was referencing DX/VK/GL and C++. What I can see on the lowest compute level, ML libs often are using pure CUDA in C or even PTX and are used especially if the code is generated by let's say via Python.


None of your comment writes about why CUDA is so special compared to OpenCL or Metal.

Anybody who has worked with them knows that it's basically the same.

Optimizing kernels for all architectures and all models is a hard problem as there are lot of cases to handle, but getting good utilization for a particular model on a particular architecture (like MI300 / Mixtral) is not that hard.


> None of your comment writes about why CUDA is so special compared to OpenCL or Metal.

OP and OP’s OP never frame it about that. But you can still get the answer you are looking for from their answers.


This comment is not going to age well. In the interest of curiosity, in this indictment of yours, is there any specific reason, about the AMD hardware itself, that leads you to believe that such implementations don't already exist, but are merely just not open sourced?

I have a specific reason to believe this is all already optimized: poor adoption of Infinity Fabric by unsophisticated people, whereas NVLink is on every 3090 and DGX/HGX box. But that doesn't mean DC users haven't optimized.


I really do want to see AMD kick ass. But

1) the user above me thinks ML is just matrix multiplies, which it is far from that.

2) It's more than hardware, and one team has (huge) momentum

3) You can have better tech and not win. It actually happens relatively frequently.

There may be nothing better for the ML space than AMD being a meaningful competitor. It will give all of us better tools. It'll make AMD better and it'll even make Nvidia better, because (healthy) competition is good for everyone. But let's not be naive in thinking it is just hardware and that one example is enough. It's a good sign, but a Tesla driving on the highway in nominal conditions isn't even close to fully self driving. It's still a year away as it's been for a decade.

So I do hope this ages poorly. But I'm not holding my breath because as AMD succeeding is one of the best things for ML, hype is the absolute worst and there's way too fucking much of it. And it it takes 5+ years to age, well I wouldn't say it aged poorly.


In an area so performance dominated people will spend the time porting to get the performance. I think this is especially true on the training side, because that's the area where companies have engineers waiting for hardware.

Unfortunately AMD only claim parity with Nvidia for training right now, and uplifts on inference. That's the wrong way round to get people to spend time porting as a priority.


I don't think that comment was an indictment of anything. As someone also in the space, it reads to me as a thoughtful explanation of why catching up to the CUDA ecosystem is so hard, even when the goal is "just" matrix multiplication (it's actually not, but for the sake of a cleaner argument that's ok). It's entirely possible AMD or Intel or someone else has some super secret CUDA killer hidden away somewhere, but nobody has seen any evidence of it. It's a bit of an extraordinary claim, and the community would love to see some extraordinary evidence.


Everyone has been writing cuda code. Hopefully we continue the shift towards using higher level frameworks/abstractions that can determine the optimum machine code for the underlying hardware. I don’t really get it either.

Nvidia is pouring $$$ into protecting their advantage as anemic as it might be. They give free compute and gpu resources to universities etc to cement the dependence on cuda. When you’re a beginner or someone who needs to just “get shit done” you’ll go nvidia at the expense of the greater good.


What the hell is the greater good here? I'm a researcher in a R1 university, I can quickly try stupid things on my personal 3080, then if it works I can virtually git clone the codes onto my lab's server and bam it chops through the problems. After a few weeks I'll get (hopefully) some publisable results that (hopefully) advance my field a little. Do you really expect most researchers to use half of their time debugging the unfinished ROCM stack that is just not there, sacrificing their output, and maybe even their tenure prospect? We love Nvidia solutions, at least in our field, not because it's a monopoly, but because it's actually the best solution right now.


I really like that you can prototype on your own consumer level card - the rocm stack has not matched that historically and has over promised and underdelivered for years now.


It's getting closer, and well, what matters is price points and what, billions of people crunching away on the various problems.

There's huge momentum on the "user" side of the equation that just getting cheap hardware and passable performance means Nvidia losing dominance.


>Nvidia is pouring $$$ into protecting their advantage as anemic as it might be.

This description isn't fair. Nvidia was pouring money into CUDA for 16 years. AMD/Intel only woke up once ChatGPT launched and nvidia started printing money. It's not fair to call it "protecting their advantage" when nvidia invested into the ecosystem for years and AMD couldn't even bother to ship working examples with OpenCL. I'm not sure why choosing the company that decided to invest in the platform for years is at the expense of the "greater good." What "greater good" is there above actually working software?


Nvidia had boatloads of money to spend, and 6X the number of engineers. AMD was skating on the edge of bankruptcy. AMD bet the farm on Zen, and that bet paid off so now they have money to spend on other things.

If they would have split resources off of Zen to spend on CUDA they would have failed at both and AMD would be a bankrupt husk.


You can get a consumer laptop with AI: https://www.amd.com/en/products/apu/amd-ryzen-7-7840hs

The thing AMD doesn't have is the software stack.


Also, the full consumer (RTX) stack is supported. Lots of people who are gamers, or want a hobbyist rig can go out to the shops and start with a RTX.

AMD's ML / ROCm support for their consumer cards is still awful.


They did add official support for the latest gen rdna cards to rocm.


Not on Linux!



Why only the latest gen?


Because they want to recover their investment


Actually because ROCm ships native machine code, as opposed to a bytecode, so supporting any additional architecture is a tonne of effort.

I expect support will eventually be extended to a generation or two prior GPUs, but nothing beyond that.


> They give free compute and gpu resources to universities etc to cement the dependence on cuda.

It's actually quite brilliant. It never occurred to me before that the resources we had access to at university weren't the most popular because that's what the library wanted. They were the most popular because the provider made them cheap/free to widen their moat over the competition.

It reminds me of Thomson Reuters launching Eikon, a superior alternative (imho) to Bloomberg terminals. Every business school has Bloomberg terminals. But few/none had Eikon, which is why nobody knows how to use one.


It's the same reason that students get free MSDN access and unlimited windows keys.

It's a pretty solid strategy, but naturally favors the current incumbent who has the $$$ to throw around.


Try Hidet - open source deep learning compiler written in Python, generating optimized CUDA https://pypi.org/project/hidet/


The obvious answer is to just let the LLM boostrap use better abstractions.


Inference-only hardware like this could be a temporary cost saving solution to scale up AI infrastructure, but I think a chip the size of this should be able to support training, otherwise it’s just a waste of money and energy. I predict inference will move to edge computing based on mobile chips like the ones from Qualcomm in the midterm.


This. When I was still doing infra for a large recommendation engine, the training was what got the big GPUs. Inference was on CPU or (later) ASICs (may have been FPGAs?).


Unless you have a model that's good enough for your use case and it's mostly inference from here on.


> I don't quite understand. Most of LLM inference is MatMul and Softmax.

Most operations are memory bound: slow because memory access takes longer than computation itself. The problem is we have to shove the whole model, billions of weights, in the SRAM once for every token generated. That creates slowness.


Really cool!


I tried to use Sakura LLM to translate some JP novels. It's really good and half the price of GPT3.5 turbo.

https://github.com/SakuraLLM/Sakura-13B-Galgame/tree/dev_ser...


It seems that it only supports Japanese to Chinese only.


You should at least have some basic ideas about your stack.


Agreed. But also if you suddenly find that the precise behavior of one of the parts of your stack matters, you would be well advised to search the internet about how exactly that bit works in practice and whether there are any nonobvious footguns in addition to your empirical testing and the stuff that the manual claims is true.


Why not store them in code? Easier to read and source controlled.


We have a bit of context about this in the readme: https://github.com/lastmile-ai/aiconfig#what-problem-it-solv.... The main issue with keeping it in code is that it tangles application code with prompts and model-specific logic.

That makes it hard to evaluate the genAI parts of the application, and also iterating on the prompts is not as straightforward as opening up a playground.

Having the config be the source of truth let's you connect it to your application code (and still source controlled), lets you evaluate the config as the AI artifact, and also lets you open the config in a playground to edit and iterate.

For example, compare how much simpler openai function calling becomes with storing the stuff as a config: https://github.com/lastmile-ai/aiconfig/blob/main/cookbooks/... vs using vanilla openai directly (https://github.com/openai/openai-node/blob/v4/examples/funct...)


Yeah, looking at this just briefly it might be the wheel I've (pre)-invented like 1/3 of for one of my couch projects. Definitely can see the appeal of a conventional format here, I'll check it out in more detail when time permits!


Please definitely let me know when you get a chance to try it out! The readme has a link to our discord too if you want to get in touch (and you can email me directly too)


I mean don't get your hopes up too far, it's gonna be a minute :D But I'll pass along what I find when I can!


Thank you. I think most things we store in the config are non-business logic. Like the connection string of database or some feature flags.

However, the prompt is your business logic in most cases and put your business logic into a separate file make it harder to read and harder to maintain.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: