AMD Alveo V70 AI inference accelerator card

kombine · on Jan 10, 2023

This is inference only. AMD should invest into the full AI stack starting from training. For this they need a product comparable to NVIDIA 4090, so that entry level researchers could use their hardware. Honestly, I don't know why AMD aren't doing that already, they are best positioned to do that in the industry landscape.

modeless · on Jan 10, 2023

The hardware is not their major problem. They have been failing super hard at the software side of machine learning for a solid decade now.

It seems like pure management incompetence to me. They need to invest a whole lot more in software, integrating their stuff directly into pytorch/TF/XLA/etc and making sure it works on consumer cards too. The investment would be paid back tenfold. The market is crying out for more competition for Nvidia and there's huge money to be made on the datacenter side but it all needs to work on the consumer side too.

gymbeaux · on Jan 10, 2023

AMD has finite resources, like any company, and they’ve been focusing on CPU/datacenter dominance, which to me is both the safer bet and the more-lucrative bet. It wasn’t that long ago that AMD was on the brink of bankruptcy (~2016), so I appreciate that they’re not trying to divide their attention.

Their attempts at entering the ML space so far have been failures, and they are wise to hold off on really competing with Nvidia until they have the bandwidth to go “all in”. Consciously NOT trying to compete with Nvidia is the reason they didn’t go bankrupt. Their Radeon division minted from 2016-2020 because they focused on a niche Nvidia was neglecting- low-end/eSports (also leveraging their APU expertise to win PS4/Xbox contracts).

I think Nvidia will eventually lose its monopoly on ML/AI stuff as AMD, Apple, Qualcomm, Amazon and Google chip away at their “moat” with their own accelerators/NPUs. As mentioned though, the Nvidia Edge really comes from CUDA and other software, not the hardware. I doubt that Apple, Qualcomm, Amazon or Google will be interested in selling hardware direct to consumers. They want that sweet, sweet cloud money and/or competitive advantages in their phones (like photo processing). I don’t want to be paying AWS $100/mo for a GPU I could pay $600 once for. I do think AMD/RTG will go hard on Nvidia eventually, and it will not matter whether you have an AMD or Nvidia GPU for Tensorflow or spaCy or whatever else.

cptskippy · on Jan 10, 2023

> AMD has finite resources, like any company, and they’ve been focusing on CPU/datacenter dominance, which to me is both the safer bet and the more-lucrative bet.

Cloud/datacenter based ML is a huge growth market. Having the same software work on a consumer GPUs and enterprise ML cards is one of Nvidia's competitive advantages.

lostmsu · on Jan 10, 2023

> AMD has finite resources, like any company, and they’ve been focusing on CPU/datacenter dominance

Why can't large companies tap the investment market? E.g. they could sell bonds to fund it, borrow, etc.

vosper · on Jan 10, 2023

Well they definitely can, and do. They have made the decision that the reward isn't worth the risk.

The idea (not aiming at you here, you didn't say this) that senior leadership at AMD is unaware of NVIDIA's lead in this space, and haven't repeatedly considered whether to invest in competing, is absurd. Likewise the idea that anyone outside of AMD understands better than AMD does what it would take in terms of investment _and opportunity cost_, is also absurd.

Senior leadership at AMD isn't dumb. The fact that they're not doing something we want doesn't make them dumb, either. Again, not aiming at you with this little rant :)

lostmsu · on Jan 10, 2023

They may not be dumb, but they may have narrow vision (which I suppose is still dumb). Bad decisions in top tiers of the largest world companies are not unheard of.

gymbeaux · on Jan 11, 2023

Yeah but no one can deny Lisa Su is an MVP and knows what she’s doing. She’s borderline overqualified for CEO of AMD IMO

lostmsu · on Jan 11, 2023

> but no one can deny Lisa Su is an MVP and knows what she’s doing

In her area. We have no real info on how proficient she is in other areas like ML.

gymbeaux · on Jan 13, 2023

I’m not sure that’s really a separate “area”. She understands ML as it pertains to hardware and software sales better than anyone in this thread.

ClumsyPilot · on Jan 10, 2023

> The idea that senior leadership at AMD is unaware.. Senior leadership at AMD isn't dumb.

Lets try this with another company:

The idea that leadership at Lehmon Brothers is unaware of the fact that they are trading subprime loans is absurd! The leadership isnt dumb

The Idea that leadership at Being is unaware of safety issues with 737 Max is absurd! How could you suggest that anyone outside boesing understands better than they do the risks involved?

vosper · on Jan 11, 2023

The fact that a couple of other companies have had dumb leadership in no way proves that AMD's is dumb. You're essentially claiming that because Lehman and Boeing had dumb leadership, all senior leadership at all companies is dumb (because there's nothing linking AMD to these other companies except that they're all companies). And that is an absurd claim.

ClumsyPilot · on Jan 11, 2023

I am not sure if they are dumb, the jury is still out, but your entire post is the textboom example of logical fallacy, appeal to authority.

You are claiming that AMD leadership is infalliable based on no evidence whatsoever. Thats whats absurd

modeless · on Jan 10, 2023

The idea that AMD management can't possibly have made any bad decisions is the absurd thing here. It's entirely possible that AMD carefully considered Nvidia's position, carefully considered their strategy, and confidently made the wrong decision. It happens all the time in all sorts of companies.

I think it's very clear with the benefit of hindsight that not investing enough into the software side of deep learning early on was a bad decision. But it was obvious to me even at the time and I said as much to anyone who would listen (e.g. seven years ago https://news.ycombinator.com/item?id=12258027)

_carbyau_ · on Jan 11, 2023

Maybe they are not dumb. But my next GPU will be nVidia because of their decision. Hence I am disappointed at not having a competitive rival.

I may not know better. But I know what I like.

BizarroLand · on Jan 12, 2023

Can you guess why they would be putting hardware into the ML workspace field but not backing that up with equivalent software integration?

hedgehog · on Jan 10, 2023

Last I checked they see deep learning training as a niche market, their strategy is to try to win big contracts (HPC etc) and then supply software specifically for that. Then "the community" will supply software. Having spent a bunch of time beating my head on this and related walls it's not clear to me that they're entirely wrong from an economic standpoint. Remember that 2/3 public cloud providers have their own chips as well as NVIDIA's so it would be tough to negotiate a good deal. As a user it's super irritating to be stuck on NVIDIA especially when Jensen gets up on stage to say "haha, Moore's law is over, stop expecting our products to get cheaper."

marcyb5st · on Jan 10, 2023

I hope they change their minds. At least now that generative models are becoming somewhat popular. I'd love to be able to get an AMD card to run generative models, but to the best of my knowledge, they only run on Nvidia hardware

nl · on Jan 10, 2023

No personal experience, but you can actually get Stable Diffusion to run on AMD cards.

It uses DirectML on Windows: https://gist.github.com/averad/256c507baa3dcc9464203dc14610d... This is thanks to Microsoft, not AMD.

On Linux you can use ROCm: https://www.videogames.ai/2022/11/06/Stable-Diffusion-AMD-GP...

The horrible install processes and what a mess this is is all down to AMD.

hedgehog · on Jan 10, 2023

I don't have any experience with DirectML but it sounds promising.

hedgehog · on Jan 10, 2023

I wouldn't hold my breath, and anyway at this point NVIDIA has faster chips and more supported software all the way down the stack. My previous startup tried to solve some of these problems and we built what is as far as I know still the only reasonably complete device-portable deep learning framework. Today something like an RTX 3070 is a good budget option for small experiments and you can always lean on a cloud provider if you need more compute temporarily. Hard to beat a TPU pod when you're in a hurry.

dathinab · on Jan 10, 2023

> product comparable to NVIDIA 4090

no, they need a product good at training and gpu compute at a reasonable price

that product doesn't need to be good at rendering, ray tracing and similar

sure students and some independent contractors probably love getting both a good graphic card and a CUDA card in one and it makes it easier for people to experiment with it but company PCs normally ban playing games on company PCs and the overlap of "needing max GPU compute" and "needing complicated 3D rendering tasks" is limited.

through having 1 product instead of two does make supply chain and pricing easier

but then 4090 is by now in a price range where students are unlikely to afford it and people will think twice about buying it just to play around with GPU compute.

So e.g. the 7900XTX having somewhat comparable GPU compute usability then a 4080 would have been good enough for the non company use case, where a dedicated compute-per-money cheaper GPU compute only card would be preferable for the company use case I think.

nl · on Jan 10, 2023

Long time ML worker here. People work in one of 3 ways:

1) Consumer Nvidia GPU cards on custom PCs

2) Self hosted shared server

3) Cloud infrastructure.

There is no "GPU compute only card" that is widely used outside servers.

> company PCs normally ban playing games on company PCs and the overlap of "needing max GPU compute" and "needing complicated 3D rendering tasks" is limited.

The "don't play games thing" isn't a factor. Most companies just buy a 4090 or whatever, and if they have to tell staff not to play games, they say "don't play games". Fortnight runs just fine on pretty much anything anyway.

dathinab · on Jan 11, 2023

my point is a card bought for doing GPU computer not being able to work as a normal graphic card is not a problem

I'm aware that currently GPU compute only cards are not widly used outside of the server space.

But that's not because people need the consumer GPU features (bedsides video decode) but because the economics of availability and cost lead to consumer GPUs being the best option (and this economic effects don't just apply to customers but also Nvidia itself).

hedgehog · on Jan 10, 2023

Big question is why? Would competing for entry level researchers buy them much?

kombine · on Jan 10, 2023

Because entry level researchers shape the industry in the long term. I'm in academia, I worked at two universities and I have not seen a lab that uses non-NVIDIA hardware for research. Majority of graduates go to work in the industry.

hedgehog · on Jan 10, 2023

I used to believe this idea that the tools in academia would carry over to industry. Now I think it's only weakly true, that is there's not really a big barrier to switching to other options like TPU or maybe Trainium (haven't tried it). Supporting independent researchers gives you the Heroku problem, they may like your product but as they get more sophisticated and go to production they'll accept significant pain to save money, scale better, etc. You basically have to re-win that business from scratch and the technical tradeoffs are very different at that point.

wmf · on Jan 10, 2023

MI200, MI250, and MI300 should work for training. They don't have an exact equivalent to the 4090 but that may be setting the bar too high. Nobody can deliver everything that Nvidia has but better.

kombine · on Jan 10, 2023

It will be harder for a small academic lab to afford MI300, whereas everyone can purchase a few cards costing $1500. And even if I had money to buy MI300, I wouldn't - it is a too risky investment, because we have no idea how suitable they are for common AI research workflows. They need to lower the entry bar, so that people can try out their hardware. Even 80% of the performance of 4090 would be enough at an appropriate price point.

Hooray_Darakian · on Jan 10, 2023

> AMD should invest into the full AI stack starting from training.

https://www.amd.com/en/graphics/servers-solutions-rocm-ml

> For this they need a product comparable to NVIDIA 4090, so that entry level researchers could use their hardware.

Why is a high end product a requirement for entry level research?

kombine · on Jan 10, 2023

4090 (or 3090, 1080Ti and so on) is a high-end consumer GPU, but at the same time it is an entry level GPU for AI researchers. Don't forget that workstation cards (RTX 8000) let alone server-grade GPUs such as A100 are an order of magnitude more expensive.

gymbeaux · on Jan 10, 2023

I was doing ML stuff on a GTX 1060 a few years ago. As with everything, it depends on what you’re doing.

p1esk · on Jan 10, 2023

Chances of you publishing something in ML improve proportionally to the amount of hardware you have access to. Or to put it another way, the less hardware you have, the smarter you have to be to publish something in ML.

DeepYogurt · on Jan 10, 2023

That does not sound like "entry level" research to me.

UncleEntity · on Jan 11, 2023

IDK, I want to train up some AI and have the choice of using google colab or buying a GPU that can do it within a reasonable timeframe.

Not going to be spending $10k like the tortoise-tts guy (was looking into that project last night) but $2k might be doable for a hobby project. Plus I’d have a computer at the end.

marcyb5st · on Jan 10, 2023

Because of VRAM. Even a simple model for language can easily max out a 4090.

Also, ROC-M is a bit of a mess to setup. With Nvidia i just need to install cuda, cudnn and then pip install tensorflow/pytorch.

jzwinck · on Jan 10, 2023

Because high end research uses a fleet of them, not just one.

wyldfire · on Jan 10, 2023

AMD XDNA – Versal AI Core / 2nd-gen AIE-ML tiles

Are these programmable by the end-user? The "software programmability" section describes "Vitis AI" frameworks supported. But can we write our own software on these?

Is this card FPGA-based?

EDIT: [1] more info on the AI-engine tiles: scalar cores + "adaptable hardware (FPGA?)" + {AI+DSP}.

[1] https://www.xilinx.com/products/technology/ai-engine.html

derefr · on Jan 10, 2023

It's very likely FPGA-based; Xilinx is an FPGA company. This is being pitched as an "AI accelerator", but "Alveo" as a product line existed before AMD's acquisition of Xilinx, and other "Alveo" products exist (https://www.xilinx.com/products/boards-and-kits/alveo.html) that are marketed for other purposes, while really just being Xilinx FPGAs pre-programmed to perform specific other tasks, with some domain-specific DSPs + interconnects around the edges.

It's possible that AMD could have reworked an existing Xilinx design to incorporate RDNA chiplets in place of some of the FPGA-gate-grid chiplets, creating a heterogeneous mesh; but I find it just as likely that AMD just took their VLSI for an RDNA core and loaded it onto the existing FPGA.

typon · on Jan 10, 2023

It's not a traditional FPGA chip (lots of luts and flip flops). The "AI Engine" is basically hardened chiplets that are working alongside soft logic chiplets and I/O. This is how they're able to get their performance/power numbers

YakBizzarro · on Jan 10, 2023

I suspect that it still has some fpga fabric attacched to the ai engines. The two parts are separate, but according to Xilinx docs (talking about Versal Soc), they are supposed to work togheter

YakBizzarro · on Jan 10, 2023

Interesting. It seems then that the xdna architecture in the Ryzen 4070 is nothing more than a port of the existing Xilinx Versal cores (fpga+ai engine)

wmf · on Jan 10, 2023

That's what AMD said.

djmips · on Jan 10, 2023

Says RDNA based which is AMD's GPU tech.

mastax · on Jan 10, 2023

No, it says XDNA.

djmips · on Jan 12, 2023

My mind accidentally translated that wrong. Thanks!

Roark66 · on Jan 11, 2023

If this is based on fpga tech (xilinx) I don't think it will have a cost/benefit edge over asics. Why not do their own TPU like Google did? Nowadays even embedded MCUs come with AI accelerators (last I heard was 5TOPS in a banana pi-cm4 board - that is sufficient for object detection stuff and perhaps even more).

westmeal · on Jan 10, 2023

No price is listed on their site so I'm assuming its gonna be stupid expensive, but if anyone knows would you mind posting?

dragontamer · on Jan 10, 2023

Previous AMD/Xilinx Alveo are in the $5000 to $20,000 range USD. I'd assume somewhere around there, or maybe even a bit higher.

EDIT: 75W is a smaller card than I expected. "Inference" also usually means "cheaper". so maybe we can be optimistic with $5000-ish ??

Anyone shocked by the price, remember that this is an FPGA-line from Xilinx. Not a GPU from Radeon. Expect very high prices.

dhruvdh · on Jan 10, 2023

It’s 1,995$ - I tried to order when it was announced.

dhruvdh · on Jan 10, 2023

I went through the checkout flow earlier, it was 1,995$ pre tax.

wyldfire · on Jan 10, 2023

The price is shown as $1,995.00 + tax&shipping for A-V70-P16G-ES3-G.

ChuckNorris89 · on Jan 10, 2023

Probably because it's a product aimed at datacenters and cloud providers who work directly with Xilinx/AMD to develop it, so they already know the price.

mcilie · on Jan 10, 2023

It costs 1995 if you look at the "order now" section

messe · on Jan 10, 2023

> High-Density Video Decoder**: 96 channels of 1920x1080p

> [...]

> **: @10 fps, H.264/H.265

Is 10 fps a standard measure for this kind of thing?

novaRom · on Jan 10, 2023

10 fps should be fast enough to provide input tensors for real time inference with small scale transformers / convolutional nets.

scottlamb · on Jan 10, 2023

Maybe. Running inference at 10 fps is probably plenty. But that doesn't mean you only have to do 10 fps of H.264/H.265 decoding. I think the most common scenario is for the input video to be e.g. 30 fps with mostly P frames that each depend on the prior frame in a chain. In that case, you need to decode almost [1] 30 fps to get 10 fps of evenly spaced frames to process.

[1] You could skip the last P frame before an IDR frame, but that doesn't buy you much.

zamadatix · on Jan 10, 2023

If your source is 96 YouTube videos sure, if it's 96 CCTV cameras it's different.

scottlamb · on Jan 10, 2023

Still depends. As it happens, I'm developing my own open source NVR software, [1] so I know a bit about this. Some cameras are fairly good about this, supporting the following features:

* "Temporal SVC", in which the frame dependencies are structured so you can discard down to 1/2 or 1/4th of the nominal frame rate and still decode the remainder.

* Three output streams, which you could configure for say forensics (high-bandwidth/high-resolution/high-fps), inference (mid-bandwidth/mid-resolution/low-fps), and viewing multiple streams / over mobile networks (low-bandwidth/low-resolution/mid-fps).

* On-camera ML tasks too. (Although I haven't seen one that lets you upload your own model.)

But other cameras are less good. E.g. some Reolinks [2] only support two streams, and the "sub" stream is fixed at 640x352, which is uncomfortably low. Your inference network may not take more resolution than that, but even if not, you might want to crop down to the area of interest (where there's motion and/or where the user has configured an alert) to improve quality. (You probably wouldn't pair that cheap Reolink camera with this expensive inference card, but the point stands in general.)

Even the "better" cameras' timestamp handling is awful, so it's hard to reliably match up the main stream, sub stream, analytics output, and wall clock time. Given that limitation it'd be desirable to just use the main stream for everything but the on-NVR transcoding's likely unaffordable.

[1] https://github.com/scottlamb/moonfire-nvr

[2] https://github.com/scottlamb/moonfire-nvr/wiki/Cameras:-Reol...

zamadatix · on Jan 11, 2023

Price wise if this card is $5,000 that's $52 per where you don't need any onboard smarts handled by the camera in a space where commercial cameras are hundreds of dollars to buy or replace to have the particular smarts you're looking for that day. I've done a few PoCs in the smart city/smart retail space they are advertising here and they pretty much end up falling into the "everything must be pre-processed as much as possible and sent to the cloud" or "everything must be dumb and sent to the central recorder" buckets as anything in the middle creates a bad cost balance where you're neither optimising hardware+simplicity costs or data+cloud costs. I'll admit though I don't normally go out to sell cameras all day it's just something we've added as clients in part of a larger connectivity rework (CBRS/LTE/Wi-Fi/GPON/traditional wired) and we typically partner up with some specialized company on the video processing use case. The onboard camera processing is usually about justifying a cloud pitch ("we use data to send video when something interesting happens" or "we send only the best picture of the face in HD to save bandwidth but still be able to ID them later") not so much letting you go in and solve your own problem. One exception I ran into was license plates at a car wash outfit where they were able to send the plate numbers back to their main app but that probably came from being a pre-baked solution for road tolls.

I also have a sneaking suspicion using lower channel counts let you raise the FPS but the max of 96 channels is the hard limit, tuned to allow up to use cases like recognition from unprocessed feeds but the documentation access seems to be a manual approval process so I can't verify for sure.

scottlamb · on Jan 11, 2023

> Price wise if this card is $5,000 that's $52 per where you don't need any onboard smarts handled by the camera in a space where commercial cameras are hundreds of dollars to buy or replace to have the particular smarts you're looking for that day.

Good point. At that scale, the price might make sense. (I'd still hesitate to buy this card, though. Based on experience with Amazon VT1 instances, I don't have any faith in Xilinx's software quality.)

There are much lower-cost solutions if you don't need that many cameras, e.g.:

* The Coral TPU is nice and cheap. I keep hoping to see a new version and/or someone making M.2/PCIe cards with several of these chips on it. It doesn't do the video decoding, though, so you need other hardware for that.

* There was an Axelera card just announced. [1] I'm curious to read the reviews when it actually ships to folks.

* The newer Rockchip SoCs advertise decent video decoding and some ML acceleration. I have one and will be trying it out sooner or later.

> The onboard camera processing is usually about justifying a cloud pitch ("we use data to send video when something interesting happens" or "we send only the best picture of the face in HD to save bandwidth but still be able to ID them later") not so much letting you go in and solve your own problem.

My software's more aimed at the home/hobbyist side of things. There some folks go with the canned/cloud stuff (Ring/Nest/whatever) similar to what you're saying. Some do everything at home with e.g. BlueIris and use the on-camera ML stuff as it is. The lack of flexibility (mostly due to closed-source, low-quality software IMHO) is a real problem though. Some folks use something like Frigate that does on-NVR analytics, and I'll eventually add that feature to my own software.

> I also have a sneaking suspicion using lower channel counts let you raise the FPS but the max of 96 channels is the hard limit, tuned to allow up to use cases like recognition from unprocessed feeds but the documentation access seems to be a manual approval process so I can't verify for sure.

I bet you're right.

[1] https://www.cnx-software.com/2023/01/02/150-axelera-m2-ai-ac...

kmeisthax · on Jan 10, 2023

Every time I hear about an AI accelerator, I get really excited, then it turns out to be inference only.

psychphysic · on Jan 10, 2023

Did I miss something did AMD buy Xilinx? Makes sense I suppose after Intel bought Altera.

Who owns lattice?

AnonMO · on Jan 10, 2023

In the time it took you to write this you could have searched your two question on google clicked the first wiki link and got your answer

psychphysic · on Jan 11, 2023

Bah! I'll go back to voicing all my thoughts to chagpt

p1esk · on Jan 10, 2023

How much memory does it have?

novaRom · on Jan 10, 2023

What TOPS means exactly in "... TOPS*|(INT8) 404 ..." ?

capableweb · on Jan 10, 2023

TOPS - Trillions of Operations Per Second, used as a benchmark to figure out the performance of the accelerator.

In my experience, mostly a marketing number, higher TOPS doesn't actually mean it'll be faster than something with a lower TOPS.

As always, you need to do your own benchmarks with your use case in mind.

novaRom · on Jan 10, 2023

What kind of operations is not clear. Wether it's a simple logic operation or a FMA is big difference.

calaphos · on Jan 10, 2023

I assume it's int8 operations (FMA would count as 2). At least that's the case with FLOPs, TOPs for AI accelerators is basically the same measure, with the changed acronym reflecting the non float data format.

tgtweak · on Jan 10, 2023

Sad to see amds ROCm efforts essentially abandoned. They were close to universal interop for cudnn and cuda on amd (and other!) Architectures.

Hopefully Intel takes a stab at it with their ARC line out now.

dhruvdh · on Jan 10, 2023

Abandoned? What?

rubatuga · on Jan 11, 2023

In Their consumer card line

h2odragon · on Jan 10, 2023

Douglas Adams said we'd have robots to watch TV for us. That seems to be the designed use case for this.

16gb RAM / 96 video channels ... I haven't done any of that work but it feels like they expect that "96" not to be fully used in practice.

inetsee · on Jan 10, 2023

I have no problem imagining a security camera application needing to monitor quite a few video channels.

phkahler · on Jan 10, 2023

>> I have no problem imagining a security camera application needing to monitor quite a few video channels.

As a joke I sometimes tell people the automatic flushing toilets in public bathrooms work by having a little camera monitored by someone in a 3rd world country who remotely flushes as needed, while monitoring a whole lot of video feeds. They usually don't buy it, but will often acknowledge that our world is uncomfortably close to having stuff like become reality.

h2odragon · on Jan 10, 2023

Certainly. I'm suspecting that doing much of anything with all 96 channels would really need more RAM, for most users.

scottlamb · on Jan 10, 2023

On the inference accelerator? IIUC, the RAM is just to hold the model and whatever state it needs during a particular inference operation. I'm not an expert on ML but AFAIK 16 GiB is plenty. I suppose it'd also need to hold onto reference frames for the video decoding, but at 1080p with e.g. YUV420 (12 bits per pixel), you can hold a lot of those in 16 GiB. edit: e.g., 4 references for each of the 96 streams would take ~1 GiB.

Even on the host, 16 GiB is fine for say an NVR. They don't need to keep a lot of state in RAM (or for that matter to do a lot of on-CPU computation either). I can run an 8-stream NVR on a Raspberry Pi 2 without on-NVR analytics. That's about its limit because the network and disk are on the same USB2 bus, but there's spare CPU and RAM.

andy_ppp · on Jan 10, 2023

/camera/state/g

hhh · on Jan 10, 2023

I have models in production that currently monitor ~400 cameras with an addition of 2-3 cameras/month. If it were cheap enough, it would be useful for our use case (Quality Control). We generally pull from cameras roughly 6400 pixels per region of interest, of which one instance may have 4-30 RoIs across N cameras.

71a54xd · on Jan 10, 2023

Curious where I could learn more about models like this / potentially see some open code outlining tooling / infra required as well?

hhh · on Jan 10, 2023

Not sure about open information about the models, but from tooling/infra we are running k8s w/ in-house API for image acquisition. Features are defined as x,y coordinates denoting center of a feature, with a pixel count denoting size of rectangle in each direction from center.

hyuen · on Jan 10, 2023

I won't even take a look at the numbers unless they show a PyTorch model running on it, the problem is the big disconnect between HW and SW, realistically, have you ever seen any off-the shelf model running on something other than NVidia?

Narew · on Jan 10, 2023

It's for inference only not training. In this use case, there is lots of device that's not Nvidia. For server you have Google tpu, for more close to public there is the Apple Neural Engine for example.

hhh · on Jan 10, 2023

I've ran models on Apple HW, Raspberry Pi, random CPUs, NVIDIA GPUs, TPUs. Waiting to get my hands on Tenstorrent gear.

kombine · on Jan 10, 2023

That. I work on the research side and I am still waiting for non-NVIDIA hardware for training deep models.

frozenport · on Jan 10, 2023

Google's TPU comes to mind