Five Learnings from 15 Years in Perception

snovv_crash · 2024-11-14T07:42:12 1731570132

It's a fine balance between Not Invented Here syndrome vs. trying to hammer the square peg of off-the-shelf OSS into the round hole of the actual problem you're trying to solve.

For example they suggest ROS as a robust industry-ready software, which absolutely hasn't been my experience: you hire a bunch of domain experts to solve the various [hardware, controls, perception, imaging, systems] problems, but once you use ROS as your middleware you end up needing a bunch of ROS experts instead. This is due to the horrible build system, odd choice of defaults, instability under constrained resources, and how it inserts itself into everything. You end up needing more fine-grained control than ROS gives you to make an actually robust system, but by the time you discover this you'll be so invested into ROS that switching away will involve a full rewrite.

The same goes for further downstream: OpenCV images are basically a void* with a bunch of helper functions. (4.x tried to help with this but got sideswiped by DNN before anything concrete could happen.)

I guess it's the same rant the FreeBSD people have about the Linux ecosystem and its reliability. However I'd hope we raise our standards when it comes to mobile robotics that have the potential to accidentally seriously hurt people. And who knows, maybe one day OpenCV and ROS will pleasantly surprise me the way Linux has with its progress.

f1shy · 2024-11-14T13:47:16 1731592036

> they suggest ROS as a robust industry-ready software, which absolutely hasn't been my experience

Also BY FAR not my experience.

Personally I see ROS as a cool thing for prototyping and research, but I see it certainly not as a serious solution for industry.

> This is due to the horrible build system, odd choice of defaults, instability under constrained resources, and how it inserts itself into everything.

This is an excellent short description of the main pain topics. I would add, that the need to work with 5 different languages (YAML, XML, MSG, CMAKE and C and/or Python) from scratch, makes it difficult for SME that are not software people to be productive in short time.

carlmr · 2024-11-14T08:13:04 1731571984

>This is due to the horrible build system, odd choice of defaults, instability under constrained resources, and how it inserts itself into everything. You end up needing more fine-grained control than ROS gives you to make an actually robust system, but by the time you discover this you'll be so invested into ROS that switching away will involve a full rewrite.

So true. The worst is that at many companies they will write a ROS clone in the end that does what they need, instead of getting rid of this awful programming paradigm altogether.

accurrent · 2024-11-14T11:33:32 1731584012

Tbf ROS is a pretty old project. ROS1 dates back to 2009 iirc. A lot of technologies didnt exist back then. Even ROS 2 is now relatively old. The way I see it is that we have since learned a lot about software engineering. I dont think using cmake with catkin and relying on debian packages makes much sense now a days. But back in 2009 I can see why we would have to do it. Heck using catkin to share c++ code by copying the code into a local workspace was so much easier than having to figure out which set of systemwide packages and which cmake incantations would work correctly. Today, however we know how to do packagemanagement much better via tools like cargo. However some core ideas like pub/sub,micro services, message definitions, message recording and playback will be part of your stack.

With regards to startups in general though. Having worked at a few Ive noticed that at the earliest stages the goal is for a few individuals to build quickly. Often this means certain framework choices that may not be suitable to scaling. As one scales one has to then evolve the architecture to ensure developer velocoty. This may mean rewriting everything. Im not surprised that people are rewriting ROS internally. At the end of the day there are a few good ideas in there, but at some point one has to acknowledge that implementations were lacking.

Personally if one were to write a middleware framework in 2024 Id go with rust, mcap, zenoh, rerun and possibly use ecs instead of topics.

ethbr1 · 2024-11-14T13:16:40 1731590200

Even when ROS started, we already knew a lot of better ways to build software.

The problem is the primary ROS contributor was/is a hybrid robotics engineer + software developer, which meant their colleagues were more likely of "closed, vendor firmware toolchains are normal" et al. norms, which meant their community hadn't internalized a lot of those solutions.

And this isn't to bash different types of devs -- god knows even leading edge software practices have all kinds of gaps -- but is to say that few people can be 100% in multiple hard things, all at once.

f1shy · 2024-11-14T13:53:14 1731592394

At the end what ROS provides for that cost, is a IPC, Logging, configuration and a startup tree (through launch system). Waaaay too expensive.

- We have moved to standard sockets IPC, or protocol buffers. - For logging just trivial printing. - For configuration to libconfig - For launch system to simple shell scripts.

carlmr · 2024-11-14T15:17:56 1731597476

The beauty with using protobuf, libconfig or whatever is that they're libraries. If you find a better RPC/messaging library for your needs it's easy to replace one library, if you find a better config management you replace one library, if you want better logging you replace one library.

I despise the framework approach, it ossifies bad decisions and then it becomes a monumental task to move away from the framework, because now you need to replace everything.

f1shy · 2024-11-14T18:52:14 1731610334

Absolutely! The system is completely modular, and single pieces can be changed.

dghf · 2024-11-14T12:34:07 1731587647

Linguistic tangent: when did "learnings" oust "lessons" as the standard word for "things I have learned"?

jasode · 2024-11-14T12:56:25 1731588985

Trends in the corpus of books don't always match trends in the corpus of web blogs but the graph of "learnings" popularity seems to match my perception of its increasing usage: https://books.google.com/ngrams/graph?content=learnings&year...

Another phrase I see spreading rapidly is "double-click that" replacing "drill into that". Seems like every podcast instantly adopted "double-click" lingo in the last 2 years.

ethbr1 · 2024-11-14T13:28:29 1731590909

Buzzwords are pretty viral via consultants, and so eventually leach into the corporate / common speech.

Because if you're not using the newest lingo, you might not be using the newest approaches, and might be falling behind! (gasp!)

euroderf · 2024-11-14T16:12:46 1731600766

"learnings" is a Briticism, innit ?

KineticLensman · 2024-11-14T16:45:09 1731602709

Not to this Brit. 'Lessons' all the way, both in corporate speak and in my multi-decade involvement in building training and education systems. We did used to identify 'key learning points' which would summarise the main things a student was expected to take away from a course, but I never saw this abbreviated to 'learnings'.

In my corporate experience, the main thing about lessons was whether they were 'identified' or actually 'learned', e.g. following a corporate post-mortem of some cockup/fiasco/disaster.

dghf · 2024-11-15T09:38:38 1731663518

Is it? As an Englishman, my instinctive tendency is to blame any corporate buzzword innovations on the US: but perhaps I'm wrong in this instance.

lynx23 · 2024-11-14T06:51:14 1731567074

Why is it that people working on spy tech never have an ethics section in there "what I've learnt" rumblings?

lifeisstillgood · 2024-11-14T06:57:19 1731567439

In a world of total surveillance, the difference between a police state and free society is who has access to the data and what laws are enforced preventing them abusing it.

The benefits of all this spy tech are great - if we manage it right. I mean telephone tapping is an example

palata · 2024-11-14T09:56:51 1731578211

A valid ethical question is: will this technology do more good than bad? Or put differently: given the bad implications this technology will likely have, is it worth it?

Example: "I have made this technology that makes me (and only me) earn 1 cent per day when managed right. Bad actors can use it to hack into hospitals and ransom them".

Wouldn't you agree that this hypothetical technology is probably not worth it?

In my opinion, it's not acceptable to say "I created a technology that helps bad guys be even worse, but I don't have any responsibility at all because I don't personally ask them to be bad guys".

Of course sometimes it's harder than one may think a posteriori: sometimes it was not clear while developping the technology that it would have bad applications.

lifeisstillgood · 2024-11-14T20:00:20 1731614420

So the cost-benefit analysis approach is quite correct - and my personal take is that “surveillance technology” is another way of saying “external accurate memory” - everything I say or do, everyone I talk to and interact with.

We like having dash-cams because they make other peoples bad driving clear to the judge / insurance agent. Something like that for all aspects of your life seems good and that’s just the personal point. Once we start looking at how to aggregate medical data and provide it to researchers - I mean some doctor has come up with stick on Bluetooth ECG monitors - stick on and record for 48 hours.

The problems arise when the data is used unfairly. Look this guys is (social underdog) - we can use the recorded data to prosecute him. That only gets solved when the judges recorded data shows a clear bias … and we live in a fair open democratic society

If we don’t live in such a society then yea, this technology will Almost always have more cost than benefit - but that’s not the technology.

palata · 2024-11-15T13:30:10 1731677410

> and we live in a fair open democratic society

I don't reach the same conclusion when I look at all the documented privacy abuse that is happening, though.

lynx23 · 2024-11-14T10:50:21 1731581421

The problem are incentives. There are monetary incentives to create whatever technology you can, because you can, and thats the way you earn money. There is zero incentive to consider how your technology might be abused.

palata · 2024-11-14T11:48:38 1731584918

> There is zero incentive to consider how your technology might be abused.

Ethics is that incentive.

Would you be fine getting a good salary for legally building bombs that are sold legally, knowing that they are purposely killing many innocent children? If the answer is "no", then it means that there is an incentive. You "just" need to find your line.

And I strongly believe that sometimes, the only way is to agree together on a line (through regulations), because individually we may not be strong enough. For instance I think that sending people to the ISS/Moon/Mars is an artistic performance that the world does not need. On the contrary, we invest a ton of public money into doing that, which then unlocks private companies (like SpaceX) to commoditize space, which in turn is bad for the environment. So I would vote against spending public money on this. But now if you offered me to become an astronaut in a space program, I would most definitely go. I find it interesting: I would vote against it, but given the opportunity I would do it :-).

DaiPlusPlus · 2024-11-14T06:52:05 1731567125

This article struck a personal note with me because around the same time (2008-2012) I was really getting into vision, and even got published as an undergrad for imaging sensor fusion work (...my first, only, and likely last only meaningful contribution to my species); while the wider MV/CV community was making incremental gains every few years (anyone else remember Histogram-of-Oriented-Gradients?), that's what they were: incremental (I also remember my research-supervisor recounting how the patent on SIFT probably held back the entire field by a decade or two, so yes - things were slow-moving...

...until a few years ago when:

> Computer vision has been consumed by AI.

...but "AI" is an unsatisfying reduction. What does it even mean? (and c'mon, plenty of non-NN CV techniques going back decades can be called "AI" today with a straight-face (for example, an adaptive pixel+contour histogram model for classifying very specific things).

My point is that computer-vision, as a field, *is* (an) artificial-intelligence: it has not been "consumed by AI". I don't want ephemeral fad terminology (y'know... buzzwords) getting in the way of what could have been a much better article.

usgroup · 2024-11-14T07:07:46 1731568066

I expect he just means deep and foundational models for vision, which is true: they dominate.

ripe · 2024-11-14T14:44:39 1731595479

You are right: Computer Vision was always one of the original fields of AI research. The International Journal of Computer Vision was established in 1987 and it remains a premier outlet.

Today the word "AI" has itself been hijacked by marketers of ANN-based techniques, so when the article uses that term, it confuses people who don't know any better.

ludicrousdispla · 2024-11-14T08:33:14 1731573194

>> computer-vision, as a field, is (an) artificial-intelligence

A lot of our visual perception happens on the retina and in the 'processing pipeline' before reaching the brain.

Margaret Livingstone provides an excellent overview in her book "Vision and Art" and she takes a view similar to yours.

ncruces · 2024-11-14T23:01:48 1731625308

A lot of CV that used to be analytical (and benchmarked for metrics on synthetic data) is being replaced by train model on synthetic data to give answer.