Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: bbox-visualizer – Make drawing and labeling bounding boxes easy as cake (github.com/shoumikchow)
67 points by shoumikchow on Sept 27, 2020 | hide | past | favorite | 23 comments

Drawing these bounding boxes can be a fiddly thing to get right, and for some it can be troublesome, so this probably will be worthwhile for a group of people.

With that said, the dependency on OpenCV seems entirely overkill. Surely Pillow would be a much more sensible dependency (with ImageDraw), as all you're doing with this package is drawing rectangles and labels, rather than handling any CV tasks?

I would guess the library uses OpenCV primitives to do things like figure out the size of the label before drawing the background fill underneath. That's normally where the complexity comes in for drawing labels. I'm not sure if you can do that easily with ImageDraw?

For users of this library, OpenCV is pretty likely to already be installed anyway.

Another couple of reasons - OpenCV in my experience handles nonstandard images much better than Pillow (eg many channels, >8 bit). It also has a uniform Python/C++ interface so this could easily be ported if desired. I maintain a C++/Qt desktop annotation app and this would definitely be something to crib from for visualisation. Although in this case image loading could be done somewhere else and the library could just take a Numpy array.

Edit: turns out it's possible with a few extra steps, but it's not trivial. You need to handle ascender/descenders, etc: https://stackoverflow.com/questions/43060479/how-to-get-the-....

OpenCV will tell you the bbox of what it just drew.

Personally, I've always used OpenCV with all object detection and object recognition tasks I've worked on to take advantage of the many CV functions it provides. So it was only natural for me to use OpenCV since it wouldn't require an additional dependency like Pillow.

I could've definitely used Pillow, but is there any tangible advantage to using Pillow over OpenCV?

Looking over the library quickly, I really like it, that being said, Pillow is a much smaller and more self contained library than OpenCV, some projects try and avoid openCV if not necessary and this would extend to this project as well.

Yeah this is the main thing, there are different approaches to computer vision problems than OpenCV, so using a smaller more specific library will make the package more portable and better suited to a wider range of projects.

Having OpenCV as a dependency means that if a project isn't already using OpenCV, this fairly minor (in terms of scale of functionality) utility library has a lot of baggage.

Thanks for the constructive comment! I'll try and see if it is at all possible to implement it in Pillow and maybe re-implement it in a major release, unless it is terribly convoluted.

Sorry for my slow response, 3vidence has covered my main thoughts, but I've added a bit to his comment.

In terms of convolution of implementing it, there may be a little more effort in terms of figuring some offsets (e.g., path widths may extend bounds), but it shouldn't be anything excessive I don't think.

Thanks for the comment! I'll look into it and aim for the next major point release.

How does this compare to https://github.com/tensorflow/models/blob/master/research/ob... ? (Tensorflow is not needed for some of those functions)

Hi! This exists as a standalone library which means you don't have to go through the trouble of cloning the tensorflow/models repository and using the specific functions you need. Moreover, I'd argue this code is easier to follow (ymmv) than Tensorflow's and it would be easier to debug if the user needs to make modifications themselves. Additionally, this library also gives you some extra visualizations you can use on your bounding boxes.

Nice one.

More useful to me would be something similar that operates on tensors on the GPU.

Doing image annotations on host/CPU often becomes a bottleneck.

Out of curiosity, what throughput do you have that would require GPU labelling?

The most resource heavy bit is text rendering I guess, but that could be cached per class-name and reduced to a memcopy. Otherwise drawing rectangles is pretty quick on a CPU to the point where I'd imagine the memory transfer to the GPU is probably comparable to the draw ops?

I've got OpenCV down to around 10ms per image (single thread, python) without the caching idea I mentioned above.

it could be nice to avoid the overhead of moving the gpu box tensors to cpu, potentially.

Interesting idea! I'll have to do some studying as to how to make that happen, so I doubt I'll be adding that option anytime soon but I'll be adding it to the icebox.

Thank you for the comment!

Can someone explain to me why I would need a library to draw bounding boxes with a label? I don't understand why this is a hard problem that warrants a library. There isn't even any 'fiddly math' in there.

I seriously don't get it. Have we become so incapable that we can't draw rectangles and labels anymore by our self?

You are correct that you do not need this library to draw bounding boxes with a label. You can use OpenCV or Pillow to draw it.

However, positioning the label to be exactly above the bounding box can be a little finicky. This just takes care of the math that you'd have to do to place it right above the box. I agree that I am not doing something revolutionary with the math here, but these functions are something that I've had to use over and over again and thought it would be nice to package the whole thing. This library abstracts everything behind two main functions.

There are also a few different visualizations that you can use.

Thanks for your thoughtful reply. I guess I have more of a problem with this hitting the front page of HN than with you publishing it. If you use it a lot and abstract it into a library, that's fine, we all do that. But it looks like people are interested in it and I don't understand why.

Don't get me wrong, I like that you published it and I encourage it as much as I can. But if someone is capable of running complex object detection algorithms, they surely can position a label correctly without the need for another library?! This is just a couple of lines of code you can write without even thinking much about it.

Maybe I'm just out of touch, but it's so weird to me that people out there might find this useful.

Same reason I might use tippy.js for popup boxes. I know I could make it myself, and make it just as well as the tippy authors have designed it, but why waste my time doing that when I know they've already thought through all of the problems that I can't even expect until I'm already in the thick of it?

Sure it's not like I haven't used libraries in the 20 years I've been coding. If it solves the complex problem I am facing, I will use it. But have you looked at the code in question? It is not complex. It is code you write in 20 minutes.

I don't understand the hate. I don't have to solve P=NP to post on HN. If I did solve something complicated, I'd be publishing a paper, not posting here. I thought this is something that might help the community. If people are upvoting it, it's because they think it might help them - save 20 minutes if nothing else.

Either way, your comments might be against HN guidelines [1].

[1] https://news.ycombinator.com/showhn.html

I find it interesting, nom, that you could be adept at using object detection algorithms and still not detect the reason why people are interested in reading this article.

It's not about capability, but efficiency. This seems handy enough for me to adopt into custom labeling tools I wrote using cv2 and PIL, etc.

No doubt that this is handy as a reference. But it's on the frontpage of HN and i don't understand why. If you look at the code it evokes memories of left-pad. Rendering a label and a box is not complicated, this library only serves the creator.

Instead of down votes, I hope someone can answer my questions.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact