Hacker News new | past | comments | ask | show | jobs | submit login
Segment Anything Model (Sam) Visualized (simple.ink)
78 points by nimsy on Dec 1, 2023 | hide | past | favorite | 17 comments



Hi everyone, We have created this visualization of the SAM model that allows you to see the architecture in a interactive manner, along with the code. We made this when trying to implement the SAM model for us to understand it better. Thought I’d share it and some ML folks might find it useful. Please let me know if it did or did not help you! How do you usually go about understanding the model architectures?

https://flowforward.simple.ink/


I'm working with fine tuning the SAM encoder using LoRA at the moment, so thanks for this. The Segment Anything code on its own, I must say though, is perhaps the most readable and easily navigated DL code I've encountered. Or maybe it's just me coming from mostly a TensorFlow background. I remember I struggled understanding even my own networks when viewing them in TensorBoard. The code peek feature of yours is great.


Thanks for your comment! Glad it was a (small) help. Yes, Meta research did a great job documenting their code! Quick question: why do you not use Hugging Face with their PEFT library for doing fine-tunning SAM with LoRA?


The work is based on the SAMed paper and repo, so I'm not re-inventing the wheel, still leveraging best practices. Generally I see a point in keeping things minimal though, anticipating getting gritty with it.


I like the idea a lot, but I had two main UX problems:

* it is hard to know which of the green blocks can be expanded by clicking, maybe a different color or border for the ones that can be expanded.

* I kept accidentally clicking the text to go to github, but I did realize that if you aim for the edge it works a lot more reliably


hit detection for hovering and clicking is pretty off in Firefox.


Sorry to hear that. Will definately try to address in the future. Other than the terrible UX, did this help you in any way?


This is a really neat idea. Would love to have a similar view for llama like models. I’ve been working with Mistral 7B lately and it’s annoying how many small changes there are between it and llama. Having a view like this would be a good time saver.


Thanks for your comment. Will definately try to work on that too! Quick question: why does the differences between Mistril and llama? Do you mean it can save time for reading the paper? Or actually during the coding/building process?


The coding process. I've been experimenting with fine tuning methods where I freeze various layers or use different loss functions for attention vs feed forward. It's random little things like the names of layers that trip me up. For example the attribute that holds the name of the activation function in mistral is called hidden_act where in llama it's called activation_function.


[flagged]


Sounds like that's something for you to fix on your side. Why are you sharing this with the rest of us?


Presumably the creator of the site would like to know that a content blocker is blocking them so they can understand why/rectify it with the company.


Not much you can do when some scammy anti-malware company decides to block your website because "Newly Seen Domains".

That's basically the anti-malware company saying: "Fuck any startups or anyone who isn't already a bigcorp."


I totally agree on the scammy part. Unfortunately, that's a big university in Germany (RWTH Aachen University), and I think many other universities in Germany apply similar filters. And potentially some other people who use similar filters would get blocked in the same way.

I just thought that the creator might want to know about this.


maybe he is CEO or someone rich.


Maybe your company is using Cisco Umbrella for content filtering


Umbrella protects from the terrible secret of space.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: