Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Papermusic (draw an instrument, then play it) (github.com/askmeegs)
13 points by pizzarat 5 months ago | hide | past | favorite | 1 comment
This was a fun experiment to try PaliGemma (open vision-language model). I found that PaliGemma performed better than Gemini Flash for this type of specific image task, especially around latency. (~0.9 seconds for PaliGemma inference on a VM, vs. 3-4 seconds for Gemini Flash.) Would love feedback on ways to potentially improve this setup.



banger




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: