Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Apple Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs (arxiv.org)
53 points by tosh on April 9, 2024 | hide | past | favorite | 7 comments


This seems to be in the same vein as the Rabbit R1 [0], software, not hardware. I'm very excited to see what Apple comes up with this year and going forward. They are uniquely (possibly along with Google though I'm not as aware of the OS hooks Android provides here) positioned to expose "functions" to their models that apps are already exposing to them for Shortcuts or Siri intents. It doesn't take much imagination to see them offering "Model Intents" (or some better name) where apps expose functions and descriptions of how/when to use them. Giddy doesn't begin to describe how I feel about this.

Apple could also go for the more aggressive approach and allow their model the ability to interact with apps that haven't exposed specific functions. Might be useful for slow moving apps or apps that don't want to be automated. I'm of two mind on that last part. On one hand, as a developer, I can see the potential for abuse but as user I don't want to wait on app developer or wait for an interaction to be "blessed" before I can use it.

For example, I love Prologue [1] (audiobook app using my Plex server) and the developer is quite good about exposing Shortcut actions but there are times where I might want to use Audible for things like WhisperSync. The issue is that Audible doesn't expose the same things I use daily (widget on homescreen to start playing my book where I left off and ability to set a sleep timer via Siri/Shortcuts. I regularly use that to extend a sleep timer while laying in bed if I haven't fallen asleep yet). I'd be interested in being able to use a LLM to add the features I want to the Audible app in those cases.

[0] https://www.rabbit.tech/

[1] https://prologue.audio/


There’s a lot of functionality attached to just associating NSUserActivities[0] with screens (e.g. asking Siri to “remind me about this” will create a reminder with a deep link to that screen attached), which is something that a lot of apps do. I wouldn’t be surprised if this new LLM functionality takes advantage of those in some basic way too, with new APIs being added to allow developers to fine-tune and extend integration as needed.

[0]: https://developer.apple.com/documentation/foundation/nsusera...


This might also have applications in screen readers and other accessibility applications, where not having a usable model of UI elements can have consequences considerably beyond convenience.


Apple is dead man walking as a company. They are a decade behind the others in AI. Apple couldn't even manage to build a good enough autocorrect in the iOS keyboard until 2023. What makes anyone believe that this company has any chance of keeping up with the advancements of AI now that it's rapidly iterating on a week-to-week basis. It's not looking pretty for this company. The stock price will crash in a year.


> The stock price will crash in a year.

Sounds like you have a /sure fire/ way to make a lot of money then. Unless, of course, your bloviating isn't based in reality... Let's just ignore the fact that Apple has been using ML for many years now and they almost always take a slower approach to new tech. This isn't odd for them at all. WWDC will be telling one way or the other but from what I can see now I don't feel like Apple can't catch up.


This is a weird take, as if Apple not capitalizing on the AI craze means somehow their core businesses won't be sustainable. I doubt anyone using a Mac will be scrambling for the Windows Copilot Bloat experience, and that's to say nothing of the iPhone. If canceling the car project this far in didn't severely injure them, I don't think sitting on their hands with the AI craze is much of a risk. And I say this as someone who detests Apple.


Still waiting to see a cutting edge AI company turn 19.9 billion USD of profit in a quarter. I guess if you count Google there is a close competitor with more AI stuff?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: