Hacker News new | past | comments | ask | show | jobs | submit login
Seeing AI for iOS (microsoft.com)
835 points by kmather73 on July 14, 2017 | hide | past | web | favorite | 126 comments

Yes, YES, this is what I'm talking about Microsoft. I'm surprised how muted the reaction is from HN here.

On the technical side, this is a perfect example of how AI can be used effectively, and is a (very obvious in hindsight) application of the cutting edge in scene understanding and HCI. There are quite a few recent and advanced techniques rolled into one product here, and although I haven't tried it out yet it seems fairly polished from the video. A whitepaper of how this works from the technical side would be fascinating, because even though I'm familiar with the relevant papers it's a long jump between the papers and this product.

On the social side, I think this is a commendable effort, and a fairly low hanging fruit to demonstrate the positive power of new ML techniques. On a site where every other AI article is full of comments (somewhat rightfully) despairing about the negative social aspects of AI and the associated large scale data collection, we should be excited about a tool that exists to improve lives most of us don't even realize need improving. This is the kind of thing I hope can inspire more developers to take the reins on beneficial AI applications.

Also on the technical side, the fact that Microsoft (Microsoft!) is releasing an iOS-only app, even if it comes from its research department, is maybe more shocking than its open-source releases of late. Who are you, Microsoft?!

It makes sense to start with iOS anyway, it's known for having great accessibility. A lot of people with accessibility needs use iOS for that reason.

Yep. The iPhone has close to 100% market share among visually impaired users, because the accessibility features are so far in advance of Android. Google have been playing catch-up for years, but they just don't have the obsessive single-minded focus on UI that Apple are famous/notorious for.

When the iPhone originally launched, blindness charities were protesting that the touch screen interface would exclude visually impaired users. Less than a year later, the iPhone was being hailed as one of the greatest advances for visually impaired people since braille. Say what you like about Jobs and Ive, but they set the bar for accessibility.

I agree that Google has not been focusing on accessibility at first, but what is missing from the last version of Android compared to iOS ?

At this stage it's a long list of small details rather than any specific feature. Apple place an extremely high value on UI consistency, which is disproportionately important for visually impaired users. Accessibility is an intrinsic part of their design culture, rather than something they retrofit on later. Subtle things like menu items being in a logical order make a really big difference when you rely on a screenreader. Apple also work far harder to promote and support accessibility for third-party developers. Accessibility is always a major focus at WWDC and is deeply woven into the Human Interface Guidelines. UIKit is highly accessible by default.

By way of metaphor: Visually impaired people need to be extremely neat and tidy, because they can't easily navigate a messy environment. If you don't put things back on the shelves in exactly the same place every time, you're in deep trouble. Google have a really poor track record of putting things in the same place every time.

For users with other accessibility needs, the size and consistency of the iOS ecosystem is often crucial. iOS gets all the good apps and peripherals first, because it's a much easier development target.

> because it's a much easier development target.


No it is not

A device that doesn't have to be replaced every 18 months since that's the only period of official upgrade support?

(Said as a former nexus 5 owner)

In addition to the brilliant accessibility features, there's a limited set of devices running iOS, all with known camera specs and picture quality, which can definitely help in maintaining some consistency in results by reducing variation in input.

Also rumours has it that Apple punishes you in the app store if you launch on Android first.

I just used my best google-fu and can find no evidence for this. If you're going to make controversial claims like that in the future, include a link to the evidence or you will mostly just end up spreading FUD.

Upvoted you. Couldn't find it either. Think it was in a checklist I read once.

It's missing because Apple took it down wherever it was, obviously.

/conspiracyfallacysarcasm :)

I think @rustyshelf has removed the most salty posts, but the bulk of the story is still somewhere in their blog:


Sounds like FUD.

Its not. Developers of a famous podcast app have a number of long posts about this subject.

Also, Apple will ban your app if it has the word "android" anywhere in it.

So a single app developer experienced unknown delays and attributed that to the fact that they launched on Android first... How's that anecdotal evidence going?

I'm sure the part about Apple punishing you for launching on Android first is rubbish, but the second claim is true - App Store rules are that you can't mention the word 'Android'.

Rule 2.3.1:

"don’t include names, icons, or imagery of other mobile platforms in your app or metadata"

And here is a case where they actually enforced it:


OK so one datapoint! But not to the most controversial claim.

> Developers of a famous podcast app have a number of long posts about this subject.

Do you have a link?

Nope, the company is called shifty jelly. They've still got other iOS apps though so it wouldn't surprise if they were taken down.

Ah the company behind Pocket Casts (love that app). Well I searched through their blog and could only find Amazon App Store drama.

I'm going to assume the original statement is FUD as accused. It does seem incredibly unlikely a dev could draw such a link even if Apple were doing that kind of penalisation (which makes literally no sense to me).

Could you point to where in the episode the 'Apple punishes if you're in the Android app store first' idea is covered?

(I'll probably listen anyway since I like those guys, but still ...)

My headphones are out of juice so you have settle for this summary:

> Coincidentally, Russell Ivanovic is a case in point for what could happen if you defy Apple and launch on Android first. Ivanovic had initially been very lucky to have been assigned an Apple Developer Relations representative who gave him exclusive promotional opportunities. Few developers get assigned these representatives. Among the benefits Ivanovic received was the privilege to have Shifty Jelly's apps preloaded on iPads in Apple stores in Australia, a major marketing boost.

> Things went south In 2012, when Ivanovic launched a new version of the Pocket Casts app on the Android Play Store first, rather than Apple's App Store. The launch was a real success, and he publicly shared the good news. Before he knew it, his Apple Developer Relations representative stopped all contact. The representative would not even answer his emails. Ivanovic had been completely shut out.

Source: http://www.elischiff.com/blog/2015/3/24/fear-of-apple

Ah okay, so at most it's a withdrawal of special treatment, NOT punishing a regular app's position in the app store because they launched in the Android store first, which was how the original accusation sounded.

But thanks for the info, I can see it certainly would stink to be treated that way, losing your benefits and being ignored by your contact.

These guys were on the featured list and showcased in apple commercials before they were suddenly dropped.

I am not sure if this is "withdrawal of special treatment". These guys made great ios apps, both before and after they started making Android apps.

The sections you quoted specifically mention special privileges (and that the Dev Rel representative thing is rare).

And it was those things which they lost right, rather than some kind of search position punishment?

Didn't they release an iOS-only photo app years ago that was also very clever? IIRC it collated thousands of user photos of a single piece of architecture into a 3D model or something

Edit: Just checked App Store. Microsoft has literally hundreds of apps on there. You should probably go look. But I can't find the app I'm talking about lol

EDIT 2: Found a story on it! 11 years ago!! https://arstechnica.com/uncategorized/2006/07/7381/

It was called Photosynth and it seems to have disappeared from the Internet :(

EDIT 3: crap they just shut it down THIS YEAR! https://blogs.msdn.microsoft.com/photosynth/2017/02/06/micro... It was cool as hell!

Sometimes the good bits of their research projects get rolled into others. https://www.microsoft.com/en-us/research/product/computation...


Found a 10 year old demo of PhotoSynth here: https://www.ted.com/talks/blaise_aguera_y_arcas_demos_photos...

With Windows Phone out of the picture, it's either that or Android. iOS is probably an easier first target (less hardware and software variety).

It also helps that iOS is more welcoming to C++ devs than Android, although they might be using something else like Xamarin.

To be honest I guess it's most likely a combiation of their bing speech and vision apis, so as there are android sdks for all of them, you could easily reproduce this as an android app.

Low hanging fruit or not, this is the kind of effort that will sway public opinion.

This is the direction that needs to be undertaken with as much or more fervour than business automation applications.

I'm inspired!

These are all definitely applications of ML we've seen around in different places, but this is the first package I've seen to put them together in an integrated way on a smartphone.

I'm also impressed with Microsoft's effort here. It looks pretty nice, especially the different 'channels' metaphor for the various ML processing tasks (Scene, barcode, people, short text etc) for the visually impaired.

I'd rather say that machine learning is an application of mathematical statistics.

> it seems fairly polished from the video

Reminder: This is a Microsoft marketing video, they are never like the actual product. See Kinect and Hololens teasers for more.

I have a technical question here: Can this be called a true AI? Because as far as I'm concerned, most companies just use AI and ML interchangeably. I guess these are only applied AI tailored to do a plethora of tasks, but at the end of the day, they're not conscious and aware of their existence.

Part of the techniques seem to come from 'office lens' app of Microsoft. It takes pictures of documents and positions them

Just as an extra piece of information for people, the person presenting the videos in that page is Saqib Shaikh, who is a developer at Microsoft. Earlier on HN, there was a really interesting video of him giving a talk about how he can use the accessibility features in visual studio to help him code. https://www.youtube.com/watch?v=iWXebEeGwn0

US App Store only at this stage it seems. Pity, I'd like to try this.

edit: I'm wrong. it's in other stores as well, but not in the Australian app store, which is the one that I tried.

For the launch, its available in USA, Canada, India, New Zealand, Singapore and Hong Kong. And gradually, should be available in more countries.

Anirudh from Seeing AI team

Just curious, what is the reasoning behind such rollouts? Localization?

I really hope not. I'm not saying localization isn't important, but holding back an English app because of this is something I don't understand. A huge part of the world speaks English just fine. Release first, localize later if it's not possible right away. This is probably not the case here, but I hate it when devs forget that the US isn't the entire world that they're releasing their app to.

It's not about just language, of course. Otherwise "localization" would be called "translation". But a machine vision AI app isn't very useful if it cannot recognize local signs, brands, whatever, or pronounce them correctly. It would be simply bad publicity.

You're absolutely right, I dismissed this far too early and didn't think far enough for an app like this. Thanks for clearing that up.

I think that must be the case - it isn't available in the UK :)

Also some of us don't even like to have the software localized. English is not my mother tongue and it's not an official language where I live either but I always set the system language to American English on my computers and devices.

Same here, I couldn't care less about a German translation. I'm not going to see it if it's not the only language the app comes with.

I do however prefer a German locale as in number, time, units, etc.

Localization != translation. Especially when it's a machine vision app that's supposed to be useful locally.

You are right. Internationalization might be the word for what I'm talking about, though some might argue that that's not correct either because they might say that l10n is a subset of i18n. Not quite sure. But anyways, yes, l10n in the sense that it works in my part of the world is desirable. I18n if only taken to mean translation (and things like right-to-left and such which arise from supporting certain languages) is what doesn't matter to me.

Because if you don't release an localized app you receive bad ratings that people never update.

Not a bad point, I hadn't thought about that, but you do still have the option of flushing all reviews with an update if they're primarily negative.

Also flushing people's memories? Seriously, marketing and PR is a human problem, stop treating it as an engineering one.

Heck it could even be a great teaching aid for English!

Yes...translating to Australian is a slow and tedious process.

Hopefully this can be expedited by translating from the New Zealand version which is released. Just lower the prior probability of sheep and you're done!

It's unlikely to be localisation, because you'd expect to see it in the UK store as well.

Some companies only release iOS applications in certain countries to begin with because they want a gradual rollout rather than doing a global launch all at once. The App Store doesn't give you many options in this respect.

Devs often limit regions to slowly ramp up usage, so the servers don't melt down on day one, see Pokemon Go for example.

Presumably licensing and legal coverage.

I work with a school for the visually impaired in the UK and would definitely like to trial this - also with a team that supports VI children in mainstream schools. Is there any way to get an alert when it appears in the UK appstore?

Android release?

make it available for other countries as well in a machine 'Language' learning mode. The seeing crowd could train it the local terms, pointing at the fridge and dictating Kühlschrank. This would be a cool project to help others. I would enjoy doing that, maybe gamify it.


You can still download it from the US app store if you create an account and link in your phone. Switching between accounts is painful (a lot of password typing is involved) but allow apps from multiples countries on the same iPhone.

The process in image: https://www.imore.com/how-download-pokemon-go-canada-uk-and-... I did that to get some Japanese apps.

I was able to download from the Canadian App Store

Sorry, I should have finished my coffee before typing. What I should have said was it's not available in the Australian App Store.

It correctly identified my refrigerator and bookshelf. Color me impressed.

Things like this make articles like this one seem silly: https://www.madebymany.com/stories/what-if-ai-is-a-failed-dr...

It was very unjudgey about my living room too.

If I have a friend that is visually impaired and is using this, I have to consent to their phone recording me and analyzing me and sending all of that data off to who knows where.

And this is just from my perspective - someone who is not visually impaired. For the person who is, every single thing they look at and read is going to be recorded and used.

It's an unfortunate situation for people to put in, and I'm sure everyone will choose using improvements like this over not using them. As much as I would love to see a focus on privacy for projects like this, I don't imagine it happening any time soon, given how powerful the data involved is.

I imagine a future where AI assistants like this are commonplace, and there is no escaping them.

In the future, most AI will happen locally, not remote, because of privacy, latency and cost. There is a lot of research in adapting models to fit phones (such as reducing the size of the NN by 20x-100x and still having almost all the accuracy).

Yes, you are right, but things like face recognition in this app run locally already. I tried it in airplane mode.

Also, check this blog from a blind user and some interesting comments: http://chrishofstader.com/seeing-ai-first-impressions/

As far as public use goes, it's always been considered legal / ethical to photograph people in public places. One does not know where the "information" in a photograph will end up, but since they are already technically in public view, it's something of a moot point.

Where do you want to escape?

If I remember correctly, this came out of a OneWeek project - Microsoft's company-wide weeklong hackathon. Very cool to see a final published version of this!

You are right, this project evolved from the week long 2015 company wide Hackathon, with a participation of close to 16,000 employees, where people are allowed to take a week off to build anything of interest. Many accessibility projects have come out, off the top of my head:

(1) Eye Controlled Wheelchair for people with ALS

(2) Color Binoculars - App for people who are colorblind

(3) Hearing AI - App including live speech recognition and sounds recognition for people with profound hearing loss

(4) Learning Tools for OneNote - https://www.onenote.com/learningtools

(5) Dictate - Speech based keyboard control to type emails/documents ( http://dictate.ms), built originally for people with hand dexterity issues

A few of my peers now work full time on projects originated at the hackathon.

- Anirudh from Seeing AI team

Looks like this is only for iOS, any plans on coming to Android?

Wow. You can imagine a near future where this, a small wearable camera and an earphone could really make a big difference to a persons daily life.

Screw Siri, thats a real AI assistant :)

Apparently there is some research and development taking place into a cane that has a sonar or some similar sensor embedded into its tip, and transmits haptic feedback to the user via the handle.

These assistive technologies are fantastic, but I wonder whether a vision-impaired person who has adapted to life before they were available would be weary about adopting them, on the basis that if it breaks or becomes unavailable that they would have maybe lost the skills and sharpness of other senses to compensate?

For years I've wanted a pair of glasses that overlays the world with captions, putting names and "you know them from" next to people, address numbers next to buildings, etc. Google glass made me hopeful, but it fizzled out.

You mean in the future Siri might be able to do this stuff?

The text recognition from documents is amazingly primitive. It doesn't use any type of spell checking to make a best guess at what a word is. It's straight text recognition.

On the other hand the "short text" feature works amazingly well to read text is sees from the camera. It's fast and accurate when reading text even at some non optimal angles.

How do you get it to try to recognize items that the camera sees?


Oops. I guess it would help if I swiped right....

Why on earth would you want spell check on text-to-speech for the blind? Spell check is for people writing messages, not for people reading messages.

This is a terrible idea.

I'm not talking about the feature that reads text aloud. I'm referring to the OCR "document" feature that makes transcription errors that could easily be fixed if it had any level of text correction.

regardless, text correction is a tool for the writer, not a tool for the reader. In general, text-correction does one of two things, it makes an error better, or it makes an error worse, unless you can guarantee that it only does the first thing, you have no idea what the total value of the operation will be.

Additionally, presumably its capacity to make an error better is not better than a human's capacity to make that error better. Even over very small timescales, tens of seconds, a computer confronted with a typo will never outperform a human confronted with the same typo.

A simple text correction could tell the difference between the word "rnethod" and "method" and know one is more likely an error in ocr.

what if it were OCR applied to this conversation?

I am recommending this for my elderly family members with poor eyesight. This could greatly increase their quality of life.

One of my friends from college has limited vision and the feature to read text aloud will be a game changing convenience.

He has a magnifier in his home, but it isn't portable and is limited to working only with documents and images that can lie flat.

edit: After speaking with my friend, he already uses a popular app called KNFB Reader that works very well on short text and documents, but costs $100. On the plus side, it works on Android or iOS.

I'm pretty blown away by this. I took a picture of myself in the mirror with the scene description feature, and it said "probably a man standing in front of a mirror posing for the camera". I took a picture of the room in front of me and it said "probably a living room". Think I'll be experimenting with this for days.

"probably a man standing in front of a mirror posing for the camera"

I would hope it could get this one right, there's a massive amount of training data to recognize it.

Well, it needs some work, but pretty cool nonetheless, can see where it was going with this :)


The use of AI and ML for application purposes is starting to get to a point that it can really be used for problem solving, we did a demo app similar use of this technology, https://zyanya.tech/hashtagger-android

I am going to give Seeing AI a try as well, but I totally understand why a research department would like to have a demo as an Application available for public.

Does it do hot dog or not?

This is quite an amazing technology. With new products like HoloLens and this, i think Microsoft is finally coming around.

I love the low vision pitch as there are a dearth of low vision resources particularly those hit with age related macular degeneration. I wonder if they are any censored items in the backend that may limit functionality -- Seeing AI won't be seeing any sex toys...

This is amazing. I don't know if a lot of people here realize this, but it is really hard to pull off this level of integration of different computer vision components (believe me, I've tried). Microsoft has really outdone themselves this time.

Not available in the Australian store. Ah, I forgot my entire country doesn't exist.

They can't recognize things upside down.

Funny that you are only upset that it is not in your country.

Does this mean RIP OrCam (the other startup form mobileye creator which basically does this as a full hardware / software solution) ?


I'm impressed. Good job MS

Awesome technology; and todays SMBC [1] seems to be related.

[1]: https://www.smbc-comics.com/comic/the-real-me

Microsoft app ' Office Lens ' is the only app I use for screenshots of documents in Android. I see part of the tech is used in this app to take a screenshot also.

Love it

I think this is also cool for learning English as well. An English learner who'd like to express what s/he sees can verify with AI's response.

The videos explaining it are really nice https://youtu.be/dqE1EWsEyx4

Looks amazing, I really need to dive into machine learning more this year... Waiting impatiently for UK release to give it a try!

Remember this? http://i.dailymail.co.uk/i/pix/2013/05/13/article-2323625-19... when it was a really big deal that Microsoft agreed to help Apple and release some software for their platform?

There was a demo of this on the series Bill Nye Saves The World

Unimpressed. Took a picture of a ceiling fan and it said, "probably a chair sitting in front of a mirror." Took a pic of a dresser, and it said "a bedroom with a wooden floor." Tried the ceiling fan again and got an equally absurd answer.

Deleted app.


I guess you also live stream using Periscope and not any other service.

It's supposed to be a joke.

Whenever I take a picture with the camera button on the left, it shows a loading indicator and the app crashes. Not a great first impression. Coming from a company the size of Microsoft, such trivial crashes should have been caught.

Hi Leo,

I am guessing you are using iOS 11 Beta. We are currently working on support for iOS 11 beta, and will shortly be updating the app. Stay tuned!

Anirudh from Seeing AI team

Hi Anirudh,

It's a really commendable effort from MS. An immense benefit for people who have vision challenges.

Are you guys hiring? If yes then what are the skills you are looking for?

Anirudh and team: Excellent product release. Thanks for doing such important work for marginalized people! We need more tech like this and less like that.

Nice job Anirudh! You and the team did some awesome work!

Which iOS device do you have?

7 Plus

I am also on a 7 Plus and I do not have this problem.

6S here, I also have this crashing problem. On iOS 11 Beta.

Don't run OS betas if you don't want crashes.

I figured that was the issue, which is specifically why I included that information. I am guessing that the commenter I replied to may have also been running a beta OS. And it turns out I was correct! But thank you for your snark anyway.

Not really snark. It is, as evidenced here, apparently an issue some people are confused about.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact