Hacker News new | past | comments | ask | show | jobs | submit login
IBM Watson API (ibm.com)
311 points by miket on Sept 24, 2014 | hide | past | favorite | 84 comments

IBM is about to make these APIs (and many others) much more accessible as part of BlueMix (https://ace.ng.bluemix.net/ - the IBM PaaS/Heroku). I lead the team in charge of developing the Watson platform. Ask me questions!

I fully admit you may not be the right person for this question, but IBM has made a lot of hay about Watson in healthcare: lots of booths and talks at industry conferences, trotting out partner medical centers, PR pieces in the Wall Street Journal, etc... Despite all the noise, I have yet to see any sort of peer-reviewed clinical study that demonstrates the application of Watson in a real healthcare setting to improve patient outcomes. Has any study like this been conducted?

We do have many actual projects with several healthcare providers. We have functioning systems (see one here https://www.youtube.com/watch?v=8lGJ0h_jAp8) and we are starting to deploy them. But the deployments are made very cautiously and in stages (and not yet broad enough for full study) because that's the nature of healthcare.

While this is not peer-review , it's interesting :


Hey there. The startup I co-founded has been accepted by IBM partner for Watson. A few weeks into it, we are still doing the "form-filling" marathon and have not had access to an instance. Seems like the API, when available to public will be a faster way to access it. Punekale at google's mail.

Not completely. The first set of APIs won't let you provide your own content - it will let you play with the Q&A API (and also do a lot of other fun things not directly related Q&A). The instance you will get access to through the application will let you provide your own content. We are working hard to make that available as self-service but that's going to take a little more time.

Thank you.

I'm curious about the "copyright" field. Do you return the original sources from where Watson learnt the information he is presenting? What are the major sources? Have you faced copyright or legal restrictions to access information and has this affected Watson's ability to answer questions in a certain area?

When you create a Watson instance, you basically load it up with what they call a "corpus" of information. They are supposed to check the source of all your info etc that you have copyright to it. I don't know what it will be like once it is opened up more, but as of now they have pretty tight hands on Watson and will only accept projects with monetization and with revshare agreements...

Copyright is a tricky issue indeed and we are still figuring it out (working with publishers).

1. How does Watson know to share only public data and not confidential data if it comes across it? How does it gather data?

2. Does it cache answers or does it try to "learn" and modify/improve the response each time?

1. Right now data comes from our customers (not shared) or public sources (shared). Watson does not decide on its own what is shared or not.

2. Watson can learn from feedback (i.e., a user grading the quality of an answer).

1. Watson has to be trained on a specific corpus of information. Each instance is it's own Watson, and needs to be trained. They say that each Watson starts off as a kindergartner. 2. Watson needs to be trained by an expert using sample questions and it learns from that.

Bluemix has a impressive list of IBM technologies available: https://ace.ng.bluemix.net/#/pricing/cloudOEPaneId=pricing

Until Watson is available in Bluemix, what is the best way to get access to the q and a API for exploration?

Right now you need to apply here: http://www.ibm.com/smarterplanet/us/en/ibmwatson/form_ecosys... but many Watson APIs will be available on BlueMix very very shortly (think weeks not months).

That's amazing news - thanks a tonne. Any chance there's someone at the mothership (maybe you!) we can reach out to, for support? Specifically I'd like to toss out some questions about data sources and whatnot - to understand if Watson is worth playing with for us.

Best would be to ask the questions on the forum: https://developer.ibm.com//answers/?community=watson&cmp=usb...

Is BlueMix a Heroku competitor or compliment? I want to develop on the Watson API but I also want to use a stack I'm familiar with. Also, do you have range estimates for the price of accessing the Watson API?

I went for an info session and a few events for Watson a couple of months ago at the IBM office in NYC. They are doing pricing with revenue shares. A lot of devs, including myself, were not happy about this. They basically were not open to projects that weren't directly making money off of Watson...

Yes BlueMix is a Heroku competitor. It is the easiest way to consume Watson APIs but won't be the only way. I can't disclose pricing yet.

BlueMix is based on Cloud Foundry and is very easy to use. Pushed an app there yesterday as a test.

while this technology looks interesting, the marketing effort is terrible.

There is no link on the webpage pointing to access request. I had to find it from the comments.

there is no example questions other than that michael jackson example. and that example is not a question. why does it reply michael jackson as an answer? there is no explanation.

there should be a web access point for potential developers to play with the api. I still have no idea what question I can ask and what answer I can expect.


Yes we know and are working on it. The soon-to-come next version should fix all these issues.

Basic dev marketing.

The Q&A is not particularly interesting...are there other api's we can experiment with? Also, is there any precedent for getting clojure up on bluemix?

(Another) IBMer here. As far as I am aware, Bluemix doesn't have built-in support for Clojure but a few of us have played with it and there are a few ways that you can run Clojure apps:

I wrote a blog post about my experiences running a Hello World app using the Heroku buildpack for Clojure which you can find here:


In addition, since Bluemix natively supports Java apps, you can export a Clojure app as either an uberjar or an uberwar and run it directly on Bluemix that way. So for example, if you create an uberwar of your Clojure webapp then you can push it to Bluemix by doing:

  cf push your-app-name -p target/my-app-uberwar.war
... and Bluemix will run your app on a WebSphere Liberty Profile app server.

I'm happy to help if you have any questions - contact details in my HN profile.

Yes many others - think NLP, multi-lingual, social media, speech, vision, etc. We'll start with a small set and keep expanding.

I don't know about Clojure, I would ask that question on their forum: https://developer.ibm.com/answers?community=bluemix

>Data: at least 50 percent of content is unstructured, and sufficient volume exists

Not quite what I expected. Does this mean the developers provide the data?

A given instance of Watson is unique based on it's corpus (documents that have been uploaded to it) and it's level of training and fine tuning. So far each instance has had to be trained from scratch, although they are making some Watson instances available that will come pre-trained in specific fields (like medicine).

The soon to come Watson platform will also come with a set of curated content. So it will be a mix of the content you provide and existing (free and paid) content.

When will BlueMix have Watson access?

Do you mind emailing me? My email is in my profile.

When will Watson instances be open for access?

If you want access to the API, you have to fill out a form, here: http://www.ibm.com/smarterplanet/us/en/ibmwatson/form_ecosys...

This is buried in the docs as a comment on this page: https://developer.ibm.com/watson/docs/developing-watson-apis...


No real support for 'playing around' with the API. Bummer.


Just went through the application process linked above. Be prepared to give info about yourself and your company and an explanation of why you want access to the Watson API, as well as what type of information you'll be working with. I stated 'just want to play around with the API'. We'll see how they react to that.

Watson APIs will be accessible - without the need for an application - as part of BlueMix (https://ace.ng.bluemix.net/ - the IBM PaaS/Heroku) in a couple of weeks. Stay posted!

That is great! I just signed up with my existing IBM ID. I am enjoying using IBM Watson right now on a customer project, and being able to experiment and learn on my own will both help the effort to help my customer and perhaps use IBM Watson for personal projects. BTW, I like your dev page, with starter kits for multiple languages and frameworks.

BlueMix is actually pretty exciting to me; I think IBM has finally stepped up as a competitor in the "cloud" ecosystem where Microsoft has really failed.

"Really failed"? According to Forbes in July of this year[0], Microsoft is second only to Amazon in the cloud market, and gaining. I'm not sure how that counts as a failure, except in the sense that Microsoft makes a popular whipping boy.

[0] http://www.forbes.com/sites/georgeanders/2014/07/28/amazon-i...

I suspect (but don't have confirmation) that this includes Microsoft's SaaS products like Exchange Online/O365, which are very successful.

I get the impression that Azure specifically as PaaS/IaaS, OTOH, is quite a bit less so. At least compared to AWS.

uhh, failed according to what?

I personally have seen very few success stories that involved Azure. The messaging coming through to the public is that Azure is buggy and unreliable. I can recall seeing a couple of major downtime incidents ([1] at least) in the last six months, and a significant number of major ones over the last few years (certificate renewal issues, leap day issues, etc [2][3][4]).

I wouldn't choose Azure myself, and I would actively recommend against choosing it to others from what I have seen (unless you are building a solution on .NET/Windows, perhaps). I can't imagine that I'm alone.

[1] http://www.zdnet.com/microsofts-azure-virtual-machine-cloud-...

[2] http://azure.microsoft.com/blog/2013/02/24/windows-azure-ser...

[3] https://news.ycombinator.com/item?id=5266947

[4] http://www.thewhir.com/web-hosting-news/leap-day-glitch-caus...

It's strange to see FUD deployed against Microsoft for once!

That sounds great. I'll stay tuned.

Out of curiosity I googled the same request.


I think this might be useful if Watson was being feed with a medical database. Otherwise I don't see any need for it; is there any?

edit: Watson as a legal consultant would be great. There might be a product in that, not as an replacement for a lawyer but more as guide/search tool.

Here some examples about Watson implementations https://developer.ibm.com/watson/2014/03/27/industries-can-b...

But would be interesting to see a real application working. An online retail for example.

I actually built my own version of Watson based on this idea - jeopardy questions are often google-able/searchable on wikipedia. It was pretty easy to build out - but it only gets 80% of the way there.

The last 20% is the hardest - and it's why Watson is so impressive (even though even Watson is probably only at 90%)

Did you every see that HN Post about QUANTA [1]? Do you think it would cover the last 20%?

[1] https://news.ycombinator.com/item?id=8298887

I would think Watson can replace most lawyers (and MDs, and ...) of this world. Most of them don't think and just rehash stuff they learnt, just like Watson does. Sure for the exceptions you need actual people, but that is the same time when you would go from your corner lawyer to a more prominent one and when your doctor would forward you to an expert anyway.


Watson could replace lawyers or doctors for people that equate Google searches to legal advice or medical advice. Think legalzoom and webmd... Absolutely seems like it could be an entertaining way for a non lawyer or no doctor to explore a law or medical library. The majority on my time spent with lawyers has been discussing my issue until it could be distiled down to a couple concise legal questions; I bought a short sell house and the seller demanded that I put a clause in the contract that said his bank couldn't issue him an i9... I have. No authority over tax laws but I also didn't want any liability or an invalid contract, nor to willingly build a bogus one. There was some real language subtlty to it all and I didn't even know the questions to ask.

Same with doctors, pain is relative, strong pains turn lesser pains into mild discomfort and people are insanely good at ignoring and normalizing pains away. Do most patients even know what to ask or describe?

Don't get me wrong, I'd love to have a lawyer and a doctor on my smartphone all day everyday but it still seems like a ways off. Watson really seems like a tool that cuts your legal fees because your lawyers research time drops 90% or something. (Or rather, he makes 90% more profit from you..)

Watson is already better at diagnosing cancer than human doctors.


Human doctors need sleep. They get tired. They get old. You'll still need innovation in the medical field, but lets not kid ourselves that we need a gourmet chef in every McDonalds.

Conversation you will never hear in a Dr. office:

  me: i have pain in my side
  dr: 25%: something you ate 22%: appendicitis, 18%: kidney stone...
Conversation you will never hear in a lawyer's office:

  me: am i allowed to...
  lawyer: 42%: Maybe, 38%: Yes, 20%: No

The medical example is called a differential diagnosis. Medical school teaches you to make this. Communicating only the highest-ranked one or two items, unless explaining why you're ordering tests that rule out lower-ranked but actionable diagnoses, is not difficult.

Disclaimer: IANAD.

This is only true when there isn't enough information supplied. You can't expect the doctor to figure out what the problem is just by telling the doctor you have a pain in your side, same goes for the lawyer's scenario.

Exactly. It's not that the doctors know anything more here, they just don't (can't) quantify their confidence (and later reliably update on new evidence).

No offense to doctors/lawyers meant here; all human brains suck at that.

Did you consider that that's not the point I'm making?

The confidences here provide misinformation. This is more harmful than no information.

How is it misinformation? The human doctor would not tell us his diagnosis in terms of percentages, because we as humans have a hard time grasping probabilities intuitively. That doesn't mean that a probabilistic diagnosis would not be more accurate.

The doctors job is to provide me with as much information about the objective criteria of my physical condition as possible. However when it comes to making choices about my treatment, say in the case of accepting/rejecting an experimental drug with some potentially nasty side effects, it should be entirely my own value judgement on what to do with said information.

I've learned a bit about a Watson from internal IBM information and this is something they understand and are working on. There are serious ethical concerns about what to tell someone, even if the diagnosis is quite compelling, IOW, "you have 6 months to live" needs to come from a human. Obviously, the approach is to have it work as a tool for a doctor, not as a WebMD type self-diagnosis service. There are all kinds of follow-up questions, which you'd need to be a doctor to even answer, because they'd be couched in medical lingo e.g. systolic/diastolic blood pressure.

They are only misinformation if it is misunderstood what they represent.

Watson can't replace most physicians. Physicians have to physically interact with a patient to gather a relevant medical history as well as subjective and objective observations about the patient's symptoms. Robots and machine vision systems are nowhere near being able to fill that role.

Watson might be able to partially replace some specialists that primary care physicians use for consultations. When PCPs are unsure about a diagnosis or proper plan of care they will often consult with a specialist for advice via phone or e-mail. So in that case the PCP has already gathered at least some preliminary data and could feed it to a computer. But even for that use case Watson won't be able to provide same level of back-and-forth interaction that's often necessary to achieve the correct result.

The volume of medical data being created and published is rapidly increasing, and it would take a doctor to read something like 160 hours per week to stay on top of their field. There are only 168 hours in a week, so there is really no way a doctor today can keep up with what is going on . The idea of Watson is to be able to take large amounts of unstructured data (i.e. an entire patient history and current symptoms) and be able to find a solution. I don't think Watson will replace a doctor or physical interaction anytime soon, but a device connected to Watson or similar solution will most definitely be used to augment the doctor's diagnosis and treatment solution.

The point is, that (hopefully) your doctor knows where his expertise ends and when to refer you to an expert. I'm not sure Watson could reliably do the same.

Well, MDs just send you away when stuff doesn't go away or they think it's serious; I think Watson can easily be instructed to do the same. If it belongs in the category 'probably needs expert' he should pass his conclusions about what it is and so on to a human expert. Similarly with the cough he subscribed medicine for and after 1 week it's not gone yet; expert. This is what human MDs do as well and I had MDs actually tell me to 'be a man, suck it up' so not sure if Watson could do worse.

Fed with enough medical knowledge, Watson would be the expert.


No. You're making assumptions.

Read the entire comment.

Has anyone at HN used either IBM Watson or Wolfram Alpha to build a real (commercial) app? It feels like there should be a whole wave of apps built on either of these technologies but it doesn't seem to be materialising.

What is holding back the killer apps for answer/computation engines?

Google / Wolfram Alpha / Watson are great at answering broad questions.

1. "Who was the 12th president?" - Zachary Taylor 2. "What color wine is cabernet sauvignon?" - Red 3. "Is a ferret a rodent?" - The ferret is the domesticated member of the Order Carnivora, Family Mustelidae and Genus Mustela. A common misconception is that ferrets are rodents.

The real challenge is answering niche questions:

1. What size are the OEM rear wheels of a Honda S2000? 2. How can I fix MySQL error 1064? 3. How do I remove wine from a macbook?

These types of questions aren't answerable by a simple mining of Wikipedia or Encyclopedic knowledge. They represent niches within our society (S2000 owners, programmers, people who spilled wine on their macbooks). Google provides excellent links to pages that contain answers to these questions, but it cannot deduce a single answer or common response. This is why sites like Answers.com, Yahoo! Answers, StackExchange, etc. can flourish, but it's also why an NLP question and answer system is very difficult.

I've been working on a system to mine existing responses to questions - http://gotoanswer.stanford.edu - I only have a small subset of programming-related questions (~10M), but you can get an idea for what I'm trying to do by searching for "How do I remove wine from a macbook?" You'll see that there are results for removing wine the liquid and WINE the windows non-emulator.

You should put your contact details in your profile (In the "About" section - your email address isn't publicly visible).

Anyway - I'm really interested in this area. I have Q/A system built that can answer (some of) the broad-type questions you mention.

I think grouping Google/Wolfram/Watson together misses that each has their strengths and weaknesses, and that they take dramatically different approaches.

Google traditionally relies on ranking information it finds to answer questions (though the whole knowledge graph thing is moving it closer to what Watson does).

Wolfram relies on manual curation of facts and probably the best "calculation" engine of the three.

Watson relies on manual curation of sources, and automatic extraction of facts and ranking of them.

I think it's quite interesting that Google is moving to a model more similar to Watson.

Anyway - I'd love to hear about your approach and what you are doing. My contact is in my profile.

You bring up a good point, but it seems as if Watson was designed with this in mind. If you notice in the JSON response, it lists this query as a factoid class.

It may handle different queries with different attributes differently, such as focusing on certain portions of its corpus or changing what aspects of its search results are more heavily weighed.

A query identified as a factoid might be researched and judged very differently than something a bit more nebulous, such as a comparison, or something with more specificity like the examples you listed.

Admittedly, I am basing quite a bit off of one example response given in their documentation, but it is an intriguing clue as to how Watson will handle that aspect of understanding which info to discern.

Factoid is a word coined by Norman Mailer for "an item of unreliable information that is repeated so often that it becomes accepted as fact". http://en.wikipedia.org/wiki/Factoid

In QA literature it is often used in the "unverified fact" sense (which is also mentioned in Wikipedia article).

The Wikipedia article says it well:

A factoid is a questionable or spurious (unverified, false, or fabricated) statement presented as a fact, but without supporting evidence, although the term can have conflicting meanings.

I've heard the turnaround for a Watson implementation is about a year. IBM is just too slow, that's why the flood of apps isn't yet materializing.

Apps are definitely coming, I can't speak for other companies but for our app we're starting a private beta very soon and the launch/announcement is set for early next year.

We are looking to use wolfram alpha for getting info on stock values and machine learning.

It seems very useful for this kind of business.

I am helping a customer integrate Watson into their system so I am very happy to see the news about BlueMix (https://ace.ng.bluemix.net/) that apparently will allow me to keep experimenting with Watson after my consulting engagement is complete.

If you read the documentation, you will see that preparing training data and questions is fairly straightforward.

So you just ask it any random question and it knows everything? Or only things that come up on Jeopardy?

I don't see an API for feeding it information.

I would assume that they are working closer with companies that they trust to feed in information. Having that on an what will eventually be a public API is recipe for disaster.

Each instance of Watson is unique and has to be trained as such based on it's "corpus" (set of data) and actual feedback on the quality of it's answers by experts. The public API sounds like it will allow access to specific flavors of pre-trained and data-filled Watsons, like the food- recipe one or some basic medical ones.

I found the link above to be a bit useless as it jumps right into getting answers with evidence. Here's a better overview link: https://developer.ibm.com/watson/docs/developing-watson-apis...

I sent a e-mail to my co-workers containing "...natural English to ask Watson..." and somehow people read it as "You can ask Emma Watson, who is English, a question and she will respond".

And I thought, "...close enough - Watson could answer questions about Emma".

Does this also give access to their cooking and recipe data?

Edit: http://www.ibm.com/smarterplanet/us/en/cognitivecooking/

Ive been looking into Watson's new application to analytics etc. How would that compare to say Mathimatica or the Wolfram Language/Data Science Platform?

Looking at that example, I wonder why that Porcaro quote is listed as evidence. It doesn't relate to Jackson's album at all.

"Questions" in the documentation without question mark (?) seem somehow wrong to me.

Waiting for my coffee to brew, I read that as "Emma Watson API".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact