Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Computable Document Format (wolfram.com)
129 points by franze on July 21, 2011 | hide | past | favorite | 79 comments


This is a recurring problem with Wolfram: he honestly doesn't seem to be aware of what is going on in the world at large. When he came out with 'A New Kind of Science', everyone in academia was aghast that he had spent 10 years of his life reproducing research that was readily available in existing academic papers (not that NKOS wasn't a great book, but it was hardly new science). Now he's pushing a technology that indeed looks as if it belongs in 1999, as though he's completely oblivious to the ongoing evolution of HTML5 and Javascript.


Let me disabuse you of that notion: Stephen is more aware of what is going on the wider world than anyone else I know or have ever met. Like him or hate him, he's a true polymath. Our meetings will often wander off to the topic of famous Silicon Valley implosions, or Feynman stories, or the tales from the Institute of Advanced Studies, or the future of augmented reality.

So, with some knowledge of the man, I can say honestly that the charge that he's oblivious of the evolution of HTML is completely laughable. A random illustration: Wolfram Research was one of the first companies to go online in the early 90s (as it happens, Tim Berners-Lee is a long-time Mathematica user). An amusing story: it was also one of the only companies to survive the original Morris worm unscathed, owing to deliberate use of obscure Japanese computers for WRI's gateway.

While I can't talk about unannounced technologies, I can say that HTML5 plays a pretty crucial role in our future technologies. In fact, CDF will eventually have a server-side incarnation that relies on HTML5 for client-side interactivity.


Wolfram seems like brilliant guy, and Mathematica is certainly a brilliant achievement. But it nonetheless seems backward-looking to introduce a dynamic document format that requires a proprietary authoring system and a proprietary player at this point in history.


Taliesinb, you really should be disclosing the fact that you work for Wolfram Alpha, in this comment thread. It is not at all obvious from glancing at your profile.


I thought his comment implied it. In addition his profile has his website which makes it quite clear he works for Wolfram (about and github): http://taliesinb.net/

If someone posts a link in their profile and you don't follow it, they can't be held accountable.


"he's completely oblivious to the ongoing evolution of HTML5 and Javascript"

I sincerely hope he's also dead wrong on the acceptability of a single company owning your communication in 2011.


> Now he's pushing a technology that indeed looks as if it belongs in 1999, as though he's completely oblivious to the ongoing evolution of HTML5 and Javascript

I don't think you looked very closely at CDF. It looks like it has a very large fraction of Mathematica in the player. You aren't going to get anywhere close to what you can do in CDF via HTML5 and JavaScript without a truly ungodly amount of work, and it would still be unusable for many things due to performance issues.


You aren't going to get anywhere close to what you can do in CDF via HTML5 and JavaScript without a truly ungodly amount of work...

The same argument was being made in favor of Flash a few years ago. Eventually it turns out that someone does that 'ungodly amount of work.' Why shouldn't it be Wolfram? He would have a leg up on everyone if he embraced the new standards and opened up the Mathematica walled garden a little.


ongoing evolution of HTML5 and Javascript

I don't think so. I think it's more likely that he's fully aware of said evolution but appreciates that most people are not developers and want a nice self-contained live document format that they can email as a single-file attachment if need be.


most people are not developers

Most people don't have a copy of Mathematica lying around either, and it appears to be a requirement for authoring CDFs: http://www.wolfram.com/cdf/faq/ .


Most people want a self-contained document format in order to read and share content rather than to write or modify it. That's why people have most of their music in mp3 format rather than as a collection of multitrack clips and mixdown parameters, why most people swap pictures as jgg files rather than as photoshop documents or a collection of TIFF and LUT files, and why most people watch movies on DVD or in some single-file digital format instead of as a collection of film clips that need to be rendered overnight before watching, and why most people like printed books or single-file ebooks instead of printer's galleys and versioned markup documents. Indeed, this is why most people like sitting at a table in a restaurant instead of going into the kitchen and making the meal themselves.

Look, I write and create multimedia content, I like powerful composition and editing tools. But managing all the structural information and assets for a large compound document or media project is a lot of work, the kind of work for which I prefer to be paid or rewarded in kind. When I'm just consuming and sharing the work of others, then I don't want to do all that work and I prefer a nice self-contained package that doesn't impose any administrative overhead. When it's as easy to store or share a HTML+CSS+JS document, online or off, and have it appear exactly the same way in a completely device-independent manner or even as the output of a printer, then I'll sing its praises. What I do not want is a bunch of extra files to keep track of for a single document that I just wish to add to my library and might not open again for a year. I am often much more interested in the content of a document than in the ability to edit, deconstruct, or radically reformat it.

When I was younger, I cared more about having control over things like typesetting, page flow, and other design issues, and also cared more about everything being as editable as possible. Now that I'm older I'm far more concerned with what a document is about than with how it looks. So I tend to open Picasa ten times for every time I launch Photoshop or my Camera RAW editing software, and I tend to read PDFs in the browser or in Acrobat reader far more often than I run up the full Acrobat environment to make a PDF file.

I suggest that you focus less on how you would do things differently if you were Stephen Wolfram, and more on whether his CDF format might open up some new economic opportunities, the way that PDF files have leveraged simplicity into ubiquity.


...whether his CDF format might open up some new economic opportunities, the way that PDF files have leveraged simplicity into ubiquity

I'm suggesting that Wolfram is trying to reproduce the success of the PDF and Flash business model at the very moment that that model is in decline.


I'm not sure that PDFs are in decline at all. They're so ubiquitous that embedded renderers are now built into modern browsers.

Similarly, Flash has hardly gone away. Of course, 1) Flash has had persistent stability and performance problems, 2) Apple is waging a very public battle against it via iOS devices, and 3) its only real killer app, streaming video, is being folded into several (competing) standards. But it still has massive penetration.

The thing is, Mathematica is too big and too rich to ever achieve standardization. If you want the ability to easily inject graph theory, computer vision, symbolic statistics, non-linear optimization, discrete math, control theory, etc etc into your documents, CDF is unmatchable.

Now, without seeing with your own eyes the kinds of crazy stuff you can do with Mathematica, you might well be skeptical, but the next few months will change your mind, I promise.


> The thing is, Mathematica is too big and too rich to ever achieve standardization.

You need to disclose the fact you work for the company that makes Mathematica.

I also disagree with this whole notion: There's no upper limit on what can be standardized. Unless you're implying a patent portfolio or some other anti-competitive practice, of course, which is another notion entirely.


This is not an HTML replacement, but a PDF replacement. HTML has never been, and probably never will be, a popular format for static documents.


One things that is missing from html is a "jar" like format so that an entire document can be contained in a single file (without data: hacks).

It seems that all we need is an extension, I propose ".htz" and a mime type, say "application/x-html-bundle". Browser's would just download it and open the containing "index.html" (at the top level or inside a unique top level directory) and open other assets relative to that.


Perhaps you're thinking of the ".mhtml" file-format, supported by IE since before IE6? It saves a web-page and associated files as one MIME multipart/related stream, much the same way that images are included in emails.


I wouldn't be so sure that HTML isn't becoming a PDF replacement. And it seems odd to pitch a dynamic document format as a better type of static document. The whole point of CDF is that it isn't static.


PDF has static contents and a static interface. CDF has static contents and a dynamic interface. HTML has dynamic contents and a dynamic interface.


Not only that, the only interesting thing from ANKOS was plagiarized by Wolfram from one of his research assistants: http://en.wikipedia.org/wiki/Matthew_Cook#Work_with_Stephen_...


This also happens to be untrue, to put it generously.

As far as I understand the story, Stephen told Matthew: here, this CA seems rich enough for universality, can you prove this for the book? Matthew duly proved it -- a tour-de-force proof, to be sure. Then Matthew broke his NDA by publishing the result early. Stephen litigated to avoid it becoming public prematurely (it was, as you say, the center-piece of the book).

I'm not privy to all the ins and outs and what-have-yous, but that seems fair. If you tried to publish something behind your advisor's back in an academic setting, you'd have a lot to answer for.

Matthew eventually did publish the Rule 110 proof under his own name in Wolfram's own journal.

I speak for myself, not WRI, here.


    If you tried to publish something behind your advisor's 
    back in an academic setting, you'd have a lot to answer for.
Hmm, but similarly if I made a contribution to a breakthrough, and my adviser was going to publish it without my name being associated with it at all, I would feel cheated. I think that's probably a better analogy.

It is a very strange setup where Stephen is claiming he created the work done by others. Legal, certainly, but not very nice. I guess I wouldn't care to work for one of the most egotistical people on earth, however smart he is.


Do you have any evidence that Stephen planned to avoid crediting Cook? I've never heard, seen, or read Stephen claiming the proof as his own. In fact, Stephen tells quite a detailed history of the proof in the notes to NKS, which you can read here http://www.wolframscience.com/nksonline/page-1115c-text. A short excerpt:

"His initial results were encouraging, but after a few months he became increasingly convinced that rule 110 would never in fact be proved universal. I insisted, however, that he keep on trying, and over the next several years he developed a systematic computer-aided design system for working with structures in rule 110. Using this he was then in 1994 successfully able to find the main elements of the proof. Many details were filled in over the next year, some mistakes were corrected in 1998, and the specific version in the note below was constructed in 2001."


I certainly don't know anything about this from firsthand knowledge. The credit you cite in ANKOS happened long after their lawsuits and falling out, so there's no way of knowing how things would have transpired if Cook just went along with things.

I had read this well known review:

http://cscs.umich.edu/~crshalizi/reviews/wolfram/

"The real problem with this result, however, is that it is not Wolfram's. He didn't invent cyclic tag systems, and he didn't come up with the incredibly intricate construction needed to implement them in Rule 110. This was done rather by one Matthew Cook, while working in Wolfram's employ under a contract with some truly remarkable provisions about intellectual property. In short, Wolfram got to control not only when and how the result was made public, but to claim it for himself. In fact, his position was that the existence of the result was a trade secret. Cook, after a messy falling-out with Wolfram, made the result, and the proof, public at a 1998 conference on CAs. (I attended, and was lucky enough to read the paper where Cook goes through the construction, supplying the details missing from A New Kind of Science.) Wolfram, for his part, responded by suing or threatening to sue Cook (now a penniless graduate student in neuroscience), the conference organizers, the publishers of the proceedings, etc. (The threat of legal action from Wolfram that I mentioned at the beginning of this review arose because we cited Cook as the person responsible for this result.)"

That doesn't put things in a very good light. I have no idea if it is accurate or not. I just found your initial attempt to paint Cook as the bad guy unfortunate. Why would Cook have tried to publish on his own? That would be very unusual, and the simplest explanation is that it must have been some sort of disagreement over academic credit. Why else would he do that?


I'm not saying Cook was a bad guy. I don't think it is necessary for there to be a bad guy, despite Cosma's rather bilious essay. Would Wolfram have tried to claim the result as his own? Maybe, maybe not, I don't know, but it doesn't seem consistent with what I do know about him. I'll ask Stephen the next time I talk to him.


There is something like this for HTML + CSS. It's called tangle.js and was created by Bret Victor: http://worrydream.com/Tangle/


Very interesting. I really like the concept of draggable digits. They are more compact than sliders, easier to make changes to and give better feedback. Now I wish this was a standard UI element / HTML input type.


Totally off topic, but that is an amazing Javascript library. Thank you.


That is absolutely incredible. Thank you!


This is a fantastic project. Why has it been kept secret? :-)


That's very nice. Worth a separate HN post, I think.



Sure, it says "publicly available, openly documented", but I can't actually find a file-format spec anywhere.

Given that CDF files seem to be able to do most things Mathematica can do, I'm guessing that might mean "publically available to anyone who pays us for a Mathematica SDK licence".


>> Wolfram currently provides the CDF specification as a public format, meaning it is publicly available, openly documented, and natively unencrypted.

Sure, but where is it?!

Good luck finding it: http://www.google.com/search?q=Computable+Document+Format+sp...


And they don't state if they hold any patents that covers CDF.


I would imagine any company with a decent size patent portfolio has one that covers this.


Interesting notion that you have to pay a license fee if you want to charge for the document. This means that if you want to do a consultancy report using CDF you have to pay a separate fee even if you already paid for the authoring environment (Mathematica). Maybe Microsoft should pursue this licensing model?


Example CDF files can be opened in any text editor.


I'm really not for anything that can't readily be done in HTML5+CSS3+JS atm. Especially If i need to make people _download_ a 'player'. What is this, 1999 all over again (some might say yes ha).

I can see big corp. getting excited over this just like 99% of the crap they buy and don't use.

That said, this doesn't exactly move forward with global progress on the problem or solution (as well it shouldn't wolfram is a corp after all... in the business to make money).


I don't think the average person on HN is the target audience for this in the first place. If you know HTML5 and Javascript and work in a web-based environment most of the time, then this probably won't help you much.

On the other hand, if you are, like me, one of the many people in academia who would love to have a way to embed interactive graphs and tables into your papers to make them more understandable for the reader, then this is potentially very useful. Especially since very few people in academia (outside CS departments anyway) know or have any interest in learning HTML5, and many already use or have easy access to Mathematica to begin with.

Also, Wolfram being a corporation is irrelevant. Some problems are best solved by large corporations trying to make money. That may or may not be the case here, but they definitely have an incentive to keep the format open (in some sense) and promulgate the free CDN reader (much like Adobe has done in the past).


That's exactly right. If I could elaborate a bit:

For sure, web developers and programmers can do most of what CDF can do, on their own, in HTML5. It might take them a lot longer, but they could certainly end up with a nice finished product.

This doesn't solve the problem. The problem is neatly illustrated by the fact that news organizations, which have a huge incentive to make compelling, sticky interactivity that wraps their news properties, haven't gone for it. I've only seen two non-trivial uses, the NYT and BBC News, and its clear these were bespoke jobs that cost them a lot of money.

The same goes for textbook publishers, scientists, NGOs, etc, anywhere were technical communication could be significantly improved with interactive documents. This problem remains unsolved.

CDF aims to make it possible for someone to crank out an interactive figure or document in a matter of hours, not weeks, with very little code.

A side comment: I say this without any real proof, but WRI specializes in doing interesting things that are economically self-sustaining, rather than things that make a lot of money. Mathematica is far from a cash cow, and WRI is a small company (~500 people), but it has lasted 25 years, and it regularly adds cutting edge technology to its portfolio. Obviously, it gets to balance profit with "interestingness" mainly because it is privately owned, and Stephen likes collecting interesting people and interesting projects for them to do.


Wow, taliesinb is making a lot more sense than you guys.


I think a better solution is to develop a standard HTML5, CSS and JavaScript combo that researchers can use to publish articles online. The requirements are that using it should be no more difficult than writing papers in Latex, and the layout results should be equally as good.

I've thought about this for a while now. It seems terribly silly for us to spend so much time formatting our papers as pdfs when people mostly read them on a computer anyway. I've been to conferences that - smartly - don't even print the proceedings. They just hand you a USB key with all of the papers and a table of contents in HTML that has authors and titles that links to the correct file.


What we really need is a latex compiling tool like pdflatex except the output is a series of HTML files. And while I know that there exists some versions out there, we really need one that doesn't suck, along with internet appropriate style files, that just spits out html+javascript+css.


But as an end-user, I don't want a series of files. Source code is a major pain in the ass if you're not actively working on it. the whole win of PDF for most people, including me most of the time, is that I don't want to edit the document, but everything is neatly contained within a single file that is ready for printing or viewing with no additional libraries, dependencies, or anything else but a PDF viewer.

I have a few thousand pdf files on my hard disk and already find it hard to manage the collection. If every single on of them had a source tree it would make me cry.


If you're willing to go through a build step, it's fairly easy to get everything into a single document. Sproutcore does it today with no special configuration. There are tricks to getting large amounts of content to load in a performant manner and you'd have to do history API to get transparent support for an entire site but neither hurdle is particularly high. You don't even have to invent a new format for specifying what makes up the site since the web archive standard already exists.


Multiple files is an implementation detail. When navigating through Finder in OSX, "applications" appear to be monolithic, and double-clicking on them will launch the application. But, they are actually just directories with the .app extension. You can navigate through them and inspect the files and subdirectories they contain. But you don't have to.


i think .mht file format does that for HTML/JS/* minus all the graphs


I know. I rather miss the option to just save a web page as a self-contained file in Chrome, because I used to use that all the time. It's not that you can't do the same things with web technologies, but that it's so much less convenient to do so. I think this is why Instapaper and Readability have become such big hits, because they offer simplicity and one-step document management. PDF does the same thing for more complex documents that need to be presented in a consistent fashion. This CDF format looks like it could be ideal for textbooks, instruction manuals and the like.

There's another comment downthread where I talk about why I don't actually want all the possible editing capabilities to be available most of the time. Sometimes you want to maximize convenience rather than control.


Maybe that's what you really need. CDF goes way beyond latex -- the idea is not to try to replicate ordinary paper-papers in a digital format. But to have programs that explain things through interactivity.


I appreciate your thoughtful reply!

One (of the many i'm sure) problems here you defined are: "...who would love to have a way to embed interactive graphs and tables into your papers to make them more understandable for the reader..."

The solution however may not necessarily be a private 'reader', 'player', or 'binary'. Though the 'format' may be 'open' this single implementation isn't.

You bring up Adobe which isn't a very good business case to follow as the very reasons I brought up are pressing the industry to move toward open implementations[1][2], away from 'free' binary distributions and plugins (for a various list of reasons from security[2], to access on other systems/apps).

This really isn't 1999, and CDF isn't yet a popular nor a de-facto standard[1] much like PDF was. Besides you and I are much more capable these days with newer and _open_ technologies are we not (browser, linux, open documents, etc...)?

Speaking of documents, other examples of 'perceived' open standard files are Microsoft's ill faded OSP[3] promise which spawned traction for OD/F[4] and other.

These reasons are why HTML5/CSS3/JS as a basis to create an open two way street for 'documents' and/or formats are so powerful. It is not enough to only provide an 'open' format, but also an open implementation. This way both use of and implementation of such product be beneficial toward academic progress. Why wouldn't that alone be worth it?

Thus a likely more popular solution I am proposing to your problem could be a service that uses HTML5/CSS3/JS in the delivery which solves your problem in a WYSIWG general user manner. Especially so as the very tools (your browser, and a million libraries must I really list them all?), UIX experience, and entrepreneurs (HN! Y-comb!) already exist!

To your quote "Some problems are best solved by large corporations trying to make money" I would say the same to "Some problems are best solved by small groups of entrepreneurs or open source developers trying to make money and/or looking for peer fame."

I would go further in saying that for 'open' standards and implementations, that small group of entrepreneurs or open source developers are the ones carrying the torch of open-ness [5][6][7][8][and on and on...] and innovation.

[1] "was originally a proprietary format controlled by Adobe", http://en.wikipedia.org/wiki/Portable_Document_Format

[2] http://andreasgal.com/2011/06/15/pdf-js/

[3] http://en.wikipedia.org/wiki/OpenDocument

[4] http://en.wikipedia.org/wiki/Microsoft_Open_Specification_Pr...

[5] http://blog.documentfoundation.org/2011/06/01/statement-abou...

[6] http://www.infoworld.com/d/application-development/oracle-ha...

[8] http://tirania.org/blog/archive/2011/Jul-18.html


RWW wrote: "...I doubt other companies will want to or be able to catch up to Wolfram in the sophistication of the tools they offer..."

If anyone would like to join up and create an alternative I'm game. Lets do this.

My full info is in my profile.


Additionally, the CDF player desktop app for Linux is a 200 MB behemoth. When I saw this, I immediately canceled the download. (For comparison: Adobe Reader is about 70 MB.)


Meta comment, when I read it I thought 'you've invented a spreadsheet interchange format?'

Having looked at the technology in depth I can appreciate the notion of building into the document the computation that went into it. This could be a killer way to distribute component data sheets where all the graphs were 'live'.

That being said, it scares the crap out of me. Why? Because I've got Frame documents which are unusable (there is no available version of Frame which will read them, and no legal way to obtain said version) This is a particularly insidious form of bit rot. I save PDF documents on CD with a self contained C language implementation of a PDF reader that can read them and a set of fonts that work that reader.

Without the equivalent for CDF I worry about having critical information (or simply relevant information) that cannot be viewed or used. At least with PDF if you print it out the paper version is still usable. Not so if you don't have an open source version of Mathematica to interpret what you read.


I'm not sure I'm entirely thrilled about a new document format that requires a 106mb download to view.


231MB for Linux. This is 7 times larger than my browser.


This is very similar to org-mode with babel and latex output. http://orgmode.org/worg/org-contrib/babel/intro.html


Absolutely, although I guess that the advantage of this is if you want to play with someone else's analysis you don't have to install the same analysis software.

I guess I'll be sticking with org mode though and trying Tangle (http://worrydream.com/Tangle/) as suggested elsewhere on this page.


That was my first thought, too.


what does this do that couldn't be done with html5?


HTML5 needs to be geared towards things that websites can do in general, where CDF appears to be geared towards a very specific set of tasks.

In other words, possibly nothing, but your original question may or may not be the right question to ask.


Sell more copies of Mathematica? There is this comparison chart, but it seems to confuse the document itself with the editing framework: http://www.wolfram.com/cdf/compare-cdf/how-cdf-compares.html


> it seems to confuse the document itself with the editing framework

Is it me, or is this a sign the format isn't going to be open in any meaningful sense?


My first thought exactly. HTML is the document format of the future. It seems almost everything can read HTML these days. Here we have an excellent, feature packed, well tested, well documented format that appears to offer everything CDF offers and more. All we need now are applications geared towards delivering the right tools to perform the necessary task.

One of my side projects is a HTML word processor, and every time I look at a common Word processor feature, HTML offers a ton of free and easy to implement ways to make that feature better.


Well, here's something that HTML5 can't do (at least easily).

Dynamic @ Graphics @ Point @ ImageKeypoints @ CurrentImage[]

What does this do? It takes a video stream from your webcam, extracts SURF keypoints ("salient" parts of the image, like corners), and plots them live. Note that all those functions are online at reference.wolfram.com.

THAT'S A CDF! You have access to some heavy duty algorithms covering a vast number of fields. In fact its kind of surprising it fits in a few hundred megs. You're essentially getting the whole of Mathematica, for free.


That's cool, and compared to writing the code to do that manually, it's compact, but I'm having trouble imagining a worse syntax and naming scheme for achieving that task.

Give me this:

stream = OpenVideoStream(device)

render1 = stream.surf() render2 = stream.sift() etc.

display(outDevice, stream)


The infix @ operator is just right-associative application, so a @ b @ c is synonymous with a[b[c]]. It plays the same syntactic role as $ does in Haskell for writing pipeline-style code.


And how many of those CDF files will still be readable in ten years?


So Wolfram Research has been around for 25 years, which is geological time for a software company. It has a (several) niches, and its technology is so far ahead of its competitors that I think the answer to your question is: all of them.


I don't believe you.

If this doesn't take off, Wolfram dumps the project, removes the software from its website and other channels, and the documents are worthless.

If I invest in this, I'm betting Wolfram is going to boil the ocean or is going to be cost-insensitive regarding how much it takes to keep this available. Neither of those sound like good bets.


HTML is the document format of the future.

Yes, I've been saying that since about 1992. I'm not disagreeing with you, mind; I really like the idea of a lightweight WYSIWYG tool that lets me create complex documents without having to fiddle about with code.


This isn't a useful question. With Canvas, HTML5 can do "anything". With Turing Equivalence, any programming language can do "anything". The question isn't what the global space of possibilities is, it's what is easier or harder.


Be authored by regular non-programming shmoes, if you take Wolfram's word for it. It looks like they've discovered an interactivity model that is unusually simple while still being relatively powerful.


From skimming it, it seems to be a shared file without need of a browser. It's like viewing an interactive PDF/Flash in a [CDF] player.


You need a CDF player, which is essentially a browser for CDF documents.


Here is the comparison with other formats (PDF, JS, Flash, etc).

There are many marketing lies in it! For one, it claims that PDF is not embedded in the browser and CDF is yes? "Full page within a browser" is yes but not for JS? "Dynamically interactive charts and diagrams" is partially supported??

Could anyone please explain to me what I am missing here?

http://www.wolfram.com/cdf/compare-cdf/how-cdf-compares.html


Fantastic idea, horrendous implementation, unfortunately. It needs over half a gigabyte of drive space to install the reader, which pretty much kills this for widespread adoption. Great tech insights, though!


I guess they are trying to go on this direction: http://www.executablepapers.com/


Reminds me of PDF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: