

WinFS, Integrated/Unified Storage, and Microsoft - wolfgke
http://hal2020.com/tag/winfs/

======
amaks
Few data points from someone who actually worked in the latest iteration of
unified storage, WinFS (2004-2006).

The first problem was the lack of clear vision, contradictory requirements
(AKA solving all world problems). For example, the data model had to be
changed very late in the game. The previous data model was incredibly complex.
It had entities of different types, links between them (of different types
IIRC), it allowed multiple entities to be a parent of another entity, and from
security stand point this was not solvable in any meaningful way. Shame that
hundreds of people written tons of code for the flawed model without even
realizing that it won't work, despite all high level architects involved into
the project. At the end they realized that they couldn't possibly ship that,
and decided to do a micro reset of WinFS' data model (around mid of 2005).

Second problem was SQL Server (actually a fork of what became SQL Server 2005)
which was optimized to work as a dedicated service on a server machine and
required lots of effort to massage it to work relatively well with other
resource consumers on an average consumer grade machine.

Third, WinFS APIs were all managed and Longhorn Explorer was written in
managed code as well. The whole thing was slow as hell and extremely unstable.

By the end the teams working on WinFS and related projects were fairly
motivated though, we could see the light and the decision to kill WinFS (and
do Longhorn reset) was simply a matter of time.

Ironically, the news of cancelling the project came the next day after the
team declared WinFS Beta 2 ready. Now I think it was the right decision.

~~~
yuhong
>Longhorn Explorer was written in managed code as well.

So can support for managed shell extensions (which is currently not supported)
be added without porting the entire Explorer to managed code?

~~~
ygra
Visual Studio, SQL Server, and Office support that already and I doubt they
are completely managed code.

------
twoodfin
Great stuff. I don't know how I missed this when it was first posted. I hope
Hal gets around to Part 5 at some point.

From this history, ISTM that WinFS was doomed as soon as they decided to build
it on top of the "Mighty Mouse" embedded SQL Server. SQL Server is, by all
accounts, a well-designed piece of software, but something doesn't become
"platform layer" just because you say it is. I don't blame Microsoft's other
BU's for not wanting to put what is essentially another application layer
between their code and its persistent data.

On another note, the proposed client use cases 10-15 years later read as
almost quaint. The vast majority of photo management these days happens on the
web and in HTTP-using client applications, where sharing, not editing, is the
killer feature. The right client bet was not to help users manage data on
their own PCs, but to take over that responsibility for them. Through SkyDrive
and other "cloud" capabilities, Microsoft can do that today, but imagine if
they'd started 12 years ago. Facebook only exists because they got photo
sharing "good enough"; there's no reason Microsoft couldn't have beaten them
to it.

~~~
csmuk
_> The vast majority of photo management these days happens on the web and in
HTTP-using client applications, where sharing, not editing, is the killer
feature._

You'd be surprised. There are many of us who don't do _social photography_ and
use desktop photographic software and Windows' built in image management
stuff.

And Microsoft make more money out of us with Windows licenses than Facebook do
with _social photography_.

The right client bet was the _choice_ of doing it where you please, which they
offer(ed).

However, they're pushing everything onto SkyDrive now in an "everything
everywhere" approach which is a bad move.

Surprisingly Apple and DIY (Linux or own software) are the only people
offering choice still, or at least not nagging you about moving to the cloud.

~~~
twoodfin
I know there are a lot of you, and there always will be. But how many more
Windows Phone licenses would Microsoft be selling if they were _the_ online
photo sharing platform instead of a distant also-ran?

WinFS was the wrong strategic move because cross-device availability and
network effects dwarf whatever value you can get out of a local store. It's
exactly the reason web mail won for consumers.

~~~
csmuk
Note: I own a windows phone. It syncs photos with SkyDrive, allows you to post
to facebook etc.

It allows choice. Suprisingly, Windows has always allowed choice.

Web mail won for consumers for a bit. Now it's heading towards apps and sync
which are omnipresent on all platforms now.

WinFS was replaced with metadata indexing which didn't incur the system-level
performance hit. Metadata indexing is there on iOS, Windows at least.

Edit: I don't actually take photos with my phone unless it's a quick shot of
my car parked to make sure I don't get a dodgy ticket.

------
kalleboo
It's interesting to contrast this to what Apple ended up doing in OS X - HFS
on the bottom with some indexed, extensible metadata bolted on, then the Core
Data API for structured storage backed by SQLite, but still represented in
"files". I wonder how far Core Data could be stretched - could they replace
all of HFS/a traditional file hierarchy if they were willing to kill (or hack
in backcompat) support for legacy formats?

~~~
csmuk
Actually they did the same on Windows Vista and above.

------
yajoe
I love this read, and I would love to read entire books of people from the
trenches making software. However, most of the specific details Hal cites
are... well... dated from someone who learned the craft in the 1980.

This line struck out to me as especially myopic:

> _Most of the world of commerce we are used to was made possible by the
> creation and growth of the concept of Structured Storage. The modern world
> of Credit Cards and ATMs is 100% predicated on this work. Amazon.com was in
> the realm of science fiction in the 1940s. By the 1970s the conceptual basis
> for everything you needed to create it was in place. It took until the 1990s
> for those concepts to mature sufficiently to let Amazon happen._

I can't put my finger on why this seems wrong, and it could be such a strong
contrarian view I need a moment to accept it. I feel like the 1990s .com boom
AND the modern social boom may depend on structured storage, but structured
storage certainly isn't the sufficient condition.

Reading through Hal's prose, it struck me that HTTP is the very thing Hal set
out to build, and it seems that since he was so focused on file systems he
missed another obvious technology choice from which to draw. Would it have
been dog slow for every application to make local HTTP requests to read files?
Heck Yeah. Is it insane? Yeah. Would it have been slower than what they built?
I don't know...

And HTTP was so well understood and supported it would have allowed
applications to start to mix content from local and remote sources, what we
effectively have with pure JavaScript apps today. But HTTP was a standard, and
Microsoft from those days was allergic to standards. Alas, someone will
eventually build a JavaScript-based OS that treats all files as HTTP
endpoints.

And then we'll get photos to sync with metadata. Just saying.

~~~
MichaelGG
Can you explain a bit of how HTTP would actually help here? HTTP doesn't
magically fix the issues of mixing content local and remote (latency,
availability, etc.) This sounds like "throw more XML at it" except with HTTP
instead.

~~~
yajoe
This isn't as serious of a proposal as some of the W3C documents on how XML
and WS-* works with my name on it (among many, many other smarter people)...
and I have a lot of sympathy for someone who had to deal with Win32 APIs (they
are locally optimized but globally bad).

I have no love for XML, but the details tend to matter. And you are right that
saying "use HTTP" is a bit hand-wavy. XML is great at serializing _nouns_ when
you want to enforce the schema of those nouns. It makes interop of nouns,
verifying, quantifying, and some types of searches must faster and consistent.
XML was a reaction to widespread RPC and endless bit-order compat that wasted
so many lines of code. It comes from the same mindset as the people who made
SQL -- "conforming to schemas is good and what most people want."

However, in the last 10 years we've seen that it isn't possible to conform to
a single schema as requirements change, and that is why XML has generally lost
favor to JSON. This is a similar reason why NoSQL wins in many cases over SQL.

HTTP, in contrast to XML, is a set of _verbs_ (called methods) and
_identifiers_ (typically urls), which is similar to what a file system is. It
leaves the _nouns_ (the body in HTTP) to the application, but it does promote
some properties (headers in HTTP) and have conventions for common properties
(content-type). The big difference between HTTP and most file systems is that
HTTP is stateless, whereas many file operations are stateful (get a handle to
a stream, write to a stream, close the stream).

HTTP would work as the API for a file system because it provides pretty good
addresses for both local and remote and relatively low-level operations.

HTTP also has the benefit of being widely adopted, even during the Cairo
development, which would have solved the chicken-and-the-egg problem from the
first essay.

Using HTTP as a file system has key drawbacks: Applications would have to be
re-written to use both the new APIs and new mindset of possibly high-latency
operations. You can't always assume that a particular endpoint will be
available, unlike many assumptions about inodes.

So, do I think they _should_ have done this? Maybe, there were a lot of
variables at play. But, I don't think it is a crazy idea to use HTTP as the
file system, and I predict we will see a popular -- nay, credible -- operating
system use it within the next 5 years.

 _And I also think it 's telling that even in the post-mortem hindsight, the
author fails to see alternatives that were widely available in the industry
because they weren't invented at Microsoft._

~~~
MichaelGG
I must be missing something. You could trivially wrap HTTP around NTFS files,
and stick whatever header metadata in an alternative stream. I don't see how
this solves anything at all about what Hal talks about regarding integrated
storage. How does HTTP solve even the simplest of problems that the articles
talked about? Such as a photo being in multiple collections?

As an aside: Your statement on XML versus JSON seems confused. XML and JSON
don't require schemas. XML allows it, and JSON has multiple (IIRC) contenders
to specify a schema. Because as it turns out, people dislike having to write
boring boilerplate code by hand and would prefer a system to specify common
things. (And eventually JSON'll come full circle with something similar to
WSDL.) JSON's popularity is half JS, and half because XML foolishly requires
the tag name in the closing tag, bloating the size, and half due to overly-
complicated uses of XML, especially namespaces. Sane use of XML is identical
to JSON, except requires more space.

------
kazagistar
I was hoping for more analysis for why WinFS failed from a technical
perspective, but unfortunately it ended with a bunch of boring internal
political maneuvering in microsoft.

------
alkonaut
Sad to see how close we were to a windows release with the CLR being a first
class citizen and a revolutionary storage system as well.

Fast forward a decade, a couple of major windows releases, and we are still
doing C++, NTFS and COM.

------
yuhong
Personally my favorite is the Jet 4.0 story and how the Access team ended up
having to fork it for Access 2007. I think for a year or so, _any_ development
on Jet 4.0 was stopped until it was eventually resumed due to security.

~~~
i386
Do you have a link to that story?

~~~
yuhong
I don't have a single source, but it is not hard to figure out. Some of it is
in MichKa's posts on Google Groups, and notice for example the >1 year long
gap between Jet 4.0 SP5 and SP6.

------
andor
_“I know I saw a spreadsheet a couple of weeks ago; when I want to find it
again do I look in my file system or do I look in my email?”. Bill was trying
to make multiple points with this simple example, but the primary one was not
that there should be a way to search across disparate stores._ (from part 1 of
the series)

My first thought was: put your mail in a Maildir instead of that pst file and
you have "unified" storage ;-) It seems they wanted to build something like
the opposite of Unix: "everything is a (structured) database" instead of
"everything is a file".

~~~
viraptor
Why would Maildir help here? Unless the file itself was saved separately and
linked from the email in an app-specific way you still wouldn't be able to
open it directly in your spreadsheet app. (excluding fusefs-style magic)

Unless you have a well-defined structures stored somewhere, you just get
extensions on top of extensions... see things as simple as 'ar' files for
example. Also Maildirs have lots of different formats and ways of indexing -
each app treating the files as "just files" and doing it in a very specific
way :( Sometimes I really appreciate the "here's an API, you will never know
where/how it's stored and you shouldn't care" approach.

~~~
andor
_Unless the file itself was saved separately and linked from the email in an
app-specific way you still wouldn 't be able to open it directly in your
spreadsheet app._

That's true. You _might_ be able to find the right mail file using text-based
tools, if the mail isn't MIME encoded, but then you still don't have the file.
I just found it fun to think of it as the opposite of Unix.

The Nepomuk framework in KDE seems to have similar goals by the way. They kept
their database separate from the filesystem, but integrated it in quite a few
KDE applications. Nepomuk uses a graph database, so the schema is easily
extendable. But apparently they don't even use all features of the current
schema, only rating, tagging and comments appear to be consistently
implemented in applications.

[http://userbase.kde.org/Nepomuk](http://userbase.kde.org/Nepomuk)

------
philsnow
Reading these articles, I'm struck by the refusal to admit influence from
other systems (or even point out how Windows' features have inspired the same
in other systems).

> For example, Windows had implemented the TransmitFile function for
> optimizing transmission of files from a web server by doing all the work in
> kernel mode.

Isn't that just sendfile(2) from BSD ?

> An application could attach another stream to the file to hold whatever it
> wanted. The file system guys really didn’t care, and they didn’t interpret
> the stream. So this was a very natural extension.

Isn't that just forks from HFS/\+ ?

I actually don't know the history behind all of these things, so the influence
could have been the other way around (or the features could have been
developed completely independently and contemporaneously).

 _Edit: just saw the nod to DEC, so it 's not that they're completely absent,
just (IMHO conspicuously) missing in some places._

------
pak6
is this helpful for wedding photography like this
[http://www.shakkyastudio.com/](http://www.shakkyastudio.com/)

