

Stephen Wolfram on a .data TLD - LisaG
http://blog.stephenwolfram.com/2012/01/a-data-top-level-internet-domain/

======
sp332
Wolfram's proposal sounds completely backward to me. You'd have to consider:
does google.data apply to google.org or google.com? Should we have
google.org.data and google.com.data?

I think the right way is to put things under the domain. So data.google.com,
or google.com/data or even a META tag on a web page that tells the browser the
URL for the data relevant to a particular page.

~~~
mrlase
> Wolfram's proposal sounds completely backward to me. You'd have to consider:
> does google.data apply to google.org or google.com? Should we have
> google.org.data and google.com.data?

As for that notion, maybe we should switch to naming it such as we do for java
packages :D

Google.com would be com.google.search, com.google.mail, etc :P

~~~
donut
It has crossed TBL's mind:

"I have to say that now I regret that the syntax is so clumsy. I would like
<http://www.example.com/foo/bar/baz> to be just written
http:com/example/foo/bar/baz where the client would figure out that
www.example.com existed and was the server to contact. But it is too late
now."

<http://www.w3.org/People/Berners-Lee/FAQ.html#etc>

------
ken
"If a human went to wolfram.data, there’d be a structured summary of what data
the organization behind it wanted to expose. And if a computational system
went there, it’d find just what it needs to ingest the data, and begin
computing with it."

This sounds to me like a high-level description of how the web is _supposed_
to work today, only implemented using a new TLD instead of HTTP headers.

It sounds odd to me, coming from someone whose major web service sends all
results -- even text and tables -- as GIF.

~~~
Tobu
Good point re Accept: headers, though I think discoverable formats (such as
<link rel="alternate" type="application/rdf+xml" href="data.rdf" />) are even
better.

------
chintan
Problem with this idea is that .data will encourage "data servers" but not
create a "data web"

Allow me to explain.

RDF[1] was created towards solving the "data web" problem. However, the
challenge has been representation and modeling "things" such that we can
cross-link "data" on "web". The language to create such shared representations
(Web Ontology Language[2]) is difficult to use and standardize. Nevertheless,
this approach has been hugely successful in knowledge-intensive domains such
as biology and health care.

On the Wild Wild Web, the microformats[3] have got wide support from Search
engines and web publishers.

1\. <http://www.w3.org/RDF/>

2\. <http://www.w3.org/TR/owl-features/>

3\.
[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146897)

~~~
dantheman
Don't forget the linking open data initiative: <http://linkeddata.org/>
They've been building a huge distributed data set.

------
Kilimanjaro
Are we going to transfer hypertext? No? Then use data://google.com and define
a protocol for GET PUT POST AND DELETE data over the wire using standard data
formats(how about INSERT, SELECT, UPDATE and DELETE?).

The index page will give you all the discoverability and from there you can go
to google.com/employees or bestbuy.com/products etc showing whatever data is
public (or private provided oauth mechanisms) and what can be created,modified
and deleted according to roles and security levels.

This has been tried before but the well was poisoned when they dropped SOAP in
it.

~~~
icebraining
_Are we going to transfer hypertext?_

I certainly hope so; linked data (of which RDF is the main implementation
nowadays) is much more useful than having disconnected silos. Of course, we
won't transfer _HTML_ , but that's just one implementation of hypertext.

Besides, even if we weren't, why would we replace HTTP by something that
accomplishes the same? Doesn't make much sense to me.

------
romaniv
TLDs should help to identify the type of organization that has control of the
domain, not some arbitrary thing about the (hypothetical) website. Why people
are making this so difficult?

------
dougbarrett
So is .data just another way of pointing to an API? So if hacker news had an
API, you would call up news.ycombinator.data all of the time? It would be
awesome if this was the case, but even better if people that had .data domains
came to a consensus of how to document their data, eg. www.domain.data/docs
and have a common layout among websites to make it easier for programmers and
scholars alike to figure out how to access the information they need.

~~~
majmun
API is more general term for programming interface whose function is not
nessecarilly obtaining data. (it could also be updating or deleting data).
this .data TLD as i understand would be just for obtaining data from website
in structured way. for example if google had .data domain it would be
something like this : you enter google.data?search=lady+gagga and it will
return you page in json or some other format for results of that google
search.

~~~
programnature
Yes. Also, almost all APIs are designed around small bits of user or query
data, just like your example.

This seems more intended for bulk data, which is likely going to be some
pregenerated chunk in the MB, GB, TB range and so less suited to the JSON API
call paradigm, and more likely to involve a simple lookup to disk rather than
being computed on the fly from some database.

------
mongol
Is not this what the semantic web strived for? I don't know if a new top level
domain would create enough momentum for it.

------
emmapersky
> I think introducing a new .data top-level domain would give much more
> prominence to the creation of the data web—and would provide the kind of
> momentum that’d be needed to get good, widespread, standards for the various
> kinds of data.

I'd say thats a pretty good reason for using the new TLD, technicalities
aside.

------
SarahSmiles
"And my concept for the .data domain is to provide a uniform
mechanism—accessible to any organization, of any size—for exposing the
underlying data."

Who would be the standards body for defining and regulating such a uniform
mechanism?

~~~
mjwalshe
and why would the people with valuable data eg FT or Bankers or Lexis Nexis do
this

~~~
lurker14
They could put a paywall in front of it.

------
MatthewPhillips
Why did OData never catch on?

~~~
vyrotek
Microsoft made a pretty big push for OData with their WCF Data Services. I
feel like there's a pretty decent community around it too.

<http://msdn.microsoft.com/en-us/data/bb931106>

------
mwsherman
This is less a technical discussion than speculation on human factors. Will a
special TLD inspire people to offer their services differently?

It’s just a namespace, one of many possible choices. But I wouldn’t discount
its importance as a protocol, or an expectation. “.com” has a very important
non-technical meaning.

------
programnature
Bringing everyone’s data as close to “computable” as possible is an all-round
win so I hope this takes off.

A big problem is how to ETL these datasets between organizations, and I think
Hadoop is a key technology there. It provides the integration point for both
slurping the data out of internal databases, and transforming it into
consumable form. It also allows for bringing the computations to the data,
which is the only practical thing to do with truly big data.

Currently there are no solutions for transferring data between different
organizations’ hadoop installations. So some publishing technology that would
connect hadoop’s HDFS to the .data domain would be a powerful way for forward-
thinking organizations to participate.

Another path towards making things easier is to focus on the cloud aspect.
Transferring terabytes of data is non-trivial. But if the data is published to
a cloud provider, others can access it without having to create their own
copy, and it can be computed upon within the high-speed internal network of
the provider. Again, bringing the computation to the data.

~~~
pork
I read your comment several times, but I still don't understand why you think
Hadoop is the key technology for data interchange between organizations. I
don't mean to be harsh, but your comment is a bit like buzzword soup (hadoop,
etl, cloud, bring the computation to the data).

> [Hadoop] provides the integration point for both slurping the data out of
> internal databases, and transforming it into consumable form

Hadoop does no such thing. It doesn't "slurp data out of internal databases".
It's just a DFS coupled with a MapReduce implementation. Perhaps you're
thinking of Hive?

> Currently there are no solutions for transferring data between different
> organizations’ hadoop installations.

All data isn't "big data". By being myopically hadoop-focused, you're ignoring
the real problem, which is data interchange. XML was supposed to be the golden
standard; it's debatable how far it's achieved its initial goal.

> So some publishing technology that would connect hadoop’s HDFS to the .data
> domain

So basically, forsake all internal business logic, access control, and just
pipe your database to the net? When you have a hammer...

> Transferring terabytes of data is non-trivial. But if the data is published
> to a cloud provider, others can access it without having to create their own
> copy, and it can be computed upon within the high-speed internal network of
> the provider

See AWS public datasets for exactly this, but it's still a long shot. It also
ignores the problem of data freshness (i.e., once a provider uploads a
dataset, they also need to keep updating it).
<http://aws.amazon.com/publicdatasets/>

~~~
programnature
Let me unpack it for you then.

There is a reason XML, the semantic web, linked data failed to really change
the data world, whereas hadoop did. The reason is computation.

The problem isn't data interchange formats and ideal representations, the
problem is being able to compute with data. Distributed computation can then
be used to solve all the other problems.

Case in point: Slurping data out of databases. Apache Sqoop leverages the
primitives provided by Hadoop, in terms of partitioning and fault tolerance,
to make it easier to do massive data transfers out of existing databases.

Another example of a solution coming from the hadoop perspective: Avro. It
beats the pants of off XML as a data interchange format, precisely because it
makes computing with the data (which is the ultimate point) easier.

Now, there is a reason I called Hadoop the integration point. It is becoming a
general purpose computation system, which at the same time is also the
datawarehouse for organizational data. So rather than dealing with the details
of proprietary commercial systems, programmers can target applications to the
open-source hadoop ecosystem, and have those solutions be reusable and
customizable on a large scale.

The "publishing solution" would of course deal with access control, business
logic, freshness, etc. That is exactly what I'm advocating be built.

Individual pieces of data may not be big data, but the aggregate problem still
is. In fact this is exactly the Wolfram Alpha case: tons and tons of little
datasets that add up to a lot of headache.

~~~
th0ma5
i think this is unfair to linked data. linked data could be hidden behind
layers upon layers of distributed sparql queries much like how the human
readable works today, with each entity playing it's part, but with hadoop you
have to have like 15 different ports opened up between each box in a whole
setup before you can even begin.

~~~
programnature
Hadoop is an ops and usability disaster. Yet companies large and small are
adopting it because it does "something people want".

RDF and ontologies are just more data. Without computation, that data is not
useful, and all the things one "could do" with it will not come to pass
without a credible computational platform that people actually want to use.

So IMHO I would like to see that community focus less on standards and
ontologies and RDF-as-panacea, and and more on the infrastructure needed to
put the data to work.

------
im3w1l
How do you embed flash ads in structured data?

Are people willing to make micropayments for access?

------
therandomguy
Wolfram.com/data, data.wolfram.com etc.

