

Show HN: Business.txt - Standard Proposal - fesja
https://github.com/fesja/businesstxt

======
buro9
The idea of a text file at the root isn't a bad one, but the format of the
document should be far more descriptive.

Have you looked at <http://schema.org/docs/schemas.html> and the examples
there?

Everything in the example given is encapsulated by schema.org and that would
describe it in a way that was unambiguous.

I know that schema.org has been dismissed because "The problem is that it's
too complicated for a non-developer", but I would say that for a non-developer
this file is also too complicated. Most non-developer small business owners
can barely use FTP or use the Wordpress admin. These people don't have a
robots.txt and won't create a business.txt

I would argue that making business.txt schema.org formatted and then to use a
simple generator wizard to produce it would be more accessible to the small
business owner than giving them a text file to edit.

~~~
dangayle
That format is perfectly valid YAML, easily parsed, and just as easily
created. It needn't be any more complicated than it is. I could make a WP
plugin to support it within a few hours.

~~~
fesja
let's work on that WP plugin dan! ping me if you want

------
pilif
So more senseless errors in my error log and more traffic caused by robots
requesting a file that doesn't exist?

While I applaud the idea, can we please, please have a meta-tag or header that
points to the location of this file if it's available?

We don't need another favicon.ico nor robots.txt

~~~
quadhome
You'd rather a robot hit your frontpage— and by association, your application-
than hit your 404 static page?

Instead of generating errors from 404s (a lot of noise), try generating errors
from the same repeated 404s.

~~~
pilif
The robot will very likely hit my frontpage anyways, so I'd rather have them
look (and not find) for a meta tag there than to produce an additional 404
error which will, depending on the content of said page, waste a considerable
amount of bandwidth.

Besides, my frontpage is either heavily fragment- or just page-cached anyways
- especially for anonymous user - so it can be served directly from RAM for
all intents and purposes, so a hit to the frontpage is really, really cheap.

~~~
egypturnash
> an additional 404 error which will, depending on the content of said page,
> waste a considerable amount of bandwidth.

Because your 404 should be one of the heaviest pages on your site, full of
graphics and surprises. Why not make a full-featured game specifically for
your 404s?

~~~
icebraining
Crawlers don't (usually) download graphics, much less from 404 pages. They
aren't regular browsers.

------
Terretta
hostname.com/business.txt is wrong.

You are supposed to use the /.well-known/ subfolder.

<http://tools.ietf.org/html/rfc5785>

If you plan to implement a "standard" please try to review the RFCs that have
covered this ground before. There's probably already a standard which may fit.
If not, there's probably one that's close you could propose a change to. And
if you're trying something genuinely new, you'll at least be on the right
foundation.

~~~
aw3c2
Interesting RFC. Do you know of any implementations of it?

~~~
icebraining
There's a registry hosted by IANA: [http://www.iana.org/assignments/well-
known-uris/well-known-u...](http://www.iana.org/assignments/well-known-
uris/well-known-uris.xml)

------
Zarkonnen
I like the idea a lot, but as it stands the format is pretty US-centric. I've
written up some suggestions on how to fix this at
<https://github.com/fesja/businesstxt/issues/2>

The core problem is definitely that many small business sites, especially
restaurant ones, are really terrible outdated, and really not run by someone
who would understand the concept of "upload text file to server".

So perversely, storing the information centrally would be easier, but who
would you trust with it? The temptation to create a walled garden and
"monetize" all that juicy local business data would be very strong for the
maintainers. And then everything falls apart into small localized non-
interoperating fiefdoms again, and we're back where we started...

------
ig1
Why not just use Microdata (which Google already supports):

[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146861)

------
tommoor
I think this usecase is already covered very well by microformats and the
various metadata standards that already exist and are supported by Google,
Facebook et al.

I'm not sure the argument that these are too complicated for non-developers
really works here, after all uploading a file to the root of a web directory
is likely also too complicated...

~~~
fesja
Apart from the really complicated thing for non-developers, another thing I
don't like from schema is that it seems to targeted to websites like Yelp or
Foursquare; but not too the original local business website.

The owner of the local business is the one who knows which is the correct
address, phone, opening hours, menu, etc. All the other websites (yelp,
foursquare, google places) can be wrong. We just need the local business to
tell the rest of the world which is the correct data.

~~~
tommoor
I'm not sure I follow your logic here, micro-formats are part of the HTML of
the local businesses website, search engines and other crawlers can collect
and parse this data . This is just like the local business owner telling the
rest of the world which is the correct data.

If updating the HTML is too complicated for the website owner (fair enough!)
then this should be taken care of in the respective CMS.

I don't mean to sound negative, I know It takes some thought and time to put
out a proposal like this :-) Just in this case, this seems to be a problem
that was solved a long time ago...

[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1211158)

~~~
fesja
go to <http://news.ycombinator.com/item?id=4438320>

------
delinka
"Without business.txt he would have to go to all the websites like Yelp and
Foursquare and..."

No. No no no. This is not how the Internet is supposed to work. I search for a
restaurant online hoping they have a website with this information on it. If
it's a chain, I can find the local location and know the information is
correct. If it's a local place with a website, the information is probably
outdated anyway because they don't edit the site when their menu and hours
change ... which means they're not going to edit business.txt to reflect the
changes. So I'm really trying to find a phone number to speak to a human or
listen to the answering thingy so that I can verify their hours.

This proposal is to help automate updates on _other_ sites when the restaurant
changes their menu or their hours. The only way this is going to work is for
the computers that help manage the restaurant are also providing information
to the website. Need to change the menu? Great, the changes are also pushed to
the website. Changing the hours employees can clock in? Comes with a requisite
change to operating hours and is reflected on the website.

~~~
icebraining
The only way the Internet is supposed to work is by routing packages to the
machines with the right IP address.

~~~
mmahemoff
There are many RFCs by the Internet Engineering Task Force which go beyond
this definition.

------
troels
This is already a solved problem. Microformats, while only moderately
successful, can embed all relevant information in a standardised way.

------
maayank
I like this idea very much and its simplicity, but it seems inevitable to me
that going down this path will just recreate RDF[0] and RDF Schema. A sort of
a semantic web version of Greenspun's tenth rule[1].

For those of you who want to quickly get up to speed on RDF/Schema, "A
Semantic Web Primer for Object-Oriented Software Developers"[2] was to me a
very good introduction.

[0] From the W3C primer on RDF: "The Resource Description Framework (RDF) is a
language for representing information about resources in the World Wide Web."

[1] <http://en.wikipedia.org/wiki/Greenspuns_tenth_rule>

[2] <http://www.w3.org/2001/sw/BestPractices/SE/ODSD/>

~~~
icebraining
I'm an RDF kool-aid drinker, but removing friction for adding somewhat
structured content is always OK for me. I'd rather have a standard way of
converting from the business.txt format to RDF than not have the data at all.

~~~
maayank
Agreed! With the caveat that it should be both to and from (using some
standard schema). Replied elsewhere to OP and addressed this as well.

------
facorreia
The schema is rather US-centric. For instance, many countries don't have
"states". They may have other divisions, in the 0..N level range, with other
names. It would be better to research and use a current, established format
for international addresses.

~~~
andos
_Falsehoods programmers believe about addresses_ [1][2] is long-overdue.
“There is a current, established format for international addresses” is
probably one of them.

[1] [http://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-b...](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/)

[2] [http://infiniteundo.com/post/25326999628/falsehoods-
programm...](http://infiniteundo.com/post/25326999628/falsehoods-programmers-
believe-about-time)

~~~
TillE
Line 1, Line 2, Line 3, Country should cover just about all cases.

Ask yourself if you really need to break out the specific components of the
address. In this case you don't. As long as the user knows the correct local
format, it's fine.

~~~
px1999
The txt file isn't just about the user though, it's to aid in indexing useful
information from the site, so having some sort of breakdown into nested
administrative divisions is something that makes sense (after all, I don't
just say "I'm looking for a steakhouse in the USA" when I'm trying to decide
where to have dinner). Of course, administrative divisions introduce their own
problems and work against the whole human readable / human writeable nature of
what they're trying to achieve.

On the flipside, even Line 1, 2, 3, country isn't sufficient for all
addresses. If you have an addressee, additional delivery information (eg a
department), need to include a rural route identifier (eg for Canada), or need
to store/use bilingual addresses (again for Canada) then you need more than 3
lines. If you want to talk edge cases, having a country code means that places
like the Haskell Free Library and Opera House
([http://en.wikipedia.org/wiki/Haskell_Free_Library_and_Opera_...](http://en.wikipedia.org/wiki/Haskell_Free_Library_and_Opera_House))
can't be correctly addressed.

Hell, even the Falsehoods programmers believe about time doesn't come close to
capturing the intracacies of lunar/lunisolar calendars (eg those with 13
months in a year, or a variable number of months in a year etc).

I guess that my point is that you need to find the balance of utility and
complexity. If you need to be able to store every format of everything you
wind up with either a hugely complex schema, or a single field that contains
everything (and may even not capture everything completely), but that's not
useful for anything except end-user display when the user is able to parse (or
make a good guess at) the data.

Unfortunately there isn't a single winning approach - so unless you draw an
arbitrary line your specification can't encompass every edge case while
maintaining simplicity and achieving what it's set out to.

------
gpcz
How well does the business.txt standard hold up against malicious behavior?
For example, what happens if I want to defame Restaurant X, so I make
restaurantXsucks.com and put a business.txt file in my root directory with the
same address and contact information? Currently, Google Places (the service
that puts stuff on Google Maps) mails a PIN to the address and requires
verification before listing to mitigate this problem -- how would business.txt
mitigate the problem?

~~~
makmanalp
More simply, you're saying that this solves the updating issue but not the
trust problem. This is true, but currently it's no better than what is being
proposed.

Google places is good but I doubt Yelp does anything, for example, and a ton
of other sites. And most people don't even know to trust google places more
anyway, they'll just trust the top result on google.

------
reinhardt
So just skimming through the comments here, the "standard" alternatives to the
proposed business.txt include (but are probably not limited to):

• HTTP Headers

• Meta-tags

• <link rel="business"> tags

• RFC 5785 (/.well-known/ folder)

• Microdata

• Microformats

• RDF Schema

No wonder we are in <http://xkcd.com/927/> territory...

------
thmzlt
Add links between these files and you have web pages!

------
m_eiman
Regardless of the format of the content, I think that it'd be nice if files
like this would be placed in /.well-known/[standard] in accordance with
<http://tools.ietf.org/html/rfc5785>

There are already too many magical files cluttering up the root.

~~~
fesja
Is it being implemented? I mean, I can find <http://www.ietf.org/robots.txt>
but not <http://www.ietf.org/.well-known/robots.txt>

~~~
icebraining
It's supposed to be used by newer specifications, not change existing ones
like robots.txt. That would create too many problems with existing
applications.

------
robotman42
Why do business people try to push business standards as technical solutions?
That's not what standards are for, they are for technical problems. It looks
like DRM to me: a technical solution to a social problem or a broken business
model.

TL;DR: there are existing solutions, micro-formats for example.

~~~
RKearney
But then how will fesja be able to tell everyone "Hey, I created a web
standard! /flex"

You're right. No research was done. Author just threw information that he
thought was important into a text file and called it a day. RFC 5785 says to
put the file into the .well-known folder, the author only thought about United
States addresses when making this, and as you stated the problem has already
been addressed.

~~~
fesja
I don't know what's the correct word for this, and even if it will arrive
somewhere.

The only thing I know is that there is a problem local businesses and website
providers have. And there isn't an efficient solution yet. I've propose a
solution so we can discuss it and see if it makes sense. That's where we are
now.

About going international, I'm from Spain, so of course I will be the first
one interested in having an international "standard". People are already
giving suggestions in github!

~~~
pbhjpbhj
How are microformats not an efficient solution?

Looks like you made a worse version of hcard,
<http://microformats.org/wiki/hcard>.

Adding a frontend to keep a hcard updated would be what makes this more
accessible for business workers/managers.

I've used this on one site for several (six??) years or so, examples at
<http://microformats.org/wiki/hcard-examples-in-wild-reviewed>.

~~~
fesja
If i have to make a plugin out of the vCard RFC
(<http://www.ietf.org/rfc/rfc6350.txt>), i kill myself before ;)

------
daemon13
How about internationalization?

Shouldn't this be self-descripting - like indicating language code (use
ISO...) of the narrative/description?

Also, how about country codes? I mean - some people use U.K., some UK, some
England, some United Kingdom, etc

------
tomkin
This – in some form – is probably a good idea. Recently, I've worked on a few
Business Improvement Area projects and one of the hassels for BIAs is keeping
up-to-date information for each business (i.e., hours of operation, services,
description, etc). So, this type of implementation would be great.

I think what I really get from this is that each business needs some form of
public API.

------
icebraining
Now we need a reviews.txt so that we don't need to give control of the reviews
to less than trustworthy parties.

I propose:

    
    
      [company/product name] (URL)
      score: X/Y
      [review text]
      --
      [next review...]
    

E.g.:

    
    
      Frank's Pizza Place (http://franks.geocities.com)
      score: 8/10
      Good service and food. Doesn't accept credit cards.

~~~
simonbrown
You want to trust the businesses being reviewed instead? This also wouldn't
stop these sites choosing which reviews to show. The best solution is to find
a site you trust.

~~~
e1ven
I think he's suggesting you would host this on your personal domain, reviewing
other services. Yelp/etc would then act as an aggregation of these reviews,
rather than the hosting company for them.

~~~
icebraining
Yes, that's it, though it probably shouldn't require a particular position in
the URL, so that you could e.g. put it on Dropbox, make it public and link it
from somewhere.

As long as it was called reviews.txt and had that particular format, it'd be
valid.

------
fesja
Does anyone have the contact of people in "Data harmonization" team of Google,
Facebook, Foursquare, Yelp, etc? Could you share this idea with them to see if
we can discuss it further?

my email is javier at touristeye.com

~~~
DannyBee
Google already supports the schema.org formats for this exact info. (e.g.
[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1211158)
and surrounding info)

Please don't take this the wrong way, since I think it's great that you are
thinking about these problems, but :

It's not clear what advantage your format offers. On the other hand, it has
some pretty clear disadvantages, including generating massive amounts of
possibly useless web traffic, not just on the server side, but on the crawling
side, since now getting business info takes two requests, instead of one (when
it is embedded in a schema.org format on the page).

Additionally, without some tag that tells you whether business.txt would
exist, you get to check for _every website_. This will slow down crawlers.

Given at least most of the companies on the crawling side of this want to
support/support the schema.org markup version, ISTM you would be better off
spending your time making simple generators for it or adding support for it to
wordpress/et al.

FWIW: I have no comment on whether text formats are better than schema.org or
anything like that, but to a large degree, it's irrelevant, because getting a
large number of folks to support something they believe is already a solved
problem is very very difficult.

------
0ptr
Is it missing fields for e-mail and logo? Or have you specified this actually
somewhere (i.e. not just examples?)

I think this is great idea, however Google has something like this already
(microformats based) <http://maps.google.com/help/maps/richsnippetslocal/>
maybe adopting this would be easier?

~~~
fesja
about the fields, check out the docs
[https://github.com/fesja/businesstxt/blob/master/structure.m...](https://github.com/fesja/businesstxt/blob/master/structure.md)

about rich snippets, go to <http://news.ycombinator.com/item?id=4438320>

------
Avalaxy
Wouldn't it be easier if this were XML instead of plain text? I assume the
goal is that software can easily interpret the data.

~~~
fesja
We want to be a human friendly file. If we had a JSON or XML file, it would be
too complicated for a non-developer to write or read it. On this way, I think
it doesn't matter if the website is done in wordpress, drupla, static files,
flash, etc. It's just a simple file.

Also, we are following the same pattern as the robots.txt file.

~~~
oellegaard
I would strongly suggest e.g. JSON or XML - if you can upload a website, the
chance are that you could also create something like this. There would of
course also be a template where absolute noobs could fill in the details and
get correct output. With our current tools, implementing a clear text data
storage seems like a pretty stupid idea - everyone would have to implement
their own parsing. With e.g. JSON, there are libraries for every language and
it has validation.

~~~
FuzzyDunlop
YAML, or the format used by Python's configParser or PHP's INI files (don't
know what it's called), might be easiest. Less syntax and formatting.

I'm not so sure about the proposal itself, though. For local, small
businesses, how much can we assume about technical ability? If they have to
pay the people who did their website to keep it up to date, how can we be sure
they'll use it?

------
nicolethenerd
Yext (<http://www.yext.com/>) offers a paid solution for exactly this issue -
they sync local business info across 35+ different sites (Bing, Yahoo, Yelp,
etc.)

Disclaimer: My significant other works there. But I wouldn't recommend it if
it weren't useful/relevant/awesome.

------
its2010already
Shouldn't we be using XML for this???

~~~
dangayle
Perfectly valid YAML

------
abishekk92
Tried visiting <http://www.touristeye.com/business.txt> ,was throwing a 404
error. Is it just me or?

~~~
fesja
1\. we are not a local business ;) 2\. this is just a proposal for now. If
many websites support it, we will implement it!

~~~
abishekk92
My bad,didn't notice that the listing is also for people that are planning to
support it.

------
crasshopper
I would just change my Google Business owner profile and my own website if I
have one. Who cares about the spammy copy sites?

------
wheelerwj
This is a really cool idea. As a small business guy, this is definitely
worthwhile and usable. Please keep working on this.

------
flexterra
Why not just use <http://ogp.me> ? A lot of sites are already using it.

------
nicolasbn
Use a micro-format. Problem solved.

------
lowglow
If google adopted this, it would save so much time and trouble. Very good
idea.

~~~
fesja
thanks! Let's see what the big ones think about it.

------
oleganza
Great idea. Added one for our company. It took just a couple of minutes.

<http://pierlis.com/business.txt>

~~~
unwind
My browser is failing to render the address properly, it reads as
"PoissonniÃ¨re" so the diaresis is breaking. Probably need to double-check how
you serve the file, and make sure the charset and encoding match.

~~~
icebraining
Yeah, '; charset=utf-8' has to be appended to the Content-Type header.

------
will_critchlow
I like the idea of having a standard place to find this information. It's
still _for robots_ though isn't it? Why not include this information in
robots.txt?

------
suyash
Good idea but tough to scale!

------
oleganza
fesja: add a list of businesses with the file to the README. This will show
others how many adopt the format and motivate to do the same.

~~~
fesja
done!

