
Show HN: DNS-based alternative to the web for structured data - elliottinvent
https://www.num.uk/blog/announcing-num
======
1vuio0pswjnm7
There has never been any limit to what one can put in a DNS RR.

From the year 2000, 42 ways to distribute arbitrary data over the internet,
including DNS:

[http://web.archive.org/web/20001207014600/http://decss.zoy.o...](http://web.archive.org/web/20001207014600/http://decss.zoy.org:80/)

djbdns has always allowed arbitrary characters in TXT records. I have put HTML
in TXT records to serve mini web pages over UDP. People have put speex files
in RRs. Not sure "BIND" or other popular DNS servers of the day allowed such
flexibility. Nowadays it is easy to find such software.

Someone will probably comment on tunneling over DNS or Wikipedia over DNS.

This is why ICANN DNS and its use of DNSSEC always seemed wrong to me. If it's
someone else's data, ICANN should not have the power to instantly "invalidate"
it. Resurgence of DNSSEC was in part a response to the idea that shared caches
are vulnerable, but DNS is more than just shared caches; and the way DNSSEC is
used in practice relies on the strange assumption that ICANN is the self-
appointed arbiter of "official" domain names. But as you can see DNS, the
protocol, can be used for more than simply "official" ICANN-approved domain
names.

~~~
elliottinvent
> From the year 2000, 42 ways to distribute arbitrary data over the internet,
> including DNS

This is very cool, thanks for the link.

Would be interested to hear more about your experience in storing HTML in DNS.

~~~
1vuio0pswjnm7
People have now wrapped DNS RR in TLS, then in HTTP and they are thinking
about putting DNS RR into HTTP headers or into HTML tags.

[https://www.potaroo.net/ispcol/2020-06/row.html](https://www.potaroo.net/ispcol/2020-06/row.html)

In the late 2000's I took the oppsoite approach, because back then, DNS, i.e.,
traditional 512K UDP packets, was much faster for data retrieval than
TCP/SSL/HTTP. TLS has gotten quite fast but using UDP is still faster, that's
why we see UDP-based "reliable transport" protocols like (lesser known)
CurveCP and (better known) QUIC. I put the content into the DNS RR instead of
putting the DNS RR, e.g., IP address, into the content.

Retrieving content, e.g., a web page, has become a two-step process, thanks to
DNS. It could be a one step process if we used IP addresses instead of names;
I still do this where possible. However to use names, first we have to get the
IP address (step 1), and only then can we retrieve the content (step 2). If we
combine the IP address with the content then we can eliminate a step.

Putting everything into DNS instead of into web pages gives more control to
the user, IMO. Web browsers are controlled by companies that rely on the
survival of the online ad industry. DNS software does not have this problem.

~~~
1vuio0pswjnm7
s/512K/512/

------
caymanjim
What are the implications of this for things like DNS caching, replication,
root servers, negative lookup, etc? If you run your own DNS and want to create
gigabytes of NUM records, that's great. But I wouldn't expect Cloudflare,
Google, etc. to want to cache that data, so that could create a situation
where most of your DNS is cached somewhere, so it's distributed, fast,
reliable, and local, but your NUM records aren't cached, so now you've got
servers that are out of sync, or are falsely returning negative results, or
you have to implement a mechanism that treats NUM records differently in
clients.

I don't know if any of these concerns are applicable, but I'm curious.

~~~
eat_veggies
If recursive resolvers like Cloudflare, Google, or your ISP stopped caching
_num.* records, the servers wouldn't go out of sync or falsely return negative
results, since DNS caching and updates are more of a "pull" model than "push"
with consensus (so it's just like your computer's L1, L2, L3, memory, and disk
caches, but with eviction based on TTL).

So _num records would end up forwarding all the way to the authoritative DNS
server for the zone you're trying to access. You'd just have a slower
experience than with your normal DNS queries, but you wouldn't observe
anything weird.

------
rohan1024
Introduction: [https://www.num.uk/](https://www.num.uk/)

NUM Record viewer: [https://tools.num.uk/](https://tools.num.uk/)

NUM Record creator:
[https://app.numserver.com/tools/editor/add](https://app.numserver.com/tools/editor/add)

~~~
pastage
Seems like the TXT record creator fails to escape `. I can not find anything
about money/licensing on there.

I love DNS, and terse text formats, but both are pretty arcane for most
people. You might as well have people post compressed base85 encoded messages
or something similar. I will use this but will the thai on the corner,
probably not.

~~~
maaarghk
I feel the same way, the record editor is pretty esoteric and reminds me of
schema.org really. For any hope of adoption by small business I think it needs
to be something registrars buy into with a more user-focused interface which
is more like "describe your business to us" than "populate our data format".
Large corps, DNS is surprisingly often managed by marketing departments, and
you're going to find them asking questions like "so we can't advertise our
complementary services to someone looking for our phone number? we can't style
it? Why would we want that? We can't track conversion?" All of these are
probably positives to the user, but in direct conflict with the interests of
the people actually in control of the domain.

edit/ of course the only way around this is for apple and google to make it
mandatory to appear in their mobile dialer app or something, but google at
least will never do that because in the "way things are" example we can see
many, many user-unfriendly situations that are great for google's metrics and
revenue

~~~
elliottinvent
> the record editor is pretty esoteric and reminds me of schema.org really.

This is our first version of the record editor and it'll get much more user-
friendly over time. I agree that our module system has some similaries with
schema.org, I think what's lacking with structured data formats for the web is
a simple way that a small business can adopt the technology. That's what we're
trying to offer with the NUM Server – fill in a simple form and we take care
of publishing the data.

> For any hope of adoption by small business I think it needs to be something
> registrars buy into with a more user-focused interface which is more like
> "describe your business to us" than "populate our data format"

We're about to build in an integration with GoDaddy and 1&1 (IONOS) – this
will make it easy for domain registrants to delegate their independent NUM
zone (_num.example.com) to the NUM Server. Longer term, registrars might want
registrants to build NUM records using tools offered by the registrars. In my
opinion, registrars have historically done a pretty bad job of making tools
user-friendly and I'm sure we can do a better job.

> Large corps, DNS is surprisingly often managed by marketing departments

This is an interesting point. I've never known of a reliable DNS zone for a
large corporation being managed by a marketing department but with more and
more services requiring DNS verification records this is becoming more and
more common. We actually have a module to address that point – the Custodians
module [1]

> ... you're going to find them asking questions like "so we can't advertise
> our complementary services to someone looking for our phone number? we can't
> style it? Why would we want that? We can't track conversion?" All of these
> are probably positives to the user, but in direct conflict with the
> interests of the people actually in control of the domain.

With the contacts module a company can advertise a range of methods (e.g.
social media) alongside their telephone number but you're right they're not in
control of how developers use the data or display it. I think this is
something companies have already gotten used to – they're not in charge of how
Facebook, Google or Yelp display their data. At least with NUM they're in
control of the data itself.

We see user anonymity and the absence of tracking (from the resolver to the
authoritative server at least) as a big plus point for NUM and a step in the
right direction.

I think Twitter is a great example of a technology which would seem to be at
odds with a company's goals: (i) complaints for the world to see; (ii) dealing
with customer service by Tweet with restricted characters!; (iii) anonymity of
users. But businesses use it because users love it.

I really appreciate the feedback.

1: [https://www.numprotocol.com/specification#example-
modules](https://www.numprotocol.com/specification#example-modules)

------
elliottinvent
A small team and I built this and we're excited to hear feedback – good or
bad. Thanks for taking a look.

------
sybercecurity
From the spec: "NUM imposes no limit on the number of records in a set but
some DNS server and client implementations may have difficulties processing
very large record sets."

There are still some places were packet size is an issue in DNS - mostly over
IPv6. This was a big issue with DNSSEC first started being deployed. Not so
much now, but I've still seen instances where large DNS responses get dropped
over IPv6 for MTU issues. 1260 is the safe limit for MTU size most of the time
even if they advertise a higher limit.

~~~
fanf2
You are right that it is wise to keep DNS response sizes less than 1280 bytes
if you want to fit in UDP. But the DNS can support responses of up to 65535
bytes over TCP, which servers are required to support.
[https://tools.ietf.org/html/rfc7766](https://tools.ietf.org/html/rfc7766)

~~~
elliottinvent
Interestingly, Google Cloud DNS has a fixed limit of 1012 characters per
resource record (counting the quote marks that separate fixed length 255 char
TXT strings). They claim you can have 10,000 resource records in a resource
record set, so over 10mb of data.

~~~
fanf2
Golly! Well, the minimum practical RR size is 2 bytes for the name (a
compression pointer), 2x2 bytes for class and type, 4 bytes for TTL, 2 bytes
for length, and some data, e.g. 4 bytes for an A record: total, 16 bytes. So
you can't fit more than about 4k records in a DNS response.

It is actually possible to have a zone with an RRset much bigger than that,
and it will transfer successfully with AXFR or IXFR, and you can make the
RRset bigger or smaller with UPDATE - but you won't be able to query for it...
unless the server has some nonstandard shenanigans for returning partial data,
as many of them do.

------
butz
So in theory I could host a minimal website on DNS?

~~~
pastage
In the same way you can already do that on DNS, and no browser support dns
lookup in any sane way except dns over http.

So no.

~~~
elliottinvent
I think DoH is an important development here – our "NUM Record Viewer" makes
all queries over DoH but of course you need to load a web app to run the
queries.

I'm hoping browsers will support custom DNS queries at some point. There was
some talk of it being supported in Chromium a couple of years ago but it got
shelved I think. Hopefully creative uses of the DNS like NUM will encourage
them to consider it again.

------
thrownaway954
so basically stuffing DNS with alot of TXT records.

~~~
elliottinvent
Hopefully people will use NUM to add lots of useful structured data in TXT
records but importantly these records are at their own DNS names and not
polluting the main zone

e.g: dig target.com TXT

(and countless others)

~~~
edsemail123
NUM looks to me like a great improvement over the defacto 'status quo' of DNS,
Search Engines, and 'Site Sifting' for useful info.

I do have some concerns about the plan to make the owners of various domains
that much easier to locate and/or name in lawsuits, as at least here in the
US, I could see that info being rather easily abused, along with the initial
focus on 'contacts' (see my further comments/concerns below).

That said, given you asked for feedback/suggestions, and what looks to me the
focus and high level of usefulness of NUM, especially on streamlining the
overall process for 'inter-entity transactions' (whether personal, commercial,
or whatever) I believe that a rather useful 'module' (and likely better yet,
some number of modules), I would see as Services, Products, and/or Solutions.

Each of those can be seen as either Standard or Custom or perhaps even involve
both (ie, a standard Solution for xyz market typically includes abc standard
products as well as def custom services or whatever)

This could easily include info about various products, as well as entire
'product lines', along with direct connect to marketing/sales materials and/or
contacts, list/actual pricing, specific support resources, whether contacts
and/or documentation (manuals) and/or even ways or sites that their
organization prefers for handling certain interactions (phone calls, texts,
chat, or even say direct (and perhaps non-disclosed) 'click to connect'
methods, whereby entering a 'client id' (or having some security certificate)
that then perhaps creates a direct connection, or maybe provides a custom
'menu' of options directly available, or whatever, might become possible

Also, given that many companies, groups, governments could also likely use
something like this Internally as well, perhaps create the ability to
'federate' the NUM info (both up and down).

Taking that to the next logical step, there could be NUM data/records flagged
for different 'audiences'

These 'audience' entries then could be used to auto-magically publish
'internal', 'external', 'vendor', 'client', 'employee', or whatever type
records in appropriate places and ways, in NUM, thus helping to maintain
appropriate access, security, permissions, etc.

I do really like the option to include public keys as well, as that opens up
avenues to directly and easily establish programmatic methods for fully
encrypted communications, transactions, file transfers, and whatever else.

In fact, using an organizational public key, along with an employee-designated
key (plus whatever other factors) could then be used to instantly create say a
Wireguard connection to whichever resources (perhaps including additional
NUM/DNS records, data, etc) that that individual has been provided with access
to, thus creating a fairly easy way to establish 'Zero Trust', yet fully
functional [net]work environments, allowing equal access, no matter where one
might happen to be located

That could simultaneously allow for a reduced, if not single, set of security
protocols/parameters per organization, and given that simplification
effectively tends to increase overall organizational security, similar to how
Wireguard is seen as so revolutionary, due to it's simplicity when compared to
legacy VPN technologies

That said, I do believe that, additionally, especially for personal
contacts/sites/details, and/or organizational units, there really ought to be
methods (put) in place to allow for some level of anonymous yet authenticated
access, such that NUM doesn't inadvertantly disclose info that ends-up
creating yet more 'attack surface' for 'bad actors'

A simple example might be what happens by 'scraping' sites, winnowing down
that info, and then publishing it (in clear text).

That would of course be done in an effort to 'help', though I could see that
rather easily causing inadvertent complexities, or even outright disasters,
especially given how much 'less than skilled' disclosure of info, whether at
the individual/family level or at various organizational entities/levels, I
have seen happening time and again on Many web-sites world-wide.

Those bits of info Currently tend to be obscured by exactly the nature of how
the web has developed (and that NUM seems to be well positioned to address and
effectively resolve moving forward) and Yet, at the same time, taking all
those juicy bits of info, boiling them all down, and 'canning' them, such that
Any script kiddie could then (far more easily And programmatically) utilize
all that 'condensed goodness' to then target Anyone or Any group just about
Anywhere, simply using NUM's (assuming publicly accessable) data, could well
cause some unintended back-lash, if not handled with care.

I do realize that this last one could be an area where there is no simple
answer, at least not yet, and I believe I would be remiss if I didn't mention
my concerns here as well

~~~
elliottinvent
I really appreciate the detailed feedback here, I somehow missed it.

I don't think site ownership data is something to be concerned about since
we'll only publish that if we find it on the website. So if a company doesn't
have a website or has no company details on their website then we won't
populate a record for it. So it's unlikely NUM would make it any easier to
name a domain registrant in a law suit than the website would.

I think a module that lists a company's products or services could have some
really interesting applications.

NUM is of course compatible with all DNS implementations, so a local DNS zone
mycompany.local could hold it's NUM records in _num.mycompany.local – I think
this has got a lot of potential for large companies and public sector
organisations.

You're right that great care needs to be taken when scraping website data and
publishing it to the DNS to prevent inadvertent publishing data that was
intended to be private or was published to the web before spam was such an
issue, also for GDPR reasons. It's unavoidable that making machine-readable
data open and freely available will result in it being consumed through
automated means and it's likely that some of this data will be used in ways
which are undesirable.

