Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: DNS-powered website with no back end (companydirectory.uk)
83 points by elliottinvent 33 days ago | hide | past | favorite | 89 comments



All of the contact data on this website is fetched from DNS TXT records. It's an example of how NUM [0] data can be used.

A small team and I have created NUM because we believe that paid, restricted, rate-limited APIs provided by the giants of the web are holding back developers and we want to change that.

NUM provides: a simple way for businesses to provide machine-readable data direct to their users, bypassing data hoarding web giants; a forever free, unlimited and unrestricted source of data to developers.

The general-purpose semantic web has failed because the standards (microdata, JSON-LD, Open Graph) are too complicated and difficult for small businesses to adopt and often forgotten about by marketing teams in large businesses because data is too difficult to edit.

NUM is different because it can be adopted by any business using a simple online form. The protocol includes two DNS queries – the first to the authoritative DNS zone; the second failover query to our NUM Server, any business can claim their domain on NUM Server using a simple online process.

The Company Directory website linked to is just one example of how NUM data can be used. The website fetches all data from DNS using client-side Cloudflare DNS-over-HTTPS calls (see Network tab in Dev Tools). Barclays, or any other company listed can update the data on this site by adopting the NUM protocol [1] independently in their own DNS – for example, to replace the data for the Barclays listing, they can create a TXT record at 1._num.barclays.co.uk or claim the record shown by visiting numserver.com/claim/barclays.co.uk

DNS-based protocols like NUM (other examples are SPF and DMARC) typically suffer from the chicken-and-egg problem where no-one looks for this data in DNS because it's not there, and no one stores this data in DNS because no-one is looking for it.

To help overcome this, we've pre-populated the DNS with contact data for large UK companies and are in the process of automatically gathering contact data for all *.uk domain names (we expect to complete this in the next 10 days) before moving on to all other UK companies, then companies worldwide (the US, Canada, Australia next).

NUM can be adopted / edited with a simple online form. Anyone can create a record for their domain, either independently in their own DNS or using our service at https://app.numserver.com/tools/editor/add

To ensure that NUM is as efficient as possible, we store all data in DNS using MODL [2] – a compact, DNS-friendly data serialisation format; we make compact DNS objects developer friendly using Unpacker [3] – we have developed both in-house. I have written a very basic Notion document to explain how all of these technologies fit together [4].

To simplify all of this for front-end developers (who by and large don't care about DNS), we've packaged it up in a Typscript library called company-api which you can query for a domain and get a beautiful object back containing company contact data [5].

We are very keen for feedback, good or bad.

0. https://www.num.uk

1. https://www.numprotocol.com

2. https://www.modl.uk

3. https://www.unpacker.uk

4. https://www.notion.so/num/NUM-MODL-and-Unpacker-67d7cd59548d...

5. https://www.npmjs.com/package/company-api

Edit: to make it clearer what problem we're trying to solve.


I guess I don't understand how DNS records work around the problem of metered APIs, since DNS providers are not going to let you store e.g. every item of your inventory in separate TXT records. Also: DNS is much, much slower than an HTTP API, which can fetch arbitrary, specific collections of records for a single query.

I'm not trying to shoot this down so much as figure out what I'm missing here.


Thanks for your comment.

> DNS providers are not going to let you store e.g. every item of your inventory in separate TXT records

There are some industry-standard limits in Cloud DNS (10k zones per customer, 10k resource records [e.g. TXT] per zone). These limits would make it difficult to store a large inventory in DNS.

NUM could be used for inventory but as you say, an API would be more effective. NUM is designed for standardised use cases like contact data, where all companies store their data in the same way – making it easy for developers to consume.

We're trying to standardise this in DNS, instead of an API, for two reasons:

1. Any domain registrant can adopt it, relatively simply (although some providers have atrocious TXT record management) – they can delegate their NUM zone to us for easy management.

2. By using DNS, we can pre-populate huge amounts of this data (we do this from num.net) which makes the protocol useful from launch and helps overcome the chicken-and-egg problem. DNS caching helps us answer these queries, we can answer a billion DNS queries for $200 using standard Cloud DNS pricing. Once caching is factored in it's incredibly efficient.

It's possible DNS resolvers might refuse to cache (or even answer) NUM queries but we hope that doesn't happen.

> I'm not trying to shoot this down so much as figure out what I'm missing here.

I appreciate that, hopefully my answers above have helped


Have any of your test users been behind corporate firewalls yet? I ask because there are some corporate firewalls (Fortigate, PAN to name a couple) that will see excessive DNS requests as a denial of service attack. The response will depend on the security admins that set up the firewalls. This could lead to false positive DDoS blocks, alerts to security operations centers, discussions with employees about their DNS traffic and possibly some folks losing internet at work temporarily or active directory server outages. What are the max queries per second this implementation can trigger per client? If you expect any of your user-base to be behind corporate firewalls, I would suggest reaching out to Palo Alto Networks and show them your implementation and ask them to test it in their lab.


Thanks for your comment.

> Have any of your test users been behind corporate firewalls yet? I ask because there are some corporate firewalls (Fortigate, PAN to name a couple) that will see excessive DNS requests as a denial of service attack.

We've done some limited testing with this on a protocol level but there's lots more to do. This example site uses DoH which some firewalls just straight up block, in that case this site and associated client-side libraries wouldn't function at all.

If a connection to Cloudflare DoH is made successfully, then it's unlikely any of this traffic will be troubled by the firewall since it's just over HTTPS.

> What are the max queries per second this implementation can trigger per client?

This example implementation is only limited by the browser and Cloudflare. In the Barclays example link provided it's sending 56 DNS queries in around a second.

This is a pretty heavy implementation, a NUM lookup for a URL like num://numexample.com:1 only requires upto 2 DNS queries. The reason this implmentation is so heavy is that we're combining a whole bunch of records.


> since DNS providers are not going to let you store e.g. every item of your inventory in separate TXT records

They aren't?


I can't imagine dns providers would be cool with you storing a couple gb of records. You could of course host your own dns server... but you could also just host your own website.


There are pretty standard industry limits in Cloud DNS (AWS, Microsoft, Google):

10,000 DNS zones per customer

10,000 resource records per zone

Both can be increased by request

At scale, query costs work out at $200 USD to answer 1 billion uncached queries, depending on TTLs that might actually deliver tens or hundreds of billions of answers to users.

It's possible DNS providers and DNS resolvers (like Cloudflare, Google etc) might put limits in place. We're hoping not.

Since TXT records are best delivered when below 5kb, it's pretty tough to store a couple of gb of data in your domain using NUM but certainly possible. If NUM is a big success, DNS providers might decide to charge for DNS data transfer rather than individual queries.

Edit: line spacing


This won't work.

If companies cannot be bothered to make their websites user friendly in the first place, why would they do it in this new and obscure way?

It's like the episode on Dragon's Den where someone tried pitching a service that would call companies on your behalf, and would only call you to connect both ends when the music stops and a call center agent actually picks up the phone.

It's the job of the company to fix their long wait times, not of some 3rd party service, which may or may not work with a specific customer service number.


Thanks for your comment.

Companies don’t need to adopt the standard for their data to be in DNS. We’re populating DNS with public website data already. All *.uk domains will be populated in the next 10 days. Then we’ll roll it out further.

In the example provided it’s mainly for big companies with lots of phone numbers. But NUM is for any machine readable data for any company - small traders, pubs, restaurants and more.


FYI, it's trying to load a resource and hanging (in ff):

> Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://sentry.numops.com/api/28/envelope/?sentry_key=${key_.... (Reason: CORS header ‘Access-Control-Allow-Origin’ missing).4


Thanks for the heads up here. This is for error reporting, which is kind of ironic – we'll take a look.


Sentry is an error collection service, so that's probably nothing.


But umm, why? Seems like you're in essence just using an https api to fetch dns data. So why not just use an https api and cut out the extra step?


The https api is Cloudlfare DNS-over-HTTPS (it could be any other provider) - is just the best method to use in browsers, backend services can use standard DNS.

Storing this stuff in DNS is what enables: - individual domain owners to adopt the standard - us to offer data free, unrestricted and unlimited forever

Since DNS is massively scalable and cached it’s well suited to this IMO. Much more so than a standard API.


I mean, it might make sense on the backend. After all, decentralized, highly scalable, key-value store is what DNS fundamentally is, and it has a long history of working well for that use case.

I just kind of assumed, the thing that's supposed to be cool here is using it in the web browser as a data storage backend for your entirely client-side app (since using dns simply as a key value store isn't particularly unique). And i don't think that usecase makes sense, since its all routed through a centralized https api translation layer anyways.

> us to offer data free, unrestricted and unlimited forever

Say what now? None of this is free. Half the bill is being footed by cloudflare, the other half by the domain owner. Using an api provided by someone else is not the same as offering something for free. You're not the party doing the offering.


> None of this is free.

The data that powers this website is offered free to developers.

> You're not the party doing the offering.

We're offering it through our DNS zone num.net.

> Half the bill is being footed by cloudflare

It's correct that Cloudflare are doing a fair bit of work here, but we're a mosquito on the arse of a bull.

If NUM gets broad adoption DNS resolvers might refuse to answer (or more likely refuse to cache) our answers but there's plenty of resolvers out there and if people are using NUM to fetch useful data there'll be a competitive edge to being the fastest DNS resolver to answer those queries.

edit: line spacing


Incidentally, it looks like Semantic Web is not dead but actually accelerating growth -thanks to current SEO practices abd Google showing semantic data within search results [1]

As for complicated , yeah, but business web platforms like Wix support those as part of SEO capabilities [2]. So you would you need the smb web dev platforms to support NUM to make it a reality?

1. http://webdatacommons.org/structureddata/#toc3

2. https://support.wix.com/en/article/adding-structured-data-to...


> it looks like Semantic Web is not dead but actually accelerating growth

IMO the general-purpose semantic web (not that used in academia) is only useful for huge companies like Google, with the engineering capacity to understand all the different types of data that might be found.

At times, it seems it's almost complex by design to keep smaller developers from being able to consume the data and build anything useful on top of it.

> So you would you need the smb web dev platforms to support NUM to make it a reality?

That would be cool but it's not required for NUM to be broadly adopted. We provide a tool, NUM Server [0], where you can publish and manage data using simple online forms after proving authority for your domain name.

The best angle for us on this point, is actually through domain name registrars that offer add-ons to domain name registration and through agencies (web design, social, etc).

0. https://app.numserver.com/tools/editor/add


> the general-purpose semantic web is only useful for huge companies like Google, with the engineering capacity to understand all the different types of data that might be found.

But easier than parsing the general web

This a very interesting assumption to tackle - how non-google players can leverage all those new semantic jsonlds? I'm pretty sure the barrier is lower to build a product aggregator or a product comparison site based on scraping the web. What else?


Brilliant. Stoked to give this a try. Clever path to radically low latency for some use cases.


Thanks very much, if you need any help or have any questions just let me know – reply here or email elliott[dot]brown[at]num.uk


> bypassing data hoarding web giants

but using cloudflare to resolve dns? hmm..


CloudFlare is just the resolver in this example. The data and the way to fetch is it just vanilla DNS. Pick a resolver, any resolver.


"All of the contact data on this website is fetched from DNS TXT files."

I found contact data for a number of companies in the Javascript files.

    {"object_display_name":"Organisation","name":"Co-op Insurance","slogan":"Car, Home, Travel, Pet and Life Insurance from Co-op","contacts":[
    {"object_display_name":"Organisation","name":"Columbus Direct","slogan":"Award Winning Travel Insurance for 30 Years","contacts":[
    {"object_display_name":"Organisation","name":"First Direct","slogan":"Online and telephone banking 24 7 365","contacts":[
    {"object_display_name":"Organisation","name":"Churchill","slogan":null,"contacts":[
    {"object_display_name":"Organisation","name":"MORE THAN","slogan":"Car, home, pet, life, travel and landlord insurance quotes","contacts":[
    {"object_display_name":"Organisation","name":"Saga","slogan":"Over 50s Insurance, Holidays, Money and Magazine","contacts":[
    {"object_display_name":"Organisation","name":"LV=","slogan":"Liverpool Victoria","contacts":[
    {"object_display_name":"Organisation","name":"Sheila\'s Wheels","slogan":"Car & Home Insurance with Style","contacts":[
    {"object_display_name":"Organisation","name":"Swiftcover","slogan":"Super Fast Car and Home Insurance","contacts":[
    {"object_display_name":"Organisation","name":"Barclays Bank","slogan":"A big world needs a big bank","contacts":[
    {"object_display_name":"Organisation","name":"Hastings Direct","slogan":"Car, Van, Bike and Home Insurance","contacts":[
    {"object_display_name":"Organisation","name":"Co-operative Bank","slogan":"For People with Purpose","contacts":[
    {"object_display_name":"Organisation","name":"Halifax UK","slogan":"Halifax makes it happen","contacts":[
    {"object_display_name":"Organisation","name":"Lloyds Bank","slogan":"By Your Side","contacts":[
    {"object_display_name":"Organisation","name":"Royal Bank of Scotland","slogan":"Enjoy better banking with RBS where people matter","contacts":[
    {"object_display_name":"Organisation","name":"Admiral","slogan":"Car, MultiCar and MultiCover Insurance Quotes","contacts":[
    {"object_display_name":"Organisation","name":"Aviva","slogan":"Insurance, Savings and Investments","contacts":[
    {"object_display_name":"Organisation","name":"AXA","slogan":null,"contacts":[


Thanks for highlighting this – it comes from company-api, which is built on top of the DNS: https://www.npmjs.com/package/company-api


Actually, I think this is a JSON object to populate the companies on the homepage. When you click one of these companies a NUM lookup is run on the domain name, then all contact data comes out of DNS.


I think you're leeching off someone else's infrastructure and using it to do things they never meant it to do. Sure, the technical capability is there, but your use case would drastically increase their costs. You are essentially cost-shifting your customers' costs onto theirs. Not cool.

It's like building a cloud storage solution off Gmail's free storage. It can be done, has been done, but that doesn't mean it's cool to do so.

Your system would increase costs for DNS providers all over the world, without their consent, just because you're using it as a loophole. It was a problem that wasn't there fixed in a way that leeches from rather than gives back to the community.


Thanks for your point of view, I find it really interesting.

> I think you're leeching off someone else's infrastructure

Ok, who’s the victim here? CloudFlare? Since we use their DoH end point?

Google Cloud DNS? Since that’s where we’re storing the data in DNS?

All of this is just standard DNS - CloudFlare DoH and GCDNS can be switched out for any other because it’s just vanilla DNS.

Let’s say Barclays wanted to serve out data using NUM and stored data in their own DNS zone, would they be abusing their DNS provider’s infrastructure? I don’t think so.

If we’re successful with our plans for NUM, and it becomes mainstream then surely this presents a huge opportunity for DNS providers who will have increased query costs for clients.

DNS revolvers will make their own decisions about whether they cache NUM queries (or perhaps even answer them at all) but revolvers that answer them quickly will surely have an edge on those that don’t.

> and using it to do things they never meant it to do.

The DNS is a distributed database. It’s designed to convert human friendly data to machine friendly data and I think NUM fits this perfectly. I understand not everyone shares my point of view.

> Sure, the technical capability is there, but your use case would drastically increase their costs. You are essentially cost-shifting your customers' costs onto theirs. Not cool.

It increases the costs of CloudFlare / Google? Ok, if it’s significant, they have a commercial decision to make - support full DNS as per the protocol spec, or partial-DNS where they block certain use cases.

> It's like building a cloud storage solution off Gmail's free storage. It can be done, has been done, but that doesn't mean it's cool to do so.

No, it’s not. The DNS is owned by no one and everyone.

> Your system would increase costs for DNS providers all over the world, without their consent

Most will just pass this on to domain owners, DNS query costs are peanuts - 200 USD per billion at scale.

> It was a problem that wasn't there fixed in a way that leeches from rather than gives back to the community.

I respect your point or view but think the opposite is true. We’re freeing data, opening it up for developers so that they can build things far outside the jurisdiction of the giants of the web - I think this is a fantastic way to give back to the community.


Distributed, decentralized projects are great, when they build up the infrastructure in a way that respects existing network traffic.

Yes, DNS is distributed and communal, but it's cheap only because it's minimal. Caching a few values for IP and MX lookups is relatively trivial, but if you purposefully start storing content in there, the whole network gets exponentially more expensive for everyone involved, especially once you cross a threshold where you can no longer easily send updates as simple key values and need to start worrying about encoding of larger chunks, network interruptions, checksums, etc. That complicates caching all over the DNS network. And if some DNS provides start supporting certain features and not others it's just going to lead to further fragmentation and user delays and a confusion about where and how to store and fetch data from this system depending on a user's region and likely DNS providers. It also presents authentication and integrity challenges for unencrypted uses, as in the case of DNS hijacking by local ISPs or governments.

It's a shoehorning of data into a poor fit, and only because someone else is paying for it. That's what makes this endeavor selfish, not heroic. You're not "freeing" data, just shoving it into some dark corner of the web and hoping to profit from it.

There have been a lot of actual hard work on the problem of decentralized information, from ipfs to freenet to tor to blockchains to dht... they all have thought about the problem in depth and built the infrastructure to try to make it happen, instead of leeching off someone else's work and pretending like it solves the problem.

Sorry to be harsh. This just seems like a money grab rather than technical innovation.


Thank you. I’m very grateful for the insight into your view, that’s why I’m here.

> DNS is distributed and communal, but it's cheap only because it's minimal.

Cheap for who? Users using free resolvers or businesses using DNS service providers?

Resolvers can choose not to serve/cache NUM answers. If there’s demand for NUM data the market will decide. Google, CF, Quad9 can look after themselves.

DNS Service providers could bill by bandwidth rather than per query. Again, the market will decide.

> It also presents authentication and integrity challenges for unencrypted uses, as in the case of DNS hijacking by local ISPs or governments.

I agree but DoH, DoT, DPRIVE and other initiatives are tackling this problem.

> It's a shoehorning of data into a poor fit, and only because someone else is paying for it.

Why is it a poor fit? We’re converting a human friendly domain (or NUM URI) into machine-friendly data. That’s the whole purpose of DNS.

DNS is comfortable transferring 5kb of data, but most NUM Records will be smaller than DNSSEC responses. In fact, most NUM records are smaller than the original DNS UDP packet limit of 512.

> That's what makes this endeavor selfish, not heroic. You're not "freeing" data, just shoving it into some dark corner of the web and hoping to profit from it.

We’re making the data available to developers for free, that’s a fact. If DNS TXT records are a dark corner of the internet then I’m pleased to shed some light on that. If rules come about to stop us doing this, so be it.

> There have been a lot of actual hard work on the problem of decentralized information, from ipfs to freenet to tor to blockchains to dht... they all have thought about the problem in depth and built the infrastructure to try to make it happen

I’m a fan of them all but how many of your non-tech friends have used them? Zero.

Realistically how much have any of us used them to do useful things that make our life easier?

I really appreciate your point of view and feedback. Clearly we’re on opposite sides of this but as I said, that’s why I’m here.


Thanks for taking the feedback into consideration.


The idea I had for this over 10 years ago was to modify djb's dnstxt to output raw HTML with a MIME header. tinydns allows one to store arbitrary data so I could put anything in a 512-byte DNS packet, including "text/html" (newlines and carriage returns). I could request tiny web pages from tinydns. There have been other similar things by other folks like putting deCSS code in DNS, Wikipedia data, audio files, etc.

Now this might all sound silly, but since then 1. DNS packets are now massive and can carry much more data and 2. big companies are pushing a "next-gen" UDP-based protocol (reminds me of djb's CurveCP) for HTTP, serving HTML and other web junk. They also want to use this UDP-based protocol for other things, like DNS, eventually.

Whatever. HTML in DNS worked great. Tiny pages that fit in a CPU cache.


Putting URLs in DNS. Could be useful.


I don't get it.

You have to host the static page somewhere. Why not host the data in the same place (as a json blob or whatever)? It seems like the implication is all the data in dns is static, so what's the benefit of adding dns to the mix - you already have to use something else to host some of your static resources, why not just have a single static host instead of two separate static hosting systems?

(Yes in theory you could write a dns server to dynamically answer queries, but that doesn't seem like its what is being proposed here)

Dont get me wrong, its cool you can access dns from a browser, but i dont think this is a compelling usecase (something like web torrent in the browser with magnet links in dns for a decentralized web, seems like it would make a good demo for this technique)


What we're trying to do here is standardise how machine-readable data is stored and retrieved, primarily for companies. Any of the companies on this page can adopt the standard and override the data on the website. So the point is, it's not static.

Sure, at the moment all this data is coming from one DNS zone (num.net) but that's only a failover query location, first we query e.g. barclays.co.uk for this data, so Barclays can override (or remove) this data if they want to.


So the interesting part here is not supposed to be querying dns data from the client side of a website, but a new standard for machine readable data in TXT records?

I mean between microdata, rdf, etc there's been a lot of standards for machine readable data storage on the internet. Most fail not for technical reasons, but because there is an incentive mismatch, where the people who have to maintain such data get no benefit out of maintaining it so they don't bother. In this scheme you can't even google it, so why would a company bother to use this?


> So the interesting part here is not supposed to be querying dns data from the client side of a website, but a new standard for machine readable data in TXT records?

Correct

> I mean between microdata, rdf, etc there's been a lot of standards for machine readable data storage on the internet. Most fail not for technical reasons, but because there is an incentive mismatch, where the people who have to maintain such data get no benefit out of maintaining it so they don't bother.

I'd argue the inconsistent way these standards have been adopted (or not in many cases) is the reason they are not an effective, reliable way for developers to find machine-readable data about _any_ business. Since they are not a reliable way to find data, developers can't build anything with that data and there's little incentive for companies to keep it up to date.

Currently, the only reason to adopt these standards is for SEO or to make your website look pretty when you share your website on Facebook / whatever.

> In this scheme you can't even google it, so why would a company bother to use this?

Google hold a lot of the data we're proposing to publish using NUM (contact data for example). The difference between our approaches is that we're going to make it freely available to developers, Google don't (not really). Google could consume NUM data like anyone else. As far as the protocol is concerned Google is the same as any other developer so it's a level playing field.


And now you are adding another standard that will be inconsistently adopted, especially since it's even less accessible to web developers than the alternatives?


We're pre-populating NUM with useful data, so it doesn't require domain registrants to adopt it.

NUM modules are much simpler than other comparable web standards so any data stored will be consistent and easy for developers that consume the data.

Edit: typo


Thank you for sharing your project, Elliott.

It is an intriguing technology, and a clever way of putting together existing pieces in a new way.

I think it is very much in the spirit of how things are done on the Internet, although I'm not entirely convinced that NUM wouldn't gum up the DNS system if widely adopted.

My question to you is, how would I take advantage of NUM as a small-time personal-website operator, if my website was largely about small text-file snippets?


Thanks for your kind comment.

The standardised use cases of NUM (currently contact data, images [gravatar for domains] and some others) would allow you as a small website operator to publish any public data that you wanted to be machine-readable.

I think your question might be how to use NUM to serve out your small text-file snippets? If so, NUM could be used for that. NUM lookups are based on URIs, eg:

num://numexample.com:1

In this example module 1 is contact data and lookups for this uri involve a query to the dns for:

dig 1._num.numexample.com TXT

If instead of publishing contact data you wanted to publish general purpose data you can do this using “module zero”. For example data could be published form the NUM URI num://numexample.com:0/foo

The DNS location is:

foo.0._num.numexample.com


>CompanyDirectory.UK is a NUM Technology [1]

>What is NUM?

>NUM is a DNS-based alternative to the World Wide Web for storing and retrieving structured data. The web is amazing but websites are built for browsing and are an inefficient way to find precise pieces of data like telephone numbers, bank details and more.

[1] https://www.num.uk



Thanks for your comment and for highlighting this.

As a summary for others: Telnic (and Telnames) allowed people to store contact data in the DNS of their .tel domain name.

> Sounds similar to Telnic

Telnic used NAPTR records and was only available on .tel domains, NUM uses TXT and can be used on any domain name, by publishing a "_num" zone.

Telnic was also "just" for contact data. What we're trying to do with NUM is make it possible to publish any kind of data. Contact data as just one "module".


The other key difference here as far as I understand, is that Telnic required you to register .tel domains, you couldn't bring your own domain and store contact information against that.


Please no.

CI systems like GitHub actions and CircleCI use to have unlimited CI minutes until people started abusing them for things they weren't intended for (i.e. crypto mining).

I would hate if I had to pay for DNS services (or be forced into using an ISP one which does things like blacklist certain domains) just because people on the internet wanted to be 'cool' and show off how smort they are.

This is why we can't have nice things.


I like the limit on GitHub actions. It won't be sustainable in long term otherwise. You just can't count on people's self discipline for this kind of stuff, and there will always be malicious usage as long as the system allows.

If putting large random TXT records can harm the global DNS infrastructure, DNS providers should put a stricter limit on it. And OP's project will help us get there.


I don't see why this application would be abusive. Chrome does much worse things trying to detect if you are behind a captive portal.

Edit: oh i see, they arent using generic dns but cloudflare's specific DoH gateway. Guess that could maybe be problematic if everyone did that, although if any company would tolerate that it would be cloudflare.


Thanks for your feedback. Obviously I think it’s cool but I doubt that’ll be a good enough reason for most people.

NUM is a protocol for storage and retrieval of data, that’s all. Hopefully the demonstration shows it’s pretty good at that. If people choose to store huge amounts of data using it, I think that would be cool too.

I don’t think there’s any risk of it being used for crypto mining. NUM doesn’t open up any new attacks vectors that aren’t already present in DNS and it’s an incredibly resilient and robust system as we know.


> If people choose to store huge amounts of data using it, I think that would be cool too.

This is what should be avoided. It adds unnecessary load to DNS servers around the globe.


If it's useful data that's being stored and retrieved, I'd argue it's better for the internet as a whole. It shifts load from webservers (where people are downloading 5mb of page data just to find a phone number) to DNS, which is far more efficient.

If NUM is success it will certainly increase load but I don't think it would be an unnecessary load, since it would be serving a worthwhile purpose.


It's not better for the internet as a whole. Host your own data behind a CDN and S3.

There are also lots of free services (github pages netlify) which allow you to store static content.

Use these over a DNS server to store your data.


I appreciate your feedback and am interested by your point of view on this.

> It's not better for the internet as a whole.

We've built this because we believe if all the people manually trawling websites looking for particular pieces of data were instead fetching this data automatically through ultra-efficient DNS queries, it would be better for the internet as a whole.

> Host your own data behind a CDN and S3.

We could certainly do this, we've made this a DNS-based protocol so domain owners can adopt it independently in their own DNS.


Have you considered that part of the reason why DNS is ultra efficient is that it's a relatively constrained dataset of small but meaningful data objects? I worry for all of the places running caching DNS proxies suddenly tanking their query resolver performance because their caches are flushing far more rapidly, and root hint servers seeing a massive influx of data now being stored in the platform. The technical marvel is cool, but I sincerely worry about this becoming a widely used product.


> Have you considered that part of the reason why DNS is ultra efficient is that it's a relatively constrained dataset of small but meaningful data objects?

I’m not sure it is. The size of DNSSEC responses are not insignificant. The apex for many domains is polluted with utter garbage: dig target.com TXT

The DNS is ultra efficient because it’s cached and distributed, yes packet sizes are relatively small but a standard NUM record would be smaller than the original udp packet limit of 512.


Better for the internet as a whole is a pretty big leap of reasoning. There is plenty of useful data in the web that needs to be stored and retrieved. In fact, this is precisely what the web was built for. It is an information network. Why hack the DNS system for this? DNS is meant for domain name discovery. It has a specific purpose. In large architectures, it is almost never a good idea to break such separation of concerns.


Thanks for your comment.

The DNS is used for a lot more than converting domain to IPs and mail servers, run:

dig target.com TXT

And be amazed at the pollution at the apex.

SPF, DMARC and others show that DNS is already being used for much more.

The DNS is a distributed database. The primary key is the domain and we believe that converting human friendly domains to machine readable data shouldn’t stop at IPs, mail servers and anti-spam measures (SPF and DMARC). It should extend to phone numbers, gps coordinates, bank details and more.


That is indeed quite some bloat. I suppose it makes sense for use cases where domain name related metadata is needed across the internet infrastructure (like for email with SPF). Just not sure it's a good idea to extend DNS for data like contact info that is ultimately meant for human consumption. Where do we draw the line otherwise? I could start caching web artefacts in TXT records or even serve entire web pages.


I agree there’s a line where putting data in DNS is just a gimmick (eg storing images in DNS) because the web does it infinitely better.

But in the case of contact information I think it makes a lot of sense. This demo site is an intense example because it’s contact info for very large companies. For smaller companies it’s just one DNS TXT record, usually under the original 512b DNS UDP packet limit.

I don’t agree that contact information is meant for human consumption. The humans (or often machines) at the end of phone numbers are for humans but the route to get there (eg telephone system) is designed for machines.


Your domain names are free?


Someone needs to host the DNS servers which deliver DNS results.

Serverless deployment models under the cover still uses servers.


What I am saying is that you are already paying for DNS services


What I said was I didn't want to have to pay to do DNS lookups.

E.g. if a client side DNS call was rate limited because DNS services like CloudFlare (1.1.1.1) or Google (8.8.8.8) got tired of caching data that was never suppose to be there.

There are people who have joked you can use route53 as a database (Corey Quinn originally?). But taking a joke too far is irresponsible.

Imagine having to pay a subscription to resolve DNS queries...

We have a good thing with it and this idea is like a knife in it (like my free CI minutes and cryptominers example)


I mean, technically speaking, a DNS is a database. I am not saying I like this idea, but are there any regulations on what you can or cannot put in TXT records?

> Imagine having to pay a subscription to resolve DNS queries...

Great, I won't pay for resolving googleadservices.com then :)

Joking aside, I personally don't mind paying for DNS (I am already doing that using NextDNS and it has been the best DNS service I have ever used). And as been pointed out by others, DNS providers can always put a limit or hit you with a ban hammer if they want.


Then presumably people would stop using these services and use others/make their own. Its not like google has a monopoly on running a public dns resolver.


In a cursory read through, I’m still not sure what problem NUM actually solves. It seems like it moves data storage into DNS TXT records and defines some parsers.

What benefits does NUM have over html over http? One can use some semantic html tags to organize the data for parsing, making it roughly equivalent to NUM modules. If a NUM client is running w/o a newly developed NUM module, how does it parse that data and how is it better than html without semantic meaning?


> What benefits does NUM have over html over http?

Web standards (JSON-LD, microdata, RDFa) are inefficient, since you have to download the entire HTML to find a particular piece of data like a phone number.

Web standards are also not widely adopted enough for developers to build something on top of that data. For example, web standards can't be used to find a phone number for _any_ company with a phone number published to their website, since many small businesses don't mark up phone numbers with semantic web data.

We're pre-populating NUM with millions of pieces of data so if a phone number is on the public website, it'll be in the NUM record for that domain.

> One can use some semantic html tags to organize the data for parsing, making it roughly equivalent to NUM modules.

Of course, the problem is that not enough websites do this (especially those of small business).

> If a NUM client is running w/o a newly developed NUM module, how does it parse that data and how is it better than html without semantic meaning?

If NUM is used for general purpose data storage (instead of for a particular module), then parsing of that data would need to be handled in a bespoke client – not a general purpose NUM client. Any advantages of using NUM over HTTP in this example but would be related to scaling and cost savings only.


Thinking about this a little more -- I'm not sure anyone is asking for something "more efficient" than html - at least in the USA, we're no longer bandwidth strapped. I thought the internet in UK was decent enough - at least when I visited in 2016 it was good enough to load google maps on 4G.

For semantic parsing, probably went too far down semantic web tags. A simple "tel:" link works for most people and there's phone integration already. ex:

    <!DOCTYPE html>
    <title>My co</title>
    <a href="tel:123-456-7890">123-456-7890</a>
(not well-formed, but it works!) - no js, no json required - good ole' html.


Thanks for coming back to this.

Internet speeds in the UK are comparable with the US. Many parts of the world are not as fortunate, with bandwidth severely limited and extortionate data charges (unless it’s Facebook data, then it’s free?!). However, NUM doesn’t exist because the world has a bandwidth problem.

You’re right of course that it’s possible to provide machine readable contact data on the web. NUM doesn’t exist because it’s not technically possible to do the same thing with data on the web.

NUM exists because after 15 years of the semantic web, this data simply isn’t available for most businesses on the web.

Even large businesses - take any example from the homepage of the demo site CompanyDirectory.uk, and see how they’ve implemented semweb standards. How they list their phone numbers for example. It’s an utter mess. Good luck to anyone trying to build a reliable service on top of that data.


> Web standards (JSON-LD, microdata, RDFa) are inefficient, since you have to download the entire HTML to find a particular piece of data like a phone number.

You might have picked a bad demo project to show off the efficiency of NUM. It takes me 58 requests and (on my very fast connection) 5.3 seconds to download all the information for Barclays. Downloading a JSON blob with all the data for Barclays, parsing it, and displaying it, would take a fraction of the time.

Sure, NUM might be great if you knew ahead of time exactly which DNS record the user would need, but most cases where something like this would be helpful I'd rather see a complete list of all contact information so I can make the correct choice given all the options. Are there use cases you've thought of where the application would know exactly which DNS record to load and display, rather than displaying an index the way you do with companydirectory.uk?


> You might have picked a bad demo project to show off the efficiency of NUM. It takes me 58 requests and (on my very fast connection) 5.3 seconds to download all the information for Barclays.

That’s very slow, I’d be interested how much of the delay is from the DNS calls and how much is overhead from the demo site.

We’ve purposely chosen to demo this site because it’s an example of an intensive NUM implementation.

> Downloading a JSON blob with all the data for Barclays, parsing it, and displaying it, would take a fraction of the time.

Of course, but that data doesn’t exist. Barclays’ phone numbers are split over hundreds of web pages which would require much more data download. I think your argument is that we could supply the data in JSON form, but then developers building on top of this data, would have to count out on our JSON response being fast and reliable.

What we’re describing here is the current state-of-the-art, it’s an API model where developers usually pay to access that data.

Barclays is a massive organisation with a very complicated contact setup. Smaller businesses are obviously much simpler, with all of their contact data contained within a single UDP packet of maybe 300 bytes.


The 5.3 second figure is the time from the start of the first API call to the completion of the final API call, as shown in Firefox's Network tab.


Very interesting, thanks for the data. “API call” being the first and last call to CloudFlare DoH? If so, there’s some throttling going on somewhere. It’s completing in around 1-1.5s for most I think.


I tried a few companies on the site and only AXA was slow for me - it looks like there are 4s delays for many of the queries to _num.axa.co.uk, so maybe they have some throttling/DDoS protection going on. They return a SERVFAIL status code.


Thanks for reporting this.

The protocol has a timeout when waiting for a response from the independent zone. It shouldn’t wait 4s for any response, so there could be a bug somewhere here. We’ll take a look.


This is interesting. I run https://newbusinessmonitor.co.uk/ and I’m often asked if I can provide contact data for UK companies, but this is difficult because no contact data is published when a company registers. I wonder how you’ll be getting this data for smaller companies.


The long game is for companies to store NUM data when they register a domain, potentially through their DNS provider or web designer. Maybe even storing machine readable data instead of a website, since many small business websites are cookie-cutter anyway – contact info, directions, menu etc.

Public data is available for most domain registries, e.g Nominet [0] and others [1] so we crawl new domains to discover contact data.

It could be good to chat (email elliott[dot]brown@num.uk), we're in the process of mashing up Companies House data with our contact data in an effort to provide company numbers in NUM records too.

0. https://registrars.nominet.uk/uk-namespace/the-uk-zone-files... 1. https://czds.icann.org/


I think it would be wise to slightly change the design so as not to look so similar to Companies House.


If you are offering this to people for free, what is your company’s business model? Why centralize their info on a single domain instead of putting it on their own using a common format?


The data is free to developers and users. We offer a freemium service to businesses that they can use to publish and manage NUM data.

We prepopulate data in our domain (NUM.net) so the protocol is useful from launch.

The protocol allows anyone to publish data to their own DNS, they can also delegate their “_num” zone to us so we can easily manage the data.

Our plan isn’t to centralise everyone’s data in one domain (num.net), but that’s a fallback. Realistically, many small businesses will use this hosted solution long term and our business model counts on that.


Isn’t DNS de-facto centralised too?


Yes in the sense that DNS is a tree with a single root.

No in the sense that you can put your data into leaves of your choice and under your control. You can easily run your own DNS server.

Halfway-yes in the sense that this thing uses HTTPS access to DNS which has few providers now, and not the classic DNS because UDP from a browser is hard, and even arbitrary TCP from a browser is hard.

I think that the use of DNS is a bit of a gimmick on this demo site; it could be more practically used in more interesting ways. A demo site should be and is very straightforward. But the idea of storing semantic-web-style information in the DNS looks interesting to me, and the suggested simplified format may be worth a look.

The service NUM is offering is also interesting, and one of the key parts of it is that it's not storing your data, so you are not beholden to it. It helps make them accessible, but you store the data yourself.


Ok, but, you can easily run your own Postgres server too, right?


Its data won't be cached by any number of caching DNS resolvers. If your DNS server has 90% uptime, but your domain if often resolved, clients may not notice anything; if your database has 90% uptime, it's pretty noticeable.

It of course only applies if the TTL of your records is long enough, but usually you don't need short TTL for TXT records of such kind.


But aren't you specificly using cloudflare's DoH endpoint to do the query? Wouldn't that mean that there aren't any caches other than cloudflare's? In which case, why not just put your website behind cloudflare?


Only this website uses CF DoH. It’s just an example of how data from the protocol can be used.

The data is stored in vanilla DNS so would be cached by any resolver that queries it.


I feel like saying something is "decentralized" is like saying something is "secure". Its a meaningless statement. Nothing is 100% secure and nothing is 100% decentralized. The relavent questions are from whom? Against what?


I think you could argue DNS over HTTPS is centralised (for now) with Cloudflare, Quad9, Google making up the main players. But I think the DNS is the original decentralised system due to the amount of players in the domain registrar / DNS service provider / ISP [DNS resolver] space.




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: