More

hyperknot · 2025-08-10T00:17:37 1754785057

Yes, wplace could solve their whole need by a single, custom-built static pmtile. No need to serve 150 GB of OSM data for their use-case.

hyperknot · 2025-08-09T17:27:39 1754760459

They are actually static files. There is just too many of them, about 300 million. You cannot put that in Pages.

jonathanlydall · 2025-08-09T17:46:07 1754761567

Is CloudFlare’s R2 an option for you?

hyperknot · 2025-08-09T16:43:49 1754757829

I've just written my question to the nginx community forum, after a lengthy debugging session with multiple LLMs. Right now, I believe it was the combination of multi_accept + open_file_cache > worker_rlimit_nofile.

https://community.nginx.org/t/too-many-open-files-at-1000-re...

Also, the servers were doing 200 Mbps, so I couldn't have kept up _much_ longer, no matter the limits.

toast0 · 2025-08-09T17:31:47 1754760707

I'm pretty sure your open file cache is way too large. If you're doing 1k/sec, and you cache file descriptors for 60 minutes, assuming those are all unique, that's asking for 3 million FDs to be cached, when you've only got 1 million available. I've never used nginx or open_file_cache[1], but I would tune it way down and see if you even notice a difference in performance in normal operation. Maybe 10k files, 60s timeout.

> Also, the servers were doing 200 Mbps, so I couldn't have kept up _much_ longer, no matter the limits.

For cost reasons or system overload?

If system overload ... What kind of storage? Are you monitoring disk i/o? What kind of CPU do you have in your system? I used to push almost 10GBps with https on dual E5-2690 [2], but it was a larger file. 2690s were high end, but something more modern will have much better AES acceleration and should do better than 200 Mbps almost regardless of what it is.

[1] to be honest, I'm not sure I understand the intent of open_file_cache... Opening files is usually not that expensive; maybe at hundreds of thousands of rps or if you have a very complex filesystem. PS don't put tens of thousands of files in a directory. Everything works better if you take your ten thousand files and put one hundred files into each of one hundred directories. You can experiment to see what works best with your load, but a tree where you've got N layers of M directories and the last layer has M files is a good plan, 64 <= M <= 256. The goal is keeping the directories compact so searching and editing is effective.

[2] https://www.intel.com/content/www/us/en/products/sku/64596/i...

Aeolun · 2025-08-10T10:21:13 1754821273

If you do 200Mbps on a hetzner server after cloudflare caching, you are going to run out of traffic pretty rapidly. The limit is 20TB / month (which you’d reach in roughly 9 days).

johnisgood · 2025-08-10T15:45:03 1754840703

One would think services like these do not have to rely on online services and have their own rack of servers. Or is this so alien these days?

MaKey · 2025-08-10T16:03:56 1754841836

Small addition: That limit applies to Hetzner Cloud servers, their dedicated servers have unlimited traffic.

Aeolun · 2025-08-10T19:37:04 1754854624

Depends on your connection I think. Mine do 1Gbit/sec but have a 20TB limit. The 100Mbit ones are unlimited (last I checked)

ndriscoll · 2025-08-09T16:58:19 1754758699

One thing that might work for you is to actually make the empty tile file, and hard link it everywhere it needs to be. Then you don't need to special case it at runtime, but instead at generation time.

NVMe disks are incredibly fast and 1k rps is not a lot (IIRC my n100 seems to be capable of ~40k if not for the 1 Gbit NIC bottlenecking). I'd try benchmarking without the tuning options you've got. Like do you actually get 40k concurrent connections from cloudflare? If you have connections to your upstream kept alive (so no constant slow starts), ideally you have numCores workers and they each do one thing at a time, and that's enough to max out your NIC. You only add concurrency if latency prevents you from maxing bandwidth.

hyperknot · 2025-08-09T17:04:21 1754759061

Yes, that's a good idea. But we are talking about 90+% of the titles being empty (I might be wrong on that), that's a lot of hard links. I think the nginx config just need to be fixed, I hope I'll receive some help on their forum.

ndriscoll · 2025-08-09T17:38:00 1754761080

You could also try turning off the file descriptor cache. Keep in mind that nvme ssds can do ~30-50k random reads/second with no concurrency, or at least hundreds of thousands with concurrency, so even if every request hit disk 10 times it should be fine. There's also kernel caching which I think includes some of what you'd get from nginx's metadata cache?

justinclift · 2025-08-10T04:46:21 1754801181

> so I couldn't have kept up _much_ longer, no matter the limits.

Why would that kind of rate cause a problem over time?

hyperknot · 2025-08-09T15:55:01 1754754901

We are talking about an insane amount of data here. It was 56 Gbit/s (or 56 x 1 Gbit servers 100% saturated!). This is not something a "caching server" could handle. We are talking on the order of CDN networks, like Cloudflare, to be able to handle this.

Sesse__ · 2025-08-09T19:28:13 1754767693

> We are talking about an insane amount of data here. It was 56 Gbit/s. This is not something a "caching server" could handle.

You are not talking about an insane amount of data if it's 56 Gbit/s. Of course a caching server could handle that.

Source: Has written servers that saturated 40gig (with TLS) on an old quadcore.

hyperknot · 2025-08-09T20:12:27 1754770347

OK, technically there might exist such server, I guess Netflix and friends are using those. But we are talking about a community supported, free service here. Hetzner servers are my only options, because of their unmetered bandwidth.

Sesse__ · 2025-08-09T21:12:03 1754773923

It really depends on the size of the active set. If it fits into RAM of whatever server you are using, then it's not a problem at all, even with completely off-the-shelf hardware and software. Slap two 40gig NICs in it, install Varnish or whatever and you're good to go. (This is, of course, assuming that you have someone willing to pay for the bandwidth out to your users!)

If you need to go to disk to serve large parts of it, it's a different beast. But then again, Netflix was doing 800gig already three years ago (in large part from disk) and they are handicapping themselves by choosing an OS where they need to do significant amounts of the scaling work themselves.

hyperknot · 2025-08-09T21:24:14 1754774654

I'm sure the server hardware is not a problem. The full dataset is 150 GB and the server has 64 GB RAM, most of which will be never requested. So I'm sure that the used tiles would actually get served from OS cache. If not, it's on a RAID 0 NVME SSD, connected locally.

What I've been referring to is the fact that even unlimited 1 Gbps connections can be quite expensive, now try to find a 2x40 gig connection for a reasonable money. That one user generated 200 TB in 24 hours! I have no idea about bandwidth pricing, but I bet it ain't cheap to serve that.

Sesse__ · 2025-08-09T22:03:22 1754777002

Well, “bandwidth is expensive” is a true claim, but it's also a very different claim from “a [normal] caching server couldn't handle 56 Gbit/sec”…?

hyperknot · 2025-08-09T22:20:36 1754778036

You are correct. I was putting "a caching server on their side" in the context of their side being a single dev hobby project running on a VPS, exploding on the weekend. I agree that these servers do exist and some companies do pay for this bandwidth as part of their normal operations.

Aeolun · 2025-08-10T10:32:01 1754821921

56 Gbit/sec costs you about €590/day even on Hetzner.

bigstrat2003 · 2025-08-10T05:52:52 1754805172

I realize that what constitutes "insane" is a subjective judgement. But, uh... I most certainly would call 56 Gbps insane. Which is not to say that hardware which handles it doesn't exist. It might not even be especially insane hardware. But that is a pretty insane data rate in my book.

ndriscoll · 2025-08-09T16:25:19 1754756719

I'd be somewhat surprised if nginx couldn't saturate a 10Gbit link with an n150 serving static files, so I'd expect 6x $200 minipcs to handle it. I'd think the expensive part would be the hosting/connection.

wyager · 2025-08-09T16:08:53 1754755733

> or 56 x 1 Gbit servers 100% saturated

Presumably a caching server would be 10GbE, 40GbE, or 100GbE

56Gbit/sec of pre-generated data is definitely something that you can handle from 1 or 2 decent servers, assuming each request doesn't generate a huge number of random disk reads or something

hyperknot · 2025-08-09T15:23:34 1754753014

Yes, I designed the whole path structure / location blocks with caching in mind. Here is the generated nginx.conf, if you are interested:

https://github.com/hyperknot/openfreemap/blob/main/docs/asse...

hyperknot · 2025-08-09T15:02:23 1754751743

What if one user really wants to browse around the world and explore the map. I remember spending half an hour in Google Earth desktop, just exploring around interesting places.

I think referer based limits are better, this way I can ask high users to please choose self-hosting instead of the public instance.

hyperknot · 2025-08-09T14:48:20 1754750900

Feel free to migrate. If you ever worry about High Availability, self-hosting is always an option. But I'm working hard on making the public instance as reliable as possible.

hyperknot · 2025-06-10T19:37:16 1749584236

I got 700+ tokens/sec on o3 after the announcement, I suspect it's very much a quantized version.

https://x.com/hyperknot/status/1932476190608036243

dist-epoch · 2025-06-10T20:07:19 1749586039

Or maybe they just brought online much faster much cheaper hardware.

az226 · 2025-06-11T00:52:42 1749603162

Or they are using a speedy add-on decoder.

beering · 2025-06-11T01:02:31 1749603751

Do you also have numbers on intelligence before and after?

zackangelo · 2025-06-10T20:27:43 1749587263

Is that input tokens or output tokens/s?

hyperknot · 2025-05-30T21:03:10 1748638990

Update: news sites have pulled the video now, confirming parts of it were AI generated.

https://deadline.com/2025/05/nbc-viral-chinese-paraglider-vi...

hyperknot · 2025-04-10T11:53:05 1744285985

This is actually correct, I don't know why is it downvoted. All current models have their identity "burned-in", not needing system prompt for that.