Hacker Newsnew | past | comments | ask | show | jobs | submit | Ne02ptzero's commentslogin

I wonder as well. Most of the climbing gyms are missing in France's big cities (Arkose, ClimbUp, etc) and the School Room is not even on the map :( !


I used Google map API to fetch all the gyms then cached it locally. It only allows (at least from what I can find) text search within a viewport. I think there are some problems when loading boulder gyms in France. At first glance, I thought France was not really into bouldering...


That first glance was a very wrong glance considering that Fontainebleau is widely considered the bouldering Mecca


> At first glance, I thought France was not really into bouldering...

You'd be very wrong! There is at least 40 bouldering gyms in Paris (and suburbs) alone.


You should consider feeding your data into OSM! Surely there's already a suitable metadata tag for olfactics?


> You can guarantee that you and someone else are listening to the same thing even across an ocean.

How can you guarantee that? NTP fails to guarantee that all clocks are synced inside a datacenter, let alone across an ocean (Did not read the code yet)

EDIT: The wording got me. "Guarantee" & "Perfect" in the post title, and "Millisecond-accurate synchronization" in the README. Cool project!


More, the speed of light puts a hard cap on how simultaneous you can be. Wolfram Alpha reckons New York to London is 19ms in a vacuum, more using fibre.

Going off on a tangent: Back in the days of Live Aid, they tried doing a transatlantic duet. Turns out it’s literally physically impossible because if A songs when they hear B, then B hears A at least 38ms too late, which is too much for the human body to handle and still make music.


It's a less hard problem than the duet. If the round-trip is 38ms, you can estimate that the one-way latency is 19ms. You tell the the other client to play the audio now, and you schedule it for 19ms in the future.

That's assuming standard OS and hardware and drivers can manage latency with that degree of precision, which I have serious doubts about.

In a duet, your partner needs to hear you now and you need to hear them now. With pre-recorded audio, you can buffer into the future.


You’re right that it’s an easier problem, but it’s still trickier than it looks. Remember the point of this is to be listening together. To do that, you need to be able to communicate your reactions. And then you’re back to the 38ms (in practice it’s probably twice that). Either way, at 120bpm that’s over a bar!

If you _don’t_ have real time communication, then you don’t really need to solve this problem. But the problem is fundamentally unsolvable because the speed of light (in a vacuum) is the speed of causality and, as I say, puts a hard cap on simultaneity. This tends to be regarded as obvious at interstellar distances but it affects us at transatlantic distances too.


You're basically right, but one 4-beat bar @120bpm is 2000ms.

Also latency demands on conversation are not nearly as tight as those on music performance. See ubiquitous video conferences.


My brain ain't working, and yeah, I don't tend to notice transatlantic delays on voice and video calls.


> More, the speed of light puts a hard cap on how simultaneous you can be.

Special relativity does indeed have something to say about simultaneity.

> Wolfram Alpha reckons New York to London is 19ms in a vacuum, more using fibre.

And this is not, in any respect, a limit on simultaneity. If the endpoints are moving at very very very quickly relative to each other, then there are complications. Otherwise you measure that 19ms or so and deal with it.


Haha yeah guarantee is a strong word. I just mean that it’s good enough to not be noticeable (even within the same physical room)


Out of curiosity, any documents on this, even for something else than AWS's S3? I find the idea very interesting


Nothing specific that I'm aware of. Off the top of my head, and sticking to things that are pretty safe NDA territory, load balancers algorithms typically do things round-robin style, or least current connections, or speed of response etc. They don't know anything about the underlying servers, or the requests, just what they've sent in which direction. If you have multiple load balancers sitting in front of your fleet, they often don't know anything about what each other is doing either.

With an Object Storage service like S3, no two GET or PUT requests an LB serves are really the same, or have the same impact. They use different amounts of bandwidth, pull different at different speeds, different latency, require different amounts of CPU for handling or checksumming etc. It didn't used to be too weird to find API servers that were bored stiff, while others were working hard, all while having approximately the same number of requests going to them.

Smartphones used to be a nightmare, especially with the number that would be on poor signal quality, and/or reaching internationally. Millions of live connections just sitting there slowly GETing or PUTing requests, using up precious connection resources on web servers, but not much CPU time.


> Executing cargo bench on Limbo’s main directory, we can compare SQLite running SELECT * FROM users LIMIT 1 (620ns on my Macbook Air M2), with Limbo executing the same query (506ns), which is 20% faster.

Faster on a single query, returning a single result, on a single computer. That's not how database performance should be measured or compared.

In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway


The goal here is not to claim that it is faster, though (it isn't, in a lot of other things it is slower and if you run cargo bench you will see)

It is to highlight that we already reached a good level of performance this early in the project.

Your claim about the programming language having no impact is just false, though. It's exactly what people said back in 2015 when we released Scylla. It was already false then, it is even more false now.

The main reason is that storage is so incredibly fast today, the CPU architecture (of which the language is a part) does make a lot of difference.


Yo glommer, I am ... very surprised to see any benchmark beat the micro-tuned sqlite so early, congrats. Where do you think rust is picking up the extra 100ns or so from a full table scan?


> It is to highlight that we already reached a good level of performance this early in the project.

This is the right thing to do. It's a pity so many projects don't keep an eye on performance from the very first day. Getting high performing product is a process, not a single task you apply at the end. Especially in a performance critical system like a database, if you don't pay attention to performance and instead you delay optimizing till the end, at the end of the day you'll need to do a major rewrite.


thanks. I am sad, but not that surprised, that a lot of people here are interpreting this as we claiming that we're already faster than sqlite all over.

I don't even care about being faster than sqlite, just not being slower, this early, is already the win I'm looking for.


Not sure which other comments you're seeing, but my original comment wasn't intended that way.


> In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway

That was true maybe 30 years ago with spinning disks and 100 mbit ethernet. Currently, with storage easily approaching speeds of 10 GB/s and networks at 25+ Gbit/s it is quite hard to saturate local I/O in a database system. Like, you need not just a fast language (C, C++, Rust) but also be very smart about how you write code.


I can’t trust benchmarks on code that isn’t feature complete. The missing functionality is never free.


> In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway

may be! However, Rust makes some things easier. It is also easy to maintain and reiterate


Segments are provided by community members, and you can adjust/edit them if you're finding the segment to be badly timed yourself.


Side topic: Didn't we have a is_const_eval equivalent in C? I seem to remember a dark magic macro doing precisely that on the lkml somewhere, but can't find it.


You might be able to do something truly arcane using GCC extensions and maybe the __attribute((error))__ thing but realistically it either won't compile or you won't notice unless it's too slow.

The compiler constant folds fairly effectively anyway.


There is __builtin_constant_p (a GCC extension): https://gcc.gnu.org/onlinedocs/gcc-11.3.0/gcc/Other-Builtins...

It's not exactly the same thing, but it's probably what you were thinking of.


This is what qemu uses:

  #define QEMU_BUILD_BUG_ON(x) \
    typedef char qemu_build_bug_on[(x)?-1:1] __attribute__((unused));
So you can write stuff like:

  QEMU_BUILD_BUG_ON(sizeof (struct foo) == 128);
(for example if the struct is used for some network protocol and so it must be 128 bytes long).


We've switched to _Static_assert() now we can assume all our compilers support it:

   #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg)

   #define QEMU_BUILD_BUG_ON(x) QEMU_BUILD_BUG_MSG(x, "not expecting: " #x)


I assume you're asking about a check for constness, like __is_constexpr() in Linux kernel.

Starting with C11, it's possible to implement it without any non-standard extensions: https://stackoverflow.com/a/49480926


There is no compilation-time evaluation of functions and statements in C. You have constants, which can be expressions, but that's about it. Oh, and macros of course.

compile-time evaluation was introduced in C++11 and expanded in subsequent versions of the language standard.


Also, `const` values in C are not really evaluated at compile time, because they have by default external linkage so they are not really "constant", they are more like "readonly" or 'let' in Rust. They are always loaded from memory, and that's why #define is still king in C.

Doing

  const int X = 33;
  
  // ...
  
  void something() {
      int arr[X] = ...
  }
works in both C and C++, but with a catch: in C++, X is evaluated at compile time (it has internal linkage), while in C this works only after C99 because it's actually a VLA.


Although many compilers allow

const int X = 33;

enum { Y = X };


That's a GNU extension that everyone relies so much on it's basically akin to a standard. You get a warning from Clang (or an error from GCC) if you ask for a more "standard" interpretation of the source:

    $ clang -o cs cs.c -Wall -std=c11 -pedantic
    cs.c:6:6: warning: expression is not an integer constant expression; folding it to a constant is a GNU extension [-Wgnu-folding-constant]
        A = X,
    $ gcc -o cs cs.c -Wall -std=c11
    cs.c:6:9: error: enumerator value for ‘A’ is not an integer constant
        6 |         A = X,
          |         ^
In general, that's illegal ISO C and should always be rejected, but as you see that's not usually the case.



What do you want to use it for? C++ uses it for the compile-time programming together with templates. Essentially C++ evolves into a language where you write a program that gets interpreted at compile-time to produce some code that then gets compiled. But I am not sure this is a good idea. It adds a lot of complexity, places many responsibilities that belongs into a compiler onto library writers, increases compilation time a lot, and - with all template expansion - causes a lot of bloat that hurts performance. It also makes debugging interesting. But it looks really good in microbenchmarks because you can create optimal code for special cases easily, I just do not see how this translates into real world performance.


If it's not in the language it will be in separate preprocessors, which will all be strictly worse.


> We ensure the CRDT is synced with at least two nodes in different geographical areas before returning an OK status to a write operation [...] we ensure durability of written data but without having to pay the synchronization penalty of Raft.

This is, in essence, strongly-consistent replication; in the sense that you wait for a majority of writes before answering a request: So you're still paying the latency cost of a round trip with a least another node on each write. How is this any better than a Raft cluster with the same behavior? (N/2+1 write consistency)


Raft consensus apparently needs more round-trips than that (maybe two round-trips to another node per write?), as evidenced by this benchmark we made against Minio:

https://garagehq.deuxfleurs.fr/documentation/design/benchmar...

Yes we do round-trips to other nodes, but we do much fewer of them to ensure the same level of consistency.

This is to be expected from a distributed system's theory perspective, as consensus (or total order) is a much harder problem to solve than what we are doing.

We haven't (yet) gone into dissecating the Raft protocol or Minio's implementation to figure out why exactly it is much slower, but the benchmark I linked above is already strong enough evidence for us.


I think it would be great if you could make a Github repo that is just about summarising performance characteristics and roundrip types of different storage systems.

You would invite Minio/Ceph/SeaweedFS/etc. authors to make pull requests in there to get their numbers right and explanations added.

This way, you could learn a lot about how other systems work, and users would have an easier time choosing the right system for their problems.

Currently, one only gets detailed comparisons from HN discussions, which arguably aren't a great place for reference and easily get outdated.


RAFT needs a lot more round trips that that, it needs to send a message about the transaction, the nodes need to confirm that they received it, then the leader needs to send back that it was committed (no response required). This is largely implementation specific (etcd does more round trips than that IIRC), but that's the bare minimum.


You know it's a big outage when other people are writing status updates for you


Too bad Facebook can't write status updates on Facebook. They have Twitter account tho.


> For the monthly expenses, with most of the components running on-premises, there was a 90.61% cost reduction, going from US$ 38,421.25 monthly to US$ 3,608.99, including the AWS services cost.

I might be missing something, but $3k/month (let alone $38k/month) sounds absolutely insane to me for how little metrics they're collecting (4k5 metrics per second, 2.7TB of data per year). Is the money going for network bandwidth or something along those lines?


AWS, for example, charges for cross-AZ data transfers. Naive setups with multiple AZs (us-west-1a, 1b, etc.) and a centralized Prometheus setup will rack up quite the cost.


The reducing cross-AZ data transfer savings on one service resulted in a low 6 figure per year savings. Its something we overlooked during initial setup and now its something I check for when dealing with AWS networking.


Out of curiosity, what's your plan for recovery during an AZ or regional outage?


If RDS or other “hard to replicate very quickly during disaster” infra is being run I personally would still have cross A-Z replication at minimum, to reduce network costs I would configure the “other zone” as a backup replica only and not for performance clustering.

With automation we can spin up full new compute stacks, including load balancers and DNS in about 5-10 minutes per “unique” environment configuration.

While it guarantees we could never have a no downtime failover, we’re okay with it and have more than halved our network costs (which admittedly were about number 8 on our AWS bill by cost).


If AWS has a regional outage this service is so far down the list of services to recover/restore that it probably will be overlooked. Accept the increased risk for the cost savings since it meets the reliability requirements of the service.


They say they ingest 226gb per month. The cross-az transfer cost is $0.01 per gb. So that should come out to only $2.26/month for them.


They are also storing the data in S3 (ingest bandwidth, storage). Plus running an RDS instance (instance cost, x2 if replicated, bandwidth again possibly x2 or more, IO cost) as well as local storage (EBS size/IO cost). And the size of the raw metrics might not be the wire size, especially if it’s uncompressed JSON. My guess on reviewing the post is that a lot of their savings were inter-AZ bandwidth conservation. But it’s hard to say without poking around in their AWS console :)


I agree. I'm working with a metrics system that takes just over 1 million metrics a second and it has a similar run rate to ~38k a month.


NASA actually made a pretty informative page about it[1], with some simulation of sound on Mars, compared to Earth. Hopefully we won't need the simulation much longer!

[1] https://mars.nasa.gov/mars2020/participate/sounds/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: