Hacker News new | past | comments | ask | show | jobs | submit | kllrnohj's comments login

> We wrote extensive pros and cons, emphasizing how each option fared by the criteria above: Collaboration, Abstraction, Migration, Learning, and Modding.

Would you really expect Godot to win out over Unity given those priorities? Godot is pretty awesome these days, but it's still going to be behind for those priorities vs. Unity or Unreal.



It is laziness to an extent, sure, but that's a huge part of language design. We wouldn't use Java or C# or Python or any of these high level languages if we weren't lazy, after all, we'd be writing assembly like the silicon gods intended!

The problem with Java checked exceptions is they don't work well with interfaces, refactoring, or layering.

For interfaces you end up with stupid stuff like ByteArrayInputStream#reset claiming to throw an IOException, which it obviously never will. And then for refactoring & layering, it's typical that you want to either handle errors close to where they occurred or far from where they occured, but check exceptions forces all the middle stack frames that don't have an opinion to also be marked. It's verbose and false-positives a lot (in that you write a function, hit compile, then go "ah forgot to add <blah> to the list that gets forwarded along..." -> repeat)

It'd be better if it was the inverse, if anything, that exceptions are assumed to chain until a function is explicitly marked as an exception boundary.


When I say lazy, I mean the essential work of modeling what's going on and making a decision which can only be made by a human. In this respect, choosing what exceptions-types to throw is like choosing what regular-types to return. If I return a GraphNode instead of a DatFile, then I should probably throw a GraphNodeException instead of a DatFileChecksumException.

Syntactic sugar should make it easier to capture the decision after it's been made. For example, like replacing "throws InnerException" (perhaps a leaky abstraction) with something like "throws MyException around InnerException".


Yes but you only make those types of decisions on library boundaries, which is a relatively small amount of code. Meanwhile checked exceptions make all of the code harder to deal with in non-trivial ways (eg, the ubiquitous "Runnable" cannot throw a checked exception). And it's that everywhere-else where "laziness" won and checked exceptions died.

> Exceptions (try/catch) became widespread much later

Exceptions, complete with try-catch-finally, were developed in the 60s & 70s, and languages such as Lisp and COBOL both adopted them.

So I'm not sure what you're calling "much later" as they fully predate C89, which is about as far back as most people consider when talking about programming languages.


Nuclear is politically dead, maybe, but it's absolutely the technology we need right now and should be using a hell of a lot more than we are. Solar and wind are great, but they need massive grid storage systems that we don't really have great options for. Nuclear is the consistent, safe power that everyone should be using as their core power solution, with solar & wind augmenting it.

It cannot be the technology of "right now" because it takes way too long to build. The UK has had a plant under construction for about a decade.

Sure it can, just look at France, it had over 70% electricity from nuclear in 2018.

Now sure the best time to have invested heavily in nuclear reactors was 30-40 years ago, but we still could today. The payoff won't be for 5-10 years, but wind & sun are never going to have continuous availability until we figure out space solar, after all.


That tells you they were the tech of when they were built. Current French nuclear power shows nuclear could've been the tech of the tomorrows of decades past, not today.

There's two different ways for wind and sun to be available continuously:

While I like the idea of a global* power grid, and have in fact done the maths on it working just fine and not being silly cost or ridiculously long periods of global aluminium production, geopolitical realities prevent it.

Storage is the other. The storage requirements for electrified personal transport are so large, several dozen kWh even for a small family car and much more for professional vehicles, that mere normal electrical usage is something you can do with spare capacity.

* Regional grids also help reduce the influence and severity of Dunkellaufe, but the models I've seen for dealing with these cost-effectively is "overbuild capacity by a few hundred percent because it's cheap, then add a few days worth of batteries because they're relatively pricy", so I count that as primarily option #2, storage.


New surveys just came out showing record breaking support for nuclear. 61% approval! It's even seeing potential revivals in Germany. There's strong bi-partisan support for nuclear in the US. I don't think calling it politically dead is right these days.

> It's even seeing potential revivals in Germany.

No, it isn't. The CSU ran with nuclear revival as one of their campaign promises and there seems to be some support for it among the population but it's nowhere to be found in the coalition agreement between CDU/CSU and SPD. Söder (the leader of the CSU) already backpaddled as well.

Really the only possibility for a nuclear revival in Germany would be a coalition between the climate-change-downplaying CDU/CSU and the straight-up-climate-change-denying AfD and I doubt that would be good for Germany's fight against climate change.


A 10kW system produces somewhere between 11,000kWh - 17,000kWh / year give or take. Qatar has one of lowest electricity prices in the world at $0.03/kW

$0.03 * 11,000kWh/year * 87 years = $28,710.

So either you're vastly underestimating the amount you pay in electricity, or you're using vastly less electricity in which case you obviously wouldn't get a 10kW system.


> 40x faster trigonometry: Speed-up standard library functions like std::sin in just 3 lines of code.

Huh, ok, let's see how...

    *  By limiting the expansion to a few
    *  terms, we can approximate `sin(x)` as:
    *
    *      sin(x) ≈ x - (x^3)/3! + (x^5)/5!
    *
    *  This reduces the computational cost but comes with reduced accuracy.
I see. "reduced accuracy" is an understatement. It's just horrifically wrong for inputs outside the range of [-2, 2]

https://www.wolframalpha.com/input?i=plot+sin+x%2C+x+-+%28x%...

It cannot handle a single interval of a sin wave, much less the repeating nature? What an absolutely useless "optimization"


You can always get more accuracy by expanding those 3 lines to handle more of the Taylor components… but it’s important to remember that this is still educational material.

You can find more complete examples in my SimSIMD (https://github.com/ashvardanian/SimSIMD), but they also often assume that at a certain part of a kernel, a floating point number is guaranteed to be in a certain range. This can greatly simplify the implementation for kernels like Atan2. For general-purpose inputs, go to SLEEF (https://sleef.org). Just remember that every large, complicated optimization starts with a small example.


Educational material that misinforms its readers isn't educational, and it's insanely counterproductive.

People have already ragged on you for doing Taylor approximation, and I'm not the best expert on the numerical analysis of implementing transcendental functions, so I won't pursue that further. But there's still several other unaddressed errors in your trigonometric code:

* If your function is going to omit range reduction, say so upfront. Saying "use me to get a 40× speedup because I omit part of the specification" is misleading to users, especially because you should assume that most users are not knowledgeable about floating-point and thus they aren't even going to understand they're missing something without you explicitly telling them!

* You're doing polynomial evaluation via `x * a + (x * x) * b + (x * x * x) * c` which is not the common way of doing so, and also, it's a slow way of doing so. If you're trying to be educational, do it via the `((((x * c) * x + b) * x) + a) * x` technique--that's how it's done, that's how it should be done.

* Also, doing `x / 6.0` is a disaster for performance, because fdiv is one of the slowest operations you can do. Why not do `x * (1.0 / 6.0)` instead?

* Doing really, really dumb floating-point code and then relying on -ffast-math to make the compiler unpick your dumbness is... a bad way of doing stuff. Especially since you're recommending people go for it for the easy speedup and saying absolutely nothing about where it can go catastrophically wrong. As Simon Byrne said, "Friends don't let friends use -ffast-math" (and the title of my explainer on floating-point will invariably be "Floating Point, or How I Learned to Start Worrying and Hate -ffast-math").


-ffast-math has about 15 separate compiler flags that go into it, and on any given piece of code, about 3-5 of them are disastrous for your numerical accuracy (but which 3-5 changes by application). If you do the other 10, you get most of the benefit without the inaccuracy. -ffast-math is especially dumb because it encourages people to go for all of them or nothing.

I'd say `x * a + (x * x) * b + (x * x * x) * c` is likely faster (subject to the compiler being reasonable) than `((((x * c) * x + b) * x) + a) * x` because it has a shorter longest instruction dependency chain. Add/Mul have higher throughput than latency, so the latency chain dominates performance and a few extra instructions will just get hidden away by instruction level parallelism.

Also x/6 vs x(1/6) is not as bad as it used to be, fdiv keeps getting faster. On my zen2 its 10 cycles latency and 0.33/cycle throughput for (vector) div, and 3 latency and 2/cycle throughput for (vector) add. So about 1/3 speed, worse if you have a lot of divs and the pipeline fills up. Going back to Pentium the difference is ~10x and you don't get to hide it with instruction parallelism.

* The first expression has a chain of 4 instructions that cannot be started before the last one finished `(((x * x) * x) * c) + the rest` vs the entire expression being a such a chain in the second version. Using fma instructions changes this a bit, making all the adds in both expressions 'free' but this changes precision and needs -ffast-math or such, which I agree is dangerous and generally ill advised.


Someone is still putting tremendous effort into this project so I reckon it would be worthwhile to submit this, obviously well thought through, criticism as a PR for the repo!

For the range reduction, I've always been a fan of using revolutions rather than radians as the angle measure as you can just extract fractional bits to range reduce. Note that this is at the cost of a more complicated calculus.

I can't for the life of me find the Sony presentation, but the fastest polynomial calculation is somewhere between Horner's method (which has a huge dependency tree in terms of pipelining) and full polynomial evaluation (which has redundancy in calculation).

Totally with you on not relying on fast math! Not that I had much choice when I was working on games because that decision was made higher up!


I don't know who the target audience is supposed to be, but who would be the type of person who tries to implement performance critical numerical codes but doesn't know the implications of Taylor expanding the sine function?

People who found that sin is the performance bottleneck in their code and are trying to find way to speed it up.

One of the big problems with floating-point code in general is that users are largely ignorant of floating-point issues. Even something as basic as "0.1 + 0.2 != 0.3" shouldn't be that surprising to a programmer if you spend about five minutes explaining it, but the evidence is clear that it is a shocking surprise to a very large fraction of programmers. And that's the most basic floating-point issue, the one you're virtually guaranteed to stumble across if you do anything with floating-point; there's so many more issues that you're not going to think about until you uncover them for the first time (e.g., different hardware gives different results).


Thanks to a lot of effort by countless legions of people, we're largely past the years when it was common for different hardware to give different results. It's pretty much just contraction, FTZ/DAZ, funsafe/ffast-math, and NaN propagation. Anyone interested in practical reproducibility really only has to consider the first two among the basic parts of the language, and they're relatively straightforward to manage.

Divergent math library implementations is the other main category, and for many practical cases, you might have to worry about parallelization factor changing things. For completeness' sake, I might as well add in approximate functions, but if you using an approximate inverse square root instruction, well, you should probably expect that to be differ on different hardware.

On the plus side, x87 excess precision is largely a thing of the past, and we've seen some major pushes towards getting rid of FTZ/DAZ (I think we're at the point where even the offload architectures are mandating denormal support?). Assuming Intel figures out how to fully get rid of denormal penalties on its hardware, we're probably a decade or so out from making -ffast-math no longer imply denormal flushing, yay. (Also, we're seeing a lot of progress on high-speed implementations of correctly-rounded libm functions, so I also expect to see standard libraries require correctly-rounded implementations as well).


The definition I use for determinism is "same inputs and same order = same results", down to the compiler level. All modern compilers on all modern platforms that I've tested take steps to ensure that for everything except transcendental and special functions (where it'd be an unreasonable guarantee).

I'm somewhat less interested in correctness of the results, so long as they're consistent. rlibm and related are definitely neat, but I'm not optimistic they'll become mainstream.


There are lots of cases where you can get away with moderate accuracy. Rotating a lot of batched sprites would be one of them; could easily get away with a mediocre Taylor series approximation, even though it's leaving free accuracy on the table compared to minimax.

But not having _range reduction_ is a bigger problem, I can't see many uses for a sin() approximation that's only good for half wave. And as others have said, if you need range reduction for the approximation to work in its intended use case, that needs to be included in the benchmark because you're going to be paying that cost relative to `std::sin()`.


> tries to implement performance critical numerical codes but doesn't know the implications of Taylor expanding the sine function?

That would be me, I’m afraid. I know little about Taylor series, but I’m pretty sure it’s less than ideal for the use case.

Here’s a better way to implement faster trigonometry functions in C++ https://github.com/Const-me/AvxMath/blob/master/AvxMath/AvxM... That particular source file implements that for 4-wide FP64 AVX vectors.


I am genuinely quite surprised that the sine approximation is the eyeball catcher in that entire repo.

It will only take a 5-line PR to add Horner’s method and Chebyshev’s polynomials and probably around 20 lines of explanations, and everyone passionate about the topic is welcome to add them.

There are more than enough examples in the libraries mentioned above ;)


It's eye catching because it's advertised as a 40x speedup without any caveats.

Oh, I've missed that part. It's hard to fit README's bullet points on a single line, and I've probably removed too many relevant words.

I'll update the README statement in a second, and already patched the sources to explicitly focus on the [-π/2, π/2] range.

Thanks!


I'd suggest simply adding `assert(-M_PI/2 <= x && x <= M_PI/2)` to your function. It won't slow down the code in optimized builds, and makes it obvious that it isn't designed to work outside that range even if people copy/paste the code without reading it or any comments.

Also, it would be good to have even in a "production" use of a function like this, in case something outside that range reaches it by accident.


Yes, that’s a very productive suggestion! Feel free to open a PR, or I’ll just patch it myself in a couple of hours when I’m back to the computer. Thanks!

No. Do not use Taylor series approximations in your real code. They are slow and inaccurate. You can do much, much better with some basic numerical analysis. Chebyshev and Remez approximations will give you more bang for your buck every time.

> but it’s important to remember that this is still educational material.

Then it should be educating on the applicability and limitations of things like this instead of just saying "reduced accuracy" and hoping the reader notices the massive issues? Kinda like the ffast-math section does.


This is kind of a dumb objection. If your sine function has good accuracy in [-pi/2, pi/2], you can compute all other values by shifting the argument and/or multiplying the result by -1.

But then you have to include this in the benchmark and it will no longer be 40x faster.

There are a bunch of real situations where you can assume the input will be in a small range. And while reducing from [-pi;pi] or [-2*pi;2*pi] or whatever is gonna slow it down somewhat, I'm pretty sure it wouldn't be too significant, compared to the FP arith. (and branching on inputs outside of even that expanded target expected range is a fine strategy realistically)

Most real math libraries will do this with only a quarter of the period, accounting for both sine and cosine in the same numerical approximation. You can then do range reduction into the region [0, pi/2) and run your approximation, flipping the X or Y axis as appropriate for either sine or cosine. This can be done branchlessly and in a SIMD-friendly way, and is far better than using a higher-order approximation to cover a larger region.

> branching on inputs outside of the target expected range is a fine strategy realistically

branches at this scale are actually significant, and so will drastically impact being able to achieve 40x faster as claimed


That's only if they're unpredictable; sure, perhaps on some workload it'll be unpredictable whether the input to sin/cos is grater than 2*pi, but I'm pretty sure on most it'll be nearly-always a "no". Perhaps not an optimization to take in general, but if you've got a workload where you're fine with 0.5% error, you can also spend a couple seconds thinking about what range of inputs to handle in the fast path. (hence "target expected range" - unexpected inputs getting unexpected branches isn't gonna slow down things if you've calibrated your expectations correctly; edited my comment slightly to make it clearer that that's about being out of the expanded range, not just [-pi/2,pi/2])

Assuming an even distribution over a single iteration of sin, that is [0,pi], this will have a ~30% misprediction rate. That's not rare.

I'm of course not suggesting branching in cases where you expect a 30% misprediction rate. You'd do branchless reduction from [-2*pi;2*pi] or whatever you expect to be frequent, and branch on inputs with magnitude greater than 2*pi if you want to be extra sure you don't get wrong results if usage changes.

Again, we're in a situation where we know we can tolerate a 0.5% error, we can spare a bit of time to think about what range needs to be handled fast or supported at all.


Those reductions need to be part of the function being benchmarked, though. Assuming a range limitation of [-pi,pi] even would be reasonable, there's certainly cases where you don't need multiple revolutions around a circle. But this can't even do that, so it's simply not a substitute for sin, and claiming 40x faster is a sham

Right; the range reduction from [-pi;pi] would be like 5 instrs ("x -= copysign(pi/2 & (abs(x)>pi/2), x)" or so), ~2 cycles throughput-wise or so, I think; that's slightly more significant than I was imagining, hmm.

It's indeed not a substitute for sin in general, but it could be in some use-cases, and for those it could really be 40x faster (say, cases where you're already externally doing range reduction because it's necessary for some other reason (in general you don't want your angles infinitely accumulating scale)).


At least do not name the function "sin". One former game dev works in my company and he is using similar tricks all the time. It makes code so hard to read and unless you are computing "sin" a lot speedup is not measurable.

At pi/2 that approximation gets you 1.0045, i.e. half a percent off, so it's not particularly good at that. (still could be sufficient for some uses though; but not the best achievable even with that performance)

Good argument reduction routines are not exactly easy for a novice to write, so I think this is a valid objection.

And it's not even using the Horner scheme for evaluating the polynomial.

It's not useless if it's good enough for the problem at hand.

Kaze Emanuar has two entire videos dedicated to optimizing sin() on the Nintendo 64 and he's using approximations like this without issues in his homebrew:

  - https://www.youtube.com/watch?v=xFKFoGiGlXQ

  - https://www.youtube.com/watch?v=hffgNRfL1XY

I came to these videos expecting someone else pushing Taylor series, but this video series was surprisingly good in terms of talking about hacking their way through some numerical analysis. These videos started with Taylor series, but did not end there. They came up with some hacky but much better polynomial approximations, which are respectable. Production math libraries also use polynomial approximations. They just don't use Taylor series approximations.

My small angle sin function is even faster: sin(x) = x

Oof this is bad. If you're going to ask people to approximate, use a Chebyshev approximation please. You will do sin(x) faster than this and more accurately.

> and I am thankful that Safari does not support it because good grief

Safari absolutely will support HDR images if it doesn't already. It might not support this PNG hack, but it's inevitable that it'll support HDR HEVC or JPEG images since those are what's produced by iOS and Android cameras respectively, and they obviously aren't going to just ignore them.


iPhones have supported HDR photos for over a decade, since at least the iPhone 5S; for whatever reason, they've ignored them for at least that long.

I think you're confusing which HDR is being used. HDR photography, the multi-expose combinations, was probably iPhone 5S but that still results in an SDR image. That's a completely different thing entirely.

iPhones have not captured HDR images until much, much more recently. No earlier than iPhone 12 at the soonest (when they first could capture HDR video), although they keep fiddling with which format they use for the result. iOS 17 was when they added support for displaying these images in UIKit & Swift, which was only like 2 years ago give or take. WWDC '23 was similarly when they started talking about handling HDR images. And they just recently announced they'll be adopting ISO 21496-1 at WWDC 2024. ISO 21496-1 being the gainmap style approach that Google & Adobe adopted with UltraHDR in 2023.


Thanks for the clarification.

There's many factors in play from what your SDR white point is at, how your OS handles HDR video, what the content contains, and finally what your brain is doing.

HDR10(+) & Dolby Vision, for example, encode content at absolute luminance, so they are basically completely trash formats since that's an insane thing to expect (the spec for authoring content in this format literally just goes "lol idk do X if you think it's going to be seen in a movie theater of Y for TV and hope"). Sadly, they are also quite common. Mobile phones (both Android & iOS) are instead pushing HLG, which is better. Although then hilariously MacOS's handling of HLG was atrocious until the latest update which fixed it but only if the video contains a magic flag that iPhone sets, but isn't standard so nobody else sets it (the "avme" tag https://developer.apple.com/documentation/technotes/tn3145-h... )

There's then also just how your eyes & brain react. When HDR shows up and suddenly the white background of a page looks like a dim gray? That's 100% a perceptual illusion. The actual light being emitted didn't change, just your perception of it did. This is a very hard problem to deal with, and it's one that so far the HDR industry as a whole has basically just ignored. But it's why there's a push to artificially limit the HDR range in mixed conditions, eg https://github.com/w3c/csswg-drafts/issues/9074


You clearly know alot about this, but I think there could be a misunderstanding. Not trying to offend but when I see the youtube link mentioned above in the other comment, my macbook screen literally goes darker AROUND the video , which gets brighter. I am not making this up. I think its how chrome on macbooks handles raw HDR encoding.

Can someone else confirm I am not mad?

PS - I am not trying to shut you down, you clearly know alot in the space I am just explaining what Im experiencing on this hardware.


> my macbook screen literally goes darker AROUND the video , which gets brighter. I am not making this up

This is almost certainly your eyes playing tricks on you, actually. Setup that situation where you know if you scroll down or whatever it'll happen, but before triggering it cover up the area where the HDR will be with something solid - like a piece of cardboard or whatever. Then do it. You'll likely not notice anything change, or if there is a shift it'll be very minor. Yet as soon as you remove that thing physically covering the area, bam it'll look gray.

It's a more intense version of the simultaneous contrast illusions: https://en.wikipedia.org/wiki/Contrast_effect & https://en.wikipedia.org/wiki/Checker_shadow_illusion

Eyes be weird.


The screen is literally getting darker so the HDR video will appear to have more contrast.

https://prolost.com/blog/edr


No, it literally isn't. It's literally doing the opposite, it increases the display brightness in order to show the HDR content. The SDR content is dimmed proportional to the increase such that SDR has the same emitted brightness before & after the change.

SDR brightness is not reduced to "add contrast". The blog post doesn't seem to say that anywhere, either, but if it does it's simply wrong. As a general note it seems wrong about a lot of aspects, like saying that Apple does this on non-HDR displays. They don't. It then also conflates EDR with whether or not HDR is used. EDR is simply the representation of content between apps & the compositor. It's a working space not entirely unlike scRGB where 0.0-1.0 is simply the SDR range, and it can go beyond that. But going beyond the maximum reported EDR range, which can be as low as 1.0, the result is simply clipped. So they are not "simulating" HDR on a non-HDR display.


I agree with what you said, but I was trying to give the layman summary ;)

> The SDR content is dimmed proportional to the increase such that SDR has the same emitted brightness before & after the change.

That's the intent, but because things aren't perfect it actually tends to get darker instead of stay perceptually the same. It depends on which panel you're using. MBPs are prone to this, XDR displays aren't.


> I agree with what you said, but I was trying to give the layman summary ;)

Your layman summary is wrong, though. Brightness stays the same is the summary, whereas you said it gets darker.

> MBPs are prone to this, XDR displays aren't.

On my M1 16" MBP it doesn't have any issue. The transition is slow, but the end result is reasonably aligned to before the transition. But yes MBP displays are not Apple's best. Sadly that remains something exclusive to the iPad


> Screens can't often do full brightness on the whole screen so if you come across a video or image that is supposed to have a higher contrast ratio, the system will darken everything and then brighten up the pixels that are supposed to be brighter.

There's no system that does that. The only thing that's kinda similar is at the display level there's a concept known as the "window size" since many displays cannot show peak brightness across the entire display. If you've ever seen brightness talked about in context of a "5%" or "10%" window size, this is what it means - the brightness the display can do when only 5% of the display is max-white, and the rest is black.

But outside of fullscreen this doesn't tend to be much of any issue in practice, and it depends on the display.


> There's no system that does that.

You mean the darkening of everything else to highlight bright HDR areas? All recent Macs do, including the one I'm typing on right now. It's a little disconcerting the first time it happens, but the effect is actually great!


Apple doesn't darken SDR to amplify HDR. They keep SDR the same brightness as it was before HDR showed up. It appears like SDR gets dimmer because of your perception of contrast, but it's not actually within a small margin of error.

Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: