I think this is true on any x86_64 platform for which sizeof(long double) exceeds 8.
 Yeah, I think there's a software emulation layer to give you 128 bits on other platforms or somesuch. Again, not important nor relevant to a discussion abou FSIN.
Edit: or are you referring to fsin at the C language level (in which case the x87 is hard to relate)
What exists are different libraries that calculate the "sine" better than fsin from x87 instruction set, and for all of these, it is actually not important that they use SSE or SSE2 -- it is completely possible to implement the better sine algorithms with the basic x87 functions too, or with anything that doesn't have SSE and SSE2. The effect of not having the sine function at all in SSE and SSE2 sets is that if you decide to use only these instructions for your x86_64 library, you have to implement everything with the basic instructions that you have, that is, you'd surely have to use some library code, even in the range in which fsin would suffice.
However, if the question was actually "why wasn't the implementation of the x87 fsin instruction updated to be simply better" (which certainly could be implemented in the microcode) the answer is that apparently AMD tried exactly that with their K5 (1996) and then for the later processors had to revert to the "worse" to keep the compatibility with the existing programs, it is written in the original article or in the comments of it.
Though, disclaimer: I am not even really an amateur at this level of analysis, so take what I've said with its very own salt lick.
x87 supports 80-bits calculations, others are the result of setting the configuration register to shorter widths.
That's one of its advantages over both SSE and SSE2. There are still some use cases where it's reasonable to use x87.
SSE has single precision (32-bits) instructions only.
SSE2 has double precision (64-bits) instructions.
When stated as "why x87 didn't change to use SSE" the question is like asking "why a dog didn't change to use a cat" as both x87 and SSE are the instruction sets, from the start differently defined, and given that, one can't "use" another.
The original question, however, referred to fsin instruction of x87 instruction set, but also reflected somewhat of confusion, as neither SSE nor SSE2 had ever an instruction to calculate sine.
And the answer in which you apparently gave a "historic rationale" had incorrect statements, the correction of which is: 1) x87 didn't "have 32 and 64 bits", the x87 was ambitiously designed to do 80-bit precise calculations with the shorter results as the additional modes. 2) SSE was 32-bit only, but SSE2 added 64-bit instructions too. Still, x87 could not "use the implementation" of fsin from SSE2 as SSE2 doesn't provide the sine function. Finally, if the question was why wasn't the "fsin" ever improved, please see the other responses here, including mine.
- either there was no user who seriously used it (to detect the error)
- or the user who seriously used the sin and detected the error didn't pass through the maintainers' "wall blocking the casual contributions" (a major maintainer for a few years was kind of legendary for being very dismissive and hard to communicate with).
- or the detector of the error remained silent.
It seems that Bruce was the first managing to induce the change in glibc, and he needed to reach Intel first for that.
Linus being wrong in believing the documentation instead of checking it himself:
I'm curious when Java fixed their bad assumptions:
It seems much faster, as already in 2005 Gosling knew the truth:
"the x87 fsin/fcos use a particular approximation to pi, which effectively means the period of the function is changed, which can lead to large errors outside [-pi/4, pi/4]."
"What we do in the JVM on x86 is moderately obvious: we range check the argument, and if it's outside the range [-pi/4, pi/4]we do the precise range reduction by hand, and then call fsin."
Gosling actually quotes "Joe Darcy, our local Floating Point God."
Joseph D. Darcy was also a co-author of:
"How Java's Floating-Point Hurts Everyone Everywhere" with W. Kahan (http://www.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf)
and his master thesis was:
"Borneo: Adding IEEE 754 floating point support to Java" http://sonic.net/~jddarcy/Borneo/
"a dialect of the Java language designed to have true support for the IEEE 754 floating point standard." "Unfortunately, Java's specification creates several problems for numerical computation" ... "useful IEEE 754 features are either explicitly forbidden or omitted from the Java specification."
The actual situation was not too similar: the instruction that should speed up the calculation and returning the correct bits in the specific range, actually behaves as intended, that is, the designers of the instruction assumed the users will know the limitation of it, that is, it was originally assumed that the users were supposed to know that the function is not the "general" one and the "bug" was just in the misleading documentation. By fdiv you've had an instruction that didn't behave as intended, being wrong for some very specific inputs and not only outside of the designed range.
glibc changed their implementation before I reported on this.
Is there a use case for such a numerically unstable function use except as a party trick to get pi?
So it doesn't make sense to interpret this as an imprecision of the input number - the fsin instruction failed to give a result which was as accurate as its documentation promised. The documentation has been fixed now.
I don't know how often this actually matters, but it's worth nothing that library implementations of sin() don't use fsin anymore, and I think that is partially because of this flaw.
Not guessing what was 'intended', just treating a number like "1.234567" as "any number that could have rounded to 1.234567000, whatever is most convenient". Does it cause any practical problems to do this? How often do people actually expect/need a double to be treated as having infinite precision, rather than 53 bits or slightly more than 53 bits of precision?
Let's build something better.
Turns out it's just hard to implement continuous math on physical hardware.
I'm curious (and a bit skeptical) about any efforts that think they can do better.
I also recommend his book, The End of Error - Unum Computing:
A transcript of a debate with William Kahan is here: