> extract a small token from the beginning of a large string, and then skip past it to do the same thing again after that small token: essentially, build a lexical analyzer in which the pattern matcher consists of calls to scanf.
Scanning floats with sscanf means undefined behavior on a datum like 1.0E999 against a %lf (double), where the exponent is out of range. IEEEE double goes to E308, if I recall.
C99 says "Unless assignment suppression was indicated by a[n asterisk], the
result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object
does not have an appropriate type, or if the result of the conversion cannot be represented
in the object, the behavior is undefined."
I'm seeing that on glibc, 1E999 scans to an Inf, but that is not a requirement.
This problem could be solved with sscanf, too, while maintaining the degenerate quadratic behavior. sscanf patterns could be used to destructure a floating-point number textually into pieces like the whole part, fractional part and exponent after the E, which could then be validated separately, like that the E part is between -308 and 308 and so forth. Someone could still do this in a loop over a string with millions of floats. I'd say though that you're way past the point of needing a real lexical analyzer, and not multiple passes through scanf.
What about parsing an unknown number of at-most-4-digits integers from a string (being careful, as you exemplified)? Isn’t that a reasonable use of sscanf?
Do I need a full fledged parser for that? No. Does it have to traverse the whole string every time? No, because I’m being very clear about the maximum length of the thing I’m trying to parse.
This interface smells like it might have the flaw that if someone wants "buf" to come from a null-terminated string, they must measure that string in order to supply the size parameter.
I.e. the underlying mechanism is binary, and does not stop at null characters.
There needs to be a separate mechanism with a different underlying stream implementation:
That fmemopen extension points to a solution to the original problem, though non-portable: open the entire buffer of numbers as a stream in one fmemopen call, and then use fscanf to march through it.
> extract a small token from the beginning of a large string, and then skip past it to do the same thing again after that small token: essentially, build a lexical analyzer in which the pattern matcher consists of calls to scanf.
Scanning floats with sscanf means undefined behavior on a datum like 1.0E999 against a %lf (double), where the exponent is out of range. IEEEE double goes to E308, if I recall.
C99 says "Unless assignment suppression was indicated by a[n asterisk], the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined."
I'm seeing that on glibc, 1E999 scans to an Inf, but that is not a requirement.
This problem could be solved with sscanf, too, while maintaining the degenerate quadratic behavior. sscanf patterns could be used to destructure a floating-point number textually into pieces like the whole part, fractional part and exponent after the E, which could then be validated separately, like that the E part is between -308 and 308 and so forth. Someone could still do this in a loop over a string with millions of floats. I'd say though that you're way past the point of needing a real lexical analyzer, and not multiple passes through scanf.