
Why is reading lines from stdin much slower in C++ than Python? - luu
http://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python
======
dded
Back about 15 or more years or so ago, I wrote all my small utility programs
in C. These were typically small programs, but I often had a lot of (text)
input data in the form of netlists. I started reading about C++, and getline()
and some of the containers (that I had to build from scratch in C), so I
decided that C++ was for me. There were a number of disappointments, but a big
one was that C++'s getline() was _more than an order of magnitude_ slower than
fgets() (on my system etc., etc.).

With some experimenting, I discovered that even Perl was much faster than C++
with getline(). (Note that this was input of a file, not stdin as in this
article.)

I've not used getline() since.

~~~
nly
getline() works for (single) character delimited reads and lets you use a
std::string. The latter is reason enough to use it: it means you don't have to
worry about manual dynamic memory allocation and have zero risk of overflows.

In any case, the C++ equivalent to fgets is sgetn on the underlying streambuf
or filebuf -
[http://en.cppreference.com/w/cpp/io/basic_streambuf/sgetn](http://en.cppreference.com/w/cpp/io/basic_streambuf/sgetn)
. I'd use this before I used fgets, if for no other reason than I then don't
have to remember to call fclose()

The fact that you haven't used one of the most trivial functions in the
standard library because of a slight performance penalty 15 years ago is
worrying.

~~~
eliasmacpherson
What are you worried about? An order of magnitude is not a slight performance
penalty. The C++ standard library is vast and we've had c++98, c++03 and c++11
in the 15 or so years. There's few enough parts of it that you'll revisit
regularly. You can't go around expecting people to be experts in the minutiae
of all of it.

~~~
ufo
If both the C++ and the Perl code had the same performance I'd stick with the
Perl version.

------
Jach
Fun fact: the solution of using `cin.sync_with_stdio(false);` introduces a
fairly unimportant memory leak that you'll see when you use Valgrind. The
behavior was reported as a bug
([http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27931](http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27931))
but it's actually part of the C++ standard not to clean up standard streams.
(Edit: Here's the page of the standard in question, see the end of the second
paragraph of 27.3: [http://imgur.com/P7wYcHn](http://imgur.com/P7wYcHn))

~~~
saurik
A "leak" normally implies something that will get worse: like, as you use this
stream, the stream will slowly lose memory or something. The standard is
simply saying that when the program exits the stream objects themselves--and
therefore the buffers they use--will not be destroyed: but of course closing
the program will destroy them... the developer who closed the bug even put
"leak" in quotes as this is more an artifact of valgrind's definition of
"leak" than even an "unimportant" issue in the behavior. (I want to clear this
up, as I can imagine people reading what you are saying and then avoiding this
feature as they are concerned that their extremely-long running program will
eventually exhaust memory.)

~~~
Jach
You're right, of course. My original comment did put quotes around 'leak', but
I figured someone would come along and argue a leak is a leak even if it's not
going to crash your program. Oh well. :)

~~~
saurik
This isn't a "leak": a leak is a leak, and this isn't a leak... this is just
something Valgrind is reporting because it could be a leak due to exhibiting a
pattern indistinguishable from something that actually was leaked from the
perspective of a tool that only is able to work from a dynamically maintained
list of heap allocations and the event "program has now terminated". Saying
this is a leak is somewhat equivalent to claiming that because the executable
code itself wasn't deallocated by the program before it exited the code was
"leaked": it just so happens that as an implementation detail this data
structure is lazily dynamically allocated, but it should be considered a
static part of the program. In a world where things Valgrind reported were
"leaks" as opposed to simply "outstanding heap allocations" then I'd be
happier claiming this was a bug in Valgrind than claiming that this feature of
the language standard should have the word "leak" used to describe it ;P.

------
nly
tl;dr: C++s standard streams (cin, cout, cerr, clog) may be used alongside
code using the underlying libc i/o streams API and therefore, by default,
synchronize with libcs own buffers. In the very least, this means you don't
end up with output resulting from characters interleaved between individual
stream operations via the two sources.

Its worth noting that even Cs i/o APIs are typically synchronized across
threads so you can output lines to stdout without experiencing the same issue
within just vanilla threaded C code.

~~~
ef47d35620c1
I think you can unlink it:

    
    
        std::ios_base::sync_with_stdio(false);

~~~
drivers99
That's what they did to fix it.

------
dkhenry
This is an old discussion, but I actually have this thread bookmarked so I can
show young programmers the dangers of assuming that if you write in C++ it
will be faster then everything else out there.

~~~
Negitivefrags
Seems like an odd conclusion to reach given that the end result was C++ being
faster.

Nobody ever claimed that C++ didn't require more knowledge to use effectively.

~~~
dkhenry
Actually C was the fastest, by a fairly wide margin, however hat doesn't
matter much since you can make the C++ faster and you can make C go just as
slow as the input buffered C. There are still lots of people who think that
speed is solely a function of programing language with very little impact from
program design. Like I said I like to bring this out when showing newer
developers that no matter what language you pick you need to be aware of what
is going on in the background and most importantly you should test you code
before trying to assume what parts are fast and what parts are slow.

~~~
nly
Yes indeed, and many don't realize that C streams are also buffered and can
also be tuned:

[http://www.gnu.org/software/libc/manual/html_node/Buffering-...](http://www.gnu.org/software/libc/manual/html_node/Buffering-
Concepts.html)

[http://www.gnu.org/software/libc/manual/html_node/Controllin...](http://www.gnu.org/software/libc/manual/html_node/Controlling-
Buffering.html)

there are also threading issues:

[http://www.gnu.org/software/libc/manual/html_node/Streams-
an...](http://www.gnu.org/software/libc/manual/html_node/Streams-and-
Threads.html)

glibc even lets you implement custom streams (FILE* handles that will work
with fgets etc. but call your own source and sink to do i/o)

------
memracom
tldr; when performance is important, don't just use defaults, optimize. And
learn how libraries and your OS work deep down at low levels. Even Python's
default IO performance can be improved in many cases, by changing buffer sizes
or even bypassing the file io susbsytem and using memory mapped files. But no
solution is right for all use cases.

Like they tell you in school, premature optimization is the root of all evil.
So don't worry about this until you need it.

------
NAFV_P
Thought I'd throw a point(er) alongside these comments...

Isn't the Python interpreter written in C?

~~~
Shish2k
Straightforward C is slow; straightforward python is medium speed; expertly
written C (the foundations of python) is fast

~~~
dded
> Straightforward C is slow;

I've never found it to be so, and it's not a common complaint.

------
gdy
tl;dr It is not.

------
Keyframe
One of the first questions I asked myself years ago in C was what's the
difference between open, read, write and fopen, fread, fwrite. Buffering.

