Thursday, May 20, 2010

Are C++ iostreams really slow?

In the C++ world, one often hears the statement “iostreams are slow, you should use printf instead”. Is this true?

It is possible that this story results from abuse of endl. Beginner's C++ books often recommend endl over '\n' for writing a newline. The observable result is often the same, but endl results in extra overhead: it flushes the stream. This results in a system call, slowing things down while often unnecessary.

Using my newly written TimedSection class, I tested whether the story is actually true, or just a myth. I tried three methods of printing:

  • cout with endl for line endings
  • cout with '\n' for line endings
  • printf (also with '\n' for line endings, so no forced flushing)

Each of these was tested in the following scenarios:

  • printing an empty line
  • printing a line containing a string literal
  • printing a line containing a string variable, a string literal, and an integer variable

All strings used were 20 characters long; I figured this would be a typical average length for printing messages. Newline characters were absorbed into string constants wherever possible.

Each of these was run 10 million times, redirecting the output to /dev/null. The testing machine is an Intel i7 920, running Ubuntu 10.4, Linux 2.6.32 and gcc 4.4.3.

Here are the results:

cout with only endl                     1461.310252 ms
cout with only '\n'                      343.080217 ms
printf with only '\n'                     90.295948 ms
cout with string constant and endl      1892.975381 ms
cout with string constant and '\n'       416.123446 ms
printf with string constant and '\n'     472.073070 ms
cout with some stuff and endl           3496.489748 ms
cout with some stuff and '\n'           2638.272046 ms
printf with some stuff and '\n'         2520.318314 ms

Surprise! Yes, for printing newlines, printf greatly outperforms cout. We can also see the huge slowdown of using endl inappropriately, something which is clearly to be avoided.

However, even for printing a moderately-sized string constant, cout outperforms printf. This effect becomes more pronounced as the string gets longer, probably because printf has to do more processing per character.

Finally, for some slightly more complicated formatting, the difference is quite small. For longer string constants, it becomes even smaller.

We can conclude that iostreams are definitely not always slower than C-style printf. It depends on the particular use, and as the formatting becomes more complicated, the difference drops to zero. Part of the myth may be ascribed to inappropriate use of endl; maybe another part is due to unoptimized implementations of the standard library. Either way: myth busted.

Update: For the curious, here is the full source code. You may need to link with -lrt to get clock_gettime.

#include <stdio.h>
#include <iostream>
#include <ctime>

class TimedSection {
    char const *d_name;
    timespec d_start;
    public:
        TimedSection(char const *name) :
            d_name(name)
        {
            clock_gettime(CLOCK_REALTIME, &d_start);
        }
        ~TimedSection() {
            timespec end;
            clock_gettime(CLOCK_REALTIME, &end);
            double duration = 1e3 * (end.tv_sec - d_start.tv_sec) +
                              1e-6 * (end.tv_nsec - d_start.tv_nsec);
            std::cerr << d_name << '\t' << std::fixed << duration << " ms\n"; 
        }
};

int main() {
    const int iters = 10000000;
    char const *text = "01234567890123456789";
    {
        TimedSection s("cout with only endl");
        for (int i = 0; i < iters; ++i)
            std::cout << std::endl;
    }
    {
        TimedSection s("cout with only '\\n'");
        for (int i = 0; i < iters; ++i)
            std::cout << '\n';
    }
    {
        TimedSection s("printf with only '\\n'");
        for (int i = 0; i < iters; ++i)
            printf("\n");
    }
    {
        TimedSection s("cout with string constant and endl");
        for (int i = 0; i < iters; ++i)
            std::cout << "01234567890123456789" << std::endl;
    }
    {
        TimedSection s("cout with string constant and '\\n'");
        for (int i = 0; i < iters; ++i)
            std::cout << "01234567890123456789\n";
    }
    {
        TimedSection s("printf with string constant and '\\n'");
        for (int i = 0; i < iters; ++i)
            printf("01234567890123456789\n");
    }
    {
        TimedSection s("cout with some stuff and endl");
        for (int i = 0; i < iters; ++i)
            std::cout << text << "01234567890123456789" << i << std::endl;
    }
    {
        TimedSection s("cout with some stuff and '\\n'");
        for (int i = 0; i < iters; ++i)
            std::cout << text << "01234567890123456789" << i << '\n';
    }
    {
        TimedSection s("printf with some stuff and '\\n'");
        for (int i = 0; i < iters; ++i)
            printf("%s01234567890123456789%i\n", text, i);
    }
}

Timing a section of code in C++

Sometimes it's useful to measure how long a certain part of your C++ code takes to execute. Here's a little utility class that makes this extremely easy. It can be used like this:

void foo() {
    TimedSection s("foo()");
    // do some long operation...
}

Calling foo() will print something like this to stderr:

foo()   336.856127 ms

As you may have guessed, a TimedSection records the time between its construction and its destruction. Since C++ has no garbage collection, destructors are called at a deterministic moment: when the local object (called s above) goes out of scope. If you want to time a section that is not a scope by itself, you can introduce a pair of braces to scope the object. If the section is even more complex (e.g. spanning multiple functions), you can use new and delete, but that destroys the elegant simplicity of TimedSection's usage.

The implementation is rather simple as well:

#include <iostream>
#include <ctime>

class TimedSection {
    char const *d_name;
    timespec d_start;
    public:
        TimedSection(char const *name) :
            d_name(name)
        {
            clock_gettime(CLOCK_REALTIME, &d_start);
        }
        ~TimedSection() {
            timespec end;
            clock_gettime(CLOCK_REALTIME, &end);
            double duration = 1e3 * (end.tv_sec - d_start.tv_sec) +
                              1e-6 * (end.tv_nsec - d_start.tv_nsec);
            std::cerr << d_name << '\t' << std::fixed << duration << " ms\n"; 
        }
};
I hereby release this little code snippet into the public domain. Enjoy!