Atomic Log Stream for C++

Introduction

Logging is one of the most frequently needed tools. There are plenty of solutions in the form of ready to use libraries, but these do not prevent programmers from inventing their own utilities. For this reason the subject of "log class" pops up on forums and newsgroups every so often, feeding discussions on how to approach the API design.

Common expectations include the following two points:

The API should feel like streams to relieve users from string formatting issues.
The log itself should be thread-safe to allow deployment in the multi-threading environment.

The above two expectations introduce the natural challenge: how to provide the interface that allows to work with data units that are smaller than their logical grouping?

The code example presents this problem:

log << a << b << c << " and " << x() << y() << z();

// or longer:

log << a;
log << b;
log << c;
log << " and ";
log << x();
log << y();
log << z();

If the data types involved have natural implementations of the stream insertion operators, then the above two versions are equivalent according to the conventions provided by IOStreams. In both cases, there are 7 formatting operations and some number of physical output operations. In the single-threaded code, the above examples can be used to output a single logical record of information.

The problem introduced by multi-threading is twofold:

The individual formatting operations from separate threads do not have to follow one another and can be intermixed, or even lead to stream corruption.
The physical output operations can be mixed leading to data corruption by not respecting logical record boundaries. Depending on the implementation, output channel can be physically corrupted as well.

How to preserve the record boundaries with stream-like interface and multi-threaded environment?

The are various ideas to solve this problem.

One possible approach is to ensure exclusive access to the log object for the time when a single logical record is being formatted and output. This solution, even if correct, is error-prone, because it relies on the user to actually acquire and release the lock. Another problem is that of scalability - as long as a single thread locks the whole log object, no other thread can proceed with their own logging, even if the formatting part could be possibly parallelized.

In order to parallelize the formatting of records for logging in separate threads, the whole logging operation needs to be divided in two distinct phases:

Record formatting, possibly in a private buffer.
Physical write to the output channel when the record is completed.

This approach provides much more opportunity for concurrent formatting in separate threads, but still leaves open the following question: how to discover that the record is complete, or in other words - how to discover record boundary?

One popular approach is to introduce a special end-of-record mark. This can rely either on some arbitrarily chosen character value (end of line is often used, although it is an obvious obstacle with multi-line records) or on some specially crafted stream manipulator, similar in concept to standard flush operation. This solution is also correct, although has a price of being a bit annoying and error-prone as it relies on the user to actually remember to put this manipulator at the end of the logging statement.

An alternative solution is to rely on the C++ expression to define record boundaries. In other words, the logical record can be defined as the longest sequence of chained calls to stream insertion operator, so that the following expression:

log << a << b << c << " and " << x() << y() << z();

defines a logging unit that is atomic in the resulting log output.

The following code presents an implementation skeleton that achieves this effect:

class record_formatter
{
public:

    template <typename T>
    record_formatter & operator<<(T const & value)
    {
        buffer_ << value;
        return *this;
    }

    ~record_formatter()
    {
        // lock the output stream

        cout << buffer_.str();

        // unlock the output stream
    }

private:
    ostringstream buffer_;
};

#define log record_formatter()

What happens above is that whenever macro log is used for logging, the temporary object of type record_formatter is created. This temporary object accumulates all data that is passed with the generic insertion operator - as long as there is some following operator in the chain, the same temporary object is used. This means that the internally managed buffer_ accumulates everything from left to right in the logging expression, until the end of that expression, when it is destroyed. The end of expression provides an implicit end-of-record marker and the destructor can safely perform the physical output of all accumulated data.

The above solution has the following important advantages:

The formatting in separate threads use separate and private buffers to improve concurrency of this phase.
Formatting does not depend on any shared data, so no synchronization overhead is implied.
The source code layout defines a natural record boundary, improving readability.
No special character value and no special manipulator is needed.
The physical output channel is locked for a short period of time, only to write the already prepared data.
Both buffer management and locking strategy are well encapsulated and can be easily replaced without touching the user code.

The code example presented above is just a skeleton and there are various ways to modify it. One frequent question is that about performance. As already stated, the buffer management is encapsulated and can be easily replaced. Users should consider this as an opportunity to improve performance if this is actually needed - one possible replacement for the standard ostringstream class can be found in the FASTreams library. Another variation is to introduce severity codes or other meta information to the formatted records by macro parameters, or even dedicated formatter factories.

Independent of the numerous variants, the above solution can be a flexible and powerful basis for more elaborated logging utilities.