YAMI4 vs. ZeroMQ

You are looking for a messaging solution for your distributed system and you have found that there are a couple of options to choose from. You know that they are different and you are looking for ways to compare them with each other to make final decision.

The purpose of this article is to explain the most important differences between YAMI4 and ZeroMQ.

Warning: YAMI4 is Inspirel's flagship product and this article can turn you into a YAMI4 advocate. Or you might even become a YAMI4 user. You have been warned.

Note: this article is not an ultimate authority on ZeroMQ and since both products constantly evolve, some statements might become out of date. Even though this article does not pretend to be unbiased, it was written with the intention of being accurate, so if you notice any inconsistencies, please let us know.

Note also that shortly after this article was published, it initiated a lively debate on Reddit (see http://redd.it/1d0pwj), which you might want to check as well, after reading this article first.

Origins and Motivation

YAMI4 and ZeroMQ projects have different backgrounds.

ZeroMQ was born as a result of frustration with previously used AMQP protocol. The iMatix company decided to end their involvement in the AMQP workgroup and move on with the development of ZeroMQ with the hope of doing things better this time (see here). What remained was the existing use-cases, which were strongly motivated by the technical challenges found in financial systems. That is, the focus of ZeroMQ is on small messages sent over relatively small number of stable communication channels - prices, tickets, trade requests and other financial stuff that has to be quickly delivered from one server to another.

YAMI4, on the other hand, was created with experienced gained from the large-scale distributed control system that was developed for the accelerator control complex at CERN. The challenges in such systems include large number of devices communicating at the same time, involvement of embedded systems, presence of failing devices and independent handling of multiple messages. The data content in such systems includes wide range of message sizes, from single temperature readings to camera images and anything in between.

Obviously, both YAMI4 and ZeroMQ are messaging solutions, but since they were motivated by different challenges, their set of features is also different. While ZeroMQ seems to be largely concerned with business-type systems, YAMI4 offers features that are more relevant in real-time control and monitoring messaging systems. The contrasting features are compared in the sections that follow.

Fixed Communication Patterns vs. Flexibility

Thread Safety

Sequential Throughput vs. Prioritized Traffic

Waiting and Controlling Time

Data Model

Special Guarantees for Critical Systems

Licensing and Commercial Use

Final Word on Performance

Fixed Communication Patterns vs. Flexibility

ZeroMQ offers several communication patterns (like request-response, publish-subscribe, etc.), but the user is expected to know up-front which pattern will be used with the given connection - and commit to this pattern. For example, if the user opens a request-response connection, it cannot be used for publish-subscribe communication. And the other way round. In other words, the communication pattern is a property of the connection and is fixed when the connection is created.

This seems to be fair if we associate physical connections with logical services (after all, you know that "google.com:80" is a request-response service, right?), but in general forces the user to operate at the wrong level of abstraction. Consider for example a thermometer-like device that can be accessed over the network (there can be thousands of these in a plant-monitoring system). Obviously, we can connect to this device and the physical connection is something that allows us to execute various scenarios. The possible scenarios could be:

ask for the current temperature readout
remote-configure the device by setting its parameters
establish automatic stream of updates

Each of these scenarios are interactions with the same logical device, but they require different communication patterns - the first two can be request-response communications, but the last one looks like publish-subscribe. One could also imagine a one-way communication, which is useful for heartbeats or watchdogs that can operate with messages that are never replied to. Should we consider these as different services? Should we need multiple physical connections for all of them, even if it is still a single thermometer?

Things become even more interesting if we consider that in a typical system multiple thermometer devices can be controlled by a single master and it is that master device that is network-enabled. What it means is that not only there are many communication patterns, but also many logical targets behind the same network endpoint. There is absolutely no reason to create multiple physical connections to handle all these different interactions with separate logical thermometers, if they are managed by a single master controller.

YAMI4 offers communication flexibility by not forcing the user to commit to any particular scenario when the physical connection is created - every communication pattern is possible at any time after the connection is made. To be exact, even multiple concurrent communication patterns are possible over a single connection at any given moment (what about requesting one thermometer for the readout while another one is already in the publishing mode?). This is possible, because YAMI4 carefully distinguishes between physical connections and logical destinations, as they are concepts from different layers of abstraction. Forcing users to treat these different concepts as being equivalent could lead to bad designs or waste of resources. Or both.

Another interesting scenario which shows that premature commitment to the given communication pattern is a wrong idea is with the following request-response interaction:

send_request_A
receive_response_A

send_request_B
receive_response_B

send_request_C
receive_response_C

send_request_D
receive_response_D

The above sequence is what happens in ZeroMQ REQ connections. Request-response-request-response-... It looks OK as long as there is a causal relationship between all those requests, which might be true in some class of distributed systems. But in general this is not true and individual requests could be unrelated - or at least their ordering need not be strict. Imagine a web browser that has to request plenty of small images to fill the content of the given web page - all of these little images have to arrive to complete the rendering, but the browser need not care about the ordering of their arrival.

It might be also the case that these requests need some time for processing and can be processed by separate server threads or even separate servers if the one we are connecting to acts like a load-balancing proxy - there can be plenty of technical reasons not to process them one after another. Instead, what we really want to do in such cases is this:

send_request_A
send_request_B
send_request_C
send_request_D
...
receive_response_A
receive_response_B
receive_response_C
receive_response_D

Note that from the client's perspective this interaction achieves the same logical effect, but now gives the server the opportunity to benefit from the concurrency that it might have somewhere there, waiting to be used. This kind of interaction is not only very natural, it is actually more general than a repeated request-response one (single request-response is just a special case of this).

Unfortunately, this kind of interaction is not possible after the user of ZeroMQ commits to REQ (request-response) type of connection. In fact, ZeroMQ does not seem to directly support this natural interaction at all. Even though ZeroMQ offers other connection modes that can be used as building blocks for the above scenario (the ROUTER type is usually proposed as a solution here), there is still a lot to be done on the user side in order to properly match replies with requests, especially if out-of-order arrival is allowed, which is a subject that has to be somehow addressed in concurrent environments.

This is also where YAMI4 offers higher flexibility by focusing on messages as application-level entities instead of forcing the user to commit to any particular interaction type at the level of physical connection. It is perfectly possible to post multiple messages and then wait for their responses, achieving higher scalability without any complexity at the application level.

In other words, YAMI4 messages can be processed independently and this gives the flexibility to build arbitrary application-level interaction patterns.

Thread Safety

The approach to thread safety is another subject where ZeroMQ and YAMI4 differ.

ZeroMQ documentation states clearly that individual sockets should not be shared between threads, unless application ensures proper synchronization when passing sockets from one thread to another. At the same time programmers are encouraged to create as many sockets as they like - in particular, as many as there are application threads that need to produce and send messages. This approach exposes the low-level nature of ZeroMQ.

YAMI4 hides the communication resources from the user, so that there is no direct interaction between the application code and sockets (or any other socket-like objects). Instead, the application code interacts with the agent, which encapsulates the management of all system resources, including sockets. That is, low-level resources are not exposed at the level of application code and since the agent interface is entirely thread-safe, the application code does not have to deal with any thread-safety issues.

In particular and in contrast to ZeroMQ, users of YAMI4 are not encouraged to create multiple sockets to the same destination just to reflect the multithreading nature of their applications.

Sequential Throughput vs. Prioritized Traffic

ZeroMQ was designed with a single and very well defined goal: to get the highest throughput that is possible. Indeed, the results are impressive, but you might want to check some performance comparisons to see the actual results.

ZeroMQ achieves its high throughput with the help of message batching, which allows to send several consecutive messages as a single unit, thus reducing the overhead that is associated with packet headers and other low-level details. This technique is indeed very useful and is practiced in various forms for quite a long time already - check the Nagle's algorithm for a similar idea.

The idea to batch messages works fine, but with with a very important assumption: that messages actually form a sequence. This is true in many business-type systems, where messages are related by natural order of time (for example, sales orders are opened before they are closed, etc.). Such sequential ordering, however, means that there is no place for the concept of importance or urgency. Consider this pseudocode:

send_small_message_A
send_BIG_message_B
send_another_BIG_message_C
send_another_small_message_D
...
send_short_but_EXTREMELY_IMPORTANT_message_X

The concept of sequence and message batching algorithms that rely on this idea mean that messages cannot take over other messages. In other words, in the above example messages A, B, C and so on have to be sent over the wire before extremely important message X. If some of these messages were big (like B and C), then extremely important message X will have to wait in the queue as long as necessary. Granted, the programmer could use separate physical connections for messages of different urgency, but as was already stated above this is operating at the wrong level of abstraction - and interestingly, there is nothing in the TCP/IP stack that would actually guarantee that the more important message will get any special treatment, as TCP connections do not have any such parameter. This means that even though ZeroMQ offers relatively good throughput, it might deliver your most important message too late.

Being fast but too late is a wrong answer in real-time systems and this is the reason why YAMI4 introduces the concept of message priority directly in its API.

YAMI4 messages are posted to the outgoing queue with some priority and it is the priority that decides where the message will be injected into the queue - not necessarily at the end. In the above example, if extremely important message X is given higher priority than previous messages, then it is guaranteed that it will be inserted at the front of the queue, so that it will be delivered before the others. In addition, to ensure that no big message monopolizes the channel, messages are chopped into frames (the size of the frame is configurable) and those frames are then subject to ordering in the priority queue. In other words, even if the transmission of big message B above was already started, the short but extremely important message X will be injected in the first slot in between B's frames and will be delivered before B finishes transmission - later on the transmission of B will be resumed from the place where it was paused and the transmission of remaining frames will go on as usual. There is no practical limit on how many different priorities can be used with a single physical connection. See the YAMI4 book for a more detailed description of this process.

The introduction of message priorities was a conscious design decision, which brings its own tradeoffs in terms of additional overhead and the interference with techniques like message batching. The result is that YAMI4 can be slower than ZeroMQ if raw throughput is the goal, but with the advantage that it can translate the application-level concept of urgency into practical terms by delivering important messages faster than those which can safely wait for their turn. This is what makes YAMI4 more adequate for real-time systems where being fast but too late is a wrong answer.

(You can compare how ZeroMQ and YAMI4 process messages in order to meet their goals: ZeroMQ glues messages together to form bigger batches to be fast, while YAMI4 chops messages into smaller chunks to be on time - these two libraries obviously have different... priorities.)

Note that the concept of priority does not prevent YAMI4 from keeping the order of messages within the same priority level - messages of the same priority are guaranteed to preserve ordering within the same communication channel.

Waiting and Controlling Time

It is natural for clients to wait for responses, but in many cases (if not all!) this waiting should be implemented in a way that makes it possible to control. In other words, it should not be possible for the client to block indefinitely just because there is no response arriving from the server. This is usually done with timeouts and both ZeroMQ and YAMI4 offer some timeout-related functionality.

ZeroMQ is consistent with its focus on physical connections and this is also a place where this low-level concept leaks to the level of application. In other words, the client application sends the message, but has to wait for the physical connection to be ready and then proceed accordingly. This is similar to how the system function select is used with sockets.

This level of operation is not only uncomfortable - it is entirely counterproductive in more complex interactions and in YAMI4 it would be actually impossible, as it would be inconsistent with the idea of processing multiple messages independently. In other words, it is the message as an application-level entity that is in focus here, so it is also the message that is a subject of timeout. The following C++ code shows the idea:

std::auto_ptr msg(
    client_agent.send(server_address, "calculator", "calculate", params));

msg->wait_for_completion(timeout);

// ...

Above, the program sends the message and then waits for the message to complete, which means that the reply arrived or it was rejected or something else has happened that has put the message in one of its terminal states. There is no need to mess with the physical connection here. What is most important is that there can be many messages being processed concurrently (see above examples for the discussion on interaction flexibility) and every message can have independent timeout. Timeouts can be both relative and absolute, which is a frequent policy in real-time systems.

Another interesting feature of YAMI4 that is related to time is the ability to track the progress of message transmission. When the message is posted an optional callback can be installed that will be automatically called whenever each chunk of the message is pushed out over the wire, so that the user application is constantly updated with regard to the progress of this process. A typical usage of this feature is to implement a nice progress bar in the GUI to show the progress in a visual way, but other uses are possible as well. Note that this progress tracking is useful only with very big messages, which can actually take some observable time to transmit.

ZeroMQ does not offer any functionality like this and the only way to get it at the application level is to do it by hand and chop bigger messages into smaller parts and send those parts separately so that they can be individually reported. Not only is this strategy going against the message batching that is done internally, but it might not work at all due to the fact that ZeroMQ allows the application to post messages to the queue, but gives no information on when they are actually sent. Thus, posting multiple parts to the queue can seem to be very "fast", but there is no information on when the message is actually transmitted. That is, the GUI progress bar mentioned above will have no way to show real data.

Data Model

ZeroMQ and YAMI4 differ significantly in their approach to data model.

In short, ZeroMQ offers no data model at all and the user is expected to handle message content serialization on his own and implement it on top of the API that is as comfortable as memcpy. It does not seem to be a big deal in Hello World examples, where a single string "Helo World" is the only thing that is being transmitted, but things can get tricky pretty quickly with more complex data or when communicating machines have different byte ordering, and so on.

The typical solution here is to use third-party serializers and for example Google protocol buffers are frequently seen in this context. Having a separate library handling the data serialization can look like a valuable freedom, but in practice can lead to integration issues if separate projects choose to use different serialization schemes or if a given serializer does not support all programming languages that are used in the distributed system. What is most important, however, is the question: why should anybody use two distinct libraries to solve what is a single design problem?

YAMI4 offers a one-stop solution here by providing a consistent data model for all supported languages. This means that all applications that take part in communication can speak a reasonable common language, which is well integrated with the target programming language. For example, the data model maps directly to the notion of dictionary in Python - this is extremely programmer-friendly.

YAMI4 goes one step further with its support for consistent data model with the YAMI4 Definition Language that allows to automate mappings to user-defined structs or record types for statically compiled languages. This support does not guarantee that the complete distributed system will be statically type-safe, but makes the integration at the language level very convenient.

Note that in addition to the consistent data model YAMI4 also supports so-called raw binary messages, which allow users to plug their own serializers. This feature exists exactly to help those users who would like to use additional third-party serializers like Google's protocol buffers. In other words, it is perfectly possible to use YAMI4 without any data model at all, which is comparable to what you can get with ZeroMQ. Guess what? As far as we know, nobody is using this feature. Maybe the idea of separating transport from data model is actually wrong?

Special Guarantees for Critical Systems

Every single company in the IT business claims to build "mission critical systems", which usually means that those systems are very important to their users. Of course, all systems are important in the eyes of their users, and this is how the concept of "critical system" gets blurred somehow.

It is not the intent of this article to define what is critical and what is not, but to get some focus let's say that programming technologies (languages, libraries, coding standards, etc.) can support the development of critical systems by offering features or guarantees that make it easier to reason about given properties of resulting products. For example, a programming language might support the development of critical systems by offering some guarantees related to the use of system resources like threads or dynamic memory, so that subsequent questions on what can go wrong can be answered with higher confidence.

YAMI4 supports the development of critical systems by offering guarantees related to the use of multiple threads and dynamic memory in its Ada, C and C++ core packages.

It is perfectly possible to build a client or server that operates in a single thread - this might be the only thread in the program or it might be integrated with some other program activity. In any case, a single-threaded system is very easy to analyse with respect to deadlocks or hazards. Well, in such system there are no deadlocks and no hazards. End of proof.

Of course, even in a single-thread mode YAMI4 is still thread-safe and in general-purpose programming the library offers its own internal pool of threads to help the user with concurrent processing of requests, but if you need to be strict for the purpose of formal reasoning, there is such an option.

More interesting, however, is the set of guarantees that the core YAMI4 packages offer with respect to the use of dynamic memory: YAMI4 has its own memory allocator that can be configured to operate within a given block and it will never ever step outside of it. In other words, YAMI4 can work in a dedicated memory partition. The memory block that YAMI4 will use can be preallocated by the user in any way, either statically or dynamically - in any case, YAMI4 will use only that block for all its dynamic allocation and deallocation needs.

The possibility to work in a dedicated memory partition has very important consequences with regard to memory fragmentation, for example. If there is no interference with the other program activities, then the chances of fragmenting the memory are much lower - in fact, the internal YAMI4 allocator is designed to prevent fragmentation if the messaging activity is cyclic, which is typical in real-time control systems that have to predictably react to external stimuli. In other words, in such a system it is much easier to assert that if the system started at all, then it will never fail for internal reasons. This can be very valuable in those systems that deal with physical processes. We don't want our critical control systems to fail because of dynamic memory allocation hiccups, right?

Even without a dedicated memory partition YAMI4 can offer mature approach to memory management by admitting that dynamic memory is a resource that is limited. Yes, of course you have your machine packed up to the ceiling, but no, it is still limited. This means that memory allocation can fail. You might want to send your next message, but you will fail if there is not enough memory. YAMI4 will not break into pieces because of this and it will report the problem either by means of error code or by means of exceptions, depending on the language in use, and the state of the system will be as before the failure. This means that even in such cases it is possible to continue operation and try some alternative strategies. Sending smaller message? Deallocating caches or other buffers to make more room for what is more essential for the system? There are ways to work it out.

ZeroMQ does not seem to offer any comparable guarantees. The dynamic memory management policy in ZeroMQ (at least at the time of writing this article, which means version 3.2) seems to be completely neglected, as the reaction to memory allocation failure in ZeroMQ is very simple: abort. In other words, ZeroMQ does not offer you any chance to work the problem out and no matter what else your program is doing, just abort. Sorry for being harsh, but it has to be said: this is just lousy.

(See also another article, Cheating With Asserts, which explains in detail why treating memory allocation failures this way is a sign of engineering incompetence.)

Licensing and Commercial Use

ZeroMQ and YAMI4 have a bit different licensing schemes.

ZeroMQ has a single license, which is LGPL. This is sufficient for typical desktop or server use, because LGPL does not influence the application that uses the library, while still giving the user opportunity to replace or modify the library on his own (this is a very important user right that many developers frequently forget about) - normally it is enough for the user to replace the shared library file to benefit from his LGPL freedoms. Things become more tricky outside of the desktop or server domain and this is where embedded systems have to be mentioned again.

If a company that builds industrial systems or other distributed systems where embedded devices take part in communication uses a library with LGPL license, it is obligatory for that company to inform their users that LGPL library is used in the product and provide the users with appropriate means to improve or replace that library, even if the application is closed. The problem is that many embedded devices do not have any technical means to do so anyway - see how Wikipedia defines Tivoization to describe this problem. Of course, it is still possible to foresee this possibility at the design stage and introduce some means for the user to replace the LGPL components in any given device, but if this is the only reason to introduce such means, then obviously it will be associated with higher production costs.

Both ZeroMQ and YAMI4 recognized this challenge, but attempted to solved it in different ways.

ZeroMQ offers a "static linking exception" that allows final programs to be statically linked with the library without imposing any restrictions on the licensing scheme of the final product. Such an exception does not seem to be explicitly foreseen by the Free Software Foundation (please correct us on this, we just cannot find any direct confirmation on FSF website), but there is also no reason to doubt the intentions of the library authors.

YAMI4 addresses these challenges by offering two different licenses. In particular, professional users are offered the YAMI4 package with the Boost Software License, which is appropriate for both open- and closed-source developments, including products that are inaccessible to final users by design. In addition, YAMI4 has no external dependencies other than system and standard run-time libraries and therefore we are in a position to constructively address every imaginable licensing issue. If you don't like the existing licenses, let us know what are your expectations in this area.

Final Word on Performance

Congratulations if you have read the full article, but if you were looking for a single chart where one bar will be higher than another and then you will know which library to choose, then no, it is not this kind of comparisons.

The reason why we do not publish performance test results is the following saying: "I do not believe in any statistics that I did not falsify myself". Benchmarks can be specifically prepared to show one product higher than another and even though it can have some desired publicity impact, it will have little meaning in the context of actual and practical use.

This is why you are encouraged to prepare such benchmarks yourself. YAMI4 comes with a set of simple examples which can be easily adapted for this purpose and there is a separate example somewhere there for measuring throughput directly, so you have all the necessary tools. The important thing here is that you should clearly define your needs before building any benchmarks and before actual evaluation.

YAMI4 was designed with distributed control and monitoring systems in mind, but its use is not limited to these categories. On the contrary - YAMI4 does not seem to have any restrictions or constraints that would prevent it from being successfully used in just about any distributed application domain. From controls to large-scale social gaming and from industrial monitoring to next generation trading systems (yes, YAMI4 is used in financial systems), YAMI4 has something to offer and performance does not seem to be a bottleneck in all these places. Typically YAMI4 is about two orders of magnitude faster than the rest of the system in terms of number of processed requests per second, so the strategies to get better performance should focus not on finding the transport that is faster than light, but rather on interaction patterns between all distributed components, as this is where the system as a whole typically wastes most of its potential performance. Getting these patterns right is where YAMI4 has a lot to offer due to the flexibility and indepenent message handling that was already described above.

Of course, if you have any questions related to performance tuning or if you need some design tips, just let us know.