Benchmarks

Every time we push new code to github, our Continuous Integration System re-runs all the benchmarks and regenerates these charts.

This section shows the current benchmark results and compares capnpy to various alternative implementations. Evolution over time shows how capnpy performance has evolved.

How to read the charts

For each benchmark we show two charts, one for CPython and one for PyPy. Make sure to notice the different scale on the Y axis: PyPy is often an order of magnitue faster than CPython, so it does not make sense to directly compare them, but inside each chart it is useful to compare the performance of capnpy to the other reference points.

Moreover, all benchmarks are written so that they repeat the same operation for a certain number of iteration inside a loop. The charts show the total time spent into the loop, not the time per iteration. Again, it is most useful to just compare capnpy to the other reference points.

Most benchmarks compare the performance of capnpy objects against alternative implementations. In particular:

instance:objects are instances of plain Python classes. This is an useful reference point because often it represents the best we can potentially do. The goal of capnpy is to be as close as possible to instances.
namedtuple:same as above, but using collections.namedtuple instead of Python classes.
pycapnp:the default Cap’n Proto implementation for Python. It does not work on PyPy.

Get Attribute

This benchmark measures how fast is to read an attribute out of an object, for different types of attribute.

The benchmarks for group, struct and list are expected to take a bit longer than the others, because after getting the attribute, they “do something” with the result, i.e. reading another attribute in case of group and struct, and getting the length of a list.

The PyPy charts shows that uint64 fields are much slower than the others: this is because the benchmarks are run on PyPy 5.4, which misses an optimization in that area. With PyPy 5.6, uint64 is as fast as int64.

Special union attributes

If you have an Union, you can inspect its tag value by calling which(), __which__() or one of the is_*() methods. Ultimately, all of them boil down to reading an int16 field, so the corresponding benchmark is included as a reference.

Note that on CPython, which() is slower than __which__(): this is because the former returns an Enum, while the latter returns a raw integer. On the other hand, PyPy is correctly able to optimize away all the abstraction overhead.

Lists

These benchmark measure the time taken to perform various operations on lists. The difference with the list benchmark of the previous section is that here we do not take into account the time taken to read the list itself out of its containing struct, but only the time taken to perform the operations after we got it.

The iter benchmark iterates over a list of 4 elements.

Hashing

If you use $Py.key (see Equality and hashing), you can hash your objects, and the return value is guaranteed to be the same as the corresponding tuple.

The simplest implementation would be to create the tuple call hash() on it. However, capnpy uses an ad-hoc implementation so that it can compute the hash value without creating the tuple. This is especially useful if you have text fields, as you completely avoid the expensive creation of the string.

Constructors

This benchmark measure the time needed to create new objects. Because of the Cap’n Proto specs, this has to be more expensive than creating e.g. a new instance, as we need to do extra checks and pack all the objects inside a buffer. However, as the following charts show, creating new capnpy objects is almost as fast as creating instances. As shown by the charts, the performances are different depending on the type of the fields of the target struct.

List fields are special: normally, if you pass a list object to an instance or namedtuple, you store only a reference to it. However, if you need to construct a new Cap’n Proto object, you need to copy the whole content of the list into the new buffer. In particular, if it is a list of structs, you need to deeep-copy each item of the list, separately. This explains why test_list looks slower than the rest.

Deep copy

Sometimes we need to perform a deep-copy of a Cap’n Proto object. In particular, this is needed:

  • if you construct a new object having a struct field
  • if you construct a new object having a list of structs field
  • if you dump() an object which is not “compact”

capnpy includes a generic, schema-less implementation which can recursively copy an arbritrary Capn’n Proto pointer into a new buffer. It is written in pure Python but compiled with Cython, and heavily optimized for speed. PyCapnp relies on the official capnproto implementation written in C++.

The copy_pointer benchmarks repeatedly copies a big recursive tree so that the majority of the time is spent inside the deep-copy function and we can ignore the small amout of time spent outside. Thus, we are effetively benchmarking our Cython-based function against the heavily optimized C++ one. The resulting speed is very good. On some machine, it has measured to be even faster than the C++ version.

Loading messages

These benchmark measure the performance of reading a stream of Cap’n Proto messages, either from a file or from a TCP socket.

Note

pycapnp delegates the reading to the underlying C++ library, so you need to pass anything with a fileno() method: so, we pass a socket object directly. On the other hand, capnpy needs a file-like object, so we pass a BufferedSocket.

Buffered streams

As explained in the section Loading from sockets, capnpy provides its own buffered wrapper around socket, which is immensely faster than socket.makefile().

Dumping messages

These benchmark measure the performance of dumping an existing capnpy object into a message to be sent over the wire. At mimimum, to dump a message you need to copy all the bytes which belongs to the object: this is measured by test_copy_buffer, which blindly copy the entire buffer and it is used as a baseline.

The actual implementation of dumps() needs to do more: in particular, it needs to compute the exact range of bytes to copy. Thus, the goal is that dumps() should be as close as possible to copy_buffer.

If the structure was inside a capnpy list, it will be “non compact”: in other words, it is not represented by a contiguous amount of bytes in memory. In that case, dumps() needs to do even more work to produce the message. At the moment of writing, the implementation of .compact() is known to be slow and non-optimized.

Evolution over time