Qt Academy has now launched! See how we aim to teach the next generation of developers. Get started
最新版Qt 6.3已正式发布。 了解更多。
最新バージョンQt6.5がご利用いただけます。 詳細はこちら

QCoreApplication mini benchmark

For robustness and security reasons, it often makes sense to split functionality into various smaller binaries (daemons) rather than having a few big and monolithic applications.

Qt 4 introduced modularized Qt libraries in order to enable Qt based daemons that don't require any GUI. Thanks to the strong embedded focus and several sane architecture decisions, Qt 5 brings this to a new level.

Let's look at some simple main function:

QCoreApplication app(argc, argv);
QTimer::singleShot(3000, &app, SLOT(quit()));
return app.exec();

This non-gui Qt app idles for about 3 seconds, then quits.

On my vanilla i386 Kubuntu 12.04 with Qt 4.8.1, valgrind's massif tool reports a peak heap memory usage of around 102 kb, while callgrind reports an instruction cost of about 1.9 million (*).

Let's have a look at the numbers from today's build of Qt 5: Massif reports a peak heap of 4.9 kb and callgrind reports an instruction cost of about 114k.

This means that Qt 5 uses about 20 times less memory and about 16 times less instructions to construct a QCoreApplication and spin an event loop.

There are several reasons for that. Most notably, Qt 5 assumes that all strings are unicode, so the initialization of text conversion codecs only happens when the first non-unicode string comes along. Even though Qt 5 has vastly improved plugin loading performance, not loading them is even faster :)

Various other improvements also add up, e.g. the C++11 support in Qt 5 means that we require no allocation to create unicode QString objects and moving objects around also got cheaper.

In summary, have fun writing Qt 5 based daemons, and if you have any idea how to make the code even more performant, we'll see you at Qt's codereview :)

(*) Disclaimer: The instruction cost does not show how fast the code is, but how many instructions were processed by the CPU. Note that in all cases I only measured the performance of main(), ignoring the overhead of the OS's symbol and library resolving, as that can be optimized with prelinking or forking from a master process.


Blog Topics:

Comments