Qt + Boost.Regex = QtBoostRegex
January 30, 2009 by sping | Comments
More or less everybody is using simple Regular Expressions from time to time, may it be through grep, url_rewrite or QRegExp. Most people seem to stick to a basic set of regex features: alternation, character classes, greedy quantifiers, capturing, grouping and zero-width assertions (e.g. anchors). However, some people, myself included, expect more from a regex engine.
A few weeks before I started working at Qt Software Berlin (Trolltech at that time) I wrote a mail to qt-interest asking for extension of QRegExp by
- Lookbehind
- ^ and $ match for each newline, not only string start/end
- Dot-matches-newline switch
- Callback input (matching text from sources unlike "plain arrays")
I knew there were existing libraries that could do that, though Boost.Regex actually does a lot more. As a side project, I started working on a Qt wrapper around Boost.Regex so I could directly feed it with QStrings. Boost.Regex is templated and works with wchar_t, char and more but cannot work with QChar directly. It works with ushort though and QChar more or less is a ushort.
It's incomplete, but I did not want it to gather more dust. Maybe this is what a few people have been hoping for just like I did back then. So I'm releasing it on Qt Labs now. Download the sources from here:
Please note you need bcp (Boost Copy) which you install through
$ sudo apt-get install bcp
on Debian-based Systems.
Stuff to do include:
- try to move Boost.Regex "further inside"
- upgrade to a more recent Boost.Regex (e.g. to fix a GCC 4.3.x compile error)
What's interesting to note is that QtBoostRegex outperforms QRegExp for some cases, for some QRegExp is the clear winner. For details please see ./examples/test holding a small test suite.
Besides the extra features on the regex language level this wrapper also allows to search non-array input. What does that mean? Usually a regex engine iterates over an array of characters or the structure wrapping these characters. Boost.Regex offers an extra layer of abstraction that we use to wrap an arbitrary non-array structure (e.g. a list of lines holding a document inside a text editor) inside a bidirectional iterator (I called feeder) here. The code contains an example of such an iterator wrapping a QStringList: StringListRegexFeeder. Please have a look at the ./examples/simple folder to see how it's used.
To prepare the Boost.Regex sources please check out the script ./extract_boost_regex.sh. Without running that "qmake && make" will not succeed.
To summarize my post:
- QtBoostRegex is a Qt wrapper around Boost.Regex
- Boost.Regex is more powerful than QRegExp but slower in some cases
- QtBoostRegex is not finished yet, I will keep working on it as time permits
Have a nice day, Sebastian
Blog Topics:
Comments
Subscribe to our newsletter
Subscribe Newsletter
Try Qt 6.5 Now!
Download the latest release here: www.qt.io/download.
Qt 6.5 is the latest Long-Term-Support release with all you need for C++ cross-platform app development.
Explore Qt World
Check our Qt demos and case studies in the virtual Qt World
We're Hiring
Check out all our open positions here and follow us on Instagram to see what it's like to be #QtPeople.
Näytä tämä julkaisu Instagramissa.Henkilön Qt (@theqtcompany) jakama julkaisu