Version 1.5.0.
This library contains a modern implementation of several types of string objects and various algorithms for working with strings.
The goal of the library is to make working with strings in C++ as simple and easy as in many other languages, especially scripting languages, while maintaining optimal performance at the level of C and C++, and even improving them.
It's no secret that working with strings in C++ often causes pain. The std::string class is often inconvenient or inefficient. Many functions that are usually needed when working with strings are simply not there, and everyone has to write them themselves. Even concatenating std::string and std::string_view became possible only with C++26. That's why I started creating this library for myself around 2012, and now I'm ready to share it with all C++ developers.
This library was not made as a universal combine that "can do everything", I implemented what I had to use in my work, trying to do it in the most efficient way, and I modestly hope that I have succeeded in something and will be useful to other people, either directly or as a source of ideas.
The library contains two parts:
"include/simstr/strexpr.h" and write in your code std::basic_string, std::basic_string_view), as well as simplified versions of the simple_str and simple_str_nt classes, which implement all those string algorithms of the library that do not require storing or modifying strings. Since this is a header-only part, it does not include working with UTF encodings and simplified Unicode."include/simstr/sstring.h"), adds its own string types with the ability to store and modify strings, works with UTF encodings and simplified Unicode.The library does not pretend to be "changed the header and everything worked better" - it gets along well with standard strings and does not change the behavior of existing code working with them. I tried to make many methods in it compatible with std::string and std::string_view, but I didn't bother with this much. Rewriting your code to work with simstr will require some effort, but I assure you that it will pay off. And thanks to compatibility with standard strings, this work can be done in stages, in small pieces. Creating new code for working with strings with its use is easy and enjoyable :)
The main difference between simstr and std::string is that not a single universal class is used for working with strings, but several types of objects, each of which is good for its own purposes, and at the same time interact well with each other. If you actively used std::string_view and understood its advantages and disadvantages compared to std::string, then the simstr approach will also be clear to you.
When using only #include "simstr/strexpr.h":
char, char8_t, char16_t, char32_t, wchar_t.simstr string objects and standard strings (std::basic_string, std::basic_string_view), which allows you to use fast concatenation even where it is not yet possible to abandon standard strings. Also allows you to mix strings of compatible character types in operations.Splitter iterator.str::append(text, "count = "_ss + count + " times").0x, 0, 0b, 0o, admissibility of the + sign. Parsing is implemented for all types of strings and characters.char and wchar_t, as well as character types compatible with them in size.When using the full version of the library:
sstring (shared string), lstring (local string).lstring - supports many mutable operations with strings - various replacements, insertions, deletions, etc. Allows you to set the size for the internal character buffer, which can turn Small String Optimization into Big String Optimization :).char <-> char8_t, wchar_t <-> char32_t in Linux, wchar_t <-> char16_t in Windows.format and sprintf formatting functions (with automatic buffer increase). Formatting is possible for char, wchar_t strings and strings compatible with them in size. That is, under Windows it is char8_t, char16_t, under Linux - char8_t, char32_t (writing my own formatting library for all types of characters was not part of my plans).upper, lower and case-insensitive string comparison. Works only for characters of the first Unicode plane (up to 0xFFFF), and when changing the case, cases are not taken into account when one code point can be converted into several, that is, the case conversion of characters corresponds to std::towupper, std::towlower for the unicode locale, only faster and can work with any type of characters.hash map for string type keys, based on std::unordered_map, with the possibility of more efficient storage and comparison of keys compared to std::string keys. The possibility of case-insensitive comparison of keys is supported (Ascii or minimal Unicode (see previous paragraph)).These are special objects that efficiently implement string concatenation using operator+. The main principle, due to which efficient work is achieved - no matter how many operands are included in the entire expression, no temporary (intermediate) strings are created, the total length of the entire result is calculated only once, memory is allocated for the character buffer of the result only once, after which the characters are copied immediately to the buffer of the result to its place. No memory reallocations, no moving characters in various intermediate buffers - everything is as efficient as possible. Thanks to the capabilities of C++ templates and operator overloading, the expression is written as close as possible to the usual string addition syntax. In addition, there are special overloads for adding string objects and string literals, strings and numbers, for copying with replacement, for merging containers of strings and much more. Thanks to the extensibility of this system, it is possible to create new options for building strings, development is constantly ongoing.
All string objects from simstr are themselves string expressions, that is, they can be used in concatenation operations of string expressions directly. Standard strings (std::basic_string, std::basic_string_view) can also serve as operands in addition operations with string expressions. Or they can be easily converted into a string expression by placing a unary + in front of them.
+s1 - converts std::string into an object - a string expression, for which there is an efficient concatenation with numbers and string literals.
According to benchmarks, acceleration is 1.6 - 2 times.
Acceleration in 9 - 14 times!!!
std::string_view ssa - alias for simple_str<char> - analogue of std::string_view, allows you to accept any string object as a function parameter with minimal costs, which does not need to be modified or passed to the C-API: std::string, std::string_view, "string literal", simple_str_nt, sstring, lstring. Also, since it is also a "string expression", it allows you to easily build concatenations with its participation.
According to measurements, acceleration is 1.5 - 9 times.
When prec != 0, acceleration is 1.5 - 2.2 times.
Acceleration from 1.5 times and higher - depending on the content of the strings.
In addition to the individual examples given here, you can look at the sources:
Available with any use:
simple_str<K> - the simplest string (or piece of string), immutable, not owning, analogue of std::string_view.simple_str_nt<K> - the same, only declares that it ends with 0. For working with third-party C-API.Available when using the entire library:
sstring<K> - shared string, immutable, owning, with shared character buffer, SSO support.lstring<K, N> - local string, mutable, owning, with a specified size of the SSO buffer.When connecting only strexpr.h - the types simple_str<K> and simple_str_nt<K> do not contain methods for working with UTF and Unicode.
The library can be used partially, just by taking the file "include/simstr/strexpr.h" and including it in your sources
This will only connect string expressions and simplified implementations of simple_str and simple_str_nt, without UTF and Unicode functions.
The full version of the simstr library consists of three header files and two source files. You can connect as a CMake project via add_subdirectory (the simstr library), you can simply include the files in your project. Building also requires simdutf (when using CMake it is downloaded automatically).
The library is also included in vcpkg, connected as orefkov-simstr.
simstr requires a compiler of at least the C++20 standard - concepts and std::format are used. The work was tested under Windows on MSVC-19 and Clang-19, under Linux - on GCC-13 and Clang-21. The work in WASM was also tested, built in Emscripten 4.0.6, Clang-21.
Together with the library, two files are supplied that make viewing simstr string objects in debuggers more convenient.\ More details are described here.
Benchmarks are performed using the Google benchmark framework. I tried to take measurements for the most typical operations that occur in normal work. I took measurements on my equipment, under Windows and Linux (in WSL), using MSVC, Clang, GCC compilers. Third-party results are welcome. I also took measurements in WASM, built in Emscripten. I draw your attention to the fact that a 32-bit build is assembled under WASM in Emscripten, which means that the sizes of SSO buffers in objects are smaller.
Also, simstr is used in my projects: