String trimming utilities
Posted: Thu Sep 23, 2021 3:30 pm
A lot of string manipulation occurs in c++ coding, and it is helpful to have a few routines handy to aid in manipulation strings.
One of the most common is dealing with whitespace (space, tab, vertical tab, feed, carriage return, newline). Or in c++ speak:
it is common to trim white space off the end of string (trim the right of the string):
this searches for the last non whitespace character, and then increments it by one (to create a length of string to extract).
So for instance, if the string was "hello ", the find_last not of() would result in 4 , the location of the last non whitespace character. One is added to it, to create the length of the string we want to extract (remember, locations are zero indexed).
we can do something similar for trimming the front of a string (left of the string):
we find the the first value that isn't whitespace. that index will be the index we use to substring the string. We need to ensure it found a non whitespace character, thus the check for npos. If we didn't, we return a "null" string.
So to trim a string on both sides, just becomes something as easy as :
Now there is in <cctype> a function called isspace(int c). This uses the current locale to check against white space. we use that for the next case, where we want to "simplify" a string. Simplify trims a string (removes all leading/trailing whitespace), and replaces all whitespace between words of the string with a single space. So if you had "this is \t\t\t a test ", after simplification it would be "this is a test".
It first trims the string. Then if checks every character if whitespace. If not whitespace, it appends it. If it is whitespace, if the first one, it appends a space, otherwise it ignores it.
Now a simpler version, but perhaps slower, is to use regex functions
this uses the regex function in <regex>. Depending on string sizes, I believe the non regex version is faster, but include regex example here, as an example of how to do other substitutions.
One of the most common is dealing with whitespace (space, tab, vertical tab, feed, carriage return, newline). Or in c++ speak:
Code: Select all
const std::string whitespace = " \t\v\f\r\n" ;
Code: Select all
//=====================================================================
std::string rtrim(const std::string &value) {
auto loc = value.find_last_not_of(whitespace) + 1;
return value.substr(0,loc);
}
So for instance, if the string was "hello ", the find_last not of() would result in 4 , the location of the last non whitespace character. One is added to it, to create the length of the string we want to extract (remember, locations are zero indexed).
we can do something similar for trimming the front of a string (left of the string):
Code: Select all
//=====================================================================
std::string ltrim(const std::string &value) {
auto loc = value.find_first_not_of(whitespace) ;
if (loc == std::string::npos){
return std::string();
}
return value.substr(loc);
So to trim a string on both sides, just becomes something as easy as :
Code: Select all
//=====================================================================
std::string trim(const std::string &value){
return ltrim(rtrim(value));
}
Code: Select all
//=====================================================================
std::string simplify(const std::string &value) {
auto temp = trim(value) ;
auto append = false ;
std::string rvalue ;
for (auto &ch : temp){
if (!std::isspace(static_cast<int>(ch))){
append = true ;
rvalue += ch;
}
else {
if (append) {
rvalue += ' ';
append = false ;
}
}
}
return rvalue ;
}
Now a simpler version, but perhaps slower, is to use regex functions
Code: Select all
std::string simplify(const std::string& input){
std::regex re("\\s{2,}");
std::string fmt = " ";
auto output = regex_replace(input, re, fmt);
return output ;
}