Page 1 of 1

String trimming utilities

Posted: Thu Sep 23, 2021 3:30 pm
by punt
A lot of string manipulation occurs in c++ coding, and it is helpful to have a few routines handy to aid in manipulation strings.

One of the most common is dealing with whitespace (space, tab, vertical tab, feed, carriage return, newline). Or in c++ speak:

Code: Select all

const std::string whitespace = " \t\v\f\r\n" ;
it is common to trim white space off the end of string (trim the right of the string):

Code: Select all

	//=====================================================================
	std::string rtrim(const std::string &value) {
		auto loc = value.find_last_not_of(whitespace) + 1;
		return value.substr(0,loc);
	}
this searches for the last non whitespace character, and then increments it by one (to create a length of string to extract).
So for instance, if the string was "hello ", the find_last not of() would result in 4 , the location of the last non whitespace character. One is added to it, to create the length of the string we want to extract (remember, locations are zero indexed).

we can do something similar for trimming the front of a string (left of the string):

Code: Select all

	//=====================================================================
	std::string ltrim(const std::string &value) {
		auto loc = value.find_first_not_of(whitespace) ;
		if (loc == std::string::npos){
			return std::string();
		}
		return value.substr(loc);
we find the the first value that isn't whitespace. that index will be the index we use to substring the string. We need to ensure it found a non whitespace character, thus the check for npos. If we didn't, we return a "null" string.

So to trim a string on both sides, just becomes something as easy as :

Code: Select all

	//=====================================================================
	std::string trim(const std::string &value){
		return ltrim(rtrim(value));
	}
Now there is in <cctype> a function called isspace(int c). This uses the current locale to check against white space. we use that for the next case, where we want to "simplify" a string. Simplify trims a string (removes all leading/trailing whitespace), and replaces all whitespace between words of the string with a single space. So if you had "this is \t\t\t a test ", after simplification it would be "this is a test".

Code: Select all

	//=====================================================================
	std::string simplify(const std::string &value) {
		auto temp = trim(value) ;
		auto append = false ;
		std::string rvalue ;
		for (auto &ch : temp){
			if (!std::isspace(static_cast<int>(ch))){
				append = true ;
				rvalue += ch;
			}
			else {
				if (append) {
					rvalue += ' ';
					append = false ;
				}
			}
		}
		return rvalue ;
	}
It first trims the string. Then if checks every character if whitespace. If not whitespace, it appends it. If it is whitespace, if the first one, it appends a space, otherwise it ignores it.

Now a simpler version, but perhaps slower, is to use regex functions

Code: Select all

	std::string simplify(const std::string& input){
		std::regex re("\\s{2,}");
		std::string fmt = " ";
		
		auto output = regex_replace(input, re, fmt);
		return output ;
	}
this uses the regex function in <regex>. Depending on string sizes, I believe the non regex version is faster, but include regex example here, as an example of how to do other substitutions.

Re: String trimming utilities

Posted: Thu Sep 23, 2021 3:38 pm
by punt
So , putting it all together, with header, in a namespace of strutil we have

StringUtility.h

Code: Select all

#ifndef StringUtility_hpp
#define StringUtility_hpp

#include <string>

namespace strutil {
	extern const std::string whitespace ;
	
	//=====================================================================
	// Trim utilities
	//=====================================================================
	std::string ltrim(const std::string &value) ;
	std::string rtrim(const std::string &value) ;
	std::string trim(const std::string &value);
	std::string simplify(const std::string &value);
}
#endif /* StringUtility_hpp */

and the implementation
StringUtility.cpp

Code: Select all

#include "StringUtility.hpp"

#include <algorithm>
#include <cctype>

namespace strutil {
	//=====================================================================
	// Define the constant "whitespace"
	// Whitespace is space,tab,vertical tab,feed,newline,carriage return
	const std::string whitespace = " \t\v\f\n\r";

	//=====================================================================
	// Trim utilities
	//=====================================================================
	//=====================================================================
	std::string rtrim(const std::string &value) {
		auto loc = value.find_last_not_of(whitespace) + 1;
		return value.substr(0,loc);
	}

	//=====================================================================
	std::string ltrim(const std::string &value) {
		auto loc = value.find_first_not_of(whitespace) ;
		if (loc == std::string::npos){
			return std::string();
		}
		return value.substr(loc);

	}
	
	//=====================================================================
	std::string trim(const std::string &value){
		return ltrim(rtrim(value));
	}

	//=====================================================================
	std::string simplify(const std::string &value) {
		auto temp = trim(value) ;
		auto append = false ;
		std::string rvalue ;
		for (auto &ch : temp){
			if (!std::isspace(static_cast<int>(ch))){
				append = true ;
				rvalue += ch;
			}
			else {
				if (append) {
					rvalue += ' ';
					append = false ;
				}
			}
		}
		return rvalue ;
	}
}