Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
969 views
in Technique[技术] by (71.8m points)

visual c++ - How can I implement an efficient whole-word string replacement in C++ without regular expressions?

Perhaps I'm overlooking something obvious but I was wondering what the fastest way to implement whole-word string replacement in C++ might be. At first I considered simply concatenating spaces to the search word, but this does not consider the string boundaries or punctuation.

This is my current abstraction for (non-whole-word) replacement:

void Replace(wstring& input, wstring find, wstring replace_with) {
  if (find.empty() || find == replace_with || input.length() < find.length()) {
      return;
  }
  for (size_t pos = input.find(find); 
              pos != wstring::npos; 
              pos = input.find(find, pos)) {

      input.replace(pos, find.length(), replace_with);
      pos += replace_with.length();
  }
}

If I only consider spaces as a word boundary, I could probably implement this by comparing the beginning and end of the search string against the find string to cover the string boundaries, and then following with a Replace(L' ' + find + L' ').... but I was wondering if there was a more elegant solution that would include punctuation efficiently.

Let's consider a word to be any collection of characters that is separated by either whitespace or punctuation (to keep it simple let's say !"#$%&'()*+,-./ at minimum -- which happen to correspond to (c > 31 && c < 48)).

In my application I have to call this function over a rather large array of short strings, which may include various Unicode which I don't want to split new words. I would also like to avoid including any external libraries, but STL is fine.

The purpose of not using regular expressions is the promise of less overhead and the goal of a fast function suited to this particular task over a large dataset.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you can do this, both doing whole-word matching and doing it efficiently. The key is to:

  • detect "whole-word" boundaries using 'std::isalpha', which should work with Unicode & any locale.
  • do the replace "out of place" by creating a separate 'output' string that you swap with 'input' at the end of processing, instead of doing the work "in place" on the 'input' string itself.

Here's my take on your function:

#include <cctype> // isalpha
#include <ciso646> // or, not
#include <string> // wstring

using std::size_t;
using std::wstring;

/// @brief Do a "find and replace" on a string.
/// @note This function does "whole-word" matching.
/// @param[in,out] input_string The string to operate on.
/// @param[in] find_string The string to find in the input.
/// @param[in] replace_string The string to replace 'find_string'
///            with in the input.
void find_and_replace( wstring& input_string,
                       const wstring& find_string,
                       const wstring& replace_string )
{
  if( find_string.empty()
      or find_string == replace_string
      or input_string.length() < find_string.length() )
  {
    return;
  }

  wstring output_string;
  output_string.reserve( input_string.length() );
  size_t last_pos = 0u;
  for( size_t new_pos = input_string.find( find_string );
       new_pos != wstring::npos;
       new_pos = input_string.find( find_string, new_pos ) )
  {
    bool did_replace = false;
    if( ( new_pos == 0u
          or not std::isalpha( input_string.at( new_pos - 1u ) ) )
        and ( new_pos + find_string.length() == input_string.length()
              or not std::isalpha( input_string.at( new_pos + find_string.length() ) ) ) )
    {
      output_string.append( input_string, last_pos, new_pos - last_pos );
      output_string.append( replace_string );
      did_replace = true;
    }
    new_pos += find_string.length();
    if( did_replace )
    {
      last_pos = new_pos;
    }
  }
  output_string.append( input_string, last_pos,
                        input_string.length() - last_pos );

  input_string.swap( output_string );
}

P.S. I was unsure what 'replace_all' was trying to accomplish in your initial example, so I removed it from my solution for clarity.

P.P.S. This code would be much cleaner with Regex-es. Can you rely on C++ TR1 or C++ 2011 functionality? They provide a standard 'regex' library.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...