Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
501 views
in Technique[技术] by (71.8m points)

regex - OS X sed -E doesn't accept extended regular expressions

I've been trying various ways to do some basic things with sed on OS X. Here are the results of some simple tests.

echo "foo bar 2011-03-17 17:31:47 foo bar" | sed 's/foo/FOUND/g'

returns (as expected)

FOUND bar 2011-03-17 17:31:47 FOUND bar

but

echo "foo bar 2011-03-17 17:31:47 foo bar" | sed -E 's/d{4}-d{2}-d{2} d{2}:d{2}:d{2}/FOUND/g'

returns

foo bar 2011-03-17 17:31:47 foo bar

and (even more irritatingly)

echo "food bar 2011-03-17 17:31:47 food bar" | sed -E 's/d/FOUND/g'

returns

fooFOUND bar 2011-03-17 17:31:47 fooFOUND bar

Now, the man sed pages say that

The following options are available:

 -E      Interpret regular expressions as extended (modern) regular
         expressions rather than basic regular expressions (BRE's).  The
         re_format(7) manual page fully describes both formats.

and man re_format says

          d  Matches a digit character.  This is equivalent to
          `[[:digit:]]'.

And indeed:

echo "foo bar 2011-03-17 17:31:47 foo bar" | sed -E 's/[[:digit:]]{4}/FOUND/g'

gives me

foo bar FOUND-03-17 17:31:47 foo bar

...but this is annoying. Either because I'm being dense, or because the man pages are lying to me (to be honest, I'd prefer the former).

A quick literature review here on SO suggests that I am not alone in this, and that many recommend installing GNU coreutils (or indeed use something else - say perl -pe) -- however, I'd like to be certain:

Do EREs work with sed as it is bundled with OS X -- as implied by the man pages -- or not?

(I'm on 10.8 and 10.6.8)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

On macOS, d is part of a regex feature set called enhanced features - note the distinction in name: enhanced, which is NOT the same as extended.

Instead, enhanced features are a separate dimension from basic vs. extended, which can be activated for both basic and extended regexes. In other words: you can have enhanced basic regexes as well as enhanced extended regexes.

However, it appears that whether enhanced features are available in a given utility is precompiled into it; in other words: a given utility either supports enhanced features or it doesn't - no option can change that. (Options only allow you to choose between basic and extended, such as -E for sed and grep.)

For a description of all enhanced features, see section ENHANCED FEATURES in man re_format.

It should also be noted that if POSIX compatibility is important, enhanced features should be avoided with sed.

There are POSIX utilities, such as awk, that do support EREs (extended regular expressions), but (a), the POSIX spec explicitly has to state so, and (b) the syntax is limited to POSIX EREs, which are less powerful than the EREs offered by specific platforms.


In practice:

Sadly, the man pages for the various utilities do NOT state whether a given utility supports enhanced regex features, so it comes down to trial and error.

As of macOS 10.15:

macOS sed does NOT support enhanced features, which explains the OP's experience.

  • E.g., sed -E 's/d//g' <<<'a10' has no effect, because d isn't recognized as representing a digit (only [[:digit:]] is).

I have found only one utility that supports enhanced features: grep:

grep    -o 'd+' <<<'a10' # -> '10' - enhanced basic regex
grep -E -o 'd+'  <<<'a10' # -> '10' - enhanced extended regex

If you know of others that do, please let us know.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...