Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
596 views
in Technique[技术] by (71.8m points)

php - Split a text into sentences

How can I split a text into an array of sentences?

Example text:

Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End

Should output:

0 => Fry me a Beaver.
1 => Fry me a Beaver!
2 => Fry me a Beaver?
3 => Fry me Beaver no. 4?!
4 => Fry me many Beavers...
5 => End

I tried some solutions that I've found on SO through search, but they all fail, especially at the 4th sentence.

/(?<=[!?.])./

/.|?|!/

/((?<=[a-z0-9)][.?!])|(?<=[a-z0-9][.?!]"))(s|
)(?="?[A-Z])/

/(?<=[.!?]|[.!?]['"])s+/    // <- closest one
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since you want to "split" sentences why are you trying to match them ?

For this case let's use preg_split().

Code:

$str = 'Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End';
$sentences = preg_split('/(?<=[.?!])s+(?=[a-z])/i', $str);
print_r($sentences);

Output:

Array
(
    [0] => Fry me a Beaver.
    [1] => Fry me a Beaver!
    [2] => Fry me a Beaver?
    [3] => Fry me Beaver no. 4?!
    [4] => Fry me many Beavers...
    [5] => End
)

Explanation:

Well to put it simply we are spliting by grouped space(s) s+ and doing two things:

  1. (?<=[.?!]) Positive look behind assertion, basically we search if there is a point or question mark or exclamation mark behind the space.

  2. (?=[a-z]) Positive look ahead assertion, searching if there is a letter after the space, this is kind of a workaround for the no. 4 problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...