Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
886 views
in Technique[技术] by (71.8m points)

excel - Use Regex to Split Numbered List array into Numbered List Multiline

I am trying to learn Regex to answer a question on SO portuguese.

Input (Array or String on a Cell, so .MultiLine = False)?

 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With number 0n mid. 4. Number 9 incorrect. 11.12 More than one digit. 12.7 Ending (no word).

Output

 1 One without dot.
 2. Some Random String.
 3.1 With SubItens.
 3.2 With number 0n mid.
 4. Number 9 incorrect.
 11.12 More than one digit.
 12.7 Ending (no word).

What i thought was to use Regex with Split, but i wasn't able to implement the example on Excel.

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "plum-pear"
      Dim pattern As String = "(-)" 

      Dim substrings() As String = Regex.Split(input, pattern)    ' Split on hyphens.
      For Each match As String In substrings
         Console.WriteLine("'{0}'", match)
      Next
   End Sub
End Module
' The method writes the following to the console:
'    'plum'
'    '-'
'    'pear' 

So reading this and this. The RegExr Website was used with the expression /([0-9]{1,2})([.]{0,1})([0-9]{0,2})/igm on the Input.

And the following is obtained:

RegExr

Is there a better way to make this? Is the Regex Correct or a better way to generate? The examples that i found on google didn't enlight me on how to use RegEx with Split correctly.

Maybe I am confusing with the logic of Split Function, which i wanted to get the split index and the separator string was the regex.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I can make that it ends with word and period

Use

d+(?:.d+)*[sS]*?w+.

See the regex demo.

Details

  • d+ - 1 or more digits
  • (?:.d+)* - zero or more sequences of:
    • . - dot
    • d+ - 1 or more digits
  • [sS]*? - any 0+ chars, as few as possible, up to the first...
  • w+. - 1+ word chars followed with ..

Here is a sample VBA code:

Dim str As String
Dim objMatches As Object
str = " 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With Another SubItem. 4. List item. 11.12 More than one digit."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "d+(?:.d+)*[sS]*?w+."
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
  For Each m In objMatches
      Debug.Print m.Value
  Next
End If

enter image description here

NOTE

You may require the matches to only stop at the word + . that are followed with 0+ whitespaces and a number using d+(?:.d+)*[sS]*?[a-zA-Z]+.(?=s*(?:d+|$)).

The (?=s*(?:d+|$)) positive lookahead requires the presence of 0+ whitespaces (s*) followed with 1+ digits (d+) or end of string ($) immediately to the right of the current location.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...