Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
326 views
in Technique[技术] by (71.8m points)

Is this the RegEx for matching any cell reference in an Excel formula?

I have been trying to create a regular expressions pattern that matches any reference in any Excel formula, including absolute, relative, and external references. I need to return the entire reference, including the worksheet and workbook name.

I haven't been able to find exhaustive documentation about Excel A1-notation, but with a lot of testing I have determined the following:

  • Formulas are preceded with an equal sign "="
  • Strings within formulas are enclosed in double quotes and need to be removed before looking for real references, otherwise =A1&"A1" would break regex
  • Worksheet names can be up to 31 characters long, excluding / ? * [ ] :
  • Worksheet names in external references must be succeeded with bang =Sheet1!A1
  • Workbook names in external references must be enclosed in square brackets =[Book1.xlsx]Sheet1!A1
  • Workbook paths, which Excel adds if a reference is to a range in a closed workbook, are always enclosed in single quotes and to the left of the brackets for the workbook name 'C:[Book1.xlsx]Sheet1'!A1
  • Some characters (non-breaking space, for example) cause Excel to enclose the workbook and worksheet name in an external reference in single quotes, but I don't know specifically which characters ='[Book 1.xlsx]Sheet 1'!A1
  • Even if R1C1-notation is enabled, Range.Formula still returns references in A1-notation. Range.FormulaR1C1 returns references in R1C1 notation.
  • 3D reference style allows a range of sheet names on one workbook =SUM([Book5]Sheet1:Sheet3!A1)
  • Named ranges can be specified in formulas:
  • The first character of a name must be a letter, an underscore character (_), or a backslash (). Remaining characters in the name can be letters, numbers, periods, and underscore characters.
  • You cannot use the uppercase and lowercase characters "C", "c", "R", or "r" as a defined name, because they are all used as a shorthand for selecting a row or column for the currently selected cell when you enter them in a Name or Go To text box.
  • Names cannot be the same as a cell reference, such as Z$100 or R1C1.
  • Spaces are not allowed as part of a name.
  • A name can be up to 255 characters in length.
  • Names can contain uppercase and lowercase letters. Excel does not distinguish between uppercase and lowercase characters in names.

Here is what I came up with wrapped in a VBA procedure for testing. I updated the code to handle names as well:

Sub ReturnFormulaReferences()

    Dim objRegExp As New VBScript_RegExp_55.RegExp
    Dim objCell As Range
    Dim objStringMatches As Object
    Dim objReferenceMatches As Object
    Dim objMatch As Object
    Dim intReferenceCount As Integer
    Dim intIndex As Integer
    Dim booIsReference As Boolean
    Dim objName As Name
    Dim booNameFound As Boolean

    With objRegExp
        .MultiLine = True
        .Global = True
        .IgnoreCase = True
    End With

    For Each objCell In Selection.Cells
        If Left(objCell.Formula, 1) = "=" Then

            objRegExp.Pattern = """.*"""
            Set objStringMatches = objRegExp.Execute(objCell.Formula)

            objRegExp.Pattern = "('.*([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}'!" _
            & "|([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}!)?" _
            & "($?[a-z]{1,3}$?[0-9]{1,7}(:$?[a-z]{1,3}$?[0-9]{1,7})?" _
            & "|$[a-z]{1,3}:$[a-z]{1,3}" _
            & "|[a-z]{1,3}:[a-z]{1,3}" _
            & "|$[0-9]{1,7}:$[0-9]{1,7}" _
            & "|[0-9]{1,7}:[0-9]{1,7}" _
            & "|[a-z_\][a-z0-9_.]{0,254})"
            Set objReferenceMatches = objRegExp.Execute(objCell.Formula)

            intReferenceCount = 0
            For Each objMatch In objReferenceMatches
                intReferenceCount = intReferenceCount + 1
            Next

            Debug.Print objCell.Formula
            For intIndex = intReferenceCount - 1 To 0 Step -1
                booIsReference = True
                For Each objMatch In objStringMatches
                    If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
                    And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
                        booIsReference = False
                        Exit For
                    End If
                Next

                If booIsReference Then
                    objRegExp.Pattern = "('.*([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}'!" _
                    & "|([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}!)?" _
                    & "($?[a-z]{1,3}$?[0-9]{1,7}(:$?[a-z]{1,3}$?[0-9]{1,7})?" _
                    & "|$[a-z]{1,3}:$[a-z]{1,3}" _
                    & "|[a-z]{1,3}:[a-z]{1,3}" _
                    & "|$[0-9]{1,7}:$[0-9]{1,7}" _
                    & "|[0-9]{1,7}:[0-9]{1,7})"
                    If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
                        objRegExp.Pattern = "^('.*([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}'!" _
                        & "|([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}!)" _
                        & "[a-z_\][a-z0-9_.]{0,254}$"
                        If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
                            booNameFound = False
                            For Each objName In objCell.Worksheet.Parent.Names
                                If objReferenceMatches(intIndex).Value = objName.Name Then
                                    booNameFound = True
                                    Exit For
                                End If
                            Next
                            If Not booNameFound Then
                                objRegExp.Pattern = "^('.*([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}'!" _
                                & "|([.*])?([^:\/?*[]]{1,31}:)?[^:\/?*[]]{1,31}!)"
                                For Each objName In objCell.Worksheet.Names
                                    If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
                                        booNameFound = True
                                        Exit For
                                    End If
                                Next
                            End If
                            booIsReference = booNameFound
                        End If
                    End If
                End If

                If booIsReference Then
                    Debug.Print "  " & objReferenceMatches(intIndex).Value _
                    & " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
                    & objReferenceMatches(intIndex).Length & ")"
                End If
            Next intIndex
            Debug.Print

        End If
    Next

    Set objRegExp = Nothing
    Set objStringMatches = Nothing
    Set objReferenceMatches = Nothing
    Set objMatch = Nothing
    Set objCell = Nothing
    Set objName = Nothing

End Sub

Can anyone break or improve this? Without exhaustive documentation on Excel's formula syntax it is difficult to know if this is correct.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.

stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula

Thanks jtolle!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...