Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
254 views
in Technique[技术] by (71.8m points)

php - How to check if string is a valid XML element name?

I need a regex or a function in PHP that will validate a string to be a good XML element name.

Form w3schools:

XML elements must follow these naming rules:

  1. Names can contain letters, numbers, and other characters
  2. Names cannot start with a number or punctuation character
  3. Names cannot start with the letters xml (or XML, or Xml, etc)
  4. Names cannot contain spaces

I can write a basic regex that will check for rules 1,2 and 4, but it won't account for all punctuation allowed and won't account for 3rd rule

w[w0-9-]

Friendly Update

Here is the more authoritative source for well-formed XML Element names:

Names and Tokens

NameStartChar   ::=
    ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
    [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | 
    [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | 
    [#x10000-#xEFFFF]

NameChar    ::=
    NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Name    ::=
    NameStartChar (NameChar)*

Also a separate non-tokenized rule is specified:

Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you want to create valid XML, use the DOM Extension. This way you don't have to bother about any Regex. If you try to put in an invalid name to a DomElement, you'll get an error.

function isValidXmlName($name)
{
    try {
        new DOMElement($name);
        return TRUE;
    } catch(DOMException $e) {
        return FALSE;
    }
}

This will give

var_dump( isValidXmlName('foo') );      // true   valid localName
var_dump( isValidXmlName(':foo') );     // true   valid localName
var_dump( isValidXmlName(':b:c') );     // true   valid localName
var_dump( isValidXmlName('b:c') );      // false  assumes QName

and is likely good enough for what you want to do.

Pedantic note 1

Note the distinction between localName and QName. ext/dom assumes you are using a namespaced element if there is a prefix before the colon, which adds constraints to how the name may be formed. Technically, b:b is a valid local name though because NameStartChar is part of NameChar. If you want to include these, change the function to

function isValidXmlName($name)
{
    try {
        new DOMElement(
            $name,
            null,
            strpos($name, ':') >= 1 ? 'http://example.com' : null
        );
        return TRUE;
    } catch(DOMException $e) {
        return FALSE;
    }
}

Pedantic note 2

Note that elements may start with "xml". W3schools (who is not affiliated with the W3c) apparently got this part wrong (wouldn't be the first time). If you really want to exclude elements starting with xml add

if(stripos($name, 'xml') === 0) return false;

before the try/catch.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...