Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
362 views
in Technique[技术] by (71.8m points)

operators - Valid identifier characters in Scala

One thing I find quite confusing is knowing which characters and combinations I can use in method and variable names. For instance

val #^ = 1 // legal
val #  = 1 // illegal
val +  = 1 // legal
val &+ = 1 // legal
val &2 = 1 // illegal
val £2 = 1 // legal
val ?  = 1 // legal

As I understand it, there is a distinction between alphanumeric identifiers and operator identifiers. You can mix an match one or the other but not both, unless separated by an underscore (a mixed identifier).

From Programming in Scala section 6.10,

An operator identifier consists of one or more operator characters. Operator characters are printable ASCII characters such as +, :, ?, ~ or #.

More precisely, an operator character belongs to the Unicode set of mathematical symbols(Sm) or other symbols(So), or to the 7-bit ASCII characters that are not letters, digits, parentheses, square brackets, curly braces, single or double quote, or an underscore, period, semi-colon, comma, or back tick character.

So we are excluded from using ()[]{}'"_.;, and `

I looked up Unicode mathematical symbols on Wikipedia, but the ones I found didn't include +, :, ? etc. Is there a definitive list somewhere of what the operator characters are?

Also, any ideas why Unicode mathematical operators (rather than symbols) do not count as operators?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Working from the EBNF syntax in the spec:

upper ::= ‘A’ | ... | ‘Z’ | ‘$’ | ‘_’ and Unicode category Lu
lower ::= ‘a’ | ... | ‘z’ and Unicode category Ll
letter ::= upper | lower and Unicode categories Lo, Lt, Nl
digit ::= ‘0’ | ... | ‘9’
opchar ::= “all other characters in u0020-007F and Unicode
            categories Sm, So except parentheses ([]) and periods”

But also taking into account the very beginning on Lexical Syntax that defines:

Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’.
Delimiter characters ‘‘’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’

Here is what I come up with. Working by elimination in the range u0020-007F, eliminating letters, digits, parentheses and delimiters, we have for opchar... (drumroll):

! # % & * + - / : < = > ? @ ^ | ~ and also Sm and So - except for parentheses and periods.

(Edit: adding valid examples here:). In summary, here are some valid examples that highlights all cases - watch out for in the REPL, I had to escape as \:

val !#%&*+-/:<=>?@^|~ = 1 // all simple opchars
val simpleName = 1 
val withDigitsAndUnderscores_ab_12_ab12 = 1 
val wordEndingInOpChars_!#%&*+-/:<=>?@^|~ = 1
val !^?? = 1 // opchars ans symbols
val abcαβγ_!^?? = 1 // mixing unicode letters and symbols

Note 1:

I found this Unicode category index to figure out Lu, Ll, Lo, Lt, Nl:

  • Lu (uppercase letters)
  • Ll (lowercase letters)
  • Lo (other letters)
  • Lt (titlecase)
  • Nl (letter numbers like roman numerals)
  • Sm (symbol math)
  • So (symbol other)

Note 2:

val #^ = 1 // legal   - two opchars
val #  = 1 // illegal - reserved word like class or => or @
val +  = 1 // legal   - opchar
val &+ = 1 // legal   - two opchars
val &2 = 1 // illegal - opchar and letter do not mix arbitrarily
val £2 = 1 // working - £ is part of Sc (Symbol currency) - undefined by spec
val ?  = 1 // legal   - part of Sm

Note 3:

Other operator-looking things that are reserved words: _ : = => <- <: <% >: # @ and also u21D2 ? and u2190


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...