Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
356 views
in Technique[技术] by (71.8m points)

r - Why is the terminology of labels and levels in factors so weird?

An example of a non-settable function would be labels. You can only set factor labels when they are created with the factor function. There is no labels<- function. Not that 'labels' and 'levels' in factors make any sense....

>  fac <- factor(1:3, labels=c("one", "two", "three"))
> fac
[1] one   two   three
Levels: one two three
> labels(fac)
[1] "1" "2" "3"

OK, I asked for labels, which one might assume were as set by the factor call, but I get something quite ... what's the word, unintuitive?

> levels(fac)
[1] "one"   "two"   "three"

So it appears that setting labels is really setting levels.

>  fac <- factor(1:3, levels=c("one", "two", "three"))
> levels(fac)
[1] "one"   "two"   "three"

OK that is as expected. So what are labels when one sets levels?

>  fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") )
> labels(fac)
[1] "1" "2" "3"
> levels(fac)
[1] "x" "y" "z"

Effing weird, if you ask me. It would seem that 'labels' arguments for factor trump any 'levels' arguments for the specification of levels. Why should this be? Seems like a confused terminology. And why does labels() return what I would have imagined to be retrieved with as.character(as.numeric(fac))?

(This was a tangential comment [labelled as such] in an earlier answer about assignment functions to which I was asked to move to a question. So here's your opportunity to enlighten me.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think the way to think about the difference between labels and levels (ignoring the labels() function that Tommy describes in his answer) is that levels is intended to tell R which values to look for in the input (x) and what order to use in the levels of the resulting factor object, and labels is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of the factor object returned by factor() that is called labels ... just the levels, which have been adjusted by the labels argument ... (clear as mud).

For example:

> f <- factor(x=c("a","b","c"),levels=c("c","d","e"))
> f
[1] <NA> <NA> c  
Levels: c d e
> str(f)
Factor w/ 3 levels "c","d","e": NA NA 1

Because the first two elements of x were not found in levels, the first two elements of f are NA. Because "d" and "e" were included in levels, they show up in the levels of f even though they did not occur in x.

Now with labels:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E"))
> f
[1] <NA> <NA> C   
Levels: C D E

After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c"))
> f
[1] <NA> <NA> a   
Levels: a b c

Another way to think about levels is that factor(x,levels=L1,labels=L2) is equivalent to

f <- factor(x,levels=L1)
levels(f) <- L2

I think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...