Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
270 views
in Technique[技术] by (71.8m points)

null - Why should someone use {} for initializing an empty object in R?

It seems that some programmers are using:

a = {}
a$foo = 1
a$bar = 2

What is the benefit over a = list(foo = 1, bar = 2)?

Why should {} be used? This expression only returns NULL, so a NULL assignment would do the same, wouldn't it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your first query

Why should {} be used, this expression only returns NULL, so an NULL assignment would do the same, wouldn't it?

Yes, a <- NULL gives the same effect. Using {} is likely to be a personal style.


NULL

NULL is probably the most versatile and confusing R object. From R language definition of NULL:

It is used whenever there is a need to indicate or specify that an object is absent. It should not be confused with a vector or list of zero length.

The NULL object has no type and no modifiable properties. There is only one NULL object in R, to which all instances refer. To test for NULL use is.null. You cannot set attributes on NULL.

Strictly speaking, NULL is just NULL. And it is the only thing that is.null returns TRUE. However, according to ?NULL:

Objects with value NULL can be changed by replacement operators and will be coerced to the type of the right-hand side.

So, while it is not identical to a length-0 vector with a legitimate mode (not all modes in R are allowed in a vector; read ?mode for the full list of modes and ?vector for what are legitimate for a vector), this flexible coercion often makes it behave like a length-0 vector:

## examples of atomic mode
integer(0)  ## vector(mode = "integer", length = 0)
numeric(0)  ## vector(mode = "numeric", length = 0)
character(0)  ## vector(mode = "character", length = 0)
logical(0)  ## vector(mode = "logical", length = 0)

## "list" mode
list()  ## vector(mode = "list", length = 0)

## "expression" mode
expression()  ## vector(mode = "expression", length = 0)

You can do vector concatenation:

c(NULL, 0L)  ## c(integer(0), 0L)
c(NULL, expression(1+2))  ## c(expression(), expression(1+2))
c(NULL, list(foo = 1))  ## c(list(), list(foo = 1))

You can grow a vector (as you did in your question):

a <- NULL; a[1] <- 1; a[2] <- 2
## a <- numeric(0); a[1] <- 1; a[2] <- 2

a <- NULL; a[1] <- TRUE; a[2] <- FALSE
## a <- logical(0); a[1] <- TRUE; a[2] <- FALSE

a <- NULL; a$foo <- 1; a$bar <- 2
## a <- list(); a$foo <- 1; a$bar <- 2

a <- NULL; a[1] <- expression(1+1); a[2] <- expression(2+2)
## a <- expression(); a[1] <- expression(1+1); a[2] <- expression(2+2)

Using {} to generate NULL is similar to expression(). Though not identical, the run-time coercion when you later do something with it really makes them indistinguishable. For example, when growing a list, any of the following would work:

a <- NULL; a$foo <- 1; a$bar <- 2
a <- numeric(0); a$foo <- 1; a$bar <- 2  ## there is a warning
a <- character(0); a$foo <- 1; a$bar <- 2  ## there is a warning
a <- expression(); a$foo <- 1; a$bar <- 2
a <- list(); a$foo <- 1; a$bar <- 2

For a length-0 vector with an atomic mode, a warning is produced during run-time coercion (because the change from "atomic" to "recursive" is too significant):

#Warning message:
#In a$foo <- 1 : Coercing LHS to a list

We don't get a warning for expression setup, because from ?expression:

As an object of mode ‘"expression"’ is a list ...

Well, it is not a "list" in the usual sense; it is an abstract syntax tree that is list-alike.


Your second query

What is the benefit over a = list(foo = 1, bar = 2)?

There is no advantage in doing so. You should have already read elsewhere that growing objects is a bad practice in R. A random search on Google gives: growing objects and loop memory pre-allocation.

If you know the length of the vector as well as the value of its each element, create it directly, like a = list(foo = 1, bar = 2).

If you know the length of the vector but its elements' values are to be computed (say by a loop), set up a vector and do fill-in, like a <- vector("list", 2); a[[1]] <- 1; a[[2]] <- 2; names(a) <- c("foo", "bar").


In reply to Tjebo

I actually looked up ?mode, but it doesn't list the possible modes. It points towards ?typeof which then points to the possible values listed in the structure TypeTable in src/main/util.c. I have not been able to find this file not even the folder (OSX). Any idea where to find this?

It means the source of an R distribution, which is a ".tar.gz" file on CRAN. An alternative is look up on https://github.com/wch/r-source. Either way, this is the table:

TypeTable[] = {
    { "NULL",       NILSXP     },  /* real types */
    { "symbol",     SYMSXP     },
    { "pairlist",   LISTSXP    },
    { "closure",    CLOSXP     },
    { "environment",    ENVSXP     },
    { "promise",    PROMSXP    },
    { "language",   LANGSXP    },
    { "special",    SPECIALSXP },
    { "builtin",    BUILTINSXP },
    { "char",       CHARSXP    },
    { "logical",    LGLSXP     },
    { "integer",    INTSXP     },
    { "double",     REALSXP    }, /*-  "real", for R <= 0.61.x */
    { "complex",    CPLXSXP    },
    { "character",  STRSXP     },
    { "...",        DOTSXP     },
    { "any",        ANYSXP     },
    { "expression", EXPRSXP    },
    { "list",       VECSXP     },
    { "externalptr",    EXTPTRSXP  },
    { "bytecode",   BCODESXP   },
    { "weakref",    WEAKREFSXP },
    { "raw",        RAWSXP },
    { "S4",     S4SXP },
    /* aliases : */
    { "numeric",    REALSXP    },
    { "name",       SYMSXP     },

    { (char *)NULL, -1     }
};

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...