I'm going through Machine Learning for Hackers, and I am stuck at this line:
from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
Which generates the following error:
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [1]
This is a traceback():
> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "
")
stop("with piece ", i, ":
", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
}(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
The priority.train object is a data frame, and here is more info:
> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date" "From.EMail" "Subject" "Message" "Path"
> sapply(priority.train, mode)
Date From.EMail Subject Message Path
"list" "character" "character" "character" "character"
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt"
$From.EMail
[1] "character"
$Subject
[1] "character"
$Message
[1] "character"
$Path
[1] "character"
> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame': 1250 obs. of 5 variables:
$ Date : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
$ From.EMail: chr "removed@removed.ca" "removed@removed.net" "removed@removed.ca" "removed@removed.net" ...
$ Subject : chr "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
$ Message : chr "
Hello,
I just installed redhat 7.2 and I think I have everything
working properly. Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you
downloaded the file. Also it might complain of a few depen"| __truncated__ "Lance wrote:
>Make sure you rebuild as root and you're in the directory that you
>downloaded the file. Also it might compl"| __truncated__ "Once upon a time, rob wrote :
> I dl'd gcc3 and libgcc3, but I still get the same error message when I
> try rpm --rebuil"| __truncated__ ...
$ Path : chr "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = """, na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = """, na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform
I would post a sample, but the content is a bit long and I don't think the content is relevant here.
The same error also happens here:
> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [1]
Does anyone have a clue on what's going on here? The error seems to be generated by a different object than priority.train, because its names attribute apparently has 9 elements.
I'd appreciate any help. Thanks!
Problem solved
I've found the problem thanks to @user1317221_G's tip of using the dput function. The problem is with the Date field, which is at this point a list that contains 9 fields (sec, min, hour, mday, mon, year, wday, yday, isdst). To solve the problem I've simply converted the dates into character vectors, used ddply then converted the dates back to Date:
> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)
See Question&Answers more detail:
os