I have the following problem: I converted a corpus into a dfm and this dfmm has some zero entries that I need to remove before fitting a LDA model. I would usually do as follows:
OutDfm <- dfm_trim(dfm(corpus, tolower = TRUE, remove = c(stopwords("english"), stopwords("german"), stopwords("french"), stopwords("italian")), remove_punct = TRUE, remove_numbers = TRUE, remove_separators = TRUE, stem = TRUE, verbose = TRUE), min_docfreq = 5)
Creating a dfm from a corpus input...
... lowercasing
... found 272,912 documents, 112,588 features
... removed 613 features
... stemming features (English)
, trimmed 27491 feature variants
... created a 272,912 x 84,515 sparse dfm
... complete.
Elapsed time: 78.7 seconds.
# remove zero-entries
raw.sum=apply(OutDfm,1,FUN=sum)
which(raw.sum == 0)
OutDfm = OutDfm[raw.sum!=0,]
However, when I try to perform the last operations I get: Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
hinting at the fact the the matrix is too large to be manipulated.
Is there anyone who has met and solved this issue before? Any alternative strategy to remove the 0 entries?
Thanks a lot!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…