python - Adding pandas columns to a sparse matrix

Question

Welcome To Ask or Share your Answers For Others

python - Adding pandas columns to a sparse matrix

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Adding pandas columns to a sparse matrix

I have additional derived values for X variables that I want to use in my model.

XAll = pd_data[['title','wordcount','sumscores','length']]
y = pd_data['sentiment']
X_train, X_test, y_train, y_test = train_test_split(XAll, y, random_state=1)

As I am working with text data in title, I first convert it to a dtm separately:

vect = CountVectorizer(max_df=0.5)
vect.fit(X_train['title'])
X_train_dtm = vect.transform(X_train['title'])
column_index = X_train_dtm.indices

print(type(X_train_dtm))    # This is <class 'scipy.sparse.csr.csr_matrix'>
print("X_train_dtm shape",X_train_dtm.get_shape())  # This is (856, 2016)
print("column index:",column_index)     # This is column index: [ 533  754  859 ...,  633  950 1339]

Now that I have the text as a document term matrix, I would like to add the other features like 'wordcount','sumscores','length' to X_train_dtm which are numeric. This I shall create the model using the new dtm and thus would be more accurate as I would have inserted additinal features.

How do I add additional numeric columns of the pandas dataframe to a sparse csr matrix?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:58:55+0000

Found the solution. We can do this using sparse.hstack:

from scipy.sparse import hstack
X_train_dtm = hstack((X_train_dtm,np.array(X_train['wordcount'])[:,None]))

Categories

python - Adding pandas columns to a sparse matrix

python - Adding pandas columns to a sparse matrix

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags