Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
460 views
in Technique[技术] by (71.8m points)

python - Use Scikit Learn to do linear regression on a time series pandas data frame

I'm trying to do a simple linear regression on a pandas data frame using scikit learn linear regressor. My data is a time series, and the pandas data frame has a datetime index:

                value
2007-01-01    0.771305
2007-02-01    0.256628
2008-01-01    0.670920
2008-02-01    0.098047

Doing something simple as

from sklearn import linear_model

lr = linear_model.LinearRegression()

lr(data.index, data['value'])

didn't work:

float() argument must be a string or a number

So I tried to create a new column with the dates to try to transform it:

data['date'] = data.index
data['date'] = pd.to_datetime(data['date'])
lr(data['date'], data['value'])

but now I get:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

So the regressor can't handle datetime. I saw a bunch of ways to convert integer data to datetime, but couldn't find a way to convert from datetime to integer, for example.

What is the proper way to do this?

PS: I'm interested in using scikit because I'm planning on doing more stuff with it later, so no statsmodels for now.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You probably want something like the number of days since the start to be your predictor here. Assuming everything is sorted:

In [36]: X = (df.index -  df.index[0]).days.reshape(-1, 1)

In [37]: y = df['value'].values

In [38]: linear_model.LinearRegression().fit(X, y)
Out[38]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

The exact units you use for the predictor don't really matter, it could be days or months. The coefficients and interpretation will change so that everything works out to the same result. Also, notice that we needed a reshape(-1, 1) so that the X is in the expected format.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...