Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
529 views
in Technique[技术] by (71.8m points)

python - Is it cheating to change the random state for test-train split to get the best r2 score?

I notice that my r2 score (for the test set) changes significantly when I play around with the random state for a linear regression model. The scores can change orders of magnitude - fluctuating between -10^4 to 0.97. While the performance is better, I can't help wondering if this is not cheating. After all, shouldn't a good model work for all selections of test and training data? By selecting a random state that works, aren't we really creating a model that works best for the given test data and may not work equally well for all data points in the future?

state=random.randint(0,100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=state)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Short answer: Yes, it's cheating.

Theoretically, you can't touch or adapt any parameter in order to increase your test accuracy.

Furthermore, what are you obtaining with this information is that your model is not quiet good, it depends a lot on the input data, that shouldn't happened, probably you are overfitting. Check the train accuracy and test accuracy, if they differ, probably your are overfitting.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...