python - Is it cheating to change the random state for test-train split to get the best r2 score?

Question

Welcome To Ask or Share your Answers For Others

python - Is it cheating to change the random state for test-train split to get the best r2 score?

posted Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Is it cheating to change the random state for test-train split to get the best r2 score?

I notice that my r2 score (for the test set) changes significantly when I play around with the random state for a linear regression model. The scores can change orders of magnitude - fluctuating between -10^4 to 0.97. While the performance is better, I can't help wondering if this is not cheating. After all, shouldn't a good model work for all selections of test and training data? By selecting a random state that works, aren't we really creating a model that works best for the given test data and may not work equally well for all data points in the future?

state=random.randint(0,100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=state)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-01-26T20:43:24+0000

Short answer: Yes, it's cheating.

Theoretically, you can't touch or adapt any parameter in order to increase your test accuracy.

Furthermore, what are you obtaining with this information is that your model is not quiet good, it depends a lot on the input data, that shouldn't happened, probably you are overfitting. Check the train accuracy and test accuracy, if they differ, probably your are overfitting.

Categories

python - Is it cheating to change the random state for test-train split to get the best r2 score?

python - Is it cheating to change the random state for test-train split to get the best r2 score?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags