I notice that my r2 score (for the test set) changes significantly when I play around with the random state for a linear regression model. The scores can change orders of magnitude - fluctuating between -10^4 to 0.97. While the performance is better, I can't help wondering if this is not cheating. After all, shouldn't a good model work for all selections of test and training data? By selecting a random state that works, aren't we really creating a model that works best for the given test data and may not work equally well for all data points in the future?
state=random.randint(0,100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=state)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…