python - What's the difference between KFold and ShuffleSplit CV?

Question

Welcome To Ask or Share your Answers For Others

python - What's the difference between KFold and ShuffleSplit CV?

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - What's the difference between KFold and ShuffleSplit CV?

It seems like KFold generates the same values every time the object is iterated over, while Shuffle Split generates different indices every time. Is this correct? If so, what are the uses for one over the other?

cv = cross_validation.KFold(10, n_folds=2,shuffle=True,random_state=None)
cv2 = cross_validation.ShuffleSplit(10,n_iter=2,test_size=0.5)
print(list(iter(cv)))
print(list(iter(cv)))
print(list(iter(cv2)))
print(list(iter(cv2)))

Yields the following output:

[(array([1, 3, 5, 8, 9]), array([0, 2, 4, 6, 7])), (array([0, 2, 4, 6, 7]), array([1, 3, 5, 8, 9]))]                                     
[(array([1, 3, 5, 8, 9]), array([0, 2, 4, 6, 7])), (array([0, 2, 4, 6, 7]), array([1, 3, 5, 8, 9]))]                                     
[(array([4, 6, 3, 2, 7]), array([8, 1, 9, 0, 5])), (array([3, 6, 7, 0, 5]), array([9, 1, 8, 4, 2]))]                                     
[(array([3, 0, 2, 1, 7]), array([5, 6, 9, 4, 8])), (array([0, 7, 1, 3, 8]), array([6, 2, 5, 4, 9]))]

question from:https://stackoverflow.com/questions/34731421/whats-the-difference-between-kfold-and-shufflesplit-cv

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:57:04+0000

Difference in KFold and ShuffleSplit output

KFold will divide your data set into prespecified number of folds, and every sample must be in one and only one fold. A fold is a subset of your dataset.

ShuffleSplit will randomly sample your entire dataset during each iteration to generate a training set and a test set. The test_size and train_size parameters control how large the test and training test set should be for each iteration. Since you are sampling from the entire dataset during each iteration, values selected during one iteration, could be selected again during another iteration.

Summary: ShuffleSplit works iteratively, KFold just divides the dataset into k folds.

Difference when doing validation

In KFold, during each round you will use one fold as the test set and all the remaining folds as your training set. However, in ShuffleSplit, during each round n you should only use the training and test set from iteration n. As your data set grows, cross validation time increases, making shufflesplits a more attractive alternate. If you can train your algorithm, with a certain percentage of your data as opposed to using all k-1 folds, ShuffleSplit is an attractive option.

Categories

python - What's the difference between KFold and ShuffleSplit CV?

python - What's the difference between KFold and ShuffleSplit CV?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags