Why is shuffling a dataset before conducting k-fold cv


Problem

1. Why is shuffling a dataset before conducting k-fold CV generally a bad idea in finance? What is the purpose of shuffling? Why does shuffling defeat the purpose of k-fold CV in financial datasets?

2. Take a pair of matrices (X, y), representing observed features and labels. These could be one of the datasets derived from the exercises in Chapter 3.

(a) Derive the performance from a 10-fold CV of an RF classifier on (X, y), without shuffling.

(b) Derive the performance from a 10-fold CV of an RF on (X, y), with shuffling.

(c) Why are both results so different?

(d) How does shuffling leak information?

Text Book: Advances in Financial Machine Learning By Marcos Lopez de Prado.

Request for Solution File

Ask an Expert for Answer!!
Other Engineering: Why is shuffling a dataset before conducting k-fold cv
Reference No:- TGS02722280

Expected delivery within 24 Hours