I am using Machine Learning Techniques in my projects in the field of education. I am looking for some help an answer to the below-mentioned questions. I would appreciate it if you could help me to find answers to these questions.
I am using nationally representative data in my project, and the data were collected through a two-stage sampling process.
Since complex sampling was used to select participants, statistical models usually would account for the unequal probability of selection of individual participants (e.g., using sampling weights), stratification/blocking, and non-independence of student outcomes within schools to obtain representative population estimates.
How is data from complex sampling designs handled in machine learning? (I would expect that at least the probability weights may need to be accounted for, but this is not my expertise.)
Another way to approach this question may be: Does machine learning require a starting assumption that the observed data was obtained by random or representative sampling? Many of these methods sample from the initial dataset as part of the analytic algorithm, but I am wondering about necessary conditions for the initial dataset.
I would greatly appreciate your inputs!
question from:
https://stackoverflow.com/questions/65713572/how-is-data-from-complex-sampling-designs-handled-in-machine-learning 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…