On the use of Expected Attainable Discrimination for feature selection in large scale medical risk prediction problems

D. R. Lovell, M. J. J. Scott, M. Niranjan, R. W. Prager, K. J. Dalton and R. Derom

This report investigates the use of expected attainable discrimination (EAD) as a measure to select discrete valued features in two-class prediction problems. In essence, EAD tells us the performance we could expect to achieve with a simple histogram probability density model of a given dataset. For discrete valued features, this kind of density model is bias-free but can have large variance. Given insufficient training data, such a model's test set performance will be lower than that of a suitably biased model. In light of this, we explore the usefulness of EAD for feature selection.

Keywords: Feature selection, area under receiver operating characteristic (ROC) curve, medical risk prediction, obstetrics.