On data mining in context : cases, fusion and evaluation

On data mining in context : cases, fusion and evaluation

Type: Doctoral Thesis
Title: On data mining in context : cases, fusion and evaluation
Author: Putten, Petrus Wilhelmus Henricus van der
Publisher: Leiden Institute of Advanced Computer Science (LIACS), Faculty of Science, Leiden University
Issue Date: 2010-01-19
Keywords: Knowledge discovery
Data mining
Machine learning
Data mining process
Real world applications
Data fusion
Bias variance
Model profiling
Abstract: Data mining can be seen as a process, with modeling as the core step. However, other steps such as planning, data preparation, evaluation and deployment are of key importance for applications. This thesis studies data mining in the context of these other steps with the goal of improving data mining applicability. We introduce cases that provide an end to end overview and serve as motivating examples, and then focus on specific research topics. We discuss the problem of data mining across multiple sources, with data fusion as a potential solution. This is an interesting research topic, as it removes barriers for applications and data mining can be used to carry out the fusion. We then analyze a large scale experiment in real world data mining. We use the bias variance evaluation framework across all steps in the process to investigate the large spread in results for a data mining competition. We conclude with a study advocating model profiling for novel classifiers. Given that it is unlikely that a novel classifier outperforms all competing classifiers across all problems, it is more interesting to characterize on what problems it performs best and to what other algorithms its behavior is most similar.
Description: Promotor: J.N. Kok With Summary in Dutch
Faculty: Faculteit der Wiskunde en Natuurwetenschappen
Citation: Putten, P.W.H. van der, 2010, Doctoral Thesis, Leiden University
ISBN: 9789088911439
Handle: http://hdl.handle.net/1887/14600

