Data mining scenarios for the discovery of subtypes and the comparison of algorithms

Leiden Repository

Data mining scenarios for the discovery of subtypes and the comparison of algorithms

Type: Doctoral Thesis
Title: Data mining scenarios for the discovery of subtypes and the comparison of algorithms
Author: Colas, Fabrice Pierre Robert
Publisher: Leiden Institute of Advanced Computer Science (LIACS), Faculty of Science, Leiden University
Issue Date: 2009-03-04
Keywords: Bioinformatics
Data
Mining
Statistics
Subtype
Text
Abstract: A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug discovery. Thus, because OA and PD are characterized by clinical heterogeneity, a more sensitive classification of the cohort of patients may contribute to the search for the underlying diseases mechanism. In drug discovery, subtyping may improve the understanding of the similarity (and distance) between different phenotypic effects as induced by drugs and chemicals. Our second scenario aims to compare text classification algorithms. First, we show that common classifiers achieve comparable performance on most problems. Second, tightly constrained SVM solutions are high performers. In that situation, most training documents are bounded support vectors, SVM reduces to a nearest mean classifier and no training is necessary, which raises a question on SVM merits in sparse bag of words feature spaces. Also, SVM is shown to suffer from performance deterioration for particular combinations of training set size/number of features. This relate to outlying documents of distinct classes overlapping in the feature space.
Description: Promotor: J.n. Kok
With summary in Dutch
Faculty: Faculteit der Wiskunde en Natuurwetenschappen
Citation: Colas, F.P.R., 2009, Doctoral thesis, Leiden University
ISBN: 9789090238883
Sponsor: Netherlands BioInformatics Center (NBIC)
Handle: http://hdl.handle.net/1887/13575
 

Files in this item

Description Size View
application/pdf Full text 5.591Mb View/Open
text/html Links to published articles 11.77Kb View/Open
application/pdf Cover 22.40Kb View/Open
application/pdf Title page_Contents 87.51Kb View/Open
application/pdf Introduction 769.0Kb View/Open
application/pdf Part 1 89.73Kb View/Open
application/pdf Chapter 1 2.761Mb View/Open
application/pdf Chapter 2 1.638Mb View/Open
application/pdf Chapter 3 702.4Kb View/Open
application/pdf Chapter 4 1.193Mb View/Open
application/pdf Chapter 5 2.185Mb View/Open
application/pdf Part 2 89.08Kb View/Open
application/pdf Chapter 6 1.290Mb View/Open
application/pdf Chapter 7 837.6Kb View/Open
application/pdf Chapter 8 1.071Mb View/Open
application/pdf Conclusions 388.1Kb View/Open
application/pdf Appendix A 193.0Kb View/Open
application/pdf Appendix B 181.1Kb View/Open
application/pdf Bibliography 246.6Kb View/Open
application/pdf Samenvatting (Summary) 82.04Kb View/Open
application/pdf Curriculum Vitae 82.40Kb View/Open
application/pdf Acknowledgements 71.37Kb View/Open
application/pdf Propositions 53.44Kb View/Open

This item appears in the following Collection(s)