Started implementation

2017-08-22 00:38:12 +01:00
parent 97c0cfc1c6
commit 5a53a6ff67
1 changed files with 68 additions and 11 deletions
@@ -1169,20 +1169,77 @@ dynamic selection of models was seen to be one of the key contributors to the
 overall success of the agorithm.

 \subsection{Model Performance Evaluation Method}\label{metrics}
-Splitting of data into training and hidden test set
-Evaluation of intermediate models generated by both the SFFS and Particle Swarm
-algorithms.
-Final Evaluation of optimised model on test set, and on the full database using
-Leave-one-out cross validation and 10-fold cross validation
+In order to fully understand the performance of the system (and to evaluate the
+impact of design decisions throughout development), a group of scoring methods
+were implemented to test the system's performance in a selection of scenarios.
+The aim was to provide reliable metrics that would highlight the systems
+strength and weaknesses and to provide quantifyable measures with which to
+compare the system to the range of alternative methods proposed in the
+literature.\\
+
+One of the most basic metrics was the scoring of the trained model on a
+separate hold-out dataset. By reserving a selection of samples from accross the
+databases, a trained model could then be scored on this dataset for accuracy, sensitivity and
+specifcity (metrics described in Section~\ref{ChallengeEnt}) to determine the
+system's performance on an unseen set of samples. This method is widely used to
+provide a basic understanding of a model's ability to generalise to new data, A
+crucial requirement of the system. Data was split using a grouped stratified shuffle
+split, grouping by database. This ensured an equal number of randomly selected
+classes were taken from each database to produce training and test sets. This
+approach was taken to avoid class imbalance issues caused when there is a
+significant difference in class frequency, as detailed in
+Section~\ref{Resample}. It should be noted that although samples were
+stratified by class, it was not possible to stratify samples by patient. This
+may have an impact on results, as the presence of data from the same patient in
+both training and test set may artificially inflate results, where the model has
+learnt patterns specific to that patient that do not generalise to others.\\
+
+A more robust method for for model evaluation is cross-validation. By splitting
+the full dataset into multiple folds, and training models on each, metrics can
+be calculated on each fold, and an average can be taken to provide a measure of
+the system's performance over all folds. 10-fold cross validation, stratified
+by class, was chosen for evaluation of the system. This provides an insight
+into the performance of the algorithm accross the dataset.\\
+It is highlighted by Homsi et.\ al, that a large amount of variance may be observed
+accross folds~\parencite[p.1637]{Homsi2017}. Homsi et.\ al attribute this to the
+variations accross databases, making generalisation difficult. To account for
+this, it is suggested that cross-validation is repeated multiple times and
+average to provide a more accurate measurement of performance accross folds.
+For the proposed system, cross validation was repreated 10 times for each fold
+and averaged to produce the final results. Standard-deviation is also
+calculated accross these iterations to illustrate the possible prevelance of
+this.\\
+
+A common theme throughout the literature was that of generalisation accross
+databases. It was observed that many previous algorithms achieved high
+accuracies in standard cross-validation, but performed significantly worse when
+testing on unseen databases~\parencite{Homsi2017, Bobillo2016}. For this
+reason, leave-one-out cross-validation was used to form a better understanding
+of the system's ability to generalise to unseen data from different sources. On
+each fold, a single database is removed, training on all other databases.\\
+
+The evaluation of models using cross-validation was not limited to final
+evaluation. Evaluation of intermediate models generated by both the SFFS and Particle Swarm
+algorithms was possible, by further separating the test set into test and
+validation fold. This technique was used in the optimisation of the
+final model to provide scores for intermediate models, as well as to help avoid
+overfitting parameters and features to the training set used for
+optimisation.\\
+
+Discussion on the performance of the proposed system using these methods can be
+found in Section~\ref{Eval}.
 % TODO: Insert cross validation diagram from data science handbook
-~\ref{ChallengeEnt}
-Group cross-validation
-$k$-fold cross validation
-Final evaluation
- 

 \section{Implementation}
-This section details the implementation challenges posed by the experiment and describes how the project addresses them.
+This section describes the tools used in the realisation of the
+proposed system and the practical issues encountered throught the
+implementation process. Rationale is given for decisions made throughout
+production of the proposed system and any issues with curent implementation are
+outlined.
+
+\subsection{System Structure}
+From the outset, the project aimed to 
+
 focus on using open source libraries throughout the project to avoid
 `reinventing the wheel'. Integration of external libraries
 Use of Python - quick development, wide variet of third party libraries to