Started implementation

This commit is contained in:
2017-08-22 00:38:12 +01:00
parent 97c0cfc1c6
commit 5a53a6ff67
+68 -11
View File
@@ -1169,20 +1169,77 @@ dynamic selection of models was seen to be one of the key contributors to the
overall success of the agorithm.
\subsection{Model Performance Evaluation Method}\label{metrics}
Splitting of data into training and hidden test set
Evaluation of intermediate models generated by both the SFFS and Particle Swarm
algorithms.
Final Evaluation of optimised model on test set, and on the full database using
Leave-one-out cross validation and 10-fold cross validation
In order to fully understand the performance of the system (and to evaluate the
impact of design decisions throughout development), a group of scoring methods
were implemented to test the system's performance in a selection of scenarios.
The aim was to provide reliable metrics that would highlight the systems
strength and weaknesses and to provide quantifyable measures with which to
compare the system to the range of alternative methods proposed in the
literature.\\
One of the most basic metrics was the scoring of the trained model on a
separate hold-out dataset. By reserving a selection of samples from accross the
databases, a trained model could then be scored on this dataset for accuracy, sensitivity and
specifcity (metrics described in Section~\ref{ChallengeEnt}) to determine the
system's performance on an unseen set of samples. This method is widely used to
provide a basic understanding of a model's ability to generalise to new data, A
crucial requirement of the system. Data was split using a grouped stratified shuffle
split, grouping by database. This ensured an equal number of randomly selected
classes were taken from each database to produce training and test sets. This
approach was taken to avoid class imbalance issues caused when there is a
significant difference in class frequency, as detailed in
Section~\ref{Resample}. It should be noted that although samples were
stratified by class, it was not possible to stratify samples by patient. This
may have an impact on results, as the presence of data from the same patient in
both training and test set may artificially inflate results, where the model has
learnt patterns specific to that patient that do not generalise to others.\\
A more robust method for for model evaluation is cross-validation. By splitting
the full dataset into multiple folds, and training models on each, metrics can
be calculated on each fold, and an average can be taken to provide a measure of
the system's performance over all folds. 10-fold cross validation, stratified
by class, was chosen for evaluation of the system. This provides an insight
into the performance of the algorithm accross the dataset.\\
It is highlighted by Homsi et.\ al, that a large amount of variance may be observed
accross folds~\parencite[p.1637]{Homsi2017}. Homsi et.\ al attribute this to the
variations accross databases, making generalisation difficult. To account for
this, it is suggested that cross-validation is repeated multiple times and
average to provide a more accurate measurement of performance accross folds.
For the proposed system, cross validation was repreated 10 times for each fold
and averaged to produce the final results. Standard-deviation is also
calculated accross these iterations to illustrate the possible prevelance of
this.\\
A common theme throughout the literature was that of generalisation accross
databases. It was observed that many previous algorithms achieved high
accuracies in standard cross-validation, but performed significantly worse when
testing on unseen databases~\parencite{Homsi2017, Bobillo2016}. For this
reason, leave-one-out cross-validation was used to form a better understanding
of the system's ability to generalise to unseen data from different sources. On
each fold, a single database is removed, training on all other databases.\\
The evaluation of models using cross-validation was not limited to final
evaluation. Evaluation of intermediate models generated by both the SFFS and Particle Swarm
algorithms was possible, by further separating the test set into test and
validation fold. This technique was used in the optimisation of the
final model to provide scores for intermediate models, as well as to help avoid
overfitting parameters and features to the training set used for
optimisation.\\
Discussion on the performance of the proposed system using these methods can be
found in Section~\ref{Eval}.
% TODO: Insert cross validation diagram from data science handbook
~\ref{ChallengeEnt}
Group cross-validation
$k$-fold cross validation
Final evaluation
-
\section{Implementation}
This section details the implementation challenges posed by the experiment and describes how the project addresses them.
This section describes the tools used in the realisation of the
proposed system and the practical issues encountered throught the
implementation process. Rationale is given for decisions made throughout
production of the proposed system and any issues with curent implementation are
outlined.
\subsection{System Structure}
From the outset, the project aimed to
focus on using open source libraries throughout the project to avoid
`reinventing the wheel'. Integration of external libraries
Use of Python - quick development, wide variet of third party libraries to