Started implementation
This commit is contained in:
+68
-11
@@ -1169,20 +1169,77 @@ dynamic selection of models was seen to be one of the key contributors to the
|
||||
overall success of the agorithm.
|
||||
|
||||
\subsection{Model Performance Evaluation Method}\label{metrics}
|
||||
Splitting of data into training and hidden test set
|
||||
Evaluation of intermediate models generated by both the SFFS and Particle Swarm
|
||||
algorithms.
|
||||
Final Evaluation of optimised model on test set, and on the full database using
|
||||
Leave-one-out cross validation and 10-fold cross validation
|
||||
In order to fully understand the performance of the system (and to evaluate the
|
||||
impact of design decisions throughout development), a group of scoring methods
|
||||
were implemented to test the system's performance in a selection of scenarios.
|
||||
The aim was to provide reliable metrics that would highlight the systems
|
||||
strength and weaknesses and to provide quantifyable measures with which to
|
||||
compare the system to the range of alternative methods proposed in the
|
||||
literature.\\
|
||||
|
||||
One of the most basic metrics was the scoring of the trained model on a
|
||||
separate hold-out dataset. By reserving a selection of samples from accross the
|
||||
databases, a trained model could then be scored on this dataset for accuracy, sensitivity and
|
||||
specifcity (metrics described in Section~\ref{ChallengeEnt}) to determine the
|
||||
system's performance on an unseen set of samples. This method is widely used to
|
||||
provide a basic understanding of a model's ability to generalise to new data, A
|
||||
crucial requirement of the system. Data was split using a grouped stratified shuffle
|
||||
split, grouping by database. This ensured an equal number of randomly selected
|
||||
classes were taken from each database to produce training and test sets. This
|
||||
approach was taken to avoid class imbalance issues caused when there is a
|
||||
significant difference in class frequency, as detailed in
|
||||
Section~\ref{Resample}. It should be noted that although samples were
|
||||
stratified by class, it was not possible to stratify samples by patient. This
|
||||
may have an impact on results, as the presence of data from the same patient in
|
||||
both training and test set may artificially inflate results, where the model has
|
||||
learnt patterns specific to that patient that do not generalise to others.\\
|
||||
|
||||
A more robust method for for model evaluation is cross-validation. By splitting
|
||||
the full dataset into multiple folds, and training models on each, metrics can
|
||||
be calculated on each fold, and an average can be taken to provide a measure of
|
||||
the system's performance over all folds. 10-fold cross validation, stratified
|
||||
by class, was chosen for evaluation of the system. This provides an insight
|
||||
into the performance of the algorithm accross the dataset.\\
|
||||
It is highlighted by Homsi et.\ al, that a large amount of variance may be observed
|
||||
accross folds~\parencite[p.1637]{Homsi2017}. Homsi et.\ al attribute this to the
|
||||
variations accross databases, making generalisation difficult. To account for
|
||||
this, it is suggested that cross-validation is repeated multiple times and
|
||||
average to provide a more accurate measurement of performance accross folds.
|
||||
For the proposed system, cross validation was repreated 10 times for each fold
|
||||
and averaged to produce the final results. Standard-deviation is also
|
||||
calculated accross these iterations to illustrate the possible prevelance of
|
||||
this.\\
|
||||
|
||||
A common theme throughout the literature was that of generalisation accross
|
||||
databases. It was observed that many previous algorithms achieved high
|
||||
accuracies in standard cross-validation, but performed significantly worse when
|
||||
testing on unseen databases~\parencite{Homsi2017, Bobillo2016}. For this
|
||||
reason, leave-one-out cross-validation was used to form a better understanding
|
||||
of the system's ability to generalise to unseen data from different sources. On
|
||||
each fold, a single database is removed, training on all other databases.\\
|
||||
|
||||
The evaluation of models using cross-validation was not limited to final
|
||||
evaluation. Evaluation of intermediate models generated by both the SFFS and Particle Swarm
|
||||
algorithms was possible, by further separating the test set into test and
|
||||
validation fold. This technique was used in the optimisation of the
|
||||
final model to provide scores for intermediate models, as well as to help avoid
|
||||
overfitting parameters and features to the training set used for
|
||||
optimisation.\\
|
||||
|
||||
Discussion on the performance of the proposed system using these methods can be
|
||||
found in Section~\ref{Eval}.
|
||||
% TODO: Insert cross validation diagram from data science handbook
|
||||
~\ref{ChallengeEnt}
|
||||
Group cross-validation
|
||||
$k$-fold cross validation
|
||||
Final evaluation
|
||||
-
|
||||
|
||||
\section{Implementation}
|
||||
This section details the implementation challenges posed by the experiment and describes how the project addresses them.
|
||||
This section describes the tools used in the realisation of the
|
||||
proposed system and the practical issues encountered throught the
|
||||
implementation process. Rationale is given for decisions made throughout
|
||||
production of the proposed system and any issues with curent implementation are
|
||||
outlined.
|
||||
|
||||
\subsection{System Structure}
|
||||
From the outset, the project aimed to
|
||||
|
||||
focus on using open source libraries throughout the project to avoid
|
||||
`reinventing the wheel'. Integration of external libraries
|
||||
Use of Python - quick development, wide variet of third party libraries to
|
||||
|
||||
Reference in New Issue
Block a user