Lit review complete, started intro

This commit is contained in:
Sam Perry
2017-08-09 11:13:06 +01:00
parent 46e3cc97db
commit bb4a1c4b83
+133 -105
View File
@@ -19,7 +19,7 @@
\usepackage{booktabs}
\usepackage{tabulary}
\usepackage[margin=1.0in]{geometry}
\usepackage[pass]{geometry}
\usepackage{pdflscape}
\usepackage{graphicx}
@@ -122,20 +122,41 @@ Classification of Heart Abnormalities} \par}\\
\renewcommand{\abstractname}{Acknowledgements}
\begin{abstract}
I'd like to thanks anyone and everyone...
I'd like to thank anyone and everyone...
\end{abstract}
\tableofcontents
\newpage
\section{Introduction}
Cardiovascular diseases are the most prevalent cause of death in Europe,
accounting for 37.5\% of all deaths in 2013~\parencite{Eurostat2016}.
Traditionally, Heart auscultation has been performed manually using a standard
stethoscope, with the aim of detecting heart defects aurally. However,
auscultation is a difficult skill that requires training and can only usually be
performed by a trained healthcare professional, such as a GP.
Due to recent advancements in technology, research into the automation of such
detection has shown promise, focusing primarily on analysis of
Electrocardiogram (ECG) signals. Although useful for detecting pathologies, ECG
equipment requires a trained professional for use and also remains expensive.
Therefore it is not currently feasible for developing countries and rural areas
there may be low numbers of physicians for the size of the population.
A comparatively affordable alternative is the Phonocardiogram (PCG).
It is a widely used and inexpensive means of detecting conditions such as heart
valve disorders.
Automation auscultation could provide an initial diagnosis for heart defects
without the need for a trained medical health practitioner. This would allow
relatively cheap equipment to analyse a patient's heart sound, and
automatically recommend further inspection based on analysis. This could have
significant benefit in a number of situations, particularly in the developing
world and rural environments, where
% TODO: Write brief overview of history of PCG signal analysis
% TODO: Explain fundamental heart sounds
\section{Related Work}
There are currently a wide variety of methods employed for the analysis and
classification of PCG signals. Current methods can typically be divided into 3
areas, each of which are combined to create full classification system. These
areas, each of which are combined to create a full classification system. These
areas are: signal preprocessing, signal segmentation, and feature
extraction/classification. The performance and evaluation of complete systems
are also discussed in section~\ref{Classification}
@@ -148,7 +169,7 @@ recordings: stethoscope type, make and model, its microphone/sensors, the
position used to record (i.e.\ lower left sternal border, apex, pulmonic area,
aortic area), built in filters/signal processing used by the stethoscope (i.e.\
noise filters, anti-tremor filters), medication that a patient may be taking,
as well as many other factors that may influence the recorded
as well as many other aspects that may influence the recorded
signal~\parencite[p.4]{Pavlopoulos2004}. This presents a significant issue when
attempting to analyse and compare a dataset of signals, as variations in
recordings and artefacts caused by factors other than heart sounds will most
@@ -167,10 +188,10 @@ highpass chebychev or butterworth filters are favoured with cutoff frequencies
ranging from 400--750Hz.\\
In addition, many methods decompose the filtered signal using wavelet based
methods such as the discrete wavelet transform
(DWT)~\parencite{Liang1997a, Pavlopoulos2004}, continuous
wavelet transform (CWT)~\parencite{Langley2016} or wavelet
package decomposition (WPD)~\parencite{Liang1998}.
methods, such as the discrete wavelet transform (DWT)~\parencite{Liang1997a,
Pavlopoulos2004}, continuous wavelet transform (CWT)~\parencite{Langley2016} or
wavelet package decomposition (WPD)~\parencite{Liang1998}, are commonly used to
separate components of a signal based on their spectral content.
Wavelet transforms are popular as, unlike Fourier transforms, they are well
localized in both the time and frequency domain. This allows for the analysis
of PCG signals across multiple frequency bands whilst maintaining transient
@@ -180,82 +201,83 @@ consist of higher frequency components than normal heart sounds.
\subsection{Signal Segmentation}\label{Segmentation}
Algorithms for the segmentation of PCG data aim to extract the structure of
the signal over time. This is a key stage in the analysis of PCG signals as the
the signal over time. This is a key stage in the analysis of PCG signals, as the
structure and relationships between the fundamental heart sounds (FHSs) form
the basis for much of the further analysis performed on PCG data.\\
the basis for much of further analysis performed on PCG data.\\
% TODO: insert segmented graph of PCG cycle
A number of methods exist for the extraction of FHSs. Traditional methods rely
on direct extraction of peaks from envelopes in the time domain to determine
the structure of a signal. These methods perform various transformation in
order to accentuate the transient events with the intention of isolating
them.\\
Liang et.\ al propose a method using the popular Shannon energy
envelope, achieving good accuracy across 37 recordings of
children~\citeyearpar{Liang1997b}. The algorithm aims to segment the data by
first extracting the envelope, then applying adaptive rule based thresholds to
on direct extraction of peaks from amplitude envelopes in the time domain to
determine the structure of a signal. These methods perform various
processing/transformations in order to accentuate the transient events with the
intention of isolating them.\\
Early work in this area by Liang et.\ al described a method using the popular
Shannon energy envelope, achieving good accuracy across 37 recordings of
children~\citeyearpar{Liang1997b}. The algorithm aimed to segment the data by
first extracting the envelope, then applying adaptive rule based thresholds, to
determine peaks corresponding to segmentation points. When comparing results to
hand annotated ground truth, the system achieves a reported accuracy score of
84\%. However, due to the small sample size, and potential lack of noise in the
dataset used, this may not translate to a larger dataset recorded in
hand annotated ground truth data, the system achieved a reported accuracy score
of 84\%. However, due to the small sample size, and potential lack of noise in
the dataset used, this may not translate to a larger dataset recorded in
sub-optimal conditions.\\
More recent methods use spectral representations to assist in the splitting of
More recent methods used spectral representations to assist in the splitting of
the FHSs, in particular using wavelet decomposition. These methods tend to
perform more robustly on signals of varying conditions.\\
Building on previous work, Liang et.\ al present an improved method, using the
perform more robustly on signals of varying conditions.\\
Building on previous work, Liang et.\ al presented an improved method, using the
discrete wavelet transform to decompose and reconstruct the signal into 7
distinct frequency bands~\citeyearpar{Liang1997a}. Applying a similar method
distinct frequency bands~\citeyearpar{Liang1997a}. Applying a similar method
of envelope extraction and peak picking to each frequency band, the best
estimate of all frequency bands is then chosen as the final result. Criterion
for this choice is based on number of S1s and S2s detected, and the number of
artefacts discarded for each frequency band. This method achieved an improved
accuracy of 93\% accuracy across a larger dataset of 77 recordings. This
for this choice is based on the number of S1s and S2s detected, and the number
of artefacts discarded for each frequency band. This method achieved an
improved accuracy of 93\% across a larger dataset of 77 recordings. This
suggests that the algorithm is as robust if not more so than previous work by
Liang et\ al.\\
Vepa et.\ al proposed a wavelet decomposition based method that uses a
combination of simplicity and envelope features~\citeyearpar{Vepa2008}. This
approach attempts to improve robustness when analysing signals of varying
quality by using multiple complimentary features, allowing the method to base
quality by using multiple complimentary features. This allows the method to base
decisions on a variety of statistical properties. Evaluating the algorithm on a
collection of 160 heart cycles from a variety of sources, a reported accuracy
of 84\% was achieved.\\
A variety of machine learning methods have been implemented with reasonable
success. Gupta et.\ al present a method that applies $k$-means clustering to
More recently, a variety of machine learning methods have been implemented with reasonable
success. Gupta et.\ al presented a method that applies $k$-means clustering to
replace standard threshold based methods for determining peak classification in
a standard envelope based segmentation algorithm~\citeyearpar{Gupta2007}. This achieved a reported
accuracy of 90.29\%. Due to the envelope based method for feature extraction,
this method is still suceptible to noise and artefacts that occur within the
frequency bands of the heart sounds.\\
Sepehri et.\ al propose a method that combines neural networks with Power
Spectral Density (PSD) estimates~\citeyearpar{Sepehri2010}. This method
exploits the periodic nature of S1 and S2 heart sounds, combined with their
narrow frequency range, to train a neural network to separate these sounds from
other sounds and murmurs. This method achieves a reported 93.6\% accuracy on a
significantly larger database than other methods detailed.\\
Sepehri et.\ al proposed a method that combines neural networks with Power
Spectral Density (PSD) estimates~\citeyearpar{Sepehri2010}. By exploiting the
periodic nature of S1 and S2 heart sounds, combined with their narrow frequency
range, a neural network is trained to separate these sounds from other events,
such as noise and murmurs. This method achieved a reported 93.6\% accuracy on a
significantly larger database than previous methods detailed.\\
Most significant success in segmentation algorithms has been observed through use
of probabilistic models such as Hidden Markov Models (HMMs). Early research
using these models by Ricke et.\ al utilized embedded HMMs to model the 4
using these models by Ricke et.\ al utilised embedded HMMs to model the 4
states of the PCG and their transitions~\citeyearpar{Ricke2005}. MFCCs and
Shannon Energy are used as feature vectors for the models. Results of
Shannon Energy were used as feature vectors for the models. Results of
98\% accuracy were reported, although this was tested on only a small database
of signals.\\
Gill et.\ al achieve similar results, most notably with specific consideration
Gill et.\ al achieved similar results, most notably with specific consideration
for the duration of each state in the HMM~\citeyearpar{Gill2005}. This is
handled through the extraction of 6 duration features based primarily on peaks,
which are then used as feature vectors for the HMM. Results of 98.6\%
handled through the extraction of 6 duration features based primarily on peaks.
These features form vectors for training the HMM. Results of 98.6\%
sensitivity, 96.9\% positive predictivity for S1 sounds and 98.3\% sensitivity,
96.5\% positive predictivity for S2 sounds were reported.
The issue of state duration is further addressed by Schmidt et.\ al through use
96.5\% positive predictivity for S2 sounds is reported.
The issue of state duration was further addressed by Schmidt et.\ al through use
of a duration-dependent hidden Markov (DHMM)~\citeyearpar{Schmidt2015}. The
DHMM is a modified HMM that considers the duration of the current state when
calculating the probability of transition to another state. This modification
scored a reported sensitivity of 98.8\% and a positive predictivity of
98.6\%.\\
Building on previous work using HMMs, Springer et.\ al presents a segmentation
Building on previous work using HMMs, Springer et.\ al presented a segmentation
algorithm by using hidden semi-markov models (HSMMs) in combination with
logistic regression~\citeyearpar{Springer2016}. Use of Hidden semi markov model
allows for a priori information on the duration of the current state to be used
@@ -305,46 +327,46 @@ Gupta et.\ al \citeyearpar{Gupta2007} & Homomorphic filtering, $k$-means clus
\doublespacing
\subsection{Feature extraction/Classification Models}\label{Classification}
\subsection{Feature extraction/Classification models}\label{Classification}
A wide variety of methods exist for the extraction of statistical features and
classification of PCG data. Most notably, the recent Physionet/Computing in
Cardiology (CinC) Challenge 2016 has prompted the development of a range of methods
that have improved the quality of abnormality classification in noisy signals.
The challenge was assembled to provide researchers with a large database of PCG
signals of varying quality. This enabled the development of algorithms that
could be evaluated on a significant database, in order to determine performance
across a range of conditions/signal qualities~\parencite{Clifford2016}. This
section first details significant work produced prior to the challenge, and
then highlights key works produced for the challenge to outline the breadth of
methods for robust heart sound analysis.
Cardiology (CinC) Challenge 2016 has prompted the development of a range of
methods that have improved the quality of abnormality classification in noisy
signals. The challenge was assembled to provide researchers with a large
database of normal/pathological PCG signals of varying quality. This enabled
the development of algorithms that could be evaluated on a significant
database, in order to determine performance across a range of conditions/signal
qualities~\parencite{Clifford2016}. This section first details significant work
produced prior to the challenge, then highlights key works produced for the
challenge to outline the breadth of methods for robust heart sound analysis.
\subsubsection{Work prior to the Physionet challenge}
Work prior to the Physionet challenge was conducted predominantly with the aim
of classifying specific heart conditions. Until recently, little research had
been produced with regards to general abnormality detection, with many projects
choosing to focus on specific conditions such as murmurs, atrial fibrillation
and flutter, and heart valve disease. This section outlines some key research
choosing to focus on specific conditions such as murmurs, atrial fibrillation,
flutter, and heart valve disease. This section outlines some key research
into these areas, alongside initial research into general abnormality
detection.\\
Reed et.\ al implement a simple general classification algorithm using artificial
Reed et.\ al implemented a simple general classification algorithm using artificial
neural networks (ANNs) and wavelet decomposition~\citeyearpar{Reed2004}. As
initial work into this field, preprocessing such as segmentation is not
performed and features remain relatively simple when compared to more recent
methods. Also, due to the comparitively small sample size used for training (1
patient per abnormality, 4 cycles per patient), a reported accuracy of 100\%
would likely generalise poorly. Thsi does however, serve as an early example of
would likely generalise poorly. This does however, serve as an early example of
limited success in general heart sound classification.\\
Maglogiannis et.\ al present a classifier for discrimination of heart valve
Maglogiannis et.\ al presented a classifier for discrimination of heart valve
disease from regular heart sounds using an SVM (Support Vector Machine)
classifier~\citeyearpar{Maglogiannis2009}. Roughly 100 features were extracted
from the signal, based on direct analysis of each heart cycle component (S1,
Systole, S2, Diastole) and the average shannon energy envelope of these
components. A database of 198 heart sounds was curated for the project,
acquired from 8 sources, such as medical CDs and pre-existing databases. An
accuracy of 91.43\% is reported using 10-fold stratified cross-validation. In
accuracy of 91.43\% was reported using 10-fold stratified cross-validation. In
addition, the project aimed to classify individual abnormalities in a 3 step
process, by distinguishing between systolic or diastolic murmurs, and then
distinguishing between aortic or mitral diseases. The classifier achieved
@@ -353,10 +375,10 @@ the potential for a system to accurately distinguish between normal and
abnormal heart sounds in a generalisable way, given carefully selected
features.\\
Ari et.\ al also propose an SVM based method for abnormality
Ari et.\ al also proposed an SVM based method for abnormality
classification~\citeyearpar{Ari2010}. A modified Least-squares SVM (LSSVM) is
used in order to improve separability between normal and abnormal datapoints
during training. 32 wavelet based features from previous literature are use as
during training. 32 wavelet based features from previous literature are used as
feature vectors for a modified LSSVM, un-modified LSSVM and a standard SVM.
Comparison of the system shows that the proposed technique performs
significantly better on all test sets with an accuracy of between 86\% and
@@ -375,17 +397,20 @@ of of 22 patients, 6 of which were labeled as having a systolic murmur. The
highest reported accuracy was achieved using MFCCs as the primary feature
vector achieving a 98\% accuracy on 10-fold cross validation.\\
Schmidt et.\ al aim to find features that can be used for classification of
Schmidt et.\ al aimed to find features that could be used for classification of
coronary artery disease through detection of small
murmurs~\citeyearpar{Schmidt2015}. A large number of features are then
calculated to provide vectors for classification. Parametric spectral features
murmurs~\citeyearpar{Schmidt2015}. A large number of features are
calculated to provide vectors for classification; Parametric spectral features
such as ARMA are used, alongside instantaneous frequency and octave power
measurements. These are combined with complexity features such as sample
entropy and simplicity. Complexity features are chosen in an attempt to exploit
the likely stochastic nature of murmurs, when compared to normal heart sounds.
Given the large number of features calculated, PCA is used to retain only the
most relevant information. Quadratic discriminant analysis (QDA) is then used
as a classifier to provide a final accuracy score of 73\%.\\
measurements. Complexity features such as sample entropy and simplicity are
also calculated in an attempt to exploit the likely stochastic nature of
murmurs, when compared to normal heart sounds. Given the large number of
features calculated, PCA is used to retain only the most relevant information.
Quadratic discriminant analysis (QDA) is then used as a classifier to provide a
final accuracy score of 73\%.\\
An overview of significant research prior to the Physionet challenge is
provided in table~\ref{SumPrior}.
\newgeometry{margin=1cm} % modify this if you need even more space
@@ -395,6 +420,7 @@ as a classifier to provide a final accuracy score of 73\%.\\
\scriptsize
%\centering
\rowcolors{1}{gray!15}{white}
\label{SumPrior}
\doublespacing
\begin{tabulary}{\linewidth}{LLLLLL}
\dtoprule
@@ -417,8 +443,8 @@ Reed et.~al \citeyearpar{Reed2004} & ---
The 2016 Physionet/CinC Challenge aimed to encourage development of heart
abnormality detection algorithms by providing a large open database of PCG
signal recordings, sourced from a variety of both clinical and non-clinical
environments. (Further details on the provided database are provided in
section~\ref{Dataset} and it is described in full by Liu et.\
environments. (Further details on the database can be found in
section~\ref{Dataset}. The complete specification is presented by Liu et.\
al~\citeyearpar{Liu2016}). In addition, participants were provided with a
state-of-the-art heart sound segmentation algorithm, as proposed by Springer
et.\ al in Section~\ref{Segmentation}. Participants were then tasked with the
@@ -427,7 +453,7 @@ healthy and unhealthy heart sound samples. The challenge recieved 348 entries
in total, each of which was scored on a hidden test dataset
using a Modified accuracy measure ($MAcc$) as defined by Clifford et.
al~\citeyearpar{Clifford2016}:
\begin{table}[H]
\begin{table}[htbp]
\centering
\caption{Output Classification}
\label{OutputClassification}
@@ -464,25 +490,25 @@ Modified sensitivity ($Se$), specificity ($Sp$) and overall accuracy ($MAcc$) ar
\end{align*}
This section summarises some of the key works presented for the challenge,
including the some of the most accurate models, and a baseline classifier
including some of the most accurate models, and a baseline classifier
provided to participants as a starting point.\\
A simple baseline classifier was provided to participants, in order to
A simple baseline classifier was provided to participants in order to
demonstrate the basic structure of systems expected for
entries~\parencite{Liu2016}. The classifier extracted a selection of 20 basic
features primarily focused on relative timings and amplitudes of heart sounds.
features, primarily focused on relative timings and amplitudes of heart sounds.
A binary logistic regression model is chosen for classification. From the 20
extracted features, 13 were selected based on their statistical significance
(measured using foreward liklihood ratio selection). The system achieved a
reported score of 66\% on the test set, giving a baseline score for challengers to build on.
In addition, the system was trained using leave-one-out cross validation. By
removing a single training database on each fold, the generalisation of the algorithm
trained on all other databases could then be evaluated. Results showed that
performance decreased significantly when training via this method, giving an
average accuracy of 59\%, with Training database $b$ scoring as low as 47\%.
This could suggest that individual databases in the dataset are not sufficiently
represented by other databases, or that features do not model abnormalities
sufficiently.\\
extracted features, 13 were selected based on their statistical significance,
measured using foreward liklihood ratio selection. The system achieved a
reported score of 66\% on the test set, giving a baseline score for participants
to build on. In addition, the system was trained using leave-one-out cross
validation. By removing a single training database on each fold, the
generalisation of the algorithm trained on all other databases could then be
evaluated. Results showed that performance decreased significantly when
training via this method, giving an average accuracy of 59\%, with training
database $b$ scoring as low as 47\%. This could suggest that individual
databases in the dataset are not sufficiently represented by other databases,
or that features do not model abnormalities sufficiently.\\
Homsi et.\ al proposed a system that utilised 131 time domain, STFT based and
wavelet based features, combined with nested ensemble classifiers to produce an
@@ -497,7 +523,7 @@ experimentation.\\
% TODO: Read into accuracy results for this method more closely
Potes et.\ al present a similar approach to that of Homsi et.\
al~\citeyearpar{Potes2016}. 124 similar time-frequency ($t-f$) features are extracted
al~\citeyearpar{Potes2016}. 124 similar TFR features are extracted
and used as vectors for an AdaBoost classifier. This was combined with a deep
learning approach using a Convolutional Neural Network (CNN) classifier. The
signal was decomposed into 4 frequency bands and segmented, to provide input to
@@ -535,34 +561,36 @@ out cross validation for a clearer understanding of the generalisation of the
algorithm, as well as highlighting issues with the underlying dataset that are
discussed in Section~\ref{Dataset}
- Large number of features, tensor based feature reduction and
K-NN~\parencite{Bobillo2016}
- Convolutional neural networks, MFCCs~\parencite{Rubin2016}
\newgeometry{margin=1cm} % modify this if you need even more space
\begin{landscape}
\begin{table}[H]
\captionof{table}{Summary of Physionet Challenge 2016 entries} \label{PriorWorkTable}
\captionof{table}{Summary of top 10 Physionet Challenge 2016 entries} \label{PriorWorkTable}
\scriptsize
%\centering
\rowcolors{1}{gray!15}{white}
\doublespacing
\begin{tabulary}{\linewidth}{CCCCC}
\dtoprule
Author & Features & Classification Method & Reported Scores & Challenge Score \\ \midrule
Potes et.~al & 124 $t-f$ features & Combined AdaBoost/ANN & & 86.02\% \\
Zabihi et.~al & 40 temporal, spectral and $t-f$ features, reduced using SFS and LPC & 2 ensembles of neural networks & & 85.90\% \\
Kay et.~al & CWT, MFCCs, complexity measures, Inter-beat features, PCA & ANNs & & 85.20\% \\
Bobillo & MFCCs and WPD, reduced using tensor decomposition & $k$-NN & & 84.54\% \\
Homsi et.~al & 131 time domain, STFT based andwavelet based features & Combined ensembles of LogitBoost, Random Forrest and CSC & & 84.48\% \\
Maknickas et.~al & & & & 84.15\% \\
Plesinger et.~al & & & & 84.11\% \\
Rubin et.~al & & & & 83.99\% \\
Author & Features & Classification Method & Reported Scores & Challenge Score \\ \midrule
Potes et.~al \citeyearpar{Potes2016} & 124 TFR features & Combined AdaBoost/ANN & In-house test set accuracy: AdaBoost-abstain: 79\%, CNN: 82\%, Combined classifiers: 85\% & 86.02\% \\
Zabihi et.~al \citeyearpar{Zabihi2016} & 40 temporal, spectral and TFR features, reduced using SFS and LPC & 2 ensembles of neural networks & Training accuracy: Maximum of 91.50\% & 85.90\% \\
Kay et.~al \citeyearpar{Kay2017} & CWT, MFCCs, complexity measures, Inter-beat features, PCA & ANNs & A range of cross validation based tests were used to analyse performance. See paper for full details & 85.20\% \\
Bobillo \citeyearpar{Bobillo2016} & MFCCs and WPD, reduced using tensor decomposition & $k$-NN & A range of cross validation based tests were used to analyse performance. See paper for full details & 84.54\% \\
Homsi et.~al \citeyearpar{Homsi2017} & 131 time domain, STFT based andwavelet based features & Combined ensembles of LogitBoost, Random Forrest and CSC & Training accuracy 87.7\%, In-house test accuracy: 93.24\% & 84.48\% \\
Maknickas et.~al \citeyearpar{Maknikas2017} & MFCCs, reduced by KarhunenLoeve transform & Deep Neural Network & Training accuracy 99.7\%, Validation accuracy 95.2\% & 84.15\% \\
Plesinger et.~al \citeyearpar{Plesinger2017} & Statistical and symettry properties of amplitude envelopes for S1 and S2 sounds & Custom probability assesment machine learning algorithm & Training accuracy 90.3\% & 84.11\% \\
Rubin et.~al \citeyearpar{Rubin2016} & MFCCs & Convolutional neural networks & -- & 83.99\% \\
Jiayu (paper not submitted) & -- & -- & -- & 82.82\% \\
Abdollahpur et.~al \citeyearpar{Abdolahpur2017} & time, TFR and perceptual features, reduced using Fisher's discriminant analysis & Combined ANNs & Training accuracy: 91.6\%, 87\%, 84.55\% (prior to ANN combination method) & 82.63\%\\
\dbottomrule\\
% TODO: Add footnote explanation for Ac = Accuracy
% TODO: Add citeyearpar references to authors
\end{tabulary}
\end{table}
\end{landscape}
\restoregeometry
% TODO: Insert table of previous research methods, datasets and results