This commit is contained in:
Sam Perry
2016-11-08 23:23:27 +00:00
parent 2cefa6a95b
commit fd791ce17c
+48 -9
View File
@@ -173,7 +173,7 @@ end
The final values for $\theta$ were:\\
$\theta = \big[ 340412.659574468, 110631.050278846, -6649.4742708198 \big]$\\
It is suprising that the $\theta$ value relating to the number of rooms is
It is surprising that the $\theta$ value relating to the number of rooms is
negative, suggesting that a house of a certain size tends to be worth less if
it has a high quantity of rooms when compared to a house of similar size that
has less.
@@ -355,7 +355,7 @@ good and bad generalization.}
The training error is the error produced when training the model on the
training data. The test error is the error when applying the model created
using the training data to the test data.
The training set generalizes best when the set closely ressembles the test
The training set generalizes best when the set closely resembles the test
set's shape. This produces a function that will perform well on the test set
and in theory on any new data. In this project, the best trained functions are
created from data that is spread most equally in the training set, as this
@@ -367,7 +367,7 @@ seen in figures~\ref{6train} and~\ref{6test}.
\caption{Function generated on training data (error: 0.20109)}
\makebox[\textwidth]{\includegraphics[width=1\textwidth]{graph4a}}
\label{4train}
\caption{Function plotted agains test data (error: 0.49358)}
\caption{Function plotted against test data (error: 0.49358)}
\makebox[\textwidth]{\includegraphics[width=1\textwidth]{graph4c}}
\label{4test}
\end{figure}
@@ -375,7 +375,7 @@ seen in figures~\ref{6train} and~\ref{6test}.
\caption{Function generated on training data (error: 0.18535)}
\makebox[\textwidth]{\includegraphics[width=1\textwidth]{graph6a}}
\label{6train}
\caption{Function plotted agains test data (error: 0.82162)}
\caption{Function plotted against test data (error: 0.82162)}
\makebox[\textwidth]{\includegraphics[width=1\textwidth]{graph6c}}
\label{6test}
\end{figure}
@@ -395,7 +395,8 @@ A rise in error over iterations as the cost in training set decreases suggests
that the training set bares little resemblance to the test set. This is shown
clearly in~\ref{costTestTrain60}, where a large number of test data points
minimizes the cost over the majority of the dataset, however this does not
result in a good fit over the few remaining points used for testing.
result in a good fit over the few remaining points used for testing. This is
because overfitting around the training points has occurred.
\begin{figure}
@@ -428,10 +429,10 @@ result in a good fit over the few remaining points used for testing.
\subsection{Explain why a logistic regression unit cannot solve the XOR
classification problem}
The XOR classification problem cannot be solved by logistic regression because
it is ``Linearly inseperable''. This is to say that it is impossible to
seperate classes in the decision space through use of a single line (as is used
it is ``Linearly inseparable''. This is to say that it is impossible to
separate classes in the decision space through use of a single line (as is used
in logistic regression). This can be clearly demonstrated by attempting to
seperate the two classes in figure~\ref{XOR} through use of a single line (it is not
separate the two classes in figure~\ref{XOR} through use of a single line (it is not
possible)
\begin{figure}
@@ -576,5 +577,43 @@ h_\Theta(x) &= \begin{bmatrix}
\end{bmatrix}.
\end{align}
% \printbibliography
\subsection{The Iris data set contains three different classes of data that we
need to discriminate between. How would this be accomplished if we used a
logistic regression unit? How is it different using a neural network?}
One method for using logistic regression for multi-class classification is the
``One-vs-all'' method. This works by training a classifier for each class
individually. The result is that multiple classifiers (in this case 3) are used
on any new input data to be classified. The data is then classified based on
the classifier that returns the highest probability that the input is of the
class associated with that classifier.~\parencite{ng2014}\\
This method contrasts the neural network approach to classification as a single
neural network is capable of being trained to differentiating between multiple classes
simultaneously. During training, all paths to outputs are updated on each
iteration to create a model that fits all outputs, rather than training each
classifier in isolation.
\subsection{What are the differences for each number of hidden neurons? Which
number do you think is the best to use? How well do you think that we have
generalized?}
Error is higher for the lower numbers of neurons. The lack of complexity in the
model results in a function that fits poorly to both the training and test set.
As neurons increase, the error in training decreases dramatically, however a
similar decrease is not seen in the test error. This is due to overfitting as
the model fits test data very well but does not fit the test data to this
degree. As a result, generalization is generally poor. The best results were
found when using 7 neurons, and so this would be most likely be the optimal
number of neurons to use.
\begin{table}[H]
\centering
\caption{Test and Training Error for Number of Neurons}
\label{my-label}
\begin{tabular}{lllllll}
No. Neurons & 1 & 2 & 3 & 5 & 7 & 10 \\
Training Error & 26.563 & 31.9923 & 26.5633 & 3.9679 & 0.42502 & 1.2144 \\
Test Error & 30.2045 & 31.5443 & 30.2043 & 13.1716 & 12.5509 & 15.3001
\end{tabular}
\end{table}
\printbibliography
\end{document}