.. _descriptor_defs:

Audio Descriptor Definitions
============================
This section describes the audio descriptors used for analysing chacteristics
of the audio files. Each descriptor is used for measuring a specific
characteristic and multiple descriptors are combined to match grains based on
the amalgamation of these measurements. For example, Using the F0 and RMS
descriptors would match audio based on it's pitch and energy.

Centroid
~~~~~~~~~~~~~~~~~
The temporal centroid is a measure of the center of gravity of a signal. It is
used to determine the central point of a signal's amplitude and is calculated
as:

.. math::
    C(n) = \frac{\sum_{i=i_s(n)}^{i_e(n)}(i-i_s(n)) \cdot x(i)}{\sum_{i=i_s(n)}^{i_e(n)} \cdot x(n)}.

Ref: :cite:`lerch2012itaca`

F0 (Pitch detection)
~~~~~~~~~~~~~~~~~~~~
An important feature of any periodic audio is it's pitch. Pitch is defined as
the perceived frequency of the signal. In order to determine the pitch of a
periodic signal, the fundamental frequency (:math:`f0`) is estimated. There are
many methods developed for estimating the :math:`f0` of a signal. This program
uses the autocorrelation method. This method was chosen for it's simplicity and
reasonable versatility for a wide range of signals.

The f0 is calculated by first calculating the autocorellation of the signal
defined as:

.. math::
    R_n(m) = \sum_{i=i_s(n)}^{i_e(n)} x(i) x(i-m)

then normalizing:

.. math::
    \Gamma_n(m) = \frac{R_n(m)}{\sqrt{\sum_{i=i_s(n)}^{i_e(n)}x(i)^2 \sum_{i=i_s(n)}^{i_e(n)}x(i-m)^2}}.

The fundamental period of the signal is then calculated as the point between
:math:`T_{min}` and :math:`T_{max}` at which the correlated signal most closely matches the
original. :math:`T_{min}` and :math:`T_{max}` are defined as the minimum and maximum values of
the fundamental period.

.. math::
    y = arg\,max_{T_{min} \leq m \leq T_{max}} \{\Gamma_i(m)\}.

In order to improve the accuracy of peak detection, parabolic interpolation is
used to estimate the peak's location with greater accuracy by using the peak
correlation and it's two closest neighbour's values to estimate the fractional
peak value.

The method for parabolic interpolation is defined as:

.. math::
    \Gamma_0^n = \frac{1}{2} \cdot \frac{\alpha - \gamma}{\alpha - 2\beta + \gamma} + y

    &\text{Where:} \\
    &\alpha = \gamma(y-1) \\
    &\beta = \gamma(y) \\
    &\gamma = \gamma(y+1) \\
Ref: :cite:`smith2011sasp`

From this, the fundamental period the frequency is then calculated as:

.. math::
    f_0^n = \frac{1}{T_0^n}.

Ref: :cite:`itaa2014`


FFT
~~~
The FFT algorithm is an optimized algorithm for computing the Short Time
Fourier Transform for windows of a signal. The full description of this
transform is outside the scope of this project, however it should be understood
that this analysis provides a description of the spectral content of a windowed
signal. By applying the transform, a number of bins of size :math:`K` are
calculated that detail the sine and cosine amplitudes required to reconstruct
the signal. The calculation of the STFT is defined as:

.. math::
    X(k,n) = \sum_{i=i_s(n)}^{i_e(n)} x(i) \exp{\Big(-jk \cdot (i -
    i_s(n))\frac{2\pi}{K}\Big)}.

Ref: :cite:`lerch2012itaca`

Harmonic Ratio
~~~~~~~~~~~~~~
The harmonic ratio can be used to differentiate between noisy and periodic
signals. Higher values suggest that the signal is more periodic (such as a sine
wave) and lower values represent less periodicity. This can be used as a form
of confidence measure in determining the validity of F0 values. It is
calculated as part of the F0 estimation algorithm as:

.. math::
    HR(n) = max_{T_{min} \leq m \leq T_{max}}{\{T_n(m)\}}.

Ref: :cite:`lerch2012itaca`

Kurtosis
~~~~~~~~~~~~~~~~~
Temporal kurtosis is used for measuring the flatness of the signal. Lower
values indicate a flatter distribution and positive values indicate a more
"peaky" distribution. Kurtosis is calculated as:

.. math::
    TK(n)=\frac{1}{\sigma_x^4(n) \cdot K}\sum_{i=i_s(n)}^{i_e(n)}\Big(x(i)-\mu_x(n)\Big)^4-3.

Ref: :cite:`lerch2012itaca`

Peak Amplitude
~~~~~~~~~~~~~~
Peak amplitude measures the highest peak in the absolute signal. It is
calculated as:

.. math::
    P(n) = \max_{i_s(n) \leq i \leq i_e(n)}\{\left|x(i)\right|\}.

RMS
~~~
The perceived loudness of a signal is an important feature as it can be related
to the dynamics of the signal.  RMS is used as a measure of sound intensity and
is used for distinguishing between loud and quiet audio. It is calculated as,
where $K$ is the total number of samples:

.. math::
    RMS(n) = \sqrt{\frac{1}{K} \sum_{i=i_s(n)}^{i_e(n)} x(i)^2}.

Other methods that take the human perception of loudness into account may
provide more perceptually relevant results. However the RMS measurement
produced acceptable results for this application.

Ref: :cite:`lerch2012itaca`

Spectral Centroid
~~~~~~~~~~~~~~~~~
The spectral centroid measure the centre of gravity across frequency bins to
determine the central point across the spectral content of the frame. High
values indicate that the spectral content is centred in higher frequencies and
lower value indicate a lower centre. The spectral centroid is calculated as:

.. math::
    SC(n) = \frac{\sum_{k=0}^{K/2-1} k \cdot | X(k,n) | ^2}{\sum_{k=0}^{K/2-1} | X(k,n) | ^2}.

The result is the sum of magnitudes, weighted by their index, normalized by the
unweighted sum.

Ref: :cite:`lerch2012itaca`

Spectral Crest Factor
~~~~~~~~~~~~~~~~~~~~~
The spectral crest factor can be used as a measure of tonalness of the signal.
It is calculated by taking the maximum magnitude and dividing by the sum of
magnitudes.
This differentiates between flat spectrums and sinusoidal spectrums. (low values
representing the former and high values representing the latter.)

.. math::
    SCF = \frac{ \max_{0 \leq k \leq K/2-1} \{| X(k,n) | \}}{\sum_{k=0}^{K/2-1} | X(k,n) | }.

Ref: :cite:`lerch2012itaca`

Spectral Flatness
~~~~~~~~~~~~~~~~~
Defined as the ratio between the geometric and arithmetic mean of the magnitude
spectrum, spectral flatness indicates the noisiness of a signal. Higher values
indicate a flatter spectrum (suggesting a noisy signal) as opposed to lower
values that represent a more tonal signal. Spectral flatness is calculated as:

.. math::
    TFl(n) = \frac{\sqrt[K/2]{\prod_{k=0}^{K/2-1} | X(k,n) | }}{2/K \cdot
    \sum_{k=0}^{K/2-1} | X(k,n) | }.

Ref: :cite:`lerch2012itaca`

Spectral Flux
~~~~~~~~~~~~~
Spectral flux is a measure of change between consecutive frames. It calculates
the average difference between frames to differentiate between adjacent frames
that are largely dissimilar (suggesting a non-stationary section of signal) and
similar frames (that suggests a steady state signal). It is calculated as:

.. math::
    SF(n) = \frac{\sqrt{\sum_{k=0}^{K/2-1} \Big( | X(k,n) | - | X(k,n-1) | \Big)^2
    }}{K/2}.

Ref: :cite:`lerch2012itaca`

Spectral Spread
~~~~~~~~~~~~~~~
Spectral spread is a measurement of the concentration of magnitudes around the
spectral centroid. This description relates to the spectral shape of the signal
and is associated with perceptions of timbre. It is calculated as:

.. math::
    SS(n) = \sqrt{\frac{\sum_{k=0}^{K/2-1} \Big(k-SC(n)\Big)^2 \cdot | X(k,n)
    | ^2}{\sum_{k=0}^{K/2-1} | X(k,n) | ^2}}.

Ref: :cite:`lerch2012itaca`

Variance
~~~~~~~~
The variance of a signal measures it's spread around the signal's arithmetic
mean. It is used in the calculation of Kurtosis and is calculated as:

.. math::
    \sigma_x^2 = \frac{1}{K} \sum_{i=i_s(n)}^{i_e(n)}(x(i) - \mu_x(n))^2.

Ref: :cite:`lerch2012itaca`

Zero-Crossing
~~~~~~~~~~~~~
The zero-crossing rate counts the number of times a signal's value changes from
positive to negative in a frame. It is relevant to determining the noisiness of
a signal, as noisy signals will pass from positive to negative more frequently
than period signals. It is calculated as:

.. math::
    Z(n) = \frac{1}{2K} \sum_{i=i_s(n)}^{i_e(n)} | sgn[x(i)] - sgn[x(i-1)] | 

    \text{Where the sgn function is defined as:}
    
    sgn[x_i(n)] = \left\{
                \begin{array}{ll}
                1, x(i) \geq 0\\
                -1, x(i) < 0
                \end{array}
              \right.

Ref: :cite:`itaa2014`

List of Symbols
~~~~~~~~~~~~~~~

====================  ================================================
Symbol                  Meaning
====================  ================================================
:math:`C`               Centroid
:math:`f`               frequency
:math:`\Gamma`          Normalized autocorrelation
:math:`HR`              Harmonic ratio
:math:`i`               Sample index
:math:`i_e`             End index of frame
:math:`i_s`             Start index of frame
:math:`K`               Size of frame
:math:`m`               Correlation time lag
:math:`\mu_x`           Arithmetic Mean
:math:`n`               Frame index
:math:`P`               Peak amplitude
:math:`R`               Autocorrelation of signal
:math:`RMS`             Root Mean Square
:math:`\sigma_x^2`      Variance
:math:`SC`              Spectral centroid
:math:`SCF`             Spectral crest factor
:math:`SF`              Spectral flux
:math:`SS`              Spectral spread
:math:`TK`              Kurtosis
:math:`TFl`              Spectral flatness
:math:`x`               Audio signal
:math:`X(k,n)`          STFT of current frame
:math:`Z`               Zero-crossing rate
====================  ================================================
.. bibliography:: refs.bib
