Final proof changes
This commit is contained in:
Executable → Regular
+472
-430
@@ -1,430 +1,472 @@
|
|||||||
\documentclass[titlepage]{scrartcl}
|
\documentclass{scrartcl}
|
||||||
\usepackage{enumitem}
|
\usepackage{enumitem}
|
||||||
\usepackage[british]{babel}
|
\usepackage[british]{babel}
|
||||||
\usepackage[style=apa, backend=biber]{biblatex}
|
\usepackage[style=apa, backend=biber, maxnames=99]{biblatex}
|
||||||
\DeclareLanguageMapping{british}{british-apa}
|
\DeclareLanguageMapping{british}{british-apa}
|
||||||
\usepackage{url}
|
\usepackage{filecontents}
|
||||||
\usepackage{float}
|
\usepackage{url}
|
||||||
\restylefloat{table}
|
\usepackage{float}
|
||||||
\usepackage{perpage}
|
\restylefloat{table}
|
||||||
\MakePerPage{footnote}
|
\usepackage{perpage}
|
||||||
\usepackage{abstract}
|
\MakePerPage{footnote}
|
||||||
\usepackage{graphicx}
|
\usepackage{abstract}
|
||||||
% Create hyperlinks in bibliography
|
\usepackage{graphicx}
|
||||||
\usepackage{hyperref}
|
% Create hyperlinks in bibliography
|
||||||
|
\usepackage{hyperref}
|
||||||
\renewcommand{\familydefault}{\sfdefault}
|
|
||||||
\usepackage{fontspec}
|
\renewcommand{\familydefault}{\sfdefault}
|
||||||
\setmainfont{Arial}
|
\usepackage{fontspec}
|
||||||
|
\setmainfont{Arial}
|
||||||
\usepackage{blindtext}
|
|
||||||
\setkomafont{disposition}{\normalfont\fontsize{12}{17}\bfseries}
|
\usepackage{blindtext}
|
||||||
\setkomafont{section}{\normalfont\fontsize{12}{17}\bfseries}
|
\setkomafont{disposition}{\normalfont\fontsize{12}{17}\bfseries}
|
||||||
\setkomafont{subsection}{\normalfont\fontsize{12}{17}\itshape}
|
\setkomafont{section}{\normalfont\fontsize{12}{17}\bfseries}
|
||||||
\setkomafont{subsubsection}{\normalfont\fontsize{12}{17}\itshape}
|
\setkomafont{subsection}{\normalfont\fontsize{12}{17}\itshape}
|
||||||
|
\setkomafont{subsubsection}{\normalfont\fontsize{12}{17}\itshape}
|
||||||
\graphicspath{{./resources/}}
|
|
||||||
\addbibresource{~/Documents/library.bib}
|
\graphicspath{{./resources/}}
|
||||||
|
\addbibresource{~/Documents/library.bib}
|
||||||
\usepackage[affil-it]{authblk}
|
|
||||||
|
% Hack to fix problem with underscores and other special charachters in
|
||||||
% \usepackage{etoolbox}
|
% Mendeley bibliography.
|
||||||
% \makeatletter
|
\DeclareSourcemap{% Used when .bib/Bibliography is compiled, not when document is
|
||||||
% \expandafter\patchcmd\csname\string\maketitle\endcsname
|
\maps{
|
||||||
% {\vskip\z@\@plus3fill}
|
\map{% Replaces '{\_}', '{_}' or '\_' with just '_'
|
||||||
% {\vskip\z@\@plus2fill\box\abstractbox\vskip\z@\@plus1fill}
|
\step[fieldsource=url,
|
||||||
% {}{}
|
match=\regexp{\{\\\_\}|\{\_\}|\\\_},
|
||||||
% \makeatother
|
replace=\regexp{\_}]
|
||||||
%
|
}
|
||||||
|
\map{% Replaces '{'$\sim$'}', '$\sim$' or '{~}' with just '~'
|
||||||
\DeclareCiteCommand{\citeyearpar}
|
\step[fieldsource=url,
|
||||||
{}
|
match=\regexp{\{\$\\sim\$\}|\{\~\}|\$\\sim\$},
|
||||||
{\mkbibparens{\bibhyperref{\printdate}}}
|
replace=\regexp{\~}]
|
||||||
{\multicitedelim}
|
}
|
||||||
{}
|
}
|
||||||
|
}
|
||||||
\begin{document}
|
|
||||||
\title{Descriptor Driven Concatenative Synthesis Tool for Python}
|
\usepackage[affil-it]{authblk}
|
||||||
% \subtitle{\LARGE{Abstract Draft}}
|
|
||||||
\author{S. Perry\thanks{E-mail: \texttt{\href{mailto:u1265119@unimail.hud.ac.uk}{u1265119@unimail.hud.ac.uk}}}}
|
% \usepackage{etoolbox}
|
||||||
\date{Dated: \today}
|
% \makeatletter
|
||||||
|
% \expandafter\patchcmd\csname\string\maketitle\endcsname
|
||||||
\maketitle
|
% {\vskip\z@\@plus3fill}
|
||||||
|
% {\vskip\z@\@plus2fill\box\abstractbox\vskip\z@\@plus1fill}
|
||||||
\begin{abstract}
|
% {}{}
|
||||||
A command-line tool and Python framework is proposed for the exploration of
|
% \makeatother
|
||||||
a new form of audio synthesis known as ``concatenative synthesis'': A form
|
%
|
||||||
of synthesis that uses perceptual audio analyses to arrange small segments
|
|
||||||
of audio based on their characteristics. The tool is designed to
|
\DeclareCiteCommand{\citeyearpar}
|
||||||
synthesise representations of an input sound using a database of source
|
{}
|
||||||
sounds. This involves the segmentation and analysis of both the input sound
|
{\mkbibparens{\bibhyperref{\printdate}}}
|
||||||
and database, matching of input segments to their closest segment from the
|
{\multicitedelim}
|
||||||
database, and the re-synthesis of the closest matches to produce the final
|
{}
|
||||||
result. The project aims to provide a tool capable of generating high
|
\newenvironment{keywords}%
|
||||||
quality sonic representations of an input, to present a variety of examples
|
{\begin{trivlist}\item[]{\bfseries\sffamily Keywords:}\ }%
|
||||||
that demonstrated the breadth of possibilities that this style of synthesis
|
{\end{trivlist}}
|
||||||
has to offer and to provide a robust framework on which concatenative
|
|
||||||
synthesis projects can be developed easily.\\
|
\begin{document}
|
||||||
|
\section*{Descriptor driven concatenative synthesis tool for Python}
|
||||||
Results demonstrate the wide variety of sounds that can be produced using
|
Sam Perry\\
|
||||||
this method of synthesis. A number of technical issues are outlined that
|
E-mail: \href{mailto:u1265119@unimail.hud.ac.uk}{u1265119@unimail.hud.ac.uk}
|
||||||
impeded the overall quality of results and efficiency of the software.
|
\section*{Abstract}
|
||||||
However, the project clearly demonstrates the strong potential for this
|
A command-line tool and Python framework is proposed for the exploration of
|
||||||
type synthesis to be used for creative purposes.
|
a new form of audio synthesis known as `concatenative synthesis', a form
|
||||||
\end{abstract}
|
of synthesis that uses perceptual audio analyses to arrange small segments
|
||||||
|
of audio based on their characteristics. The tool is designed to
|
||||||
\section*{Background}
|
synthesise representations of an input target sound using a source database
|
||||||
The concept of constructing a new sound by arranging collections of smaller
|
of sounds. This involves the segmentation and analysis of both the input
|
||||||
sounds has gained popularity in the past 30 years through the introduction
|
sound and database, the matching of input segments to their closest segment
|
||||||
of ``Granular Synthesis''. Granular synthesis works on the theory that any
|
from the database, and the re-synthesis of the closest matches to produce
|
||||||
sound can be described through the arrangement of smaller samples (referred
|
the final result.\\
|
||||||
to as ``grains''). This representation of sound allows for the temporal
|
The project aims to provide a tool capable of generating high-quality
|
||||||
decomposition and re-arranging of real-world samples, with the potential to
|
sonic representations of an input, to present a variety of examples that
|
||||||
create new ``complex, dynamically-evolving
|
demonstrated the breadth of possibilities that this style of synthesis has
|
||||||
sounds.''~\parencite[p.1]{Roads1988}\\
|
to offer and to provide a robust framework on which concatenative synthesis
|
||||||
|
projects can be developed easily. The purpose of this project was primarily
|
||||||
Concatenative synthesis (CS) is a form of synthesis that has developed
|
to highlight the potential for further development in the area of
|
||||||
significantly over the past 15 years, driven by recent advancements in
|
concatenative synthesis, and to provide a simple and intuitive tool that
|
||||||
technology. Key advancements have been in easy access to large databases of
|
could be used by composers for sound design and experimentation. The
|
||||||
audio and the development of methods for extracting useful information from
|
breadth of possibilities for creating new sounds offered by this method of
|
||||||
these databases automatically~\parencite[p.1]{Schwarz2006a}. CS utilises
|
synthesis makes it ideal for digital sound design and electroacoustic
|
||||||
these technologies to provide a content-based extension to granular
|
composition.\\
|
||||||
synthesis; by analysing a database of source grains, grains can be
|
Results demonstrate the wide variety of sounds that can be produced using
|
||||||
differentiated based on their characteristics. These characteristics can
|
this method of synthesis. A number of technical issues are outlined that
|
||||||
then be used for grain selection in the process of synthesizing output for
|
impeded the overall quality of results and efficiency of the software.
|
||||||
a wide range of applications~\parencite[p.102]{Schwarz2007}.
|
However, the project clearly demonstrates the strong potential for this
|
||||||
|
type of synthesis to be used for creative purposes.
|
||||||
\section*{Related Works}
|
\begin{keywords}
|
||||||
A number of programs utilize CS to achieve various goals. The process has
|
Concatenative synthesis; Python; audio descriptor; audio analysis; command line tool; Python framework; Python sound;
|
||||||
been used for applications in areas such as speech synthesis, instrument
|
\end{keywords}
|
||||||
synthesis and for applications in creative sound design.\\
|
|
||||||
The wide range of applications demonstrates the versatility of this
|
\section*{Acknowledgments}
|
||||||
synthesis technique. It differs from traditional synthesis methods through
|
I would like to thank A Harker for his advice and guidance as a mentor
|
||||||
the use of real recorded samples, as opposed to traditional methods that
|
throughout the project, and A Harker and P Chen for access to their
|
||||||
focus on defining sets of rules for emulating real sounds. By transforming
|
vocal samples database. Thanks also to D Chaplin for his creative input
|
||||||
samples that have been directly recorded from a source, the subtle nuances
|
in generating results.
|
||||||
of the source's sound are preserved. These would be difficult to reproduce
|
\pagebreak
|
||||||
using other synthetic methods for modelling an
|
|
||||||
instrument~\parencite[p.24]{Maestre2009}.
|
\section*{Background}
|
||||||
|
The concept of constructing a new sound by arranging collections of smaller
|
||||||
\subsection*{Speech Synthesis}
|
sounds has gained popularity in the past 30 years through the introduction
|
||||||
Creating a natural and intelligible realisation is an important factor when
|
of granular synthesis, which works on the theory that any sound can be
|
||||||
developing a speech synthesis system. The Talkapillar project is one such
|
described through the arrangement of smaller samples (referred to as
|
||||||
example of how highly convincing results are possible with CS. Through
|
`grains'). This representation of sound allows for the temporal
|
||||||
careful analysis of a vocal database, the project aims to impose the
|
decomposition and re-arranging of real-world samples, with the potential to
|
||||||
qualities of the database voice on an input voice. This would result in the
|
create new `complex, dynamically-evolving
|
||||||
words of the input speaker being transformed to appear as if they were
|
sounds'~\parencite[p.1]{Roads1988}.\\
|
||||||
spoken by the voice in the database.~\parencite{Hueber}
|
|
||||||
|
Concatenative synthesis (CS) is a form of synthesis that has developed
|
||||||
\subsection*{Instrument Synthesis}
|
significantly over the past 15 years, driven by recent advancements in
|
||||||
Progress has also been made in improving the quality of instrumental
|
technology. The key advancements have been in ease of access to large databases of
|
||||||
synthesis. As with speech synthesis, the use of samples directly allows for
|
audio and the development of methods for extracting useful information from
|
||||||
natural sounding results, which provides a method for reproducing real
|
these databases automatically~\parencite[p. 1]{Schwarz2006a}. CS utilises
|
||||||
instruments convincingly. Another important aspect of instrumental synthesis is that of performer
|
these technologies to provide a content-based extension to granular
|
||||||
expression. The reproduction of performance qualities such as dynamics,
|
synthesis; analysis of a database of source grains enable them to be
|
||||||
timbre and timing are essential when emulating a real instrument and CS has
|
differentiated based on their characteristics. These characteristics can
|
||||||
been used to effectively reproduce these aspects. This is achieved through
|
then be used for grain selection in the process of synthesising output for
|
||||||
splicing of grains based on their expressive characteristics to form
|
a wide range of applications~\parencite[p. 102]{Schwarz2007}.
|
||||||
musical phrases. For example, just as a violinist might transition
|
|
||||||
seamlessly from one articulation to the next, the CS software will join
|
\section*{Related works}
|
||||||
grains to produce the variation in articulations. This contrasts the
|
A number of programs utilise CS to achieve various goals. The process has
|
||||||
traditional approach to sampling, where samples are played in isolation,
|
been used for applications in areas such as speech synthesis, instrument
|
||||||
resulting in a discontinuity between adjacent samples~\parencite[p.82]{Lindemann2007}.
|
synthesis and creative sound design.\\
|
||||||
The Catapillar project is one such example of this use of CS.
|
The wide range of applications demonstrates the versatility of this
|
||||||
By using a Viterbi algorithm, the project is able to calculate the
|
synthesis technique. It differs from traditional synthesis methods as it
|
||||||
smoothest overall transition between grains across the output, resulting
|
uses real recorded samples, as opposed to traditional methods that focus on
|
||||||
in convincing synthesis of orchestral instrument performances~\parencite[p.5]{Schwarz2003}.
|
defining sets of rules for emulating real sounds. By transforming samples
|
||||||
|
that have been directly recorded from a source, the subtle nuances of the
|
||||||
\subsection*{Creative Sound Design}
|
source's sound are preserved. These would be difficult to reproduce using
|
||||||
The flexibility of CS allows for creativity in a broader context than simply
|
other synthetic methods for modelling an
|
||||||
emulating real-world instruments and speech. It can also be used as a tool
|
instrument~\parencite[p. 24]{Maestre2009}.
|
||||||
to explore the possibilities for synthesizing new abstract sounds for
|
|
||||||
creative purposes.\\
|
\subsection*{Speech synthesis}
|
||||||
A prominent project in this area of CS is IRCAM's CataRT
|
Creating a natural and intelligible realisation is an important factor when
|
||||||
project~\parencite{Schwarz2006a}. The project focuses on the playback of
|
developing a speech-synthesis system. The Talkapillar project is one such
|
||||||
source grains based on their proximity to a target in multi-dimensional
|
example of how highly convincing results are possible with CS. Through
|
||||||
descriptor space. By providing a target point in the descriptor space, the
|
careful analysis of a vocal database, the project aims to impose the
|
||||||
user is able to navigate the database, playing selections of samples that
|
qualities of the database voice on an input voice. This would result in the
|
||||||
are nearest to the target. This allows the user to explore the database
|
words of the input speaker being transformed to appear as if they were
|
||||||
intuitively through a graphic user interface, selecting a point in
|
spoken by the voice in the database.~\parencite{Hueber}
|
||||||
2-dimensional space with the mouse. Grains are then played back in
|
|
||||||
real-time to create an ``audio mosaic''.\\
|
\subsection*{Instrument Synthesis}
|
||||||
Alternatively, target audio can be provided and analysed to create a target
|
Progress has also been made in improving the quality of instrumental
|
||||||
location based on it's location in the descriptor space. Tremblay and
|
synthesis. As with speech synthesis, the use of samples directly allows for
|
||||||
Schwarz's~\citeyearpar{Tremblay2010} use of CataRT to explore
|
natural-sounding results, which provides a method for reproducing real
|
||||||
electroacoustic sample banks demonstrates the creative potential of this
|
instruments convincingly. Another important aspect of instrumental synthesis is that of performer
|
||||||
method. CS is used in this context as a means for synthesizing matches in a
|
expression. The reproduction of performance qualities such as dynamics,
|
||||||
corpus database to real-time input from an electric bass. Significance is
|
timbre and timing is essential when emulating a real instrument and CS has
|
||||||
placed on linking the playback of grains to the expressivity of the
|
been used to effectively reproduce these aspects. This is achieved through
|
||||||
performer. The use of perceptually based audio descriptors to match the
|
splicing of grains based on their expressive characteristics to form
|
||||||
source to the target allows the performer to navigate the database
|
musical phrases. For example, just as a violinist might transition
|
||||||
naturally based on factors such as the pitch and timbre of the bass
|
seamlessly from one articulation to the next, the CS software will join
|
||||||
guitar. The result is a performance that mixes characteristics of both the
|
grains to produce the variation in articulations. This contrasts with the
|
||||||
bass guitar output and the qualities of the corpus database to create a
|
traditional approach to sampling, where samples are played in isolation,
|
||||||
hybrid of the two.\\
|
resulting in a discontinuity between adjacent samples~\parencite[p. 82]{Lindemann2007}.
|
||||||
|
The Catapillar project is one such example of this use of CS.
|
||||||
This is by no means an exhaustive overview of the projects and techniques
|
By using a Viterbi algorithm, the project is able to calculate the
|
||||||
that explore the vast possibilities of CS. For further information, please
|
smoothest overall transition between grains across the output, resulting
|
||||||
refer to: ``Concatenative Synthesis - The Early
|
in convincing synthesis of orchestral instrument performances~\parencite[p. 5]{Schwarz2003}.
|
||||||
Years''~\parencite{Schwarz2006b}
|
|
||||||
|
\subsection*{Creative sound design}
|
||||||
\section*{Concatenator}
|
The flexibility of CS allows for creativity in a broader context than simply
|
||||||
The concatenator project aims to provide an open source set of tools that
|
emulating real-world instruments and speech. It can also be used as a tool
|
||||||
allows composers to generate a variety of CS driven realisations for
|
to explore the possibilities for synthesising new abstract sounds for
|
||||||
sound design purposes. In addition, the project aims to provide an
|
creative purposes.\\
|
||||||
intuitive API that Python programmers might use as the fundamental building
|
A prominent project in this area of CS is IRCAM's CataRT
|
||||||
blocks to build further concatenative synthesis applications on.
|
project~\parencite{Schwarz2006a}. The project focuses on the playback of
|
||||||
The result is a framework and command-line interface, built in Python, for
|
source grains based on their proximity to a target in multi-dimensional
|
||||||
easy access to basic CS techniques.
|
descriptor space. Providing a target point in the descriptor space enable the
|
||||||
The current implementation can be used for the concatenation of a source
|
user to navigate the database, playing selections of samples that
|
||||||
database onto target audio files, using a range of perceptual audio
|
are nearest to the target. This allows the user to explore the database
|
||||||
descriptors for matching. Database management, simple matching and
|
intuitively through a graphic user interface, selecting a point in
|
||||||
synthesis algorithms are used to achieve this, and are described in the
|
2-dimensional space with the mouse. Grains are then played back in
|
||||||
following sections.
|
real-time to create an `audio mosaic'.\\
|
||||||
|
Alternatively, target audio can be provided and analysed to create a target
|
||||||
\section*{Program Design and Implementation}
|
location based on it's location in the descriptor space. Tremblay and
|
||||||
The Concatenator project consists of a number of components that work
|
Schwarz's~\citeyearpar{Tremblay2010} use of CataRT to explore
|
||||||
together to produce the final output. A complete description of all
|
electroacoustic sample banks demonstrates the creative potential of this
|
||||||
components and there usage in the concatenator project can be found in it's
|
method. CS is used in this context as a means of synthesising matches in a
|
||||||
complete documentation at:\\
|
corpus database to real-time input from an electric bass. Significance is
|
||||||
|
placed on linking the playback of grains to the expressively of the
|
||||||
*PERMANENT URL FOR DOCUMENTATION NEEDED*\\
|
performer. The use of perceptually based audio descriptors to match the
|
||||||
|
source to the target allows the performer to navigate the database
|
||||||
Output is generated by analysing overlapping segments of audio (known as
|
naturally based on factors such as the pitch and timbre of the bass
|
||||||
grains) from both the target sound and the source database, then searching
|
guitar. The result is a performance that mixes characteristics of both the
|
||||||
for the closest matching grain in the source database to the target sound.
|
bass guitar output and the qualities of the corpus database to create a
|
||||||
Finally, the output is generated by applying a hanning window and
|
hybrid of the two.\\
|
||||||
overlap-adding the best matches. Each component will be discussed in detail
|
This is by no means an exhaustive overview of the projects and techniques
|
||||||
in the following sections.\\
|
that explore the vast possibilities of CS. Further information can be found
|
||||||
|
in the article by~\parencite{Schwarz2006b}
|
||||||
When designing the concatenator framework, ease of development, use and
|
\pagebreak
|
||||||
extensibility were primary considerations. It was for these reasons that
|
|
||||||
the framework was written in the Python programming language. Python has
|
\section*{Concatenator}
|
||||||
grown in popularity in the scientific community recently, primarily due to
|
The Concatenator project aims to provide an open source tool that allows
|
||||||
it's focus on productivity, readability and the large number of efficient
|
composers to generate a variety of CS driven realisations for sound design
|
||||||
numeric processing libraries available (Numpy, SciPy, Scikitlearn
|
purposes. In addition, the project aims to provide an intuitive API that
|
||||||
etc...)~\parencite[p.11]{Fangohr2014}. This makes Python a good choice for
|
Python programmers might use as the fundamental building blocks on which to
|
||||||
quickly developing ideas in the context of audio signal processing.
|
build further CS applications. The result is a framework and command-line
|
||||||
Unfortunately, the language does sacrifice processing speed for simplicity
|
interface, built in Python, for easy access to basic CS techniques. All
|
||||||
and as a result is not suitable for real-time signal processing. Other
|
relevant material including source code, results, and documentation can be
|
||||||
performance focused languages such as C++ are better suited to this type of
|
found in the official online project repository~\parencite{perry2016a}.
|
||||||
processing. However, it was decided that the increase in productivity, lack
|
The current implementation can be used for the concatenation of a source
|
||||||
of prior CS research in Python and the author's previous experience,
|
database onto target audio files, using a range of perceptual audio
|
||||||
made it the most suitable choice for this project.\\
|
descriptors for matching. Database management, simple matching and
|
||||||
|
synthesis algorithms are used to achieve this, and are described in the
|
||||||
The choice to limit the project to offline processing has both positive and
|
following sections. \\
|
||||||
negative implications on the function of the project. A key disadvantage to
|
|
||||||
this type of processing is the lack of possibility for any live performance
|
The features and uses of this tool are most comparable to those of the
|
||||||
aspect. This method provides no way of exploring the feedback between
|
MATConcat project~\parencite{sturm2004}, which was developed to provide an
|
||||||
performer and system in a live environment, comparable to the work of
|
open source tool for generating similar representations of audio in MATLAB.
|
||||||
Tremblay and Schwarz's~\citeyearpar{Tremblay2010}.
|
Although there are technical differences such as the number of descriptors
|
||||||
However, there are advantages to offline processing that would not be
|
available for each project, both share a similar focus on the
|
||||||
possible in a real-time context.\\
|
electro-acoustic compositional applications of CS. Results produced for the
|
||||||
One significant advantage is that databases can afford to be far larger
|
MATConcat project are comparable to those of the Concatenator project, and
|
||||||
than they could in real time. Without the requirement to process output in
|
both work offline to produce results. The Concatenator project builds on
|
||||||
a short period of time, more time can be taken to search vast databases in
|
this by providing a wider variety of descriptors and the ability to
|
||||||
the hope that the closest match to a target will be found.\\
|
artificially enhance matches (as discussed in the~\hyperref[sat]{Synthesis
|
||||||
Another advantage is in the global view of a target that can be taken in an
|
and Transformations section}).
|
||||||
offline approach. Because the complete audio file is available from the
|
|
||||||
start of processing, techniques can be applied that consider the output as
|
\section*{Program design and implementation}
|
||||||
a whole rather than on a grain by grain basis. This allows for algorithms
|
The Concatenator project consists of a number of components that work
|
||||||
such as the Viterbi algorithm to find the sequence of grains that provide
|
together to produce the final output. A complete description of all
|
||||||
the best continuity, as demonstrated in the Catapillar
|
components and there usage in the Concatenator project can be found in it's
|
||||||
project~\parencite[p.4]{Schwarz2003} This would not be possible in
|
documentation.\\
|
||||||
real-time, as audio is processed on-the-fly.\\
|
|
||||||
|
Output is generated by analysing overlapping segments of audio (known as
|
||||||
An additional consideration was the method to be used for controlling the
|
grains) from both the target sound and the source database, then searching
|
||||||
target to be matched to. It was decided that the most interesting results
|
for the closest matching grain in the source database to the target sound.
|
||||||
would be produced through the matching of grains to a target audio file, as
|
Finally, the output is generated by applying a hanning window and
|
||||||
opposed to other approaches such as matching to MIDI scores. In this sense
|
overlap-adding the best matches. Each component is discussed in detail
|
||||||
the project is a form of offline audio-mosaicking tool similar to that of
|
in the following sections.\\
|
||||||
CataRT.
|
|
||||||
|
When designing the Concatenator framework, ease of development, use and
|
||||||
\subsection*{Descriptor Implementation}
|
extensibility were primary considerations. It was for these reasons that
|
||||||
In order to differentiate between grains, a number of audio descriptors
|
the framework was written in the Python programming language. Python has
|
||||||
were implemented. Audio descriptors are used to measure a specific
|
grown in popularity in the scientific community recently, primarily due to
|
||||||
characteristic of a signal~\parencite[p.31]{Lerch2012}. For example, an RMS
|
its focus on productivity, readability and the large number of efficient
|
||||||
descriptor was implemented to give an indication of the overall intensity
|
numeric processing libraries available (\cite{Pedregosa2011,
|
||||||
of the grain. Another example is the F0 descriptor implemented to give a
|
Fangohr2014, Scipy}). This makes Python a good choice for
|
||||||
value relating to pitch for harmonic grains. These values could then be
|
quickly developing ideas in the context of audio signal processing.
|
||||||
used by the matching algorithm in order to find the best match between the
|
Unfortunately, the language does sacrifice processing speed for simplicity,
|
||||||
source and target grains. A full description of all descriptors implemented
|
and as a result, is not suitable for real-time signal processing. Other
|
||||||
can be found in the Concatenator documentation.\\
|
performance-focused languages such as C++ are better suited to this type of
|
||||||
Due to time constraints on the project, only a limited number of basic
|
processing. However, it was decided that the increase in productivity, lack
|
||||||
descriptors were implemented. For this reason, it was ensured that new
|
of prior CS research in Python and the author's previous experience made
|
||||||
descriptors could be added easily to the project. The object oriented
|
it the most suitable choice for this project.\\
|
||||||
design of the descriptors provides the potential for quick development of
|
|
||||||
any future descriptors to be added to the project.
|
The choice to limit the project to offline processing has both positive and
|
||||||
|
negative implications for the function of the project. A key disadvantage
|
||||||
\subsection*{Database Design}
|
of this type of processing is the lack of possibility for any live
|
||||||
When generating descriptors for large database, large amounts of data are
|
performance aspect. This method provides no way of exploring the feedback
|
||||||
produced and so an efficient method of storing and retrieving the data was
|
between performer and system in a live environment, as in the work
|
||||||
needed to manage this. The Python interface to the HDF5
|
of Tremblay and Schwarz~\citeyearpar{Tremblay2010}.
|
||||||
filesystem~\parencite{Collette2016} was chosen for it's simplicity and
|
However, there are advantages to offline processing that would not be
|
||||||
ability to compress the data automatically. Storing Numpy arrays of
|
possible in a real-time context.\\
|
||||||
descriptors in groups allowed for quick and easy access to analyses from a
|
One significant advantage is that databases can afford to be far larger
|
||||||
single, organized source.
|
than they could be in real time. Without the requirement to process output in
|
||||||
|
a short period of time, more time can be taken to search vast databases in
|
||||||
\subsection*{Matching Algorithms}
|
the hope that the closest match to a target will be found.\\
|
||||||
In order to match grains using the descriptor values, a matching algorithm
|
Another advantage is in the global view of a target that can be taken in an
|
||||||
was required. Initially a brute force matcher was used to compare each
|
offline approach. Because the complete audio file is available from the
|
||||||
descriptor value in the target to all values of the same descriptor type in
|
start of processing, techniques can be applied that consider the output as
|
||||||
the source. However, it quickly became apparent that this approach would be
|
a whole, rather than on a grain-by-grain basis. This allows for algorithms
|
||||||
far to slow, particularly for larger database.\\
|
such as the Viterbi algorithm to find the sequence of grains that provide
|
||||||
For this reason, a k-dimensional tree search algorithm was used in an
|
the best continuity, as demonstrated in the Catapillar
|
||||||
effort to improve matching efficiency. This approach produced the same
|
project~\parencite[p. 4]{Schwarz2003} This would not be possible in
|
||||||
results as the brute force matcher, but by arranging descriptors in a tree
|
real-time, as audio is processed 'on the fly'.\\
|
||||||
structure, a far more efficient search to find the best match was possible.
|
|
||||||
This reduced matching time considerably.
|
An additional consideration was the method to be used for controlling the
|
||||||
|
target to which the grains would be matched. It was decided that the most
|
||||||
\subsection*{Synthesis and Transformations}
|
interesting results would be produced through the matching of grains to a
|
||||||
The final step in the program is to synthesize the matched output.
|
target audio file, as opposed to other approaches such as matching to MIDI
|
||||||
This process consisted of:
|
scores. In this sense the project is a form of offline audio-mosaicking
|
||||||
\begin{enumerate}
|
tool similar to that of CataRT.
|
||||||
\item Retrieving the best grain matches returned by the matching algorithm
|
|
||||||
\item Applying a window function
|
\subsection*{Descriptor Implementation}
|
||||||
\item Overlapping the grains
|
In order to differentiate between grains, a number of audio descriptors
|
||||||
\item Transforming grains to match target
|
were implemented. Audio descriptors are used to measure a specific
|
||||||
\item Saving the result to a file
|
characteristic of a signal~\parencite[p. 31]{Lerch2012}. For example, a
|
||||||
\end{enumerate}
|
root mean square (RMS) descriptor was implemented to give an indication of
|
||||||
Initially, grains were not transformed to better match the target. This
|
the overall intensity of the grain. Another example is the fundamental
|
||||||
worked effectively for large databases, however it was observed that
|
frequency (F0) descriptor, which was implemented to give a value relating
|
||||||
results synthesized using small databases were of a lower quality as the
|
to pitch for harmonic grains. These values could then be used by the
|
||||||
chance of a closely matched grain was lower. To account for this, methods
|
matching algorithm in order to find the best match between the source and
|
||||||
for altering grains to better match their target were implemented. It was
|
target grains.\\
|
||||||
decided that the two most significant characteristics to alter were the
|
Owing to time constraints on the project, only a limited number of basic
|
||||||
pitch and intensity of the grains. By scaling the grains by the difference
|
descriptors were implemented. For this reason, the project was designed so that new
|
||||||
between the source and target RMS, it was possible to impose a closer
|
descriptors could easily be added. The object-oriented
|
||||||
intensity on a grain. Likewise, by shifting the pitch of a grain by the
|
design of the descriptors provides the potential for quick development of
|
||||||
difference, it was possible to better match the pitch contour of the output
|
any future descriptors to be added.
|
||||||
to that of the target audio. This improved the results significantly in
|
|
||||||
smaller databases, as poor matches could be improved to match the target
|
\subsection*{Database design}
|
||||||
more convincingly.
|
When generating descriptors for large databases, large amounts of data are
|
||||||
|
produced and so an efficient method of storing and retrieving the data was
|
||||||
\subsection*{Command line Interface}
|
needed in order to manage this. The Python interface to the HDF5
|
||||||
In order to make the framework accessible to users, a commandline interface
|
filesystem~\parencite{Collette2016} was chosen for it's simplicity and
|
||||||
was developed. By supplying arguments to the program, users could alter
|
ability to compress the data automatically. Storing Numpy arrays of
|
||||||
parameters and experiment freely with the tool. Although this interface
|
descriptors in groups allowed for quick and easy access to analyses from a
|
||||||
was sufficient for testing and experimentation, it quickly became apparent
|
single, organised source.
|
||||||
that there were too many parameters to pass to the program via the command
|
|
||||||
line interface on each run. A configuration file parser was created to
|
\subsection*{Matching algorithms}
|
||||||
address this issue, allowing users to specify default parameters that would
|
In order to match grains using the descriptor values, a matching algorithm
|
||||||
be used by the program on each run. The combination of these interfaces
|
was required. Initially a brute-force matcher was used to compare each
|
||||||
provided an effective means for accessing all of the framework's features.
|
descriptor value in the target to all values of the same descriptor type in
|
||||||
|
the source. However, it quickly became apparent that this approach would be
|
||||||
\subsection*{Documentation and API}
|
far too slow, particularly for a larger database.\\
|
||||||
Complete documentation for the project was created in order to make the
|
For this reason, a k-dimensional tree search algorithm was used in an
|
||||||
project as user friendly as possible for both developers and users. As a
|
effort to improve matching efficiency. This approach produced the same
|
||||||
result, a full API is available alongside examples of use and instructions
|
results as the brute force matcher, but by arranging descriptors in a tree
|
||||||
for commandline operation. This was created in the hope that it might form
|
structure, a far more efficient search to find the best match was possible.
|
||||||
a usable package that developers can build on quickly and effectively to
|
This reduced matching time considerably.
|
||||||
build other CS projects, allowing for easier access to Python based CS than
|
|
||||||
is currently available. The command line interface is equally documented to
|
\subsection*{Synthesis and transformations} \label{sat}
|
||||||
allow users to create their own realisations quickly and easily so that
|
The final step in the program was to synthesise the matched output.
|
||||||
this project may be used for creative sound design purposes.
|
This process consisted of:
|
||||||
|
\begin{enumerate}
|
||||||
\section*{Results and Evaluation}
|
\item Retrieving the best grain matches returned by the matching algorithm
|
||||||
Overall, results generated by this project showed promise; a variety of
|
\item Applying a window function
|
||||||
transformations were generated using open source instrument databases to
|
\item Overlapping the grains
|
||||||
demonstrate the projects potential for sound design application. This
|
\item Transforming grains to match the target
|
||||||
tested the project's ability to convincingly impose qualities of an
|
\item Saving the result to a file
|
||||||
instrument onto target sounds. A variety of examples are provided that
|
\end{enumerate}
|
||||||
outline the style of synthesis aimed for. These range from imposing
|
Initially, grains were not transformed to better match the target. This
|
||||||
acoustic guitar qualities on an electric guitar to imposing stringed
|
worked effectively for large databases; however, it was observed that
|
||||||
instrument qualities on vocal melodies. Current results have a clear
|
results synthesised using small databases were of a lower quality, as the
|
||||||
synthetic nature, but still clearly exhibit some of the main
|
chance of a closely matched grain was lower. To account for this, methods
|
||||||
characteristics of the database used.\\
|
for altering grains to better match their target were implemented. It was
|
||||||
|
decided that the two most significant characteristics to alter were the
|
||||||
\noindent
|
pitch and intensity of the grains. By scaling the grains by the difference
|
||||||
Concatenator project examples that demonstrate current results can be found at:\\
|
between the source and target RMS, it was possible to impose a closer
|
||||||
|
intensity on a grain. Likewise, by shifting the pitch of a grain by the
|
||||||
*PERMENANT URL FOR RESULTS NEEDED*\\
|
difference, it was possible to better match the pitch contour of the output
|
||||||
|
to that of the target audio. This improved the results significantly in
|
||||||
\section*{Research Limitations/Potential Development}
|
smaller databases, as poor matches could be improved to match the target
|
||||||
In retrospect, a great deal of time was spent trying to improve the
|
more convincingly.
|
||||||
efficiency of the project. Although this was necessary, as initial tests
|
|
||||||
were not feasible on most databases, it had a negative impact on the time
|
\subsection*{Command-line interface}
|
||||||
available for developing perceptual qualities of the output. As a result of
|
In order to make the framework accessible to users, a command-line interface
|
||||||
this, the overall quality of output may perhaps not be as natural as that of
|
was developed. By supplying arguments to the program, users could alter
|
||||||
other projects in this area. This is apparent in the vocal -> string
|
parameters and experiment freely with the tool. Although this interface
|
||||||
instrument examples. Phrases tend to begin and end abruptly, failing to
|
was sufficient for testing and experimentation, it quickly became apparent
|
||||||
replicate any defined attack or decay of the string instruments, as would
|
that there were too many parameters to pass to the program via the command
|
||||||
be expected when hearing a string instrument naturally. Conversely, this
|
line interface on each run. A configuration file parser was created to
|
||||||
does give output it's own synthetic characteristic, which may be desirable
|
address this issue, allowing users to specify default parameters that would
|
||||||
as perfect reproduction of an instrument may not be the reason for using
|
be used by the program on each run. The combination of these interfaces
|
||||||
this tool.\\
|
provided an effective means for accessing all of the framework's features.
|
||||||
In Addition, the high computation required results in large amounts of time
|
|
||||||
needed to produce high quality results. An end user may not have the
|
\subsection*{Documentation and API}
|
||||||
patience required to to reach the quality of results that might be
|
Complete documentation for the project was created in order to make the
|
||||||
possible. This is in part a set back of the Python language, and could be
|
project as user friendly as possible for both developers and users. As a
|
||||||
better accounted for with further work on profiling the performance of the
|
result, a full API is available alongside examples of use and instructions
|
||||||
tool.\\
|
for command-line operation. This was created in the hope that it might form
|
||||||
However, the fundamental concepts such as descriptor matching and
|
a usable package that developers can build on quickly and effectively to
|
||||||
transforming matches to better fit the target, that are used in the most
|
build other CS projects, allowing for easier access to Python-based CS than
|
||||||
sophisticated CS projects, have been implemented in this project to
|
is currently available. The command-line interface is equally documented to
|
||||||
satisfying creative effect. As a proof of concept, this project displays
|
allow users to create their own realisations quickly and easily so that
|
||||||
the possibilities for CS in Python and there is evidently potential for
|
this project may be used for creative sound design purposes.
|
||||||
further development in this area.\\
|
|
||||||
|
\section*{Results and evaluation}
|
||||||
There are a number of further improvements that could be made to this
|
Overall, the results generated by this project showed promise; a variety of
|
||||||
project in order to improve the quality of results and extend it's overall
|
transformations were generated using open source instrument databases to
|
||||||
usefulness. Some initial ideas for improvements are detailed in this
|
demonstrate the projects potential for sound design application. This
|
||||||
section below. These range from reasonably simple modifications that could
|
tested the project's ability to convincingly impose qualities of an
|
||||||
not be implemented purely due to time constraints, to more complex ideas
|
instrument onto target sounds. A variety of examples are provided that
|
||||||
that may take a considerable amount of work.\\
|
outline the style of synthesis aimed for. These range from imposing
|
||||||
|
acoustic guitar qualities on an electric guitar to imposing stringed
|
||||||
The current implementation uses only a small and relatively basic subset of
|
instrument qualities on vocal melodies. Current results have a clear
|
||||||
the audio descriptors available. This limits the analysis of audio and thus
|
synthetic nature, but still clearly exhibit some of the main
|
||||||
the quality of matches. Using a larger set of more advanced descriptors may
|
characteristics of the database used.
|
||||||
improve quality from this perspective. One way would be to incorporate the
|
|
||||||
open source Essentia audio descriptors~\parencite{Essentia2016} giving the
|
\section*{Research Limitations/Potential Development}
|
||||||
project access to a vast quantity of descriptors for analysis.\\
|
In retrospect, a great deal of time was spent trying to improve the
|
||||||
|
efficiency of the project. Although this was necessary, as initial tests
|
||||||
Replacing the hanning window function used for grain windowing with a short
|
were not feasible on most databases, it had a negative impact on the time
|
||||||
cross fade at grain overlaps should reduce amplitude modulation, resulting
|
available for developing perceptual qualities of the output. As a result of
|
||||||
in smoother transitions between grains. This might be further improved
|
this, the overall quality of output might not perhaps be as natural as that
|
||||||
through calculating the point of maximum similarity by cross-correlating
|
of other projects in this area. This is apparent in the
|
||||||
overlapping sections, as described by~\textcite[p.191-193]{Zolzer2011} in
|
vocal~\textrightarrow~string instrument examples. Phrases tend to begin and
|
||||||
the SOLA algorithm.\\
|
end abruptly, failing to replicate any defined attack or decay of the
|
||||||
|
string instruments, as would be expected when hearing a string instrument
|
||||||
A lack of continuity between grains was observed in results, most likely
|
naturally. Conversely, this does give output it's own synthetic
|
||||||
due to the lack of any comparison of selected grains. A Viterbi algorithm
|
characteristic, which may be desirable as perfect reproduction of an
|
||||||
could be used to account for this, allowing for a search to be done amongst
|
instrument may not be the reason for using this tool.\\
|
||||||
the top matches to find the optimal set of grains. This takes advantage of
|
In addition, the amount of computation required results in large amounts of time
|
||||||
the offline nature of the project and has been shown to work effectively in
|
needed to produce high quality results. An end user may not have the
|
||||||
the Talkapillar project~\parencite{Hueber}.
|
patience required to reach the quality of results that might be
|
||||||
|
possible. This is in part a drawback of the Python language, and could be
|
||||||
Although the HDF5 filesystem allows for easy storage of descriptor values,
|
better accounted for with further work on profiling the performance of the
|
||||||
it also has drawbacks that limits the functionality of the project. One
|
tool.\\
|
||||||
significant problem is that it is difficult to implement parallel
|
However, the fundamental concepts such as descriptor matching and
|
||||||
processing using the library and for this reason asynchronous processing was
|
transforming matches to better fit the target, which are used in the most
|
||||||
not implemented in the project. An alternative method of storage may
|
sophisticated CS projects, have been implemented in this project to
|
||||||
accommodate this more easily, allowing for the speed-ups possible through
|
satisfying creative effect. As a proof of concept, this project displays
|
||||||
asynchronous processing. The overall design of the database management was
|
the possibilities for CS in Python and there is evidently potential for
|
||||||
also relatively naive and may benefit from being replaced by a technology
|
further development in this area.\\
|
||||||
such as an SQL database or similar. This has been shown to work effectively
|
|
||||||
in work such as the CataRT project~\parencite[p.3]{Schwarz2006a}.
|
There are a number of further improvements that could be made to this
|
||||||
|
project in order to improve the quality of results and extend it's overall
|
||||||
\section*{Conclusion}
|
usefulness. These range from reasonably simple modifications that could not
|
||||||
This project has provided a functioning Python based CS project with much
|
be implemented purely due to time constraints, to more complex ideas that
|
||||||
potential for further development. Given the number of technical issues
|
may take a considerable amount of work. The following is a list of some
|
||||||
faced with this style of synthesis (from the big data issues faced with
|
initial ideas for improvements.\\
|
||||||
analysis storage, to high efficiency requirements for processing the large
|
|
||||||
quantities of data), overall this project appears to work effectively. It
|
\begin{itemize}
|
||||||
provides a new and accessible means for tapping some of the vast amount of
|
\item The current implementation uses only a small and relatively basic
|
||||||
potential that concatenative synthesis has to offer.\\ With the ever
|
subset of the audio descriptors available. This limits the analysis
|
||||||
increasing quality of technology, it is predicted that new techniques such
|
of audio and thus the quality of matches. Using a larger set of
|
||||||
as concatenative synthesis may grow further in popularity, leading to an
|
more advanced descriptors may improve quality from this
|
||||||
increasing number of possibilities in this area of sound synthesis. It is
|
perspective. One way would be to incorporate the open source
|
||||||
hoped that this project might aid in the highlighting the possibilities
|
Essentia audio descriptors~\parencite{Essentia2016} giving the
|
||||||
offered by this form of synthesis and demonstrate some of the technical
|
project access to a vast quantity of descriptors for analysis.
|
||||||
obstacles that must be addressed to design a CS project successfully.
|
|
||||||
|
\item Replacing the hanning window function used for grain windowing
|
||||||
\section*{Acknowledgments}
|
with a short cross-fade at grain overlaps should reduce amplitude
|
||||||
The author would like to thanks A. Harker for his advice and guidance
|
modulation, resulting in smoother transitions between grains. This
|
||||||
as a mentor throughout the project, and to A. Harker and P. Chen for access
|
might be further improved through calculating the point of maximum
|
||||||
to their vocal samples database. Thanks also to D. Chaplin for his
|
similarity by cross-correlating overlapping sections, as described
|
||||||
creative input in generating results.
|
by~\textcite[p.191-193]{Zolzer2011} in the Synchronus OverLap Add
|
||||||
|
(SOLA) algorithm.
|
||||||
\printbibliography
|
|
||||||
\end{document}
|
\item A lack of continuity between grains was observed in results, most
|
||||||
|
likely owing to the lack of any comparison of selected grains. A
|
||||||
|
Viterbi algorithm could be used to account for this, allowing for a
|
||||||
|
search to be done amongst the top matches to find the optimal set
|
||||||
|
of grains. This takes advantage of the offline nature of the
|
||||||
|
project and has been shown to work effectively in the Talkapillar
|
||||||
|
project~\parencite{Hueber}.
|
||||||
|
|
||||||
|
\item Although the HDF5 filesystem allows for easy storage of
|
||||||
|
descriptor values, it also has drawbacks that limits the
|
||||||
|
functionality of the project. One significant problem is that it is
|
||||||
|
difficult to implement parallel processing using the library and
|
||||||
|
for this reason asynchronous processing was not implemented in the
|
||||||
|
project. An alternative method of storage may accommodate this more
|
||||||
|
easily, allowing for the speed-ups possible through asynchronous
|
||||||
|
processing. The overall design of the database management was also
|
||||||
|
relatively naive and may benefit from being replaced by a
|
||||||
|
technology such as an SQL database or similar. This has been shown
|
||||||
|
to work effectively in work such as the CataRT
|
||||||
|
project~\parencite[p.3]{Schwarz2006a}.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\section*{Conclusion}
|
||||||
|
This project has provided a functioning Python based CS project with much
|
||||||
|
potential for further development. Given the number of technical issues
|
||||||
|
faced with this style of synthesis (from the big data issues faced with
|
||||||
|
analysis storage, to high efficiency requirements for processing the large
|
||||||
|
quantities of data), overall this project appears to work effectively. It
|
||||||
|
provides a new and accessible means for tapping some of the vast amount of
|
||||||
|
potential that concatenative synthesis has to offer.\\
|
||||||
|
With the ever increasing quality of technology, it is predicted that new
|
||||||
|
techniques such as concatenative synthesis may grow further in popularity,
|
||||||
|
leading to an increasing number of possibilities in this area of sound
|
||||||
|
synthesis. It is hoped that this project might aid in the highlighting the
|
||||||
|
possibilities offered by this form of synthesis and demonstrate some of the
|
||||||
|
technical obstacles that must be addressed to design a CS project
|
||||||
|
successfully.
|
||||||
|
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
|
\printbibliography
|
||||||
|
\end{document}
|
||||||
|
>>>>>>> Stashed changes
|
||||||
|
|||||||
Reference in New Issue
Block a user