Finished first draft

This commit is contained in:
2016-08-28 12:46:16 +01:00
parent 9a26ad927b
commit c0ec255191
+85 -58
View File
@@ -49,21 +49,18 @@
\begin{abstract}
A command-line tool and Python framework is proposed for the exploration of
a new form of audio synthesis known as ``concatenative-synthesis'' (CS): A
a new form of audio synthesis known as ``concatenative synthesis'': A
form of synthesis that uses perceptual audio analyses to arrange small
segments of audio based on their characteristics. The tool is designed to
synthesise representations of an input sound using a database of source
sounds. This involves the segmentation and analysis of both the input sound
and database, matching of input segments to their closest segment from the
database, and the re-synthesis of the closest matches from the database to
produce the final result.\\
The aim was to produce a tool capable of generating high quality sonic
representations of an input, and to present a variety of examples that
demonstrated the breadth of possibilities that this style of synthesis has
to offer. There are a number of projects that use this form of synthesis,
however this project aims primarily to explore the further potential
offered through the offline processing of large databases.\\
produce the final result. The aim was to produce a tool capable of
generating high quality sonic representations of an input, to present a
variety of examples that demonstrated the breadth of possibilities that
this style of synthesis has to offer and to provide a robust framework on
which concatenative synthesis projects could be developed easily.\\
Results demonstrate the wide variety of sounds that can be produced using
this method of synthesis. A number of technical issues are outlined that
@@ -76,20 +73,20 @@
The concept of constructing a new sound by arranging collections of smaller
sounds has gained popularity in the past 30 years through the introduction
of ``Granular Synthesis''. Granular synthesis works on the theory that any
sound can be described through the arrangement of smaller samples (reffered
sound can be described through the arrangement of smaller samples (referred
to as ``grains''). This representation of sound allows for the temporal
decomposition and re-arranging of real-world samples, with the potential to
create new ``complex, dynamically-evolving
sounds.''~\parencite[p.1]{Roads1988}\\
Concatenative synthesis is a form of synthesis that has developed
Concatenative synthesis (CS) is a form of synthesis that has developed
significantly over the past 15 years, driven by recent advancements in
technology. Key advancements have been in easy access to large databases of
audio and the development of methods for extracting useful information from
these databases automatically~\parencite[p.1]{Schwarz2006a}. CS utilises
these technologies to provide a content-based extension to granular
synthesis; by analysing a database of source grains, grains can be
differentiated based on their charcteristics. These charachteristics can
differentiated based on their characteristics. These characteristics can
then be used for grain selection in the process of synthesizing output for
a wide range of applications~\parencite[p.102]{Schwarz2007}.
@@ -103,18 +100,17 @@
focus on defining sets of rules for emulating real sounds. By transforming
samples that have been directly recorded from a source, the subtle nuances
of the source's sound are preserved. These would be difficult to reproduce
using other synthetic methods for modeling an
using other synthetic methods for modelling an
instrument~\parencite[p.24]{Maestre2009}.
\subsection*{Speech Synthesis}
Creating a natural and intelligible realisation is an important factor when
developing a speech synthesis system.*add part about continuity here* The
Talkapillar project is one such example of how highly convincing results
are possible with CS. Through careful analysis of a vocal database, the
project aims to impose the qualities of the database voice on an input
voice. This would result in the words of the input speaker being
transformed to appear as if they were spoken by the voice in the
database.~\parencite{Hueber}
developing a speech synthesis system. The Talkapillar project is one such
example of how highly convincing results are possible with CS. Through
careful analysis of a vocal database, the project aims to impose the
qualities of the database voice on an input voice. This would result in the
words of the input speaker being transformed to appear as if they were
spoken by the voice in the database.~\parencite{Hueber}
\subsection*{Instrument Synthesis}
Progress has also been made in improving the quality of instrument
@@ -127,16 +123,16 @@
splicing of grains based on their expressive characteristics to form
musical phrases. For example, just as a violinist might transition
seamlessly from one articulation to the next, the CS software will join
grains to produce the varyation in articulations. This contrasts the
grains to produce the variation in articulations. This contrasts the
traditional approach to sampling, where samples are played in isolation,
resulting in a discontinuity between adjacent samples~\parencite[p.82]{Lindemann2007}.
The Catapillar project is one such example of this use of CS.
By using a viterbi algorithm, the project is able to calculate the
smoothest overall transition between grains accross the output, resulting
smoothest overall transition between grains across the output, resulting
in convincing synthesis of orchestral instrument performances~\parencite[p.5]{Schwarz2003}.
\subsection*{Creative Sound Design}
The flexibilty of CS allows for creativity in a broader context than simply
The flexibility of CS allows for creativity in a broader context than simply
emulating real-world instruments and speech. It can also be used as a tool
to explore the possibilities for synthesizing new abstract sounds for
creative purposes.\\
@@ -156,7 +152,7 @@
method. CS is used in this context as a means for synthesizing matches in a
corpus database to real-time input from an electric bass. Significance is
placed on linking the playback of grains to the expressivity of the
performer. The use of perceptualy based audio descriptors to match the
performer. The use of perceptually based audio descriptors to match the
source to the target allows the performer to navigate the database
naturally based on factors such as the pitch and timbre of the bass
guitar. The result is a performance that mixes characteristics of both the
@@ -183,17 +179,25 @@
following sections.
\section*{Program Design and Implementation}
*INSERT OVERVIEW OF CONCATENATOR AS A WHOLE*
The Concatenator project consists of a number of components, as show below:\\
*INSERT Concatenator OVERVIEW DIAGRAM*\\
Output is generated by analysing overlapping segments of audio (known as
grains) from both the target sound and the source database, then searching
for the closest matching grain in the source database to the target sound.
Finally, the output is generated by overlap-adding the best matches. Each
component will be discussed in detail in the following sections.\\
When designing the concatenator framework, ease of development, use and
extensability were primary considerations. It was for these reasons that
extensibility were primary considerations. It was for these reasons that
the framework was written in the Python programming language. Python has
grown in popularity in the scientific community recently, primarily due to
it's focus on productivity, readability and the large number of efficient
numeric processing libraries available (Numpy, SciPy, Scikitlearn
etc...)~\parencite[p.11]{Fangohr2014}. This makes Python a good choice for
quickly developing ideas in the context of audio signal processing.
Unfortunatley, the language does sacrafice processing speed for simplicity
Unfortunately, the language does sacrifice processing speed for simplicity
and as a result is not suitable for real-time signal processing. Other
performance focused languages such as C++ are better suited to this type of
processing. However, it was decided that the increase in productivity, lack
@@ -217,7 +221,7 @@
start of processing, techniques can be applied that consider the output as
a whole rather than on a grain by grain basis. This allows for algorithms
such as the viterbi algorithm to find the sequence of grains that provide
the best continuity, as demonstarted in the Catapillar
the best continuity, as demonstrated in the Catapillar
project~\parencite[p.4]{Schwarz2003} This would not be possible in
real-time, as audio is processed on the fly.\\
@@ -225,7 +229,7 @@
target to be matched too. It was decided that the most interesting results
would be produced through the matching of grains to a target audio file, as
opposed to other approaches such as matching to midi scores. In this sense
the project is a form of offline audio-mosaicing tool similar to that of
the project is a form of offline audio-mosaicking tool similar to that of
CataRT.
\subsection*{Descriptor Implementation}
@@ -246,11 +250,12 @@
\subsection*{Database Design}
When generating descriptors for large database, large amounts of data are
produced and so an efficient method of storing and retriving the data was
needed to manage this. The Python interface to the HDF5 filesystem (h5py)
was chosen for it's simplicity and ability to compress the data
automatically. Storing Numpy arrays of descriptors in groups allowed for
quick and easy access to analyses from a single, organized source.
produced and so an efficient method of storing and retrieving the data was
needed to manage this. The Python interface to the HDF5
filesystem~\parencite{Collette2016} was chosen for it's simplicity and
ability to compress the data automatically. Storing Numpy arrays of
descriptors in groups allowed for quick and easy access to analyses from a
single, organized source.
\subsection*{Matching Algorithms}
In order to match grains using the descriptor values, a matching algorithm
@@ -258,19 +263,17 @@
descriptor value in the target to all values of the same descriptor type in
the source. However, it quickly became apparent that this approach would be
far to slow, particularly for larger database.\\
*INSERT O NOTATION FOR BRUTE FORCE MATCHER*
For this reason, a k dimensional tree search algorithm was used in an
effort to improve matching efficiciency. This approach produced the same
For this reason, a k-dimensional tree search algorithm was used in an
effort to improve matching efficiency. This approach produced the same
results as the brute force matcher, but by arranging descriptors in a tree
structure, a far more efficient search to find the best match was possible.
this reduced matching time considerably.
*INSERT O NOTATION FOR KD TREE SEARCH*
This reduced matching time considerably.
\subsection*{Synthesis and Transformations}
The final step in the program is to synthesize the matched output.
This process consisted of:
\begin{enumerate}
\item Retreiving the best grain matches returned by the matching algorithm
\item Retrieving the best grain matches returned by the matching algorithm
\item Applying a window function
\item Overlapping the grains
\item Transform grains to match target
@@ -305,7 +308,7 @@
In order to make the project as user friendly as possible for both
developers and users, a significant amount of time was spent documenting
the code properly. As a result, a full API is available alongside examples
of use. This was written in the hope that it might form a useable package
of use. This was written in the hope that it might form a usable package
that developers can build on quickly and effectively to build other CS
projects, allowing for easier access to Python based CS than is currently
available. The command line interface is equally documented to allow users
@@ -314,7 +317,7 @@
\section*{Results and Evaluation}
In retrospect, a great deal of time was spent trying to improve the
efficiency of the project. Although this was neccessary, as initial tests
efficiency of the project. Although this was necessary, as initial tests
were not feasible on most databases, it had a negative impact on the time
available for developing perceptual qualities of the output. As a result of
this, the overall quality of output may perhaps not be as high as that of
@@ -329,13 +332,13 @@
matching and transforming matches to better fit the target, that are used
in the most sophisticated CS projects, have been implemented in this
project to reasonable effect. As a proof of concept, this project displays
the possibilities for CS in Python and there is clearly potential for
the possibilities for CS in Python and there is evidently potential for
further development in this area.
\section*{Research Limitations/Potential Development}
There are a number of further improvments that could be made to this
There are a number of further improvements that could be made to this
project in order to improve the quality of results and extend it's overall
usefulness. Some initial ideas for improvments are detailed in this
usefulness. Some initial ideas for improvements are detailed in this
section. These range from reasonably simple modifications that could not be
implemented purely due to time constraints, to more complex ideas that may
take a considerable amount of work.\\
@@ -354,22 +357,46 @@
overlapping sections, as described by~\textcite[p.191-193]{Zolzer2011} in
the SOLA algorithm.\\
Replacment of HDF5 to allow parallel processing
possible use of more sophisticated database management system as demonstarted in the Catapillar project.
A lack of continuity between grains was observed in results, most likely
due to the lack of any comparison of selected grains. A viterbi algorithm
could be used to account for this, allowing for a search to be done amongst
the top matches to find the optimal set of grains. This takes advantage of
the offline nature of the project and has been shown to work effectively in
the Talkapillar project~\parencite{Hueber}.
Although the HDF5 filesystem allows for easy storage of descriptor values,
it also has drawbacks that limits the functionality of the project. One
significant problem is that it is difficult to implement parallel
processing using the library and for this reason asynchronous processing was
not implemented in the project. An alternative method of storage may
accommodate this more easily, allowing for the speed-ups possible through
asynchronous processing. The overall design of the database management was
also relatively naive and may benefit from being replaced by a technology
such as an SQL database or similar. This has been shown to work effectively
in work such as the CataRT project~\parencite[p.3]{Schwarz2006a}.
High quantity of parameters is very time consuming ~\parencite{Petrushin2007}
Spectral matching~\parencite{Hoffman2009}
Use of RPM?~\parencite[p.82]{Lindemann2007}
Lack of continuity
Viterbi path search~\parencite[p.1]{Schwarz2006a}
\section*{Conclusion}
Given the limited time frame for the project and complexity of modern
approaches to this form of synthesis, only a basic implementation was
possible.
approaches to this form of synthesis, only a basic implementation of CS is
presented. Nevertheless, this project has provided a functioning Python
based CS project with much potential for further development. Given the
high number of technical issues faced with this style of synthesis (from
the big data issues faced with analysis storage, to high efficiency
requirements for processing the large quantities of data), overall this
project appears to perform to a reasonable standard.\\
With the ever increasing quality of technology, it is predicted that new
techniques such as concatenative synthesis may grow further in popularity,
leading to an increasing number of possibilities in this area of sound
synthesis. It is hoped that this project might aid in the highlighting the
possibilities offered by this form of synthesis and demonstrate some of the
technical obstacles that must be addressed to design a CS project
successfully.
\section*{Acknowledgments}
The author would like to thanks A. Harker for his advice and guidance
as a mentor throughout the project, and to A. Harker and P. Chen for access
to their vocal samples database. Thanks also to D. Chaplin for his
creative input in generating results.
\printbibliography
\end{document}