Begun potention improvements section

This commit is contained in:
2016-08-27 19:51:18 +01:00
parent 4b7789164f
commit 121a5d57bd
+92 -18
View File
@@ -20,7 +20,7 @@
\usepackage{blindtext}
\setkomafont{disposition}{\normalfont\fontsize{12}{17}\bfseries}
\setkomafont{section}{\normalfont\fontsize{12}{17}\bfseries}
\setkomafont{subsection}{\normalfont\fontsize{12}{17}\bfseries\itshape}
\setkomafont{subsection}{\normalfont\fontsize{12}{17}\itshape}
\setkomafont{subsubsection}{\normalfont\fontsize{12}{17}\itshape}
\graphicspath{{./resources/}}
@@ -41,7 +41,7 @@
{}
\begin{document}
\title{Descriptor Driven Concatenative Synthesis Tool}
\title{Descriptor Driven Concatenative Synthesis Tool for Python}
% \subtitle{\LARGE{Abstract Draft}}
\author{Sam Perry}
@@ -249,43 +249,117 @@
produced and so an efficient method of storing and retriving the data was
needed to manage this. The Python interface to the HDF5 filesystem (h5py)
was chosen for it's simplicity and ability to compress the data
automatically. This allowed for quick and easy access to analyses from a
single, organized source.
automatically. Storing Numpy arrays of descriptors in groups allowed for
quick and easy access to analyses from a single, organized source.
\subsection*{Matching Algorithms}
Brute force matching
Kd tree matching
In order to match grains using the descriptor values, a matching algorithm
was required. Initially a brute force matcher was used to compare each
descriptor value in the target to all values of the same descriptor type in
the source. However, it quickly became apparent that this approach would be
far to slow, particularly for larger database.\\
*INSERT O NOTATION FOR BRUTE FORCE MATCHER*
For this reason, a k dimensional tree search algorithm was used in an
effort to improve matching efficiciency. This approach produced the same
results as the brute force matcher, but by arranging descriptors in a tree
structure, a far more efficient search to find the best match was possible.
this reduced matching time considerably.
*INSERT O NOTATION FOR KD TREE SEARCH*
\subsection*{Synthesis and Transformations}
Windowing of grains
Pitch enforcement
RMS Enforcement
The final step in the program is to synthesize the matched output.
This process consisted of:
\begin{enumerate}
\item Retreiving the best grain matches returned by the matching algorithm
\item Applying a window function
\item Overlapping the grains
\item Transform grains to match target
\item Saving the result to a file
\end{enumerate}
Initially, grains were not transformed to better match the target. This
worked effectively for large databases, however it was observed that
results synthesized using small databases were of a lower quality as the
chance of a closely matched grain was lower. To account for this, methods
for altering grains to better match their target were implemented. It was
decided that the two most significant characteristics to alter were the
pitch and intensity of the grains. By scaling the grains by the difference
between the source and target RMS, it was possible to impose a closer
intensity on a grain. Likewise, by shifting the pitch of a grain by the
difference, it was possible to better match the pitch contour of the output
to that of the target audio. This improved the results significantly in
smaller databases, as poor matches could be improved to match the target
more convincingly.
\subsection*{Command line Interface}
High quantity of parameters is very time consuming ~\parencite{Petrushin2007}
In order to make the framework accessible to users, a commandline interface
was developed. By supplying arguments to the program, users could alter
parameters and experiment freely with the tool. Although this interface
was sufficient for testing and experimentation, it quickly became apparent
that there were too many parameters to pass to the program via the command
line interface on each run. A configuration file parser was created to
address this issue, allowing users to specify default parameters that would
be used by the program on each run. The combination of these interfaces
provided an effective means for accessing all of the framework's features.
\subsection*{Documentation and API}
Object oriented approach for intuitive API
In order to make the project as user friendly as possible for both
developers and users, a significant amount of time was spent documenting
the code properly. As a result, a full API is available alongside examples
of use. This was written in the hope that it might form a useable package
that developers can build on quickly and effectively to build other CS
projects, allowing for easier access to Python based CS than is currently
available. The command line interface is equally documented to allow users
to create their own realisations quickly and easily so that this project
may be used for creative sound design purposes.
\section*{Results and Evaluation}
The choice to develop a purely offline project
Reasonable results, further development needed for it to be truly useful
In retrospect, a great deal of time was spent trying to improve the
efficiency of the project. Although this was neccessary, as initial tests
were not feasible on most databases, it had a negative impact on the time
available for developing perceptual qualities of the output. As a result of
this, the overall quality of output may perhaps not be as high as that of
other projects in this area. It is clear that in it's current state this
project does not have the level of sophistication that might be needed for
this style of synthesis. Factors such as the low quantity of descriptors
supplied and basic transformations impede the overall quality of results.
This is further exacerbated by high computation required, resulting in
large amounts of time needed to produce high quality results. An end user
may not have the patience required to to reach the quality of results that
might be possible. However, the fundamental concepts such as descriptor
matching and transforming matches to better fit the target, that are used
in the most sophisticated CS projects, have been implemented in this
project to reasonable effect. As a proof of concept, this project displays
the possibilities for CS in Python and there is clearly potential for
further development in this area.
\section*{Research Limitations/Potential Development}
Given the limited time frame and complexity of modern approaches to this
form of synthesis, only a basic implementation was possible.
There are a number of further improvments that could be made to this
project in order to improve the quality of results and extend it's overall
usefulness. Some initial ideas for improvments are detailed in this
section. These range from reasonably simple modifications that could not be
implemented purely due to time constraints, to more complex ideas that may
take a considerable amount of work.
Using Essentia to vastly increase the number of available descriptors.
Replacment of HDF5 to allow parallel processing
High quantity of parameters is very time consuming ~\parencite{Petrushin2007}
Better ways of windowing using SOLA/PSOLA methods
Replacment of HDF5 to allow parallel processing
possible use of more sophisticated database management system as demonstarted in the Catapillar project.
Spectral matching~\parencite{Hoffman2009}
Use of RPM?~\parencite[p.82]{Lindemann2007}
Lack of continuity
Viterbi path search~\parencite[p.1]{Schwarz2006a}
\section*{Conclusion}
Given the limited time frame for the project and complexity of modern
approaches to this form of synthesis, only a basic implementation was
possible.
\printbibliography
\end{document}