diff --git a/Journal_Article.tex b/Journal_Article.tex old mode 100755 new mode 100644 index cb4d3e3..4301cde --- a/Journal_Article.tex +++ b/Journal_Article.tex @@ -1,430 +1,472 @@ -\documentclass[titlepage]{scrartcl} -\usepackage{enumitem} -\usepackage[british]{babel} -\usepackage[style=apa, backend=biber]{biblatex} -\DeclareLanguageMapping{british}{british-apa} -\usepackage{url} -\usepackage{float} -\restylefloat{table} -\usepackage{perpage} -\MakePerPage{footnote} -\usepackage{abstract} -\usepackage{graphicx} -% Create hyperlinks in bibliography -\usepackage{hyperref} - -\renewcommand{\familydefault}{\sfdefault} -\usepackage{fontspec} -\setmainfont{Arial} - -\usepackage{blindtext} -\setkomafont{disposition}{\normalfont\fontsize{12}{17}\bfseries} -\setkomafont{section}{\normalfont\fontsize{12}{17}\bfseries} -\setkomafont{subsection}{\normalfont\fontsize{12}{17}\itshape} -\setkomafont{subsubsection}{\normalfont\fontsize{12}{17}\itshape} - -\graphicspath{{./resources/}} -\addbibresource{~/Documents/library.bib} - -\usepackage[affil-it]{authblk} - -% \usepackage{etoolbox} -% \makeatletter -% \expandafter\patchcmd\csname\string\maketitle\endcsname -% {\vskip\z@\@plus3fill} -% {\vskip\z@\@plus2fill\box\abstractbox\vskip\z@\@plus1fill} -% {}{} -% \makeatother -% - -\DeclareCiteCommand{\citeyearpar} - {} - {\mkbibparens{\bibhyperref{\printdate}}} - {\multicitedelim} - {} - -\begin{document} - \title{Descriptor Driven Concatenative Synthesis Tool for Python} - % \subtitle{\LARGE{Abstract Draft}} - \author{S. Perry\thanks{E-mail: \texttt{\href{mailto:u1265119@unimail.hud.ac.uk}{u1265119@unimail.hud.ac.uk}}}} - \date{Dated: \today} - - \maketitle - - \begin{abstract} - A command-line tool and Python framework is proposed for the exploration of - a new form of audio synthesis known as ``concatenative synthesis'': A form - of synthesis that uses perceptual audio analyses to arrange small segments - of audio based on their characteristics. The tool is designed to - synthesise representations of an input sound using a database of source - sounds. This involves the segmentation and analysis of both the input sound - and database, matching of input segments to their closest segment from the - database, and the re-synthesis of the closest matches to produce the final - result. The project aims to provide a tool capable of generating high - quality sonic representations of an input, to present a variety of examples - that demonstrated the breadth of possibilities that this style of synthesis - has to offer and to provide a robust framework on which concatenative - synthesis projects can be developed easily.\\ - - Results demonstrate the wide variety of sounds that can be produced using - this method of synthesis. A number of technical issues are outlined that - impeded the overall quality of results and efficiency of the software. - However, the project clearly demonstrates the strong potential for this - type synthesis to be used for creative purposes. - \end{abstract} - - \section*{Background} - The concept of constructing a new sound by arranging collections of smaller - sounds has gained popularity in the past 30 years through the introduction - of ``Granular Synthesis''. Granular synthesis works on the theory that any - sound can be described through the arrangement of smaller samples (referred - to as ``grains''). This representation of sound allows for the temporal - decomposition and re-arranging of real-world samples, with the potential to - create new ``complex, dynamically-evolving - sounds.''~\parencite[p.1]{Roads1988}\\ - - Concatenative synthesis (CS) is a form of synthesis that has developed - significantly over the past 15 years, driven by recent advancements in - technology. Key advancements have been in easy access to large databases of - audio and the development of methods for extracting useful information from - these databases automatically~\parencite[p.1]{Schwarz2006a}. CS utilises - these technologies to provide a content-based extension to granular - synthesis; by analysing a database of source grains, grains can be - differentiated based on their characteristics. These characteristics can - then be used for grain selection in the process of synthesizing output for - a wide range of applications~\parencite[p.102]{Schwarz2007}. - - \section*{Related Works} - A number of programs utilize CS to achieve various goals. The process has - been used for applications in areas such as speech synthesis, instrument - synthesis and for applications in creative sound design.\\ - The wide range of applications demonstrates the versatility of this - synthesis technique. It differs from traditional synthesis methods through - the use of real recorded samples, as opposed to traditional methods that - focus on defining sets of rules for emulating real sounds. By transforming - samples that have been directly recorded from a source, the subtle nuances - of the source's sound are preserved. These would be difficult to reproduce - using other synthetic methods for modelling an - instrument~\parencite[p.24]{Maestre2009}. - - \subsection*{Speech Synthesis} - Creating a natural and intelligible realisation is an important factor when - developing a speech synthesis system. The Talkapillar project is one such - example of how highly convincing results are possible with CS. Through - careful analysis of a vocal database, the project aims to impose the - qualities of the database voice on an input voice. This would result in the - words of the input speaker being transformed to appear as if they were - spoken by the voice in the database.~\parencite{Hueber} - - \subsection*{Instrument Synthesis} - Progress has also been made in improving the quality of instrumental - synthesis. As with speech synthesis, the use of samples directly allows for - natural sounding results, which provides a method for reproducing real - instruments convincingly. Another important aspect of instrumental synthesis is that of performer - expression. The reproduction of performance qualities such as dynamics, - timbre and timing are essential when emulating a real instrument and CS has - been used to effectively reproduce these aspects. This is achieved through - splicing of grains based on their expressive characteristics to form - musical phrases. For example, just as a violinist might transition - seamlessly from one articulation to the next, the CS software will join - grains to produce the variation in articulations. This contrasts the - traditional approach to sampling, where samples are played in isolation, - resulting in a discontinuity between adjacent samples~\parencite[p.82]{Lindemann2007}. - The Catapillar project is one such example of this use of CS. - By using a Viterbi algorithm, the project is able to calculate the - smoothest overall transition between grains across the output, resulting - in convincing synthesis of orchestral instrument performances~\parencite[p.5]{Schwarz2003}. - - \subsection*{Creative Sound Design} - The flexibility of CS allows for creativity in a broader context than simply - emulating real-world instruments and speech. It can also be used as a tool - to explore the possibilities for synthesizing new abstract sounds for - creative purposes.\\ - A prominent project in this area of CS is IRCAM's CataRT - project~\parencite{Schwarz2006a}. The project focuses on the playback of - source grains based on their proximity to a target in multi-dimensional - descriptor space. By providing a target point in the descriptor space, the - user is able to navigate the database, playing selections of samples that - are nearest to the target. This allows the user to explore the database - intuitively through a graphic user interface, selecting a point in - 2-dimensional space with the mouse. Grains are then played back in - real-time to create an ``audio mosaic''.\\ - Alternatively, target audio can be provided and analysed to create a target - location based on it's location in the descriptor space. Tremblay and - Schwarz's~\citeyearpar{Tremblay2010} use of CataRT to explore - electroacoustic sample banks demonstrates the creative potential of this - method. CS is used in this context as a means for synthesizing matches in a - corpus database to real-time input from an electric bass. Significance is - placed on linking the playback of grains to the expressivity of the - performer. The use of perceptually based audio descriptors to match the - source to the target allows the performer to navigate the database - naturally based on factors such as the pitch and timbre of the bass - guitar. The result is a performance that mixes characteristics of both the - bass guitar output and the qualities of the corpus database to create a - hybrid of the two.\\ - - This is by no means an exhaustive overview of the projects and techniques - that explore the vast possibilities of CS. For further information, please - refer to: ``Concatenative Synthesis - The Early - Years''~\parencite{Schwarz2006b} - - \section*{Concatenator} - The concatenator project aims to provide an open source set of tools that - allows composers to generate a variety of CS driven realisations for - sound design purposes. In addition, the project aims to provide an - intuitive API that Python programmers might use as the fundamental building - blocks to build further concatenative synthesis applications on. - The result is a framework and command-line interface, built in Python, for - easy access to basic CS techniques. - The current implementation can be used for the concatenation of a source - database onto target audio files, using a range of perceptual audio - descriptors for matching. Database management, simple matching and - synthesis algorithms are used to achieve this, and are described in the - following sections. - - \section*{Program Design and Implementation} - The Concatenator project consists of a number of components that work - together to produce the final output. A complete description of all - components and there usage in the concatenator project can be found in it's - complete documentation at:\\ - - *PERMANENT URL FOR DOCUMENTATION NEEDED*\\ - - Output is generated by analysing overlapping segments of audio (known as - grains) from both the target sound and the source database, then searching - for the closest matching grain in the source database to the target sound. - Finally, the output is generated by applying a hanning window and - overlap-adding the best matches. Each component will be discussed in detail - in the following sections.\\ - - When designing the concatenator framework, ease of development, use and - extensibility were primary considerations. It was for these reasons that - the framework was written in the Python programming language. Python has - grown in popularity in the scientific community recently, primarily due to - it's focus on productivity, readability and the large number of efficient - numeric processing libraries available (Numpy, SciPy, Scikitlearn - etc...)~\parencite[p.11]{Fangohr2014}. This makes Python a good choice for - quickly developing ideas in the context of audio signal processing. - Unfortunately, the language does sacrifice processing speed for simplicity - and as a result is not suitable for real-time signal processing. Other - performance focused languages such as C++ are better suited to this type of - processing. However, it was decided that the increase in productivity, lack - of prior CS research in Python and the author's previous experience, - made it the most suitable choice for this project.\\ - - The choice to limit the project to offline processing has both positive and - negative implications on the function of the project. A key disadvantage to - this type of processing is the lack of possibility for any live performance - aspect. This method provides no way of exploring the feedback between - performer and system in a live environment, comparable to the work of - Tremblay and Schwarz's~\citeyearpar{Tremblay2010}. - However, there are advantages to offline processing that would not be - possible in a real-time context.\\ - One significant advantage is that databases can afford to be far larger - than they could in real time. Without the requirement to process output in - a short period of time, more time can be taken to search vast databases in - the hope that the closest match to a target will be found.\\ - Another advantage is in the global view of a target that can be taken in an - offline approach. Because the complete audio file is available from the - start of processing, techniques can be applied that consider the output as - a whole rather than on a grain by grain basis. This allows for algorithms - such as the Viterbi algorithm to find the sequence of grains that provide - the best continuity, as demonstrated in the Catapillar - project~\parencite[p.4]{Schwarz2003} This would not be possible in - real-time, as audio is processed on-the-fly.\\ - - An additional consideration was the method to be used for controlling the - target to be matched to. It was decided that the most interesting results - would be produced through the matching of grains to a target audio file, as - opposed to other approaches such as matching to MIDI scores. In this sense - the project is a form of offline audio-mosaicking tool similar to that of - CataRT. - - \subsection*{Descriptor Implementation} - In order to differentiate between grains, a number of audio descriptors - were implemented. Audio descriptors are used to measure a specific - characteristic of a signal~\parencite[p.31]{Lerch2012}. For example, an RMS - descriptor was implemented to give an indication of the overall intensity - of the grain. Another example is the F0 descriptor implemented to give a - value relating to pitch for harmonic grains. These values could then be - used by the matching algorithm in order to find the best match between the - source and target grains. A full description of all descriptors implemented - can be found in the Concatenator documentation.\\ - Due to time constraints on the project, only a limited number of basic - descriptors were implemented. For this reason, it was ensured that new - descriptors could be added easily to the project. The object oriented - design of the descriptors provides the potential for quick development of - any future descriptors to be added to the project. - - \subsection*{Database Design} - When generating descriptors for large database, large amounts of data are - produced and so an efficient method of storing and retrieving the data was - needed to manage this. The Python interface to the HDF5 - filesystem~\parencite{Collette2016} was chosen for it's simplicity and - ability to compress the data automatically. Storing Numpy arrays of - descriptors in groups allowed for quick and easy access to analyses from a - single, organized source. - - \subsection*{Matching Algorithms} - In order to match grains using the descriptor values, a matching algorithm - was required. Initially a brute force matcher was used to compare each - descriptor value in the target to all values of the same descriptor type in - the source. However, it quickly became apparent that this approach would be - far to slow, particularly for larger database.\\ - For this reason, a k-dimensional tree search algorithm was used in an - effort to improve matching efficiency. This approach produced the same - results as the brute force matcher, but by arranging descriptors in a tree - structure, a far more efficient search to find the best match was possible. - This reduced matching time considerably. - - \subsection*{Synthesis and Transformations} - The final step in the program is to synthesize the matched output. - This process consisted of: - \begin{enumerate} - \item Retrieving the best grain matches returned by the matching algorithm - \item Applying a window function - \item Overlapping the grains - \item Transforming grains to match target - \item Saving the result to a file - \end{enumerate} - Initially, grains were not transformed to better match the target. This - worked effectively for large databases, however it was observed that - results synthesized using small databases were of a lower quality as the - chance of a closely matched grain was lower. To account for this, methods - for altering grains to better match their target were implemented. It was - decided that the two most significant characteristics to alter were the - pitch and intensity of the grains. By scaling the grains by the difference - between the source and target RMS, it was possible to impose a closer - intensity on a grain. Likewise, by shifting the pitch of a grain by the - difference, it was possible to better match the pitch contour of the output - to that of the target audio. This improved the results significantly in - smaller databases, as poor matches could be improved to match the target - more convincingly. - - \subsection*{Command line Interface} - In order to make the framework accessible to users, a commandline interface - was developed. By supplying arguments to the program, users could alter - parameters and experiment freely with the tool. Although this interface - was sufficient for testing and experimentation, it quickly became apparent - that there were too many parameters to pass to the program via the command - line interface on each run. A configuration file parser was created to - address this issue, allowing users to specify default parameters that would - be used by the program on each run. The combination of these interfaces - provided an effective means for accessing all of the framework's features. - - \subsection*{Documentation and API} - Complete documentation for the project was created in order to make the - project as user friendly as possible for both developers and users. As a - result, a full API is available alongside examples of use and instructions - for commandline operation. This was created in the hope that it might form - a usable package that developers can build on quickly and effectively to - build other CS projects, allowing for easier access to Python based CS than - is currently available. The command line interface is equally documented to - allow users to create their own realisations quickly and easily so that - this project may be used for creative sound design purposes. - - \section*{Results and Evaluation} - Overall, results generated by this project showed promise; a variety of - transformations were generated using open source instrument databases to - demonstrate the projects potential for sound design application. This - tested the project's ability to convincingly impose qualities of an - instrument onto target sounds. A variety of examples are provided that - outline the style of synthesis aimed for. These range from imposing - acoustic guitar qualities on an electric guitar to imposing stringed - instrument qualities on vocal melodies. Current results have a clear - synthetic nature, but still clearly exhibit some of the main - characteristics of the database used.\\ - - \noindent - Concatenator project examples that demonstrate current results can be found at:\\ - - *PERMENANT URL FOR RESULTS NEEDED*\\ - - \section*{Research Limitations/Potential Development} - In retrospect, a great deal of time was spent trying to improve the - efficiency of the project. Although this was necessary, as initial tests - were not feasible on most databases, it had a negative impact on the time - available for developing perceptual qualities of the output. As a result of - this, the overall quality of output may perhaps not be as natural as that of - other projects in this area. This is apparent in the vocal -> string - instrument examples. Phrases tend to begin and end abruptly, failing to - replicate any defined attack or decay of the string instruments, as would - be expected when hearing a string instrument naturally. Conversely, this - does give output it's own synthetic characteristic, which may be desirable - as perfect reproduction of an instrument may not be the reason for using - this tool.\\ - In Addition, the high computation required results in large amounts of time - needed to produce high quality results. An end user may not have the - patience required to to reach the quality of results that might be - possible. This is in part a set back of the Python language, and could be - better accounted for with further work on profiling the performance of the - tool.\\ - However, the fundamental concepts such as descriptor matching and - transforming matches to better fit the target, that are used in the most - sophisticated CS projects, have been implemented in this project to - satisfying creative effect. As a proof of concept, this project displays - the possibilities for CS in Python and there is evidently potential for - further development in this area.\\ - - There are a number of further improvements that could be made to this - project in order to improve the quality of results and extend it's overall - usefulness. Some initial ideas for improvements are detailed in this - section below. These range from reasonably simple modifications that could - not be implemented purely due to time constraints, to more complex ideas - that may take a considerable amount of work.\\ - - The current implementation uses only a small and relatively basic subset of - the audio descriptors available. This limits the analysis of audio and thus - the quality of matches. Using a larger set of more advanced descriptors may - improve quality from this perspective. One way would be to incorporate the - open source Essentia audio descriptors~\parencite{Essentia2016} giving the - project access to a vast quantity of descriptors for analysis.\\ - - Replacing the hanning window function used for grain windowing with a short - cross fade at grain overlaps should reduce amplitude modulation, resulting - in smoother transitions between grains. This might be further improved - through calculating the point of maximum similarity by cross-correlating - overlapping sections, as described by~\textcite[p.191-193]{Zolzer2011} in - the SOLA algorithm.\\ - - A lack of continuity between grains was observed in results, most likely - due to the lack of any comparison of selected grains. A Viterbi algorithm - could be used to account for this, allowing for a search to be done amongst - the top matches to find the optimal set of grains. This takes advantage of - the offline nature of the project and has been shown to work effectively in - the Talkapillar project~\parencite{Hueber}. - - Although the HDF5 filesystem allows for easy storage of descriptor values, - it also has drawbacks that limits the functionality of the project. One - significant problem is that it is difficult to implement parallel - processing using the library and for this reason asynchronous processing was - not implemented in the project. An alternative method of storage may - accommodate this more easily, allowing for the speed-ups possible through - asynchronous processing. The overall design of the database management was - also relatively naive and may benefit from being replaced by a technology - such as an SQL database or similar. This has been shown to work effectively - in work such as the CataRT project~\parencite[p.3]{Schwarz2006a}. - - \section*{Conclusion} - This project has provided a functioning Python based CS project with much - potential for further development. Given the number of technical issues - faced with this style of synthesis (from the big data issues faced with - analysis storage, to high efficiency requirements for processing the large - quantities of data), overall this project appears to work effectively. It - provides a new and accessible means for tapping some of the vast amount of - potential that concatenative synthesis has to offer.\\ With the ever - increasing quality of technology, it is predicted that new techniques such - as concatenative synthesis may grow further in popularity, leading to an - increasing number of possibilities in this area of sound synthesis. It is - hoped that this project might aid in the highlighting the possibilities - offered by this form of synthesis and demonstrate some of the technical - obstacles that must be addressed to design a CS project successfully. - - \section*{Acknowledgments} - The author would like to thanks A. Harker for his advice and guidance - as a mentor throughout the project, and to A. Harker and P. Chen for access - to their vocal samples database. Thanks also to D. Chaplin for his - creative input in generating results. - - \printbibliography -\end{document} +\documentclass{scrartcl} +\usepackage{enumitem} +\usepackage[british]{babel} +\usepackage[style=apa, backend=biber, maxnames=99]{biblatex} +\DeclareLanguageMapping{british}{british-apa} +\usepackage{filecontents} +\usepackage{url} +\usepackage{float} +\restylefloat{table} +\usepackage{perpage} +\MakePerPage{footnote} +\usepackage{abstract} +\usepackage{graphicx} +% Create hyperlinks in bibliography +\usepackage{hyperref} + +\renewcommand{\familydefault}{\sfdefault} +\usepackage{fontspec} +\setmainfont{Arial} + +\usepackage{blindtext} +\setkomafont{disposition}{\normalfont\fontsize{12}{17}\bfseries} +\setkomafont{section}{\normalfont\fontsize{12}{17}\bfseries} +\setkomafont{subsection}{\normalfont\fontsize{12}{17}\itshape} +\setkomafont{subsubsection}{\normalfont\fontsize{12}{17}\itshape} + +\graphicspath{{./resources/}} +\addbibresource{~/Documents/library.bib} + +% Hack to fix problem with underscores and other special charachters in +% Mendeley bibliography. +\DeclareSourcemap{% Used when .bib/Bibliography is compiled, not when document is + \maps{ + \map{% Replaces '{\_}', '{_}' or '\_' with just '_' + \step[fieldsource=url, + match=\regexp{\{\\\_\}|\{\_\}|\\\_}, + replace=\regexp{\_}] + } + \map{% Replaces '{'$\sim$'}', '$\sim$' or '{~}' with just '~' + \step[fieldsource=url, + match=\regexp{\{\$\\sim\$\}|\{\~\}|\$\\sim\$}, + replace=\regexp{\~}] + } + } +} + +\usepackage[affil-it]{authblk} + +% \usepackage{etoolbox} +% \makeatletter +% \expandafter\patchcmd\csname\string\maketitle\endcsname +% {\vskip\z@\@plus3fill} +% {\vskip\z@\@plus2fill\box\abstractbox\vskip\z@\@plus1fill} +% {}{} +% \makeatother +% + +\DeclareCiteCommand{\citeyearpar} + {} + {\mkbibparens{\bibhyperref{\printdate}}} + {\multicitedelim} + {} +\newenvironment{keywords}% + {\begin{trivlist}\item[]{\bfseries\sffamily Keywords:}\ }% + {\end{trivlist}} + +\begin{document} + \section*{Descriptor driven concatenative synthesis tool for Python} + Sam Perry\\ + E-mail: \href{mailto:u1265119@unimail.hud.ac.uk}{u1265119@unimail.hud.ac.uk} + \section*{Abstract} + A command-line tool and Python framework is proposed for the exploration of + a new form of audio synthesis known as `concatenative synthesis', a form + of synthesis that uses perceptual audio analyses to arrange small segments + of audio based on their characteristics. The tool is designed to + synthesise representations of an input target sound using a source database + of sounds. This involves the segmentation and analysis of both the input + sound and database, the matching of input segments to their closest segment + from the database, and the re-synthesis of the closest matches to produce + the final result.\\ + The project aims to provide a tool capable of generating high-quality + sonic representations of an input, to present a variety of examples that + demonstrated the breadth of possibilities that this style of synthesis has + to offer and to provide a robust framework on which concatenative synthesis + projects can be developed easily. The purpose of this project was primarily + to highlight the potential for further development in the area of + concatenative synthesis, and to provide a simple and intuitive tool that + could be used by composers for sound design and experimentation. The + breadth of possibilities for creating new sounds offered by this method of + synthesis makes it ideal for digital sound design and electroacoustic + composition.\\ + Results demonstrate the wide variety of sounds that can be produced using + this method of synthesis. A number of technical issues are outlined that + impeded the overall quality of results and efficiency of the software. + However, the project clearly demonstrates the strong potential for this + type of synthesis to be used for creative purposes. + \begin{keywords} + Concatenative synthesis; Python; audio descriptor; audio analysis; command line tool; Python framework; Python sound; + \end{keywords} + + \section*{Acknowledgments} + I would like to thank A Harker for his advice and guidance as a mentor + throughout the project, and A Harker and P Chen for access to their + vocal samples database. Thanks also to D Chaplin for his creative input + in generating results. + \pagebreak + + \section*{Background} + The concept of constructing a new sound by arranging collections of smaller + sounds has gained popularity in the past 30 years through the introduction + of granular synthesis, which works on the theory that any sound can be + described through the arrangement of smaller samples (referred to as + `grains'). This representation of sound allows for the temporal + decomposition and re-arranging of real-world samples, with the potential to + create new `complex, dynamically-evolving + sounds'~\parencite[p.1]{Roads1988}.\\ + + Concatenative synthesis (CS) is a form of synthesis that has developed + significantly over the past 15 years, driven by recent advancements in + technology. The key advancements have been in ease of access to large databases of + audio and the development of methods for extracting useful information from + these databases automatically~\parencite[p. 1]{Schwarz2006a}. CS utilises + these technologies to provide a content-based extension to granular + synthesis; analysis of a database of source grains enable them to be + differentiated based on their characteristics. These characteristics can + then be used for grain selection in the process of synthesising output for + a wide range of applications~\parencite[p. 102]{Schwarz2007}. + + \section*{Related works} + A number of programs utilise CS to achieve various goals. The process has + been used for applications in areas such as speech synthesis, instrument + synthesis and creative sound design.\\ + The wide range of applications demonstrates the versatility of this + synthesis technique. It differs from traditional synthesis methods as it + uses real recorded samples, as opposed to traditional methods that focus on + defining sets of rules for emulating real sounds. By transforming samples + that have been directly recorded from a source, the subtle nuances of the + source's sound are preserved. These would be difficult to reproduce using + other synthetic methods for modelling an + instrument~\parencite[p. 24]{Maestre2009}. + + \subsection*{Speech synthesis} + Creating a natural and intelligible realisation is an important factor when + developing a speech-synthesis system. The Talkapillar project is one such + example of how highly convincing results are possible with CS. Through + careful analysis of a vocal database, the project aims to impose the + qualities of the database voice on an input voice. This would result in the + words of the input speaker being transformed to appear as if they were + spoken by the voice in the database.~\parencite{Hueber} + + \subsection*{Instrument Synthesis} + Progress has also been made in improving the quality of instrumental + synthesis. As with speech synthesis, the use of samples directly allows for + natural-sounding results, which provides a method for reproducing real + instruments convincingly. Another important aspect of instrumental synthesis is that of performer + expression. The reproduction of performance qualities such as dynamics, + timbre and timing is essential when emulating a real instrument and CS has + been used to effectively reproduce these aspects. This is achieved through + splicing of grains based on their expressive characteristics to form + musical phrases. For example, just as a violinist might transition + seamlessly from one articulation to the next, the CS software will join + grains to produce the variation in articulations. This contrasts with the + traditional approach to sampling, where samples are played in isolation, + resulting in a discontinuity between adjacent samples~\parencite[p. 82]{Lindemann2007}. + The Catapillar project is one such example of this use of CS. + By using a Viterbi algorithm, the project is able to calculate the + smoothest overall transition between grains across the output, resulting + in convincing synthesis of orchestral instrument performances~\parencite[p. 5]{Schwarz2003}. + + \subsection*{Creative sound design} + The flexibility of CS allows for creativity in a broader context than simply + emulating real-world instruments and speech. It can also be used as a tool + to explore the possibilities for synthesising new abstract sounds for + creative purposes.\\ + A prominent project in this area of CS is IRCAM's CataRT + project~\parencite{Schwarz2006a}. The project focuses on the playback of + source grains based on their proximity to a target in multi-dimensional + descriptor space. Providing a target point in the descriptor space enable the + user to navigate the database, playing selections of samples that + are nearest to the target. This allows the user to explore the database + intuitively through a graphic user interface, selecting a point in + 2-dimensional space with the mouse. Grains are then played back in + real-time to create an `audio mosaic'.\\ + Alternatively, target audio can be provided and analysed to create a target + location based on it's location in the descriptor space. Tremblay and + Schwarz's~\citeyearpar{Tremblay2010} use of CataRT to explore + electroacoustic sample banks demonstrates the creative potential of this + method. CS is used in this context as a means of synthesising matches in a + corpus database to real-time input from an electric bass. Significance is + placed on linking the playback of grains to the expressively of the + performer. The use of perceptually based audio descriptors to match the + source to the target allows the performer to navigate the database + naturally based on factors such as the pitch and timbre of the bass + guitar. The result is a performance that mixes characteristics of both the + bass guitar output and the qualities of the corpus database to create a + hybrid of the two.\\ + This is by no means an exhaustive overview of the projects and techniques + that explore the vast possibilities of CS. Further information can be found + in the article by~\parencite{Schwarz2006b} + \pagebreak + + \section*{Concatenator} + The Concatenator project aims to provide an open source tool that allows + composers to generate a variety of CS driven realisations for sound design + purposes. In addition, the project aims to provide an intuitive API that + Python programmers might use as the fundamental building blocks on which to + build further CS applications. The result is a framework and command-line + interface, built in Python, for easy access to basic CS techniques. All + relevant material including source code, results, and documentation can be + found in the official online project repository~\parencite{perry2016a}. + The current implementation can be used for the concatenation of a source + database onto target audio files, using a range of perceptual audio + descriptors for matching. Database management, simple matching and + synthesis algorithms are used to achieve this, and are described in the + following sections. \\ + + The features and uses of this tool are most comparable to those of the + MATConcat project~\parencite{sturm2004}, which was developed to provide an + open source tool for generating similar representations of audio in MATLAB. + Although there are technical differences such as the number of descriptors + available for each project, both share a similar focus on the + electro-acoustic compositional applications of CS. Results produced for the + MATConcat project are comparable to those of the Concatenator project, and + both work offline to produce results. The Concatenator project builds on + this by providing a wider variety of descriptors and the ability to + artificially enhance matches (as discussed in the~\hyperref[sat]{Synthesis + and Transformations section}). + + \section*{Program design and implementation} + The Concatenator project consists of a number of components that work + together to produce the final output. A complete description of all + components and there usage in the Concatenator project can be found in it's + documentation.\\ + + Output is generated by analysing overlapping segments of audio (known as + grains) from both the target sound and the source database, then searching + for the closest matching grain in the source database to the target sound. + Finally, the output is generated by applying a hanning window and + overlap-adding the best matches. Each component is discussed in detail + in the following sections.\\ + + When designing the Concatenator framework, ease of development, use and + extensibility were primary considerations. It was for these reasons that + the framework was written in the Python programming language. Python has + grown in popularity in the scientific community recently, primarily due to + its focus on productivity, readability and the large number of efficient + numeric processing libraries available (\cite{Pedregosa2011, + Fangohr2014, Scipy}). This makes Python a good choice for + quickly developing ideas in the context of audio signal processing. + Unfortunately, the language does sacrifice processing speed for simplicity, + and as a result, is not suitable for real-time signal processing. Other + performance-focused languages such as C++ are better suited to this type of + processing. However, it was decided that the increase in productivity, lack + of prior CS research in Python and the author's previous experience made + it the most suitable choice for this project.\\ + + The choice to limit the project to offline processing has both positive and + negative implications for the function of the project. A key disadvantage + of this type of processing is the lack of possibility for any live + performance aspect. This method provides no way of exploring the feedback + between performer and system in a live environment, as in the work + of Tremblay and Schwarz~\citeyearpar{Tremblay2010}. + However, there are advantages to offline processing that would not be + possible in a real-time context.\\ + One significant advantage is that databases can afford to be far larger + than they could be in real time. Without the requirement to process output in + a short period of time, more time can be taken to search vast databases in + the hope that the closest match to a target will be found.\\ + Another advantage is in the global view of a target that can be taken in an + offline approach. Because the complete audio file is available from the + start of processing, techniques can be applied that consider the output as + a whole, rather than on a grain-by-grain basis. This allows for algorithms + such as the Viterbi algorithm to find the sequence of grains that provide + the best continuity, as demonstrated in the Catapillar + project~\parencite[p. 4]{Schwarz2003} This would not be possible in + real-time, as audio is processed 'on the fly'.\\ + + An additional consideration was the method to be used for controlling the + target to which the grains would be matched. It was decided that the most + interesting results would be produced through the matching of grains to a + target audio file, as opposed to other approaches such as matching to MIDI + scores. In this sense the project is a form of offline audio-mosaicking + tool similar to that of CataRT. + + \subsection*{Descriptor Implementation} + In order to differentiate between grains, a number of audio descriptors + were implemented. Audio descriptors are used to measure a specific + characteristic of a signal~\parencite[p. 31]{Lerch2012}. For example, a + root mean square (RMS) descriptor was implemented to give an indication of + the overall intensity of the grain. Another example is the fundamental + frequency (F0) descriptor, which was implemented to give a value relating + to pitch for harmonic grains. These values could then be used by the + matching algorithm in order to find the best match between the source and + target grains.\\ + Owing to time constraints on the project, only a limited number of basic + descriptors were implemented. For this reason, the project was designed so that new + descriptors could easily be added. The object-oriented + design of the descriptors provides the potential for quick development of + any future descriptors to be added. + + \subsection*{Database design} + When generating descriptors for large databases, large amounts of data are + produced and so an efficient method of storing and retrieving the data was + needed in order to manage this. The Python interface to the HDF5 + filesystem~\parencite{Collette2016} was chosen for it's simplicity and + ability to compress the data automatically. Storing Numpy arrays of + descriptors in groups allowed for quick and easy access to analyses from a + single, organised source. + + \subsection*{Matching algorithms} + In order to match grains using the descriptor values, a matching algorithm + was required. Initially a brute-force matcher was used to compare each + descriptor value in the target to all values of the same descriptor type in + the source. However, it quickly became apparent that this approach would be + far too slow, particularly for a larger database.\\ + For this reason, a k-dimensional tree search algorithm was used in an + effort to improve matching efficiency. This approach produced the same + results as the brute force matcher, but by arranging descriptors in a tree + structure, a far more efficient search to find the best match was possible. + This reduced matching time considerably. + + \subsection*{Synthesis and transformations} \label{sat} + The final step in the program was to synthesise the matched output. + This process consisted of: + \begin{enumerate} + \item Retrieving the best grain matches returned by the matching algorithm + \item Applying a window function + \item Overlapping the grains + \item Transforming grains to match the target + \item Saving the result to a file + \end{enumerate} + Initially, grains were not transformed to better match the target. This + worked effectively for large databases; however, it was observed that + results synthesised using small databases were of a lower quality, as the + chance of a closely matched grain was lower. To account for this, methods + for altering grains to better match their target were implemented. It was + decided that the two most significant characteristics to alter were the + pitch and intensity of the grains. By scaling the grains by the difference + between the source and target RMS, it was possible to impose a closer + intensity on a grain. Likewise, by shifting the pitch of a grain by the + difference, it was possible to better match the pitch contour of the output + to that of the target audio. This improved the results significantly in + smaller databases, as poor matches could be improved to match the target + more convincingly. + + \subsection*{Command-line interface} + In order to make the framework accessible to users, a command-line interface + was developed. By supplying arguments to the program, users could alter + parameters and experiment freely with the tool. Although this interface + was sufficient for testing and experimentation, it quickly became apparent + that there were too many parameters to pass to the program via the command + line interface on each run. A configuration file parser was created to + address this issue, allowing users to specify default parameters that would + be used by the program on each run. The combination of these interfaces + provided an effective means for accessing all of the framework's features. + + \subsection*{Documentation and API} + Complete documentation for the project was created in order to make the + project as user friendly as possible for both developers and users. As a + result, a full API is available alongside examples of use and instructions + for command-line operation. This was created in the hope that it might form + a usable package that developers can build on quickly and effectively to + build other CS projects, allowing for easier access to Python-based CS than + is currently available. The command-line interface is equally documented to + allow users to create their own realisations quickly and easily so that + this project may be used for creative sound design purposes. + + \section*{Results and evaluation} + Overall, the results generated by this project showed promise; a variety of + transformations were generated using open source instrument databases to + demonstrate the projects potential for sound design application. This + tested the project's ability to convincingly impose qualities of an + instrument onto target sounds. A variety of examples are provided that + outline the style of synthesis aimed for. These range from imposing + acoustic guitar qualities on an electric guitar to imposing stringed + instrument qualities on vocal melodies. Current results have a clear + synthetic nature, but still clearly exhibit some of the main + characteristics of the database used. + + \section*{Research Limitations/Potential Development} + In retrospect, a great deal of time was spent trying to improve the + efficiency of the project. Although this was necessary, as initial tests + were not feasible on most databases, it had a negative impact on the time + available for developing perceptual qualities of the output. As a result of + this, the overall quality of output might not perhaps be as natural as that + of other projects in this area. This is apparent in the + vocal~\textrightarrow~string instrument examples. Phrases tend to begin and + end abruptly, failing to replicate any defined attack or decay of the + string instruments, as would be expected when hearing a string instrument + naturally. Conversely, this does give output it's own synthetic + characteristic, which may be desirable as perfect reproduction of an + instrument may not be the reason for using this tool.\\ + In addition, the amount of computation required results in large amounts of time + needed to produce high quality results. An end user may not have the + patience required to reach the quality of results that might be + possible. This is in part a drawback of the Python language, and could be + better accounted for with further work on profiling the performance of the + tool.\\ + However, the fundamental concepts such as descriptor matching and + transforming matches to better fit the target, which are used in the most + sophisticated CS projects, have been implemented in this project to + satisfying creative effect. As a proof of concept, this project displays + the possibilities for CS in Python and there is evidently potential for + further development in this area.\\ + + There are a number of further improvements that could be made to this + project in order to improve the quality of results and extend it's overall + usefulness. These range from reasonably simple modifications that could not + be implemented purely due to time constraints, to more complex ideas that + may take a considerable amount of work. The following is a list of some + initial ideas for improvements.\\ + + \begin{itemize} + \item The current implementation uses only a small and relatively basic + subset of the audio descriptors available. This limits the analysis + of audio and thus the quality of matches. Using a larger set of + more advanced descriptors may improve quality from this + perspective. One way would be to incorporate the open source + Essentia audio descriptors~\parencite{Essentia2016} giving the + project access to a vast quantity of descriptors for analysis. + + \item Replacing the hanning window function used for grain windowing + with a short cross-fade at grain overlaps should reduce amplitude + modulation, resulting in smoother transitions between grains. This + might be further improved through calculating the point of maximum + similarity by cross-correlating overlapping sections, as described + by~\textcite[p.191-193]{Zolzer2011} in the Synchronus OverLap Add + (SOLA) algorithm. + + \item A lack of continuity between grains was observed in results, most + likely owing to the lack of any comparison of selected grains. A + Viterbi algorithm could be used to account for this, allowing for a + search to be done amongst the top matches to find the optimal set + of grains. This takes advantage of the offline nature of the + project and has been shown to work effectively in the Talkapillar + project~\parencite{Hueber}. + + \item Although the HDF5 filesystem allows for easy storage of + descriptor values, it also has drawbacks that limits the + functionality of the project. One significant problem is that it is + difficult to implement parallel processing using the library and + for this reason asynchronous processing was not implemented in the + project. An alternative method of storage may accommodate this more + easily, allowing for the speed-ups possible through asynchronous + processing. The overall design of the database management was also + relatively naive and may benefit from being replaced by a + technology such as an SQL database or similar. This has been shown + to work effectively in work such as the CataRT + project~\parencite[p.3]{Schwarz2006a}. + \end{itemize} + + \section*{Conclusion} + This project has provided a functioning Python based CS project with much + potential for further development. Given the number of technical issues + faced with this style of synthesis (from the big data issues faced with + analysis storage, to high efficiency requirements for processing the large + quantities of data), overall this project appears to work effectively. It + provides a new and accessible means for tapping some of the vast amount of + potential that concatenative synthesis has to offer.\\ + With the ever increasing quality of technology, it is predicted that new + techniques such as concatenative synthesis may grow further in popularity, + leading to an increasing number of possibilities in this area of sound + synthesis. It is hoped that this project might aid in the highlighting the + possibilities offered by this form of synthesis and demonstrate some of the + technical obstacles that must be addressed to design a CS project + successfully. + + \pagebreak + + \printbibliography +\end{document} +>>>>>>> Stashed changes