Video impression of the symposium:
View all video recordings of the lectures, demonstrations and concert here.
Download the booklet with abstracts here.
Synthetic speech is part of modern everyday life. Artificial voices do not only occur in multifaceted technological uses, but they also feed back into researching the natural human voice. Moreover, artists, musicians and composers find a source of inspiration in the artificial sound of such voices. The symposium inquires both the richness of the human voice and the limits and surplus of its theoretical modelling and mechanical and digital imitation. We are specifically interested in modelling and synthesizing so-called "extended vocal techniques" - all sounds the human voice can produce, that exceed conventional singing and speaking. The symposium covers the history of the artificial voice, extended vocal techniques, aspects of theoretical modelling and technical realization, and the role of the artificial voice in contemporary music. Academics, scientists and artists come together to exchange ideas and insights in three days of presentations, meetings, workshops and a concert. With a group of international experts we place the artificial voice in a broad perspective of historical, technical, socio-cultural, artistic and musical investigation.
Scheduled are sessions on:
- Mechanical voice synthesis
- Extending the voice
- Digital voice synthesis
Artist Martin Riches demonstrates his Talking Machine, MotorMouth and Singing Machine.
The Talking Machine performs in the concert in a new work by vocalist Ute Wassermann, who also gives a lecture-workshop.
Dr. Fabian Brackhane demonstrates his replica of the von Kempelen machine and gives an overview of the history of mechanical voice synthesis. He also talks about the relation of mechanical speech synthesis to organ technology.
Prof. dr. Hans Fidom discusses the relation between organ and voice and the vox humana.
Prof. dr. Julia Kursell discusses the artificial voice in the history of acoustics and elaborates on Helmhotz's vowel synthesis.
Prof. dr. Bruno Bossis talks about the models of the voice and the voice as model in electroacoustic music.
Dr. Hannah Bosma discusses how various ways of modelling the voice imply ideas on what the human voice is and how it functions, and poses the question of the possibility of alternative models.
Dr. Michael Edgerton gives a systematic overview of extended vocal techniques and analyses these as nonlinear phenomena.
Peter Pabon discusses digital voice synthesis: how to model the "grain" of the voice
Dr. Arthur Dirksen discusses concatenative synthesis and why it sounds better than synthesis that is exclusively based on models of the voice.
Dr. Nicolas d'Alessandro demonstrates his singing synthesis instruments (on tablet and smartphone), perform in the concert and talk about digital voice synthesis techniques, creating and controlling artificial voice. He also gives an artistic-technical workshop at STEIM.
Symposium: 11 & 12 May at the University of Amsterdam (Doelenzaal) and Orgelpark
Concert: 11 May at Orgelpark
Academic expert meeting: 12 May at the University of Amsterdam (Belle van Zuylenzaal)
Artistic-technical workshop: 13 May at STEIM
Artistic expert meeting: 13 May at STEIM
prof. dr. ir. Remko Scha (1945-2015), dr. Hannah Bosma, prof. dr. Julia Kursell.
With financial and practical support of:
University of Amsterdam: Computational Linguistics, Musicology, Amsterdam School for Cultural Analysis ASCA;
Royal Netherlands Academy of Arts and Sciences KNAW;
Orgelpark, Organ Studies Vrije Universiteit Amsterdam;
We thank the Berlinische Galerie - museum for modern art, photography and architecture for lending Martin Riches's MotorMouth.
The Art of Voice Synthesis is an initiative of Remko Scha, artist and professor emeritus in computational linguistics at the University of Amsterdam.
Remko Scha passed away on 9 November 2015.
We organize this conference in sad and thankful remembrance of his enthusiasm, generosity, keen interest and inspiring ideas.
Obituary Remko Scha (1945-2015)
Creating an artificial voice has been a preoccupation already for several centuries. The first mechanical models imitated parts of the human body that were most clearly involved in vocal production. Later models were based on theoretical principles for the speaking (and sometimes: singing) voice. Electroacoustic synthetic or re-synthesized (quasi-)vocal sounds have been used in contemporary music and art already for decades. Nowadays, artificial voices are abundant: such as in voice response systems (telephone), in navigation systems, as aid for the vocally or visually disabled, and as commercial singing synthesis computer programme (Yamaha's Vocaloid). The current success of artificial voices comes with a change of focus from synthesis by rules, based on a model of the voice, to the use of recorded voices that are analysed, cut into tiny fragments, manipulated and put together to form new utterances, facilitated by the enormous increase of computer data capacity and computing power.
We wish to look at the full range of techniques that are used and that have been used: from mechanical replication of the synthesis process (von Kempelen, 1791), through theory-based modelling (electro-mechanical: Helmholtz, 1863; or digital: Klatt, 1980s; synthesis by rule; physical modelling), to methods that are based on audio recordings (synthesis by analysis, analysis/re-synthesis; and concatenation, such as Vocaloid).
Our point of departure is contemporary music. Therefore, a system's capacity to produce acceptable-sounding speech is not our ultimate evaluation criterion. Also, the replication of classical opera technique, though definitely interesting, is not enough. Our reference frame includes the use of extended vocal techniques in the 20th century avant-garde (such as developed by Cathy Berberian, Trevor Wishart and others), as well as the singing styles of various popular and ethnic traditions. A new research question thus emerges: the artificial generation of the complete repertoire of human vocal possibilities.
What are the limitations of the existing voice synthesis models and techniques? And what do these limitations reveal of the complexity and diversity of real, embodied human voices? Is it possible to synthesize "the grain of the voice" (R. Barthes)?
Voice synthesis is, to varying degrees, based on a model of the voice informed by phonetics and voice acoustics. Time-to-frequency transformation (Fourier analysis), sound spectrum analysis, formants, and the source-resonance principle (larynx - vocal tract) are among its basic concepts. Musical instruments functioned as models, objects and inspiration for the science of acoustics; and with respect to the voice, the organ seems to be a privileged metaphor. What attitude(s) towards voice and sound do the existing voice synthesis models and the underlying concepts imply? How is this related to conceptions of sound, music, voice, body, gender and nature?
What alternative models have been conceived of the voice and its artificial synthesis? What alternative models could we think of? If temporal acuity is central in auditory processing (Oppenheim & Magnasco 2013), and the ear does not (only) perform spectral analysis, what are the consequences for the prevalent models of the voice in which vocal spectra (with formants) are of primary importance?
How are artificial voices used in music and other arts? Does this offer a different perspective on the models and methods of voice synthesis? And does the artistic use of voice synthesis offer different perspectives on the voice in general or on specific voices in particular?
In the realm of these questions we are organizing a conference where technical and historical experts meet to discuss the potentials, limitations, implications and contexts of the different voice synthesis techniques.