The Sixties: Speech Synthesis
The research on music technology at University of Padova started in 1959, when Giovanni Battista Debiasi realized a photoelectric organ where the oscillations were produced by a rotating wheel with splits which modulated the light reaching a photodiode. The envelope was produced by lamp transitory and a sliding window, and the harmonic mixing by collecting the luminous fluxes on the photocathode of the same photocell.
In 1965, after a period at Stanford University, Debiasi started a research on digital speech synthesis in time domain. The idea was to develop a text to speech synthesis for Italian by experimentally identifying an optimal set of elementary segments extracted from the spoken language which – if appropriately recombined together – allowed the synthesis of any message whilst having as its main objective the intelligibility of the same. For this purpose, in syllabic languages as Italian, it is possible to normalize the intensity, duration and pitch of the various segments, which entails a reduction in their number, a simplification in their search and, subsequently, in the rules of text to speech conversion.
For each isolated vowel two units were devised: the first containing the initial transient and part of the steady state, the second containing the remaining part of the steady state and the transient to rest. For consonant-vowel groups one unit was used for consonant to vowel transition, while the second unit was the same as in the isolated vowels. Almost all isolated consonants required a unit. Very few groups, as some diphthongs, required specific couples of units. In total the system was composed by 165 units of 125 ms duration, allowing a very intelligible speech synthesis, even if of a robotic quality.
The system was later expanded for German, Greek and Serbo-Croatian languages, which all are syllabic languages.
- Debiasi, G. B. (1959) Sulla riproduzione del suono dell’organo a canne con organi elettronici. L’Elettrotecnica, XLVI(11), 754–765.
- Francini G.L., Debiasi G.B. & Spinabelli R.D. (1968) Study of a system of minimal speech reproducing units for italian speech. Journal of Acoustical Society of America, 43(6), 1282-1286
The Seventies: Computer Sounds
Activity in computer music at the University of Padova began in 1972, using the experience acquired in the field of voice synthesis and analysis.
For music research the facilities used for speech synthesis were not sufficient. Thus, the first research objective was to develop an easy-to-use – yet flexible – in application complete system for music production with the equipment of the Computing Centre of the University, an IBM mainframe connected to a IBM S/7 for four-channel digital-analog high quality conversion. The first musical sounds could be heard at the end of 1973.
For batch synthesis MusicV and Music360 programs were used.
For interactive synthesis the Interactive Computer Music System (ICMS), operating in multi-programming environment, was created by G. Tisato. It allowed real-time synthesis, editing and mixing of selected musical material, reverberation and spatial distribution on four channels, LPC sound analysis and synthesis using any sound source as stimulus. The system was successfully used in the production of many musical works, and for acoustic and psycho-acoustic research, as well as for educational purposes.
As for speech research, Graziano Tisato and Gian Antonio Mian developed a system for the automatic translation of any Italian text into a naturally fluent speech. The system was built up around a phonological processor, which mapped the phonological rules of Italian into prosodic structures, and made of a synthesizer, which processed and joined LPC coded diphones (derived from the previous research on concatenative speech synthesis).
Two systems were realized for computer-aided composition: the MUSICA language for alphanumerically encoding traditional scores for computer, and the Emus software, which allowed the definition of both the general structure of the composition and the relationships among the individual sounds, with particular attention to acoustic aspects.
In 1979 Debiasi founded the Centro di Sonologia Computazionale, one of the first computer music laboratories in Europe.
- Dashow, J., De Poli, G., Tisato, G., & Vidolin, A. (1978). Computer music at Padua University. In Proceedings of the 1978 International Computer Music Conference (ICMC-78), 486-493. Evanston, ILL, USA.
- Tisato, G. (1976). An interactive software system for real-time sound synthesis. In Proceedings of the 1976 International Computer Music Conference (ICMC-76), 135–143. Boston, MA, USA.
The Eighties: Computer as a Musical Instrument
The research has been mainly oriented to make the computer a musical instrument, developing systems for real-time synthesis and processing oriented to live electronics performance. Giuseppe Di Giugno, in cooperation with IRCAM, La Biennale di Venezia and CSC, built the 4i sound processor. In the same period, a comprehensive software system for musical application was developed. It was then used in many important musical productions such as Prometeo, Tragedia dell’ascolto by Luigi Nono (1984) and Perseo ed Andromeda (1990) by Salvatore Sciarrino.
As home computing streak around the globe, CSC changed moved its focus on digital signal processing and musical sound analysis and synthesis, aiming at both supplying composers with new timbres and developing efficient algorithms for low-cost computer music systems.
Several synthesis techniques have been investigated, in particular a generalized VOSIM oscillator, special functions for waveshaping synthesis, FM with phase or frequency series modulators, as well as special discrete modulation with phase distortion.
Giovanni De Poli collaborated with Aldo Piccialli of the University of Naples Federico II, working on time domain algorithms for sound synthesis. Different strategies were proposed to bring methodologies and techniques of digital signal processing into a granular synthesis context. With this goal in mind they developed a scenario in which the grains are synchronized with the pitch period of the signal. Pitch Synchronous Granular Synthesis is based on the source-filter model of sound production: the spectral envelope of the sound is determined by the Fourier transform of the grain, whereas the pitch articulations are determined by the temporal location of the grain. In this case, the grains can be interpreted as finite impulse response of a filter and are synchronized with the pitch period of the signal. Waveform design and transformation techniques were proposed to produce sounds with time-varying formant regions.
In physical modeling synthesis, De Poli and Piccialli studied efficient algorithms for the simulation of specific musical instruments and the main mechanisms of sound excitation.
The sound analysis system helped the researchers to deeply understand the acoustics of multiphonics on woodwind instruments, a technique experimented on 20th Century music, in which several tones are produced at once by using new fingering or particular embouchures.
In 1989 the study on the voice started in the Seventies culminated in a research by Tisato on the overtone singing that investigated the technique of vocal virtuosity of a singer who – holding constant the fundamental frequency – is able to exalt some upper harmonics, thus creating a two-voice melody. The study was very interesting both from a perceptive and musical point of view.
- Debiasi, G. B., De Poli, G., Tisato, G., & Vidolin, A. (1984) Centro di Sonologia Computazionale C.S.C. University of Padua. In Proceedings of the 1984 International Computer Music Conference (ICMC84), 287–298. Paris, France.
- De Poli, G. (1984).Sound synthesis by fractional waveshaping. Journal of the Audio Engineering Society, 32(11), 849–861
- De Poli, G. & Piccialli, A. (1991) Pitch-synchronous granular synthesis. Representations of Musical Signals, De Poli, G., Piccialli, A., & Roads, C. (eds), 187–219. MIT Press, Cambridge, MA, USA.
Research on timbre aimed at investigating the physical relationships that exist among the sounds and at explaining the main factors differentiating timbres.
We used different acoustic analysis methods derived both from digital signal processing (such as mel-frequency cepstral coefficients) and from computational auditory models (such as classical auditory model and advanced physical model of the cochlea) to obtain relevant parametric representation. Then, with SOM neural networks and multivariate analysis we could obtain low dimension physical timbre spaces, which preserve the perceptual topology of musical timbres: different sounds are distinguishable and, at the same time, similar sounds are close together.
These physical timbre spaces support the importance of the features of the steady-state portion when evaluating timbre quality, and shifted the importance of the attack to the act of source recognition.
Research on physical model synthesis continued by defining the concept of a generalized exciter and resonator as a unifying element. This structure allowed the realization of most of the classical mechanical and fluid dynamic exciters of musical instruments, as well as pseudophysical exciters.
The problem of computational delay free loop, which often arise in was addressed. Two original methods have been proposed to solve this problem: the K-method, which uses geometric transformations of non-linearities and algebraic transformations of equations in the time domain, and a generalization of the formalism of the wave digital filters applicable to non-linear elements. This last proposal was well matched with wave-guide models that are widespread in musical instrument simulations.
As for restoring audio documents, we faced the problem of improving existing algorithms, both in terms of efficiency and quality of results, and in extending their applicability to previously neglected sounds and music, such as the electronic music.
In affective computing, research on expressive music performance was initially oriented to analyze many musical performances played with different expressive intentions and to look for the relationships between measurable parameters and the various intentions, to understand what the strategies employed by the performers were. These analyses allowed to develop computational models for expressive content rendering and processing in multimodal interactive systems.
- Borin, G., De Poli, G., & Sarti, A. (1992) Algorithms and structures for physical models synthesis. Computer Music Journal, 19(4), 30–42.
- Canazza S., De Poli G., Drioli C., Rodà A., & Vidolin A., (2000) Audio Morphing Different Expressive Intentions for Multimedia Systems. IEEE Multimedia, 7(3), 79-83.
- De Poli, G. & Prandoni, P. (1997). Sonological models for timbre characterization. Journal of New Music Research, 26(2), 170–197.
- Sarti, A. & De Poli, G. (1999). Toward nonlinear wave digital filters. IEEE Transaction on Signal Processing, 47(6), 1654–1668.
Today CSC carries out research in widely differing fields as Musical Cultural Heritage preservation and valorization, Multimodal Interaction for Learning and Well-Being, Computational Creativity, Acoustic analysis for Safety and Security in the Workplace. Our Visions of Sound is to use the music from different perspectives, to facilitate the inclusion of people with disabilities and the dialogue among different cultures and populations.
For further details on our current research, see the page on our Research Areas.