SAVAS · Sharing AudioVisual language resources for Automatic Subtitling

FP7Status: CLOSED1 May 2012 – 30 April 2014EU funding €1,978,000

Due to recently approved European and National directives and laws, the subtitling demand has grown fast in the past few years throughout Europe. The path of manual subtitling is no longer feasible, due to the quantity of the demand and the cost of the process, both in terms of time and personnel. As a result, broadcasters and subtitling companies are seeking for subtitling alternatives more productive than the traditional manual process. Large Vocabulary Continuous Speech Recognition (LVCSR) is proving to be a useful technology for such a purpose. Respeaking – a technique in which a professional listens to the source audio and re-speaks it to a speech recognition engine which transcribes it – is consolidating as the main subtitling technique employed for live and pre-recorded broadcast productions. Another trend in use today is the application of speech recognition to automatically generate a transcript of a programme's soundtrack without the need of a respeaker, and to use this as the basis of subtitles. Unfortunately, the high cost associated to the collection and annotation of the speech and text corpora required to train each LVCSR system for respeaking and/or automatic transcription has hindered the development of new languages and application domains. However, in order to comply with the new audiovisual legal framework, European broadcasters and subtitling companies are generating speech and text corpora suitable for LVCSR developments on a daily basis. SAVAS aims to acquire, share and reuse audiovisual resources of broadcasters and subtitling companies so that high-tech European ASR companies can use the shared data to develop domain specific LVCSRs and/or LVCSRs in new languages to solve the automated subtitling needs of the media industry. Within the project, data and LVCSR technology for automated subtitling will be collected, shared and developed for the following six languages: Basque, Spanish, Italian, French, German and Portuguese.

Consortium · 8 organisations

coordinator

FUNDACION CENTRO DE TECNOLOGIAS DE INTERACCION VISUAL Y COMUNICACIONES VICOMTECH

ES · €444,801

participant

RADIO E TELEVISAO DE PORTUGAL SA

PT · €90,750

participant

EUSKAL TELEBISTA TELEVISION VASCA SA

ES · €100,200

participant

VOICEINTERACTION – TECNOLOGIAS DE PROCESSAMENTO DA FALA, SA

PT · €364,260

participant

MIXER SERVICIOS AUDIOVISUALES SL

ES · €140,904

participant

SYNTHEMA S.R.L.

IT · €638,170

participant

RAI-RADIOTELEVISIONE ITALIANA SPA

IT · €98,121

participant

SWISS TXT AG

CH · €100,794

Research fields

social sciences law

View the official record on CORDIS →

← Find collaborators and more funded projects

Source: CORDIS, Publications Office of the European Union. Global Research Partnerships surfaces open EU research data to help you find collaborators; we are not affiliated with the European Union.