Celtic Knot Conference 2017/Programme/CK110
Title: Welsh/Celtic speech technology in Wikipedia
Date: 6 July 2017
Time: 2:40pm to 3pm
Duration: 15 minute presentation (5 mins Q&A).
Venue: University of Edinburgh Business School - Lecture Theatre 1B.
- Delyth Prys, Head of the Language Technologies Unit, Canolfan Bedwyr.
Overview of topic:
Text-to-speech and speech recognition are becoming increasingly important in our digital world. Major languages such as English are well catered for, but smaller languages such as Welsh and the other Celtic languages are often left behind. Wikipedia is both a huge resource for the creation of Celtic automatic speech capabilities and a platform for deploying the technology. A new project to make text-to-speech possible for Wikipedia has been announced for English and Swedish, (see https://www.mediawiki.org/wiki/Wikispeech) which may be extended in time to other languages. However, as far as we know, there are no plans yet to develop speech recognition in the Wikipedia environment, and speech recognition for the Celtic languages in general remains underdeveloped. In our Welsh National Language Technologies Portal we have published the work we have done so far in this field (see http://techiaith.cymru/speech/?lang=en) aiming at disseminating our resources on free and generous licences. We now wish to engage with our Celtic colleagues to explore how we can create speech recognition for our languages with Wikipedia, starting with training in named entities, and questioning and answering modules e.g. who was, where is, where/when was someone born etc.
Notes: Etherpad link.