Cylch newydd gwerth ei halen
Newyddion da o lawenydd mawr! Mae’r Uned wedi ennill grant dwy flynedd i sefydlu rhwydwaith ymchwil rhwng diwydiant ac academia yng Nghymru ym maes Technolegau Iaith a Lleferydd.
Bydd hyn yn ein galluogi ni yng Nghymru i fanteisio ar ein dwyieithrwydd i greu adnoddau a chymwysiadau a fydd yn gwella’r ffordd rydyn ni’n rhyngwynebu gyda chyfrifiaduron yn y Gymraeg, fel yn achos ein llais synthetig sydd i’w gael ar sawl gwefan erbyn hyn. Bwriedir y bydd y project hefyd yn rhoi hwb i ddiwydiant Cymru ac yn eu galluogi i fanteisio ar yr arbenigrwydd hwnnw’n fasnachol, yng Nghymru ac yn y byd mawr amlieithog tu hwnt.
Bydd y cylch ymchwil yn trefnu nifer o seminarau ar gyfer busnes ac academia, gan wahodd siaradwyr rhyngwladol ac arbenigwyr brodorol o Brifysgolion Cymru i rannu o’u doethineb.
Byddwn yn darparu cymorth i gwmnïau sydd â diddordeb yn y maes, gan fynd ati i drafod cynnal projectau ar y cyd. Bydd yr uned hefyd yn cynorthwyo busnesau i geisio am arian o ffynonellau ariannu addas er mwyn iddynt allu datblygu eu cynnyrch a’u gwasanaethau - rhywbeth y mae’r uned wedi’i wneud yn llwyddiannus sawl gwaith yn y gorffennol.
Cadwch lygad ar ein blog am fanylion y digwyddiad lansio fis nesaf!
Gwefan y project: http://www.saltcymru.org
Datganiad i’r Wasg: Cymru’n Labordy Byw
Language Reviatlisation through Multimedia Technologies
We’ve finally got our programme together for our Symposium at next week’s Minority Languages Conference at Pécs, Hungary. This is a great opportunity to work again with some of our international collaborators and meet up with old friends, hopefully making some new ones along the way as well.
Full details about the conference can be found at http://icml11.law.pte.hu/ but the details about our own sessions are given below.
Language Revitalisation through Multimedia Technology
Colloquium at the International Conference on Minority Languages
Friday July 5th, 2007, Pécs, Hungary
Chair: Dr Briony Williams, University of Wales, Bangor, UK
14:30 Briony Williams : Welcome and Introduction
14:35 Language Revitalisation through Multimedia Technology
Delyth Prys (University of Wales, Bangor)
If multimedia technologies are to be truly harnessed for the work of revitalising endangered languages, researchers active in the field need to share common strategies and resources. The issue of quality control and evaluation needs also to be addressed, as well as measurements of the success of such initiatives. Current international funding opportunities in the EU provide us with a means of formalising our network of collaboration as we give an overview of current best practice and examine practical proposals for forming long-term partnerships in the field.
15:05 The BLARKette concept: Developing a slimmed-down version of the BLARK matrix for minority languages
Dr Steven Krauwer (University of Utrecht, The Netherlands)
Recently many governments have launched national language and speech technology programmes to strengthen the position of their national languages. The development of these technologies crucially depends on the availability of language resources (e.g. corpora, dictionaries, and annotation tools). Hence the concept of a Basic Language Resources Toolkit (BLARK) has been developed to define the minimum resources needed to carry out language and speech technology research. For languages for which technological research has a short history, we propose an even smaller collection of language resources: the BLARKette. This should be small, cheap, fast to develop, and suitable for bootstrapping initial technology research. It should contain both static material (e.g. corpora and dictionaries) and tools and guidelines for acquiring, creating and processing new resources.
15:35 Bridging the gap: Cutting Edge Technologies working for lesser-resourced languages
Christian Monson (Carnegie Mellon University, USA) presenting for: Lori Levin, Jaime Carbonell, Alon Lavie, Robert Frederking, Ariadna Font Llitjos and Alison Alvarez (Carnegie Mellon University, USA)
Lesser resourced languages lack the large corpora necessary for automatic training of corpus based machine translation systems. However, our project has two ways of obtaining low-cost, corpus-based MT without large corpora. First, we do machine learning from small, highly structured corpora. Second, a machine learning system is guided by interaction with a human user. We are building experimental systems for three Western Hemisphere languages.
16:05 Coffee break
16:30 Technology is an effective tool to promote use of Basque: Strategies to develop HLT for minority languages
Dr Kepa Sarasola (University of the Basque Country, Donostia)
The IXA group at the University of the Basque Country have deployed a long-term strategy for Human Language Technologies for the Basque language, from basic research in the early years to applications development more recently. Beginning many years ago with the core technologies of computational lexicons and text corpora, they then created tools for developers (such as lemmatisers, spell-checkers, and corpus tools). Recently, they have created end-user applications such as a grammar checker, web crawler, and language-learning software. These applications have promoted the use of Basque, and have helped in the ongoing standardisation of Basque.
17:00 Speech Processing Resources for Minority Languages: The Irish Experience
Dr Ailbhe Ní Chasaide (Trinity College, Dublin, Ireland)
The development of speech technology could play an important role in the maintenance and preservation of minority languages. The WISPR project developed spoken corpora along with prerequisites for the synthesis of Irish. There was a need to gear the methodologies used to the particular constraints of Irish, and to maximise the reusability of resources. It is a major consideration to develop resources so that they are independent of any single technical methodology.
17:30 Turning grammar exercises into interactive on-line games: a useful tool to teach Welsh mutations
Gruffudd Prys (University of Wales, Bangor, UK)
We describe the process of turning grammar exercises into interactive on-line games designed to improve the literacy of Welsh speakers who struggle with the language in its standard written form. A recent project demonstrates ways in which complicated grammatical rules can be made entertaining, allowing the engagement of a target audience that is difficult to reach. Technical issues will be discussed, including the question of using language tools (such as corpora, lexicons and spellcheckers) to supply content for language games.
18:00 End of colloquium
Meddalwedd yng Nghymru : Strwythuro Llwyddiant
Echddoe fe wnaethon ni gyflwyniad o’r enw ‘Datblygu Meddalwedd ar gyfer Amgylchoedd Amlieithog ac Amlddiwylliannol‘ (pdf) i gynhadledd ITWales ar Feddalwedd yng Nghymru.
Fe wnaethon ni ddisgrifio’n fras ein perthynas ni, fel technolegwyr iaith sy’n bennaf ymwneud â’r Gymraeg, â meddalwedd amlieithog. Fe wnaethon ni arddangos hyn drwy ein gwaith terminoleg a thechnoleg lleferydd. Hefyd, fe ddisgrifion ni’r gwahanol fathau o feddalwedd amlieithog sydd wedi bodoli hyd yn hyn.
Yn draddodiadol, y syniad o feddalwedd amlieithog yw pecyn sy’n cynnig ei ryngwyneb mewn ieithoedd gwahanol. Ond dim ond crafu’r wyneb yw hyn: rhaid cynnal amlieithrwydd yn rhesymeg y rhaglen a’r haenau cynnwys/data, er mwyn i’r systemau gynnal diwylliant ac iaith y defnyddiwr yn llawn. Fe wnaethon ni son am y rheidrwydd i feddalwedd amlieithog, yn enwedig yng Nghymru, gynnal mwy nag un iaith ar y tro, yn hytrach nag ieithoedd ar eu pen eu hun, a galluogi croesi rhyngddynt ar bob lefel.
Fe wnaethon ni hefyd drafod ein gwaith technoleg lleferydd, gan son am yr hyn wnaeth ein symbylu i ddatblygu systemau testun-i-leferydd a’u haddasu at amgylchoedd a geirfaoedd penodol. Fe sonion ni hefyd am y gwaith integreiddio wnaed gyda’n lleisiau testun-i-leferydd, o fewn fframwaith SAPI Microsoft yn ogystal â system Readspeaker.
Diolch yn fawr i Christine, Beti a Sali o ITWales Abertawe am drefnu’r gynhadledd.
Y blog cyntaf sy’n siarad Cymraeg
Ddoe, fe wnaeth Bwrdd yr Iaith Gymraeg lansio gwasanaeth Readspeaker XT ar eu gwefan nhw. Mae Readspeaker yn cynnig ffordd hwylus i roi gwasanaeth testun-i-leferydd ar unrhyw wefan, heb orfod gosod meddalwedd arbennig ar y wefan honno na llwytho dim i lawr at gyfrifiadur y defnyddiwr. Ar gyfer y Gymraeg, mae Readspeaker wedi partneru gyda ni, a defnyddio fersiwn arbennig o un o’n lleisiau Cymraeg sy’n rhedeg o fewn fframwaith Festival.
Yn hyn o beth, mae’r Bwrdd yn dilyn trywydd Cyngor Bwrdeistref Sir Wrecsam a Chyngor Sir y Fflint, sydd hefyd wedi lansio gwasanaeth Readspeaker Cymraeg ar eu gwefannau’n ddiweddar. Mae hi wedi bod yn ddiddorol gweld ymateb eraill i’r llais Cymraeg - mae Dafydd wedi cymharu ei ansawdd â’r llais Saesneg a gwneud y pwynt, yn ddigon teg, bod hwnnw’n adlewyrchu blynyddoedd os nad degawdau yn fwy o ymchwil, a llawer mwy o fuddsoddiad hefyd.
Gan taw ni ddatblygodd y llais, dyma gynnig i chi gyfle i glywed Readspeaker yn llefaru’r blog hwn hefyd. Dyma gyswllt i chi gychwyn, neu fe allwch chi ddefnyddio’r cyswllt parhaol ‘Dechreuwch Wrando’ ar y bar ochr hefyd. Defnyddiwch, mwynhewch, ac fe fydden ni’n gwerthfawrogi unrhyw sylwadau.
The first blog to speak Welsh
Yesterday, the Welsh Language Board launched the Readspeaker XT service on their website. Readspeaker provides a convenient way of adding a text-to-speech service to any website, without having to set up special software on that website, or downloading anything to the user’s computer. In the case of Welsh, Readspeaker has partnered with us, and has used a special version of one of our Welsh voices which runs within the Festival framework.
In this, the Welsh Language Board is following in the footsteps of Wrexham County Borough Council and Flintshire County Council, which have also recently launched a Welsh Readspeaker service on their websites. It has been interesting to see the responses of others to the Welsh voice - Dafydd has compared its quality to the English Readspeaker voice and has made the fair point that the English voice quality reflects years (if not decades) more research, and much more funding.
Since it was us who developed the voice, here’s an opportunity for you to hear Readspeaker reading this blog as well. Here’s a link to start you off, or otherwise you can use the permanent link ‘Dechreuwch Wrando’ (’Start Listening’) in the sidebar instead. Try it out, enjoy, and we’d value any comments.
EdGair - clywed lleisiau
Rwy’n cofio’n hynod y tro cyntaf wnes i glywed cyfrifiadur yn siarad. Tua wyth neu naw oed oeddwn i, mewn arddangosfa gyfrifiadurol ar gyfer busnesau yng Nghaerfyrddin. Dwi dal ddim yn siŵr iawn pam oeddwn i yno yn y lle cyntaf, ond un o’r atyniadau oedd cyfrifiadur ac ynddo feddalwedd fedrai lefaru, mae’n debyg, unrhyw destun Saesneg. Fyddwn i’n hoffi ei glywed yn gweithio?
Wel, roedd hyn yn gynnig rhy dda i’w wrthod. Felly dyma’r sawl oedd yn goruchwylio’r peiriant yn teipio brawddeg i mewn:
Rhys was here today.
a gwasgu’r botwm.
Dyna siom. Fe faglodd y system yn gyfan gwbl ar fy enw cyntaf. Mae’n wir iddo lwyddo unwaith i ‘Rhys’ newid i ‘Rees’, ond doedd hynny ddim cweit ‘run fath rywsut. Pa werth oedd yn y peth, meddyliais, os nad oedd hyd yn oed yn medru dweud fy enw i’n iawn?
Wel, mae’r rhod wedi troi, a’n tro ni ddoe oedd hi i gyflwyno cyfrifiaduron llafar i’r byd - rhai sy’n siarad Cymraeg. Hwn yw’r defnydd ehangaf eto o’r lleisiau Cymraeg a ddatblygwyd yn wreiddiol fel rhan o brosiect WISPR.
EdGair yw enw’r cynnyrch; prosesydd geiriau syml, wedi ei addasu o’r Saesneg gwreiddiol ar y cyd â Phrosiect Dyslecsia Cymru. Fel mae adroddiad y BBC yn egluro, gellir ei lwytho ar unrhyw gyfrifiadur o fewn ysgol, a’r gobaith yw y bydd yn werthfawr i bob disgybl, ac yn benodol i’r rhai ag anghenion ieithyddol arbennig.
Wrth ddatblygu’r llais ar gyfer y rhaglen, rydyn ni wedi ceisio sicrhau y gall lefaru amrywiaeth eang o destun Cymraeg. Mae geiriadur ffonetig cynhwysfawr yng nghrombil y rhaglen, a set o reolau hefyd sy’n penderfynu sut i droi geiriau yn synau. Ond mae’n rhaid i’r llais hefyd benderfynu pa eiriau i’w llefaru yn y lle cyntaf. A gall newid bach iawn i destun newid y geiriau hynny’n llwyr. Mae’n rhaid llefaru 1345, fel rhif yn y miloedd, yn gwbl wahanol i 13.45, ac o ran hynny 13:45, £13.45, $13.45… wel, fe gewch chi syniad o gymhlethdod y dasg. Ar ben hynny hefyd mae codau post, cyfeiriadau e-bost a gwe, acronymau a byrfoddau.
Fyddwn ni erioed mor hyderus a dweud y gall y llais lefaru unrhyw beth ar unrhyw dudalen. Mae defnydd yr iaith Gymraeg, a’r amrywiaeth testun mewn dogfennau, yn llawer rhy eang i neb fedru honni hynny. Ond, gobeithio, rydyn ni wedi llwyddo i greu deunydd sy’n hyblyg ac addas ar gyfer y mwyafrif o ddefnyddwyr.
Ac ydy, mae’n dweud fy enw i’n gywir. Fe wnes i’n siŵr o hynny.
EdGair - hearing voices
I remember clearly the first time I heard a computer talk. I was about eight or nine years old, in a computer exhibition for businesses in Carmarthen. I’m still not quite sure why I was there in the first place, but one of the attractions was a computer containing software that could speak, probably, any English text. Would I like to hear it work?
Well, that was an offer too good to refuse. So the person in charge of the machine typed in a sentence:
Rhys was here today.
and pressed the button.
What a disappointment. The system tripped up completely on my first name. It’s true that it succeeded once ‘Rhys’ had been changed to ‘Rees’, but that wasn’t quite the same thing somehow. What value was there in the thing, I thought, if it couldn’t even say my name correctly?
Well, the wheel has turned full circle, and it led us yesterday to present talking computers to the world - computers that speak Welsh. This is the most extensive application so far of the Welsh voices that were originally developed as part of the WISPR project.
EdGair is the name of the product; a simple word-processor, adapted from the original English in partnership with the Wales Dyslexia Project (Prosiect Dyslecsia Cymru.). As the BBC report explains, it can be installed on any computer in a school, and the hope is that it will be of value to every pupil, and especially to those who have language-related specific needs.
In developing the voice for this application, we have tried to ensure that it can pronounce a wide variety of Welsh text. A comprehensive phonetic dictionary lies in the innards of the software, and also a set of rules which decide how to convert words to sounds. But the voice also has to decide which words to pronounce in the first place. And a tiny change to the text can change those words completely. It has to pronounce 1345 (a number in the thousands) completely differently from 13.45, and for that matter 13:45, £13.45, $13.45… well, you can get an idea of the complexity of the task. And also there are postcodes, e-mail and web addresses, acronyms and abbreviations, into the bargain.
We would never be so rash as to claim that the voice can pronounce anything on any page. The usage of the Welsh language, and the sheer variety of text in documents, is so wide as to make it impossible for anyone to promise that. But hopefully we’ve succeeded in creating a resource which is flexible and suitable for most users.
And yes, it does pronounce my name correctly. I made sure of that.
