Interview: How open-source language tools are helping to keep Sámi language alive
February 9, 2021
The 6th of February celebrates the only indigenous people native to Europe with the Sámi National Day. The Sámi have an estimated population of 80,000 people that inhabit Norway, Sweden, Finland, and north-west Russia, but only around 30,000 remain speaking the nine languages in the Sámi family. And how can we keep a language alive? One way is to offer digital inclusion, and that's what Divvun has been working towards since 2004. To talk a bit about their task and the challenges to make language tools available to everyone, we interviewed Sjur Nørstebø Moshagen, Chief Engineer and Head of Divvun.
The Group's responsibility is to develop and maintain language technology tools, including spelling and grammar checkers, keyboards, dictionaries, and other digital and web services - with the vast majority being provided free of charge as open-source software.
"The layout files are processed by an open-source code generation tool called ‘kbdgen’. This tool integrates open source spell checking by Divvun, specifically designed to handle languages with complex morphology and other requirements. It's also optimized for mobile devices to minimize memory usage and user-perceivable latency of suggestion results", explains Brendan Molloy, Lead Developer at The Techno Creatives.
What started with the Sami language is now a project including more than hundred languages in various stages of development, moving always towards the fundamental freedoms of all language users, as you can read in the interview below.
The Techno Creatives - When Divvun first started, it had the Sami language communities in focus, right? How was the decision to extend it to other minority languages?
Sjur Nørstebø Moshagen - From the very beginning we had cooperation also with the Greenlandic community. And there has been academic cooperation related to other languages, mostly Uralic languages, so the extension came naturally over time.
TTC - And how many are available now? Do you have plans to include other ones?
Sjur - There are presently 119 different languages in various stages of development, and we will always welcome new minority and indigenous languages.
TTC - How does Divvun work with under-resourced languages and make useful tools?
Sjur - We have established routines working basically as traditional linguists, describing the various parts of the language in question. The crucial point is that we use formalisms and descriptions that are machine readable according to a defined standard, so that the descriptions can be turned into computer programs. From there on, it is relatively straightforward to turn those programs into various tools.
TTC - 2019 was the UN International year of indigenous languages, and we went together to their LT4All conference, in Paris. The event called attention to the fact that “everyone should have the possibility to get access to Language Technologies in his/her native languages, including indigenous languages.” That’s exactly what Divvun is working on since day one. Can you elaborate a bit about what is missing to accomplish this goal?
Sjur - The main issue is that the various operating system and application developers protect their platforms to the degree that they lock down API and service access.
It took Apple many years to open up iOS to third party keyboards, and the access is still very limited compared to the native keyboard. This results in an inferior user experience for minority and indigenous language communities.
And this was only one example, the same pattern is found in all of Windows, Android, macOS and ChromeOS, and for various parts of the systems.
TTC - Free tools, language-independent, and open source technology solutions. It’s impressive the project you have. What are the challenges to make your language tools available to everyone?
Sjur - In addition to the above, there are various challenges both related to the development (you need knowledgeable linguists) and language community uptake — if the community is not involved at some level, they either won’t know about the tools, or they ignore them, because they have no perception of ownership.
So community involvement and commitment from the very beginning is crucial. In the best of cases the linguist doing the job is a community member, with native and fluent language knowledge.
TTC - Can you share Divvun’s plans for this year?
Sjur - We have just started a project to produce a text-to-speech system for Julev Sámi. That one will take a major part of our time. We will also work to cover more applications and systems for the existing tools we have, and of course try to provide more tools for more languages.
Divvun is a research and development group within UiT The Arctic University of Norway, in Tromsø, Norway, and is funded by the Ministry of Local Government and Modernisation. It works in cooperation with the Sami Parliament in Norway (Sámediggi).
Population source: Sweden.se