Not Apple’s Siri, Google Home, Amazon’s Alexa, or any other speech platform can hear or respond to a single African language, but as speech interaction gradually takes over basic functions from typing to touch, the non-profit Mozilla—which created the free web browser Firefox—is working to bring voice-integrated technology to the continent.
Mozilla’s Common Voice platform, which receives support from the German and UK governments, as well as the Bill and Melinda Gates Foundation, is an open-source initiative that’s already creating voice datasets for Kiswahili—a language spoken in Rwanda, Burundi, Kenya, Ugana, Tanzania, and South Sudan.
As Remy Muhire details for the Mozilla Foundation, most voice datasets used in voice-activated software are siloed, meaning they are contained within a very small number of companies, stifling innovation.
Common Voice wasn’t started exclusively to serve Africa, it merely wanted to create an open-source platform to enable voice-activation tech in any of the 7,100 “living” languages currently spoken. To date they’ve recorded more than 9,000 hours of audio from 160,000 different speakers of 60 different languages, including Welsh, which should help people looking for directions to “Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch.”
The Common Voice platform is incredibly simple, and as soon as you arrive on its homepage your voice is welcomed with open arms into the datasets if you only want to take a moment to record it.
The spirit of togetherness
The language of Kinyarwanda is spoken by about 12 million people in Rwanda. Last year, Common Voice hosted a hackathon in Kigali to create a starting dataset for Kinyarwanda. It’s now the fastest growing language on Common Voice, with over 1,700 hours of submission.
The response to the hackathon gave rise to an AI solutions startup called Digital Umuganda—which takes the name from Kinyarwanda word for a kind of cooperation and community.
The final Saturday of every month sees people take to the streets to pitch in on community projects like building or repairing roads—this is Umuganda, and the startup wants to take it to the digital space to create digital infrastructure.
They’ve created an AI-powered ChatBot named Mbaza that uses the Common Voice Kinyarwanda dataset to enable citizens to access information adn guidance while using the local language.
Mbaza provides text-to-speech and speech-to-text functionality, removing the barriers of illiteracy from citizens accessing important information, such as getting in contact with local governments.
Just recently, Mozilla received a $3.4 million grant to expand the Common Voice platform across Africa, and Chenai Chair, Special Advisor on Africa explained that Kiswahili is just the beginning.
“The next steps are… building up the community engagements, and the community supports because Common Voice is about people donating their voice and we want to do it right,” Chair tells GNN. “We don’t want to do it in a way that we end up with the same issues that other technology platforms have.”
“We are initially starting off with the East African community… then we want to strategically build up those other communities of other African languages so they can make use of the Common Voice or the Common Voice toolset.”
Chair explains that traditional illiteracy is a problem in the agriculture sector, and is also highlighted within the female half of African populations. Potential Common Voice applications—such as interacting with the increasingly digital functions of government, or within the financial sector such as online banking—will be made much easier.
No language left behind
Language contains far more than a few unique words or concepts: it acts as the decoding tool for speakers to know their history; all their stories, fables, and culture.
UNESCO, for example, is promoting voice technology to document Indigenous knowledge, save Indigenous languages, and increase access to information.
Like the large voice datasets of Microsoft, Google, and Apple, Wikipedia contains a version of the historical, spiritual, cultural, and indeed linguistic record of the African continent, and here also, community-driven initiatives are working to bridge the divide in access to information—particularly in African languages.
The WikiAfrica Education Program, created by the Moleskine Foundation, is an effort to foster creativity and an interest in culture in African school curriculums by teaching students how to prepare, submit, and edit articles on Wikipedia—especially in their own languages.
Adama Sanneh, Founder of the Foundation, has helped organize or been a part of community-driven events that have seen tens of thousands of entries on Wikipedia in different African languages. This proved particularly helpful, he told GNN, during the pandemic’s early days.
“When we started the situation was very grim, there was only one article in Luba, or something like that,” said Sanneh in March. “We launched a campaign to ask people to translate… ten articles around COVID-19 that would allow the sparking of creative solutions.”
“In a couple of months we passed from one to more than 300 articles in more than 20 different African languages. That gave access to more than 300 million people when we look at the composition of the languages,” he said.