57 open-source ASR models fine-tuned across 19 African languages. Compact edge models outperform zero-shot giants by 26.9pp WER — using models 3–40× smaller.
2026 · arXiv:2606.02375
We benchmark three zero-shot foundation models against compact fine-tuned edge models across 19 African languages using the conversational WAXAL corpus. Fine-tuned edge models achieve a macro-averaged WER of 38.0% compared to 64.9% for the best zero-shot baseline — a 26.9pp reduction using models 3–40× smaller. An audit by native speakers across all 19 languages reveals systematic architectural failure patterns aligned with language family, script system, and morphological typology.
Word Error Rate (%) on the WAXAL test set. Lower is better.
| Language | Family | MMS-300M | Whisper S | Whisper T |
|---|---|---|---|---|
| Acholi | Nilo-Saharan | 42.3 | 42.3 | 57.7 |
| Akan | Niger-Congo | 34.2 | 31.7 | 37.9 |
| Amharic | Afro-Asiatic | 37.8 | 33.6 | 41.3 |
| Dagaare | Niger-Congo | 34.9 | 34 | 37.3 |
| Dagbani | Niger-Congo | 35 | 34 | 39.5 |
| Ewe | Niger-Congo (Kwa) | 31.3 | 32.3 | 35.5 |
| Fula | Atlantic-Congo | 40.6 | 42.6 | 35.5 |
| Ikposo | Niger-Congo (Kwa) | 75.3 | 77.5 | 80.9 |
| Lingala | Niger-Congo (Bantu) | 42.6 | 42.7 | 49 |
| Luganda | Niger-Congo (Bantu) | 16.9 | 21.6 | 33.8 |
| Malagasy | Austronesian | 12.8 | 13.1 | 17.7 |
| Masaaba | Niger-Congo (Bantu) | 49.5 | 75.5 | 59.6 |
| Nyankole | Niger-Congo (Bantu) | 38.6 | 44.7 | 46.7 |
| Oromo | Afro-Asiatic | 26.9 | 25.2 | 29.3 |
| Shona | Niger-Congo (Bantu) | 25 | 26.9 | 31.4 |
| Sidama | Afro-Asiatic | 35.6 | 30.1 | 34.4 |
| Soga | Niger-Congo (Bantu) | 47.2 | 57.1 | 69 |
| Tigrinya | Afro-Asiatic | 57.1 | 53.5 | 60.3 |
| Wolaytta | Afro-Asiatic | 38.8 | 39.5 | 42.6 |
| Macro Average | 38.0 | 39.9 | 44.2 |
Bold = best fine-tuned model per language
Three model families, 19 languages each. All available on HuggingFace.
Best character-level accuracy. Wins on all 6 Bantu languages. Immune to repetition loops.
Preferred for Afro-Asiatic languages. Strong language model prior aids complex morphology.
Ultra-lightweight edge deployment. Leads on Fula. Runs on mobile hardware.
from transformers import pipeline
# Replace {language} with any of the 19 ISO codes
# e.g. lug, amh, sna, ewe, orm ...
asr = pipeline("automatic-speech-recognition",
model="waxal-benchmarking/mms-300m-waxal-{language}")
result = asr("audio.wav")
print(result["text"]) 31 researchers across 3 continents
Linguistic Acknowledgements: Ajara Oyinloye · Abubakari Sadic Mohammed · Hafiz Adjei · Aliga Norah Lele · Marie-Louise B. Ndamuso · Odong Diana
@article{waxalnet2026,
title = {The WAXAL ASR Benchmark: Fine-Tuned Edge Models Across 19 African Languages},
author = {Olufemi, Victor Tolulope and Babatunde, Oreoluwa and Njema, Ramsey and
Gbotemi, Bolarinwa and Yen, Wanchi Lucia and Uzodinma, John and
Ajayi, Sunday and Williams, Oluwademilade and Moshood, Kausar and
Anyaele, Innocent Elendu and Arefaine, Akebert Tesfahunegn and
Hunzwi, Candace and Daniel, Wongel Dawit and Namuganga, Emmilly Immaculate and
Kadima, Cleophas and Bahizire, Athanase Biluge and Ranaivoson, Onitsiky and
Aaron, Emmanuel and Ladislaus, Nicholaus Dismas and Muhammed, Idris and
Simenya, Jonathan Enoch and Koome, Martin and Endaylalu, Matewos Tegete and
Adeyemo, Peter Ifeoluwa and Birindwa, Hondi Prisca and Eze-Mbey, Ukachi Agnes and
Oduro-Yeboah, Yacoba and Aremu, Toluwani and Adjovi, Pericles and
Ngueajio, Mikel K and Mitra, Prasenjit},
year = {2026},
note = {arXiv preprint arXiv:2606.02375}
}