Morphology of Agglutinative Languages: A Fusional Learning of Under-Resourced ‎Language‎

Authors

  • Aftab Saqib Durrani FAST University, Lahore

Keywords:

Morphology, Agglutinative, Fusional, Language Learning

Abstract

This study presents a new and efficient method for mechanically breaking down words into their stem and suffixes. Examples of fusional and agglutinating languages are Burushaski and Shina. We employ a modest number of word-pairs as training data instead of corpus counts, which can be especially useful for languages with limited resources. We first learn a tree of aligned suffix rules (TASR) from word pairs in fusional languages. Using suffix rule frequency and rule subsumption, the tree is constructed top-down, going from general to specific rules. It is then executed bottom-up, meaning that the most specific rule that fires is selected. A word form is divided into a stem and suffix sequence using TASR.‎ Learning through generation is crucial for accurate stem extraction in fusional languages. The suffix sequence is then segmented using an unsupervised segmentation algorithm called graph-based unsupervised suffix segmentation (GBUSS). GBUSS makes use of a suffix graph in which node merging produces suffix sequences under the direction of an information-theoretic metric. Experiments on Shina validate the approach, which is demonstrated to be quite successful. For word breakdown in agglutinating languages, only the GBUSS is required. For Burushaski, promising experimental findings are obtained.‎

Downloads

Download data is not yet available.

Downloads

Published

2023-05-30

How to Cite

Durrani, A. S. (2023). Morphology of Agglutinative Languages: A Fusional Learning of Under-Resourced ‎Language‎. Competitive Linguistic Research Journal, 4(1), 42–53. Retrieved from https://clrjournal.com/index.php/clrjournal/article/view/35