
Features of Initial Processing of Georgian Texts for NLP Tasks
Author: Maia ArchuadzeCo-authors: Manana Khachidze, MagdaTsintsadze
Keywords: Text preprocessing, NLP
Text pre-processing (text component analysis, tagging, morphological and syntactic analysis, information generation, machine translation, etc.) is an integral and essential part of almost all NLP tasks [??]. The success of solving NLP tasks depends on the accuracy of it results. The process is generally considered to be split into two stages of initial processing of text : Sorting and Segmentation. These stages, in turn, include certain actions that require consideration of the various natural features of the text, which relate to the language in which the document is to be produced. Due to the morphological complexity of Georgian language makes it difficult to apply existing methods and algorithms to solve different NLP tasks without modification and sometimes development of new methods. This paper discusses all the basic steps of initial text processing for NLP tasks and the features that accompany Georgian texts. The proposed Algorithms incorporate these features into the process and form the basis of the software package
Lecture files:
Features of Initial Processing of Georgian Texts for NLP Tasks [en]ქართულენოვანი ტექსტების საწყისი დამუშავების თავისებურებანი NLP ამოცანებისათვის [ka]