An adaptable sentence segmentation based on Indonesian rules

Sukemi, Sukemi and ermatita, ermatita and Petrus, Johanes (2023) An adaptable sentence segmentation based on Indonesian rules. IAES International Journal of Artificial Intelligence (IJ-AI), 12 (3). pp. 1491-1499. ISSN 2089-4872

[thumbnail of Jurnal Internasional terindeks SCOPUS] Text (Jurnal Internasional terindeks SCOPUS)
Paper IAES.pdf - Published Version

Download (523kB)

Abstract

Sentence segmentation that breaks textual data strings into individual sentences is an important phase in natural language processing (NLP). Each word in the string that is added a punctuation mark such as a period, question mark, or exclamation point, becomes the location for splitting the string. Humans can easily see the punctuation and split the string into sentences, but not machines. Basically, the three punctuation marks also perform other functions so that the sentence segmentation process must really be able to detect whether a word marked with punctuation is a sentence boundary or not. This research proposes a sentence segmentation system called segmentasi kalimat bahasa Indonesia (SKBI) or Indonesian language sentence segmentation by applying a set of rules and can be used in Indonesian texts and can be adapted for English. There are 34 rules built with a combination of 27 fairly complete features that contribute to this research. The experimental results for the Indonesian text show that the SKBI is able to achieve an F1-Score of 96.89% and 97.07% for English. Both need to be improved but now better than previous research.

Item Type: Article
Subjects: Q Science > Q Science (General) > Q334-342 Computer science. Artificial intelligence. Algorithms. Robotics. Automation.
Divisions: 09-Faculty of Computer Science > 56201-Computer Systems (S1)
Depositing User: Dr. Sukemi Sukemi
Date Deposited: 11 Apr 2023 13:56
Last Modified: 17 Apr 2023 02:30
URI: http://repository.unsri.ac.id/id/eprint/95797

Actions (login required)

View Item View Item