Similarity An adaptable sentence segmentation based on Indonesian rules

Ermatita, Ermatita (2024) Similarity An adaptable sentence segmentation based on Indonesian rules. Turnitin Universitas Sriwijaya. (Submitted)

Text
Similarity - An adaptable sentence segmentation based on Indonesian rules.pdf
Download (2MB)

Abstract

Sentence segmentation that breaks textual data strings into individual sentences is an important phase in natural language processing (NLP). Each word in the string that is added a punctuation mark such as a period, question mark, or exclamation point, becomes the location for splitting the string. Humans can easily see the punctuation and split the string into sentences, but not machines. Basically, the three punctuation marks also perform other functions so that the sentence segmentation process must really be able to detect whether a word marked with punctuation is a sentence boundary or not. This research proposes a sentence segmentation system called segmentasi kalimat bahasa Indonesia (SKBI) or Indonesian language sentence segmentation by applying a set of rules and can be used in Indonesian texts and can be adapted for English. There are 34 rules built with a combination of 27 fairly complete features that contribute to this research. The experimental results for the Indonesian text show that the SKBI is able to achieve an F1-Score of 96.89% and 97.07% for English. Both need to be improved but now better than previous research.

Item Type:	Other
Subjects:	#3 Repository of Lecturer Academic Credit Systems (TPAK) > Results of Ithenticate Plagiarism and Similarity Checker
Divisions:	09-Faculty of Computer Science > 55101-Informatics (S2)
Depositing User:	Dr Ermatita zuhairi
Date Deposited:	25 Jun 2024 05:58
Last Modified:	25 Jun 2024 05:58
URI:	http://repository.unsri.ac.id/id/eprint/147670

Actions (login required)

View Item