RESMY, DIAH SHINTA and Utami, Alvi Syahrini and Rodiah, Desty (2024) KLASIFIKASI TEKS BERITA NON-FORMAL DAN FORMAL DALAM BAHASA INDONESIA MENGGUNAKAN ALGORITMA LONG SHORT-TERM MEMORY BERDASARKAN KAMUS BESAR BAHASA INDONESIA. Undergraduate thesis, Sriwijaya University.
Text
RAMA_55201_09021282025089.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (2MB) | Request a copy |
|
Text
RAMA_55201_09021282025089_TURNITIN.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (7MB) | Request a copy |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_01_front_ref.pdf - Accepted Version Available under License Creative Commons Public Domain Dedication. Download (1MB) |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_02.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (607kB) | Request a copy |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_03.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (533kB) | Request a copy |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_04.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (1MB) | Request a copy |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_05.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (397kB) | Request a copy |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_06.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (213kB) | Request a copy |
|
Text
RAMA_55201_09021282025089_0022127804_0021128905_07_ref.pdf - Bibliography Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (230kB) | Request a copy |
Abstract
This research aims to develop an Indonesian news text classification system to distinguish between formal and non-formal texts. This classification system utilises the Long Short-Term Memory (LSTM) algorithm with reference to the Big Indonesian Dictionary (KBBI). The development process involves text preprocessing techniques, such as data cleaning, lower case, remove punctuation, tokenising, stopword removal and stemming, to improve the quality of the data before it is processed by the LSTM model. The data used consists of 50,000 news texts divided into three parts: 60% training data, 20% validation data, and 20% testing data. The LSTM model was developed with hyperparameters such as RMSProp optimiser, batch size 128, and 40 epochs. The evaluation showed that the LSTM model was able to achieve 81.12% accuracy, 83.65% precision, 92.15% recall, and 87.70% F1-Score. However, the main challenges in this classification include the frequently changing variation of non-formal language, mislabelling in the dataset, limited representative data, risk of overfitting, as well as optimal hyperparameter determination, this system managed to overcome these challenges well.
Item Type: | Thesis (Undergraduate) |
---|---|
Uncontrolled Keywords: | Klasifikasi, Natrual Language Processing, Kamus Besar Bahasa Indonesia, Long Short-Term Memory |
Subjects: | Q Science > Q Science (General) > Q334-342 Computer science. Artificial intelligence. Algorithms. Robotics. Automation. |
Divisions: | 09-Faculty of Computer Science > 55201-Informatics (S1) |
Depositing User: | Diah Shinta Resmy |
Date Deposited: | 19 Nov 2024 07:54 |
Last Modified: | 19 Nov 2024 07:54 |
URI: | http://repository.unsri.ac.id/id/eprint/159606 |
Actions (login required)
View Item |