IDENTIFIKASI BAHASA PADA TEKS MENGGUNAKAN METODE LONG SHORT TERM MEMORY (LSTM)

SATRIAN, SHEVA and Utami, Alvi Syahrini (2023) IDENTIFIKASI BAHASA PADA TEKS MENGGUNAKAN METODE LONG SHORT TERM MEMORY (LSTM). Undergraduate thesis, Sriwijaya University.

[thumbnail of RAMA_55201_09021282025081.pdf] Text
RAMA_55201_09021282025081.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (3MB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_TURNITIN.pdf] Text
RAMA_55201_09021282025081_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (10MB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_01_front_ref.pdf] Text
RAMA_55201_09021282025081_0022127804_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_55201_09021282025081_0022127804_02.pdf] Text
RAMA_55201_09021282025081_0022127804_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (379kB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_03.pdf] Text
RAMA_55201_09021282025081_0022127804_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (109kB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_04.pdf] Text
RAMA_55201_09021282025081_0022127804_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_05.pdf] Text
RAMA_55201_09021282025081_0022127804_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (419kB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_06.pdf] Text
RAMA_55201_09021282025081_0022127804_06.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (9kB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_07_ref.pdf] Text
RAMA_55201_09021282025081_0022127804_07_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (99kB) | Request a copy
[thumbnail of RAMA_55201_09021282025081_0022127804_08_lamp.pdf] Text
RAMA_55201_09021282025081_0022127804_08_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (904kB) | Request a copy

Abstract

Language is the main communication tool used by humans, with the diversity of languages that exist in the world reflecting the cultural diversity and identity of a language. In this context, language identification is important for the development of communication technology and information processing. This research focuses on language identification in text by utilizing Long Short Term Memory method and Word2vec as Word Embedding method to produce effective results from text. The main objective of this research is to develop a system that is able to recognize and classify language in text with high accuracy. The dataset used in this research consists of 10,000 text data, which includes 10 different language label classes with 1000 data each including Arabic, Chinese, Dutch, English, French, Indonesian, Japanese, Korean, Russian, Spanish. The total dataset is divided into 80% training data and 20% test data, to determine the hyperparameters used in the study by searching using the random search method. After the process, the best hyperparameter results were obtained for the LSTM model with a dropout configuration of 0.3, batch size 32, hidden unit 64, recurrent dropout 0.2 and epoch 15. Based on this research, by evaluating using the confusion matrix table, the average value of evaluation metrics such as precision 0.9859, recall 0.9855 and f1-score 0.9856 and getting an accuracy value of 0.9856.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Identifikasi Bahasa, Confusion Matrix, Long Short Term Memory, Word Embedding, Teks, Word2Vec
Subjects: P Language and Literature > P Philology. Linguistics > P98-98.5 Computational linguistics. Natural language processing
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Sheva Satrian
Date Deposited: 03 Jan 2024 01:41
Last Modified: 03 Jan 2024 01:41
URI: http://repository.unsri.ac.id/id/eprint/137345

Actions (login required)

View Item View Item