SISTEM TANYA JAWAB EKSTRAKTIF PADA TEKS BERBAHASA INDONESIA DENGAN FINE-TUNING INDOBERT

DZAKY, DEWA SHEVA and Abdiansah, Abdiansah (2025) SISTEM TANYA JAWAB EKSTRAKTIF PADA TEKS BERBAHASA INDONESIA DENGAN FINE-TUNING INDOBERT. Undergraduate thesis, Sriwijaya University.

[thumbnail of RAMA_55201_09021182126005_COVER.pdf] Image
RAMA_55201_09021182126005_COVER.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (596kB)
[thumbnail of RAMA_55201_09021182126005.pdf] Text
RAMA_55201_09021182126005.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (3MB)
[thumbnail of RAMA_55201_09021182126005_TURNITIN.pdf] Text
RAMA_55201_09021182126005_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (9MB)
[thumbnail of RAMA_55201_09021182126005_0001108401_01_front_ref.pdf] Text
RAMA_55201_09021182126005_0001108401_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_55201_09021182126005_0001108401_02.pdf] Text
RAMA_55201_09021182126005_0001108401_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (637kB)
[thumbnail of RAMA_55201_09021182126005_0001108401_03.pdf] Text
RAMA_55201_09021182126005_0001108401_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (350kB)
[thumbnail of RAMA_55201_09021182126005_0001108401_04.pdf] Text
RAMA_55201_09021182126005_0001108401_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_55201_09021182126005_0001108401_05.pdf] Text
RAMA_55201_09021182126005_0001108401_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (871kB)
[thumbnail of RAMA_55201_09021182126005_0001108401_06.pdf] Text
RAMA_55201_09021182126005_0001108401_06.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (185kB)
[thumbnail of RAMA_55201_09021182126005_0001108401_07_ref.pdf] Text
RAMA_55201_09021182126005_0001108401_07_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (192kB)
[thumbnail of RAMA_55201_09021182126005_0001108401_08_lamp.pdf] Text
RAMA_55201_09021182126005_0001108401_08_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (114kB)

Abstract

The abundance of digital information in today's era makes the extraction of relevant information a major challenge, especially in Indonesian, which has unique linguistic characteristics. As an effort to overcome this challenge, this study develops an extractive question-answering system for Indonesian text by fine-tuning the IndoBERT model, which enables the system to extract specific parts of a context paragraph as answers to given questions. The dataset used in this study is the Indonesian-translated version of the Stanford Question Answering Dataset (SQuAD) 2.0, which contains more than 100,000 question-answer pairs derived from Wikipedia articles. The fine-tuning process was carried out in eight scenarios, which are combinations of dataset type (the full dataset including unanswerable questions and a modified dataset with all unanswerable questions removed), learning rate (2e-5 and 5e-5), and batch size (16 and 48). The results of the study show that the model with a learning rate of 5e-5 and batch size of 16 delivers the best performance. On the dataset with unanswerable questions, the model achieved an exact match score of 60.57% and an f1-score of 70.84%. Meanwhile, on the dataset without unanswerable questions, the model achieved an exact match score of 54.79% and an f1-score of 73.06%.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Extractive Question Answering System, SQuAD, IndoBERT, fine-tuning, exact match, f1-score
Subjects: Q Science > Q Science (General) > Q334-342 Computer science. Artificial intelligence. Algorithms. Robotics. Automation.
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Dewa Sheva Dzaky
Date Deposited: 22 May 2025 02:12
Last Modified: 22 May 2025 02:12
URI: http://repository.unsri.ac.id/id/eprint/173583

Actions (login required)

View Item View Item