SISTEM TANYA JAWAB DETEKSI KANKER DINI MENGGUNAKAN METODE BERT DAN TF-IDF

GARCIA, LOUIS and Abdiansah, Abdiansah (2025) SISTEM TANYA JAWAB DETEKSI KANKER DINI MENGGUNAKAN METODE BERT DAN TF-IDF. Undergraduate thesis, Sriwijaya University.

[thumbnail of RAMA_55201_09021182126006_Cover.jpg]
Preview
Image
RAMA_55201_09021182126006_Cover.jpg - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (476kB) | Preview
[thumbnail of RAMA_55201_09021182126006.pdf] Text
RAMA_55201_09021182126006.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (4MB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_TURNITIN.pdf] Text
RAMA_55201_09021182126006_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (7MB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_0001108401_01_front_ref.pdf] Text
RAMA_55201_09021182126006_0001108401_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_55201_09021182126006_0001108401_02.pdf] Text
RAMA_55201_09021182126006_0001108401_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (558kB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_0001108401_03.pdf] Text
RAMA_55201_09021182126006_0001108401_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (388kB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_0001108401_04.pdf] Text
RAMA_55201_09021182126006_0001108401_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (454kB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_0001108401_05.pdf] Text
RAMA_55201_09021182126006_0001108401_05.pdf - Submitted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_0001108401_06_ref.pdf] Text
RAMA_55201_09021182126006_0001108401_06_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (223kB) | Request a copy
[thumbnail of RAMA_55201_09021182126006_0001108401_07_lamp.pdf] Text
RAMA_55201_09021182126006_0001108401_07_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (565kB) | Request a copy

Abstract

The increasing mortality rate due to cancer, particularly in developing countries like Indonesia, highlights the urgency of developing an effective question-answering detection system. According to data from Globocan 2020, Indonesia recorded 396,914 new cancer cases with 234,511 cancer-related deaths. Additionally, Riskesdas data shows that the prevalence of cancer in Indonesia increased from 1.4 per 1,000 population in 2013 to 1.79 per 1,000 population in 2018. This study aims to develop a cancer early detection question-answering system using BERT (Bidirectional Encoder Representations from Transformers) and TF-IDF (Term Frequency-Inverse Document Frequency) methods. The combination of these two methods is expected to improve the accuracy in understanding cancer symptoms, diagnosis, and treatment. The system was tested using a dataset from Kaggle containing clinical data on various types of cancer, with preprocessing techniques such as case folding, stop word removal, stemming, and tokenization applied to enhance data quality. The system’s performance evaluation showed the highest accuracy of 98.85%, achieved with a fine-tuned BERT model. In comparison with the BERT-only model (94.70%) and TF-IDF-only model (96.55%), these results demonstrate that the integration of BERT and TF-IDF is more effective in providing accurate and relevant responses. This study also involved interviews with 10 medical students from Universitas Sriwijaya, class of 2021-2022, to test the validity of the system. Of the 20 questions asked, the system successfully answered 19 correctly, resulting in an accuracy of 95%. The findings of this study contribute to the development of artificial intelligence (AI)-based health technology and support early cancer detection efforts in Indonesia by providing an efficient and reliable cancer detection system.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: BERT, TF-IDF, NLP, Model, Berbasis AI, Sistem, Kanker
Subjects: T Technology > T Technology (General) > T58.5-58.64 Information technology > T58.5 General works Management information systems Cf. HD30.213 Industrial management Cf. HF5549.5.C6+ Communication in personnel management Cf. TS158.6 Automatic data collection systems (Production control)
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Louis Garcia
Date Deposited: 14 Mar 2025 07:32
Last Modified: 14 Mar 2025 07:32
URI: http://repository.unsri.ac.id/id/eprint/168832

Actions (login required)

View Item View Item