KLASIFIKASI SPAM PADA EMAIL BERBAHASA INDONESIA MENGGUNAKAN FASTTEXT DAN BERNOULLI NAÏVE BAYES

PUTRI, ZATUN AULIA and Kurniati, Rizki and Rachmatullah, Muhammad Naufal (2025) KLASIFIKASI SPAM PADA EMAIL BERBAHASA INDONESIA MENGGUNAKAN FASTTEXT DAN BERNOULLI NAÏVE BAYES. Undergraduate thesis, Sriwijaya University.

[thumbnail of RAMA_55201_09021182025029_cover.jpg] Image
RAMA_55201_09021182025029_cover.jpg - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (455kB)
[thumbnail of RAMA_55201_09021182025029.pdf] Text
RAMA_55201_09021182025029.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (3MB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_TURNITIN.pdf] Text
RAMA_55201_09021182025029_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (5MB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_01_front_ref.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (2MB)
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_02.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (488kB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_03.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (2MB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_04.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (923kB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_05.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (275kB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_06.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_06.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (186kB) | Request a copy
[thumbnail of RAMA_55201_09021182025029_0012079104_0001129204_07_ref.pdf] Text
RAMA_55201_09021182025029_0012079104_0001129204_07_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (435kB) | Request a copy

Abstract

Indonesia ranks sixth globally in terms of the number of spam senders. Numerous studies have been conducted on spam detection and filtering, with Bayesian algorithms being among the most commonly used approaches. This study aims to classify Indonesian-language email messages into spam and non-spam categories. A secondary dataset consisting of 2,604 messages was used, comprising 1,362 spam messages and 1,242 non-spam messages. Word representation was performed using FastText with an n-gram approach to capture sub-word level information, while classification was carried out using the Bernoulli Naïve Bayes algorithm based on binary values. The experiments compared the performance of the Bernoulli Naïve Bayes algorithm with and without the use of FastText. Evaluation was conducted using accuracy, confusion matrix, and classification report metrics, with a 70:30 data split. The results showed that both models, with and without FastText, achieved 95% accuracy. However, the model incorporating FastText demonstrated more balanced performance across classes and higher recall in detecting spam. In contrast, the model without FastText achieved perfect precision and recall for spam but showed decreased performance for non-spam. Therefore, the use of FastText contributes to improving the sensitivity and balance of spam email classification in the Indonesian language

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Bahasa Indonesia, Bernoulli Naïve Bayes, FastText, Email Spam, Klasifikasi Teks
Subjects: Q Science > Q Science (General) > Q334-342 Computer science. Artificial intelligence. Algorithms. Robotics. Automation.
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Zatun Aulia Putri
Date Deposited: 15 Aug 2025 01:35
Last Modified: 15 Aug 2025 01:35
URI: http://repository.unsri.ac.id/id/eprint/182739

Actions (login required)

View Item View Item