PERBANDINGAN METODE SELEKSI FITUR UNTUK KLASIFIKASI PERTANYAAN BERBBAHASA INDONESIA MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM)

ARUDA, SYECHKY AL QODRIN and Yusliani, Novi and Utami, Alvi Syahrini (2022) PERBANDINGAN METODE SELEKSI FITUR UNTUK KLASIFIKASI PERTANYAAN BERBBAHASA INDONESIA MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM). Undergraduate thesis, Sriwijaya University.

[img]
Preview
Text
RAMA_55201_09021381823120.pdf

Download (5MB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_TURNITIN.pdf

Download (17MB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_01_front_ref.pdf

Download (1MB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_02.pdf

Download (540kB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_03.pdf

Download (263kB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_04.pdf

Download (2MB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_05.pdf

Download (351kB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_06.pdf

Download (139kB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_07_ref.pdf

Download (147kB) | Preview
[img]
Preview
Text
RAMA_55201_09021381823120_0008118205_0022127804_08_lamp.pdf

Download (980kB) | Preview

Abstract

Most texts have a large number of features. However, the features contained in the text mostly have a low level of relevance and even caontain noise which can later reduce the accuracy of the results. Feature selection is used to reduce the dimensions of feature space by weighting all features then features with lower weights than treshold will be liminated. It aims to improve the accuracy and efficiency of computational time in the text classification process. In this research, selection method Information Gain, Chi Square, Mutual Information were used in the text Classification process in the form of Indonesian questions using the Support Vector Machine (SVM) algorithm. Then, a comparative analysis will be carried out on each classification model based on the evaluation results obtained. The results showed that the use of the feature selection method was able to increase accuracy and reduce computation time.The use of the Chi Square feature selection method on the SVM algorithm with a linear kernel and parameter C:1 give the best performance with average of accuracy 0.92, precision 0.93, recall 0.89, f-measure 0.91 and computation time 8 seconds. Sebagian besar teks memiliki jumlah fitur yang banyak. Namun, fitur yang terdapat pada teks sebagian besar memiliki tingkat relevansi yang kurang bahkan mengandung noise yang nantinya dapat mengurangi hasil akurasi. Seleksi fitur digunakan untuk mengurangi dimensi ruang fitur dengan cara melakukan pembobotan pada semua fitur kemudian fitur dengan bobot yang kurang dari ambang batas akan dieliminasi. Hal ini bertujuan untuk meningkatkan akurasi serta efisiensi waktu komputasi pada proses klasifikasi teks. Pada penelitian ini, metode seleksi Information Gain, Chi Square dan Mutual Information digunakan pada proses klasifikasi teks berupa pertanyaan berbahasa Indonesia menggunakan algoritma Support Vector Machine (SVM). Kemudian, akan dilakukan analisis perbandingan pada setiap model klasifikasi berdasarkan hasil evaluasi yang didapat. Hasil penelitian menunjukan penggunaan metode seleksi fitur mampu memberikan peningkatan akurasi serta mengurangi waktu komputasi. Penggunaan metode seleksi fitur Chi Square pada algortima SVM dengan kernel linear dan parameter C: 1 menghasilkan kinerja terbaik dengan rata-rata accuracy 0.92, precision 0.93, recall 0.89, f-measure 0.91 dan waktu komputasi 8 detik.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Kata Kunci : Klasifikasi Teks, Jumlah Fitur, Seleksi Fitur, Support Vector Machine
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics > Q325.5 Machine learning
Q Science > QA Mathematics > QA299.6-433 Analysis > Q334.A755 Artificial intelligence. Computational linguistics. Computer science.
Q Science > QA Mathematics > QA8.9-QA10.3 Computer science. Artificial intelligence. Computational complexity. Data structures (Computer scienc. Mathematical Logic and Formal Languages
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Syechky Al Qodrin Aruda
Date Deposited: 08 Jul 2022 03:03
Last Modified: 08 Jul 2022 03:03
URI: http://repository.unsri.ac.id/id/eprint/73447

Actions (login required)

View Item View Item