ALKAUSAR, ALIF TORIQ and Yusliani, Novi and Darmawahyuni, Annisa (2024) PEMODELAN TOPIK MENGGUNAKAN PRE-TRAINED LANGUAGE MODEL INDOBERT DAN VARIATIONAL AUTOENCODER(VAE). Undergraduate thesis, Sriwijaya University.
Text
RAMA_55201_09021182025016.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (3MB) | Request a copy |
|
Text
RAMA_55201_09021182025016_TURNITIN.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (5MB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_01_front_ref.pdf - Accepted Version Available under License Creative Commons Public Domain Dedication. Download (1MB) |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_02.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (731kB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_03.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (391kB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_04.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (1MB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_05.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (669kB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_06.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (279kB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_07_ref.pdf - Bibliography Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (247kB) | Request a copy |
|
Text
RAMA_55201_09021182025016_0008118205_8968340022_08_lamp.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Public Domain Dedication. Download (153kB) | Request a copy |
Abstract
The number of information and documents scattered on the internet today is very large and makes it difficult to search for information according to topics. This is a challenge in grouping and managing information such as online news headline data. Therefore, the solution to this problem is to use a Topic Modeling System that aims to group information and documents according to their topics. This research uses a Topic Modeling method that combines the use of pre-trained language models BERT and Variational Autoencoders. This approach utilizes BERT's capability in text embedding and VAE's capability in dimensionality reduction and hidden representation, and uses K-means algorithm to cluster the data. For model training, 5000 news headline data with 10 different categories were used from online media namely cnnindonesia, detik.com, and kompas. Testing was conducted using 2000 news headline data that did not enter the training stage. The Topic Modeling System produces 10 groups, with an average coherence score cv of 0.78, the lowest value 0.76, and the highest value 0.80.
Item Type: | Thesis (Undergraduate) |
---|---|
Uncontrolled Keywords: | Judul Berita, Pemodelan Topik, Variational Autoencoder, Pretrained Language Mode, BERT, Coherence Score CV |
Subjects: | Q Science > QA Mathematics > QA75-76.95 Calculating machines > QA76 Computer software |
Divisions: | 09-Faculty of Computer Science > 55201-Informatics (S1) |
Depositing User: | Alif Toriq Alkausar |
Date Deposited: | 14 May 2024 05:08 |
Last Modified: | 14 May 2024 05:08 |
URI: | http://repository.unsri.ac.id/id/eprint/143978 |
Actions (login required)
View Item |