PEMODELAN TOPIK MENGGUNAKAN PRE-TRAINED LANGUAGE MODEL INDOBERT DAN VARIATIONAL AUTOENCODER(VAE)

ALKAUSAR, ALIF TORIQ and Yusliani, Novi and Darmawahyuni, Annisa (2024) PEMODELAN TOPIK MENGGUNAKAN PRE-TRAINED LANGUAGE MODEL INDOBERT DAN VARIATIONAL AUTOENCODER(VAE). Undergraduate thesis, Sriwijaya University.

[thumbnail of RAMA_55201_09021182025016.pdf] Text
RAMA_55201_09021182025016.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (3MB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_TURNITIN.pdf] Text
RAMA_55201_09021182025016_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (5MB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_01_front_ref.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_02.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (731kB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_03.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (391kB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_04.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_05.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (669kB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_06.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_06.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (279kB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_07_ref.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_07_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (247kB) | Request a copy
[thumbnail of RAMA_55201_09021182025016_0008118205_8968340022_08_lamp.pdf] Text
RAMA_55201_09021182025016_0008118205_8968340022_08_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (153kB) | Request a copy

Abstract

The number of information and documents scattered on the internet today is very large and makes it difficult to search for information according to topics. This is a challenge in grouping and managing information such as online news headline data. Therefore, the solution to this problem is to use a Topic Modeling System that aims to group information and documents according to their topics. This research uses a Topic Modeling method that combines the use of pre-trained language models BERT and Variational Autoencoders. This approach utilizes BERT's capability in text embedding and VAE's capability in dimensionality reduction and hidden representation, and uses K-means algorithm to cluster the data. For model training, 5000 news headline data with 10 different categories were used from online media namely cnnindonesia, detik.com, and kompas. Testing was conducted using 2000 news headline data that did not enter the training stage. The Topic Modeling System produces 10 groups, with an average coherence score cv of 0.78, the lowest value 0.76, and the highest value 0.80.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Judul Berita, Pemodelan Topik, Variational Autoencoder, Pretrained Language Mode, BERT, Coherence Score CV
Subjects: Q Science > QA Mathematics > QA75-76.95 Calculating machines > QA76 Computer software
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Alif Toriq Alkausar
Date Deposited: 14 May 2024 05:08
Last Modified: 14 May 2024 05:08
URI: http://repository.unsri.ac.id/id/eprint/143978

Actions (login required)

View Item View Item