TOPIC MODELING MENGGUNAKAN BERTOPIC DENGAN LLAMA2 SEBAGAI TOPIC REPRESENTATION TUNING

KHAIRI, ALPIAN and Yusliani, Novi and Saputra, Danny Matthew (2024) TOPIC MODELING MENGGUNAKAN BERTOPIC DENGAN LLAMA2 SEBAGAI TOPIC REPRESENTATION TUNING. Undergraduate thesis, Universitas Sriwijaya.

[thumbnail of RAMA_55201_09021282025044.pdf] Text
RAMA_55201_09021282025044.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_TURNITIN.pdf] Text
RAMA_55201_09021282025044_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (2MB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_01_front_ref.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_02.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (145kB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_03.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (50kB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_04.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (391kB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_05.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (122kB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_06.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_06.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (7kB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_07_ref.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_07_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (73kB) | Request a copy
[thumbnail of RAMA_55201_09021282025044_0008118205_0010058507_08_lamp.pdf] Text
RAMA_55201_09021282025044_0008118205_0010058507_08_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (53kB) | Request a copy

Abstract

The growing use of social media, particularly Twitter, has generated large amounts of information in the form of text data covering a wide range of topics and issues. Analyzing Twitter data has the potential to find important insights into topics relevant to society. Topic modeling is one of the latest innovations in text data processing to find topics in a set of text. This research aims to perform topic modeling on Indonesian tweets using BERTopic with LLAMA2 as topic representation tuning. LLAMA2 is used to generate labels from a set of keywords generated from c-TF-IDF calculations. The dataset used consists of 10,000 Indonesian tweets taken from the Twitter account @detikcom. The data is divided into 2 parts, 8,000 tweets are used for training data and 2,000 tweets are used for testing. Based on the results of topic modeling with BERTopic, 49 total topics were obtained. Topic Modeling Evaluation is done using coherence score cv, obtained an average coherence score cv of 0.86 on training data and 0.73 on testing data.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: BERTopic, Cohrence Score cv, Large Language Models, Pemodelan Topik, Tweet
Subjects: Q Science > QA Mathematics > QA75-76.95 Calculating machines > QA76.9.B45 Big data. Machine learning. Quantitative research. Metaheuristics.
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Alpian Khairi
Date Deposited: 27 May 2024 02:51
Last Modified: 27 May 2024 02:51
URI: http://repository.unsri.ac.id/id/eprint/145486

Actions (login required)

View Item View Item