ANALISA BIG DATA PADA CLUSTER KOMPUTER MENGGUNAKAN KOMPUTASI TERDISTRIBUSI

ZAINUDIN, ZAINUDIN and Heryanto, Ahmad (2023) ANALISA BIG DATA PADA CLUSTER KOMPUTER MENGGUNAKAN KOMPUTASI TERDISTRIBUSI. Undergraduate thesis, Sriwijaya University.

[thumbnail of RAMA_56201_09011181924004.pdf] Text
RAMA_56201_09011181924004.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (4MB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_TURNITIN.pdf] Text
RAMA_56201_09011181924004_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (5MB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_0022018703_01_front_ref.pdf] Text
RAMA_56201_09011181924004_0022018703_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (1MB)
[thumbnail of RAMA_56201_09011181924004_0022018703_02.pdf] Text
RAMA_56201_09011181924004_0022018703_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (598kB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_0022018703_03.pdf] Text
RAMA_56201_09011181924004_0022018703_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (632kB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_0022018703_04.pdf] Text
RAMA_56201_09011181924004_0022018703_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (2MB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_0022018703_05.pdf] Text
RAMA_56201_09011181924004_0022018703_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (10kB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_0022018703_06_ref.pdf] Text
RAMA_56201_09011181924004_0022018703_06_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (245kB) | Request a copy
[thumbnail of RAMA_56201_09011181924004_0022018703_07_lamp.pdf] Text
RAMA_56201_09011181924004_0022018703_07_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (538kB) | Request a copy

Abstract

Along with the development of the era of globalization, the use of technology has been very widespread in various industrial sectors, so data accumulates in a very fast time to grow into large-scale data called big data. The emergence of big data makes the formulation of optimization problems more complicated, because of the large volume and complexity of the data, therefore it is necessary to implement a parallel and distributed computer cluster architecture. There are several methods that support parallelization and computing systems to perform data processing such as MPI (Message Processing Interface), OpenMP (Open Multi Processing), Hadoop, Spark, and others. In the context of big data, many data structures in big data become more complex, high dimensions, and large sizes. This study utilizes the parallelization system of the Apache Spark framework system which is used as a medium to conduct distributed computer clusters to carry out big data processing. The results of this study showed that the distributed cluster system on spark effectively read big data, in the wordcount experiment on 31,788,324 rows of data, spark was faster with a time difference of 84.6 seconds. The performance produced in the spark library, MLlib, to conduct machine learning classification experiments and recommendation system to carry out advanced big data processing, the performance produced in the classification model gets the best value with an accuracy of 94.95%, F1-score 95%, recall 95.18%, and precision 94.77% of the 6 models used, while for the recommendation system with Algorithm ALS (Alternating Least Squares) got an RMSE score of 0.46 from 5 experiments with different tune parameters.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Komputasi Terdistribusi, Big Data, Cluster Komputer, Apache Spark
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics > Q325.5 Machine learning
Divisions: 09-Faculty of Computer Science > 56201-Computer Systems (S1)
Depositing User: Zainudin Zainudin
Date Deposited: 22 Nov 2023 07:04
Last Modified: 22 Nov 2023 07:04
URI: http://repository.unsri.ac.id/id/eprint/130876

Actions (login required)

View Item View Item