PERBANDINGAN ALGORITMA JARO-WINKLER DISTANCE DAN LEVENSHTEIN DISTANCE DALAM MENDETEKSI KEMIRIPAN DOKUMEN BAHASA INDONESIA

SYAKBANIA, NISVA and Yusliani, Novi and Arsalan, Osvari (2020) PERBANDINGAN ALGORITMA JARO-WINKLER DISTANCE DAN LEVENSHTEIN DISTANCE DALAM MENDETEKSI KEMIRIPAN DOKUMEN BAHASA INDONESIA. Undergraduate thesis, Sriwijaya University.

[img] Text
RAMA_55201_09021181320023.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (2MB) | Request a copy
[img] Text
RAMA_55201_09021181320023_TURNITIN.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (10MB) | Request a copy
[img]
Preview
Text
RAMA_55201_09021181320023_0008118205_0028068806_01_front_ref.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.

Download (503kB) | Preview
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_02.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (315kB) | Request a copy
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_03.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (271kB) | Request a copy
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_04.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB) | Request a copy
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_05.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (450kB) | Request a copy
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_06.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (7kB) | Request a copy
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_06_ref.pdf - Bibliography
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (210kB) | Request a copy
[img] Text
RAMA_55201_09021181320023_0008118205_0028068806_07_lamp.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Public Domain Dedication.

Download (1MB) | Request a copy

Abstract

Document similarity detection is used to calculate the similarity between two or more documents based on semantic similarity or lexical similarity. This research proposed to detect similarity based on lexical similarity using a string matching techniques on each documents. Jaro-Winkler and Levenshtein Distance are algorithms usually used in string matching techniques. Jaro-Winkler Distance includes a step of calculating the length of strings in the document, counting common characters, and transposition. Levenshtein Distance is an algorithm which is used to calculate the minimum distance that needed to transform one string into the other. Testing was done with a total 19 authentic document and 6 comparative, the result of this research shows that the average error value of Levenshtein Distance is 7,86% while Jaro-Winkler Distance with average error value of 24,45%. As for computing time, four out of five testing configuration shows that Jaro-Winkler Distance have a faster computing time than Levenshtein Distance.

Item Type: Thesis (Undergraduate)
Uncontrolled Keywords: Kemiripan Dokumen, Jaro-Winkler Distance, Levenshtein Distance
Subjects: P Language and Literature > P Philology. Linguistics > P98-98.5 Computational linguistics. Natural language processing
Divisions: 09-Faculty of Computer Science > 55201-Informatics (S1)
Depositing User: Users 8140 not found.
Date Deposited: 24 Sep 2020 04:26
Last Modified: 24 Sep 2020 04:26
URI: http://repository.unsri.ac.id/id/eprint/35560

Actions (login required)

View Item View Item