SISTEM PENDETEKSI DINI TRANSLATED PLAGIARISM PADA DOKUMEN DIGITAL
Yosua Alberth Sir, Dr.tech. Khabib Mustofa, S.Si., M.Kom
2011 | Tesis | S2 Ilmu KomputerPenggunaan aplikasi internet yang telah melewati batas bahasa menghasilkan efek negatif yaitu meningkatnya tindakan translated plagiarism. Pada lingkungan akademis, translated plagiarism ditemukan pada kasus-kasus seperti: penulisan tugas akhir, tesis, dan paper. Dalam tesis ini, peneliti mengusulkan sistem pendeteksi dini translated plagiarism (Indonesia-Inggris) pada dokumen digital berbasis algoritma sentence-based detection hasil modifikasi. Algoritma tersebut adalah hasil modifikasi dari algoritma sentence-based detection. Sistem yang diusulkan bekerja dengan cara sebagai berikut: (i) menerjemahkan dokumen masukan menggunakan komponen Google Translate API, (ii) mencari dokumendokumen PDF yang mirip dengan dokumen masukan hasil terjemahan pada repository World Wide Web dengan menggunakan komponen Google AJAX Search API, (iii) apabila ditemukan, sistem akan mengunduh dokumen tersebut, kemudian (iv) melakukan preprocessing dengan cara: eliminasi tanda baca, eliminasi angka, eliminasi stopwords, lemmatisasi kata, dan eliminasi kata-kata yang berulang. Kemudian, (v) proses terakhir adalah membandingkan konten dokumen hasil terjemahan terhadap konten dokumen hasil unduh. Untuk membandingkan akurasi pendeteksian, peneliti membangun 2 sistem: (i) sistem pertama berbasis algoritma sentence-based detection dan (ii) sistem kedua berbasis algoritma sentence-based detection hasil modifikasi, kemudian menguji coba dengan menggunakan 25 dataset yang sama dan mengevaluasi akurasi dengan menggunakan nilai RMSE dan uji t sebagai basis perbandingan. Hasil pengujian menunjukan bahwa ada perbedaan akurasi yang signifikan antara kedua sistem di mana sistem berbasis algoritma sentence-based detection hasil modifikasi (RMSE=24,95%) lebih akurat dari sistem berbasis algoritma sentencebased detection (RMSE=38,54%).
The use of Internet applications, which have already crossed the language border, caused a serious problem such as translated plagiarism. In academic institutions, translated plagiarism is found in various cases, such as: theses, final projects, and papers. In this thesis, we propose an early detection system for translated plagiarism (Indonesian-English) on digital document which based on the revised version of sentence-based detection algorithm. This algorithm is a modified version of the sentence-based detection algorithm. The proposed system works as follows: (i) translating the input document using the Google Translate API component, (ii) searching for PDF documents that are similar to the translated document on WWW repository using the Google AJAX Search API component. If it is found, (iii) the system will download these documents, then (iv) does some preprocessing steps, such as: removing punctuation, removing numbers, removing stopwords, removing repeated words, and doing a process called lemmatization of words. The last process (v) is to compare the content of translated document against downloaded documents. To compare the accuracy of detection, we built two systems: (i) the first system based on sentence-based detection algorithm and (ii) a second system based on the revised version of sentence-based detection algorithm, and then tested both systems by using the same datasets (25 datasets). We evaluate the accuracy of both systems by using RMSE metric and the t test as the basis for comparison. The results showed that there was a significant difference in accuracy between the two systems, where the system based on the revised version of sentence-based detection algorithm (RMSE=24,95%) is more accurate than the system based on sentence-based detection algorithm (RMSE=38,54%).
Kata Kunci : Translated Plagiarism