Laporkan Masalah

DECEPTIVE OPINION SPAM DETECTION USING CONVOLUTIONAL NEURAL NETWORK (CNN) ENSEMBLE

AFLAH NADHIF HAMMAM, Mardhani Riasetiawan, SE (Accounting), MT (IT), Dr (Computer Science)

2020 | Skripsi | S1 ILMU KOMPUTER

User generated content such as reviews have a big impact in influencing con- sumer decision on the internet. Several parties are interested in using this platform to achieve their agenda such as profit or many other malicious intention by manu- facturing deceptive opinion spam (Ott et al., 2013). Deceptive opinion spam is the fictitious reviews that have been deliberately written to sound authentic and aims to deceive the reader (Ott et al., 2013). Human detection of deceptive opinion spam is proven to be as low as 57.33% (Ren and Ji, 2017), But research done by Ott et al. (2013) shows the possibility of performing detection using machine learning meth- ods. This research aims to create CNN Ensemble models that implement the concept of using a collection of neural networks by using CNN as the base model to further increase the reliability of the detection model due to the existence of outliers in the base model. The implementation of the CNN Ensemble model consists of 30 CNN model where their individual results is combined to get the CNN Ensemble final re- sult. From the 30 CNN models, 15 of them is using GloVe word embedding while the other 15 models is using FastText word embedding. The individual models start by embedding the reviews to get the numerical representations of the reviews, followed by the convolutional, pooling, and activation layer to get the vector representations of each sentences, and followed by these same layers again to get the vector representa- tions of the whole review. Lastly from these review vector, it is used to get the final prediction of the individual models. The CNN Ensemble model achieve an accuracy of 82%. Finally the CNN Ensemble model experiments using 30 CNNs and 10 CNNs shows that the increasing number of individual model used in the ensemble, without using more individual model variation, would yield insignificant improvements.

User generated content such as reviews have a big impact in influencing con- sumer decision on the internet. Several parties are interested in using this platform to achieve their agenda such as profit or many other malicious intention by manu- facturing deceptive opinion spam (Ott et al., 2013). Deceptive opinion spam is the fictitious reviews that have been deliberately written to sound authentic and aims to deceive the reader (Ott et al., 2013). Human detection of deceptive opinion spam is proven to be as low as 57.33% (Ren and Ji, 2017), But research done by Ott et al. (2013) shows the possibility of performing detection using machine learning meth- ods. This research aims to create CNN Ensemble models that implement the concept of using a collection of neural networks by using CNN as the base model to further increase the reliability of the detection model due to the existence of outliers in the base model. The implementation of the CNN Ensemble model consists of 30 CNN model where their individual results is combined to get the CNN Ensemble final re- sult. From the 30 CNN models, 15 of them is using GloVe word embedding while the other 15 models is using FastText word embedding. The individual models start by embedding the reviews to get the numerical representations of the reviews, followed by the convolutional, pooling, and activation layer to get the vector representations of each sentences, and followed by these same layers again to get the vector representa- tions of the whole review. Lastly from these review vector, it is used to get the final prediction of the individual models. The CNN Ensemble model achieve an accuracy of 82%. Finally the CNN Ensemble model experiments using 30 CNNs and 10 CNNs shows that the increasing number of individual model used in the ensemble, without using more individual model variation, would yield insignificant improvements.

Kata Kunci : Convolutional Neural Network Ensemble, Convolutional Neural Network, Deceptive Opinion Spam Detection, Outliers

  1. S1-2020-392756-abstract.pdf  
  2. S1-2020-392756-bibliography.pdf  
  3. S1-2020-392756-tableofcontent.pdf  
  4. S1-2020-392756-title.pdf