Dengesiz veriler için ağırlıklı geometrik ortalama tabanlı yeni bir yeniden örnekleme yaklaşımı

Gümüş, İbrahim Halil; Güldal, Serkan; Dal, Abdullah; Yavaş, Mustafa

DSpace Ana Sayfası
→
Adıyaman Üniversitesi Yayınları
→
Adıyaman Üniversitesi Dergiler
→
Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi
→
Cilt 8 / Sayı 15 (2021)
→
Öğe Göster

dc.contributor.author	Gümüş, İbrahim Halil
dc.contributor.author	Güldal, Serkan
dc.contributor.author	Dal, Abdullah
dc.contributor.author	Yavaş, Mustafa
dc.date.accessioned	2022-02-08T12:40:56Z
dc.date.available	2022-02-08T12:40:56Z
dc.date.issued	2021
dc.identifier.issn	2149-0309
dc.identifier.uri	http://dspace.adiyaman.edu.tr:8080/xmlui/handle/20.500.12414/2369
dc.description.abstract	Son yıllarda makine öğrenmesi yöntemleri kullanılarak veri sınıflandırma işlemlerinde büyük gelişmeler yaşanmıştır. Teknolojik gelişmeler arttıkça, internet ortamında ve diğer ortamlarda verilerin boyutu da hızla artmaktadır. Bununla beraber dengesiz ve sınıflandırılmamış veriler ortaya çıkmıştır. Dengesizlik problemi iki sınıftan birinin diğerine göre daha az örneğe sahip olması durumudur. Özellikle tıbbi alanda kullanılan veri kümelerin çoğu dengesiz dağılıma sahiptir. Dengesiz dağılıma sahip bir veri kümesi sınıflandırıcı algoritmaların başarım performansını olumsuz yönde etkilemektedir. Bu dağılımı dengelemek ve sınıflandırmak için birçok çalışma yapılmıştır. Bu çalışmalar veri ve algoritma düzeyinde olup, yeniden örnekleme yöntemi ile örneklem azaltma ve örneklem çoğaltma işlemleridir. Bu çalışmada azınlık sınıfa ait mevcut örnekler, yeniden sentetik olarak çoğaltılmıştır ve veri kümeleri dengelenmiştir. Yeniden örnekleme işlemi için, azınlık sınıfa ait örnekler arasında, Öklid uzaklık metriğiyle tüm data noktaları için en yakın komşular tespit edilmiştir. Bu komşular baz alınarak, her örnek arasında Ağırlıklı Geometrik Ortalama kullanılarak istenen sayıda yeni sentetik örnekler oluşturulmuştur. Bu işlem sonucunda veri kümeleri dengeli hale getirilmiştir. Ayrıca, veri setlerini dengelemek için Rastgele Az Örnekleme (RUS), Rastgele Aşırı Örnekleme (ROS) ve Sentetik Azınlık Örnekleme Tekniği (SMOTE) yöntemleri de kullanılmıştır. Orijinal ve dengelenmiş veri kümeleri Random Forest algoritması ile sınıflandırılmış ve sonuçları kıyaslanmıştır. Çalışma sonucunda, yeniden örnekleme yaklaşımı ile dengelenen veri setlerinin tüm performans değerlerinde artış gözlemlenmiştir. Çalışmada önerilen yaklaşım ile yeniden örneklenerek dengelenen veri kümesi, ham veri kümesi ve diğer yöntemlere kıyasla sınıflandırma performansını iyileştirdiği gösterilmiştir.	tr
dc.description.abstract	In recent years, there have been great improvements in data classification processes using machine learning methods. As technological advances increase, the size of data in the internet and other environments also increases rapidly. With these developments, unbalanced and unclassified data has emerged. The problem of imbalance is that one of the two classes has fewer samples than the other. Most of the datasets, especially used in the medical field, have an unbalanced distribution. A dataset with unbalanced distribution negatively affects the performance of classification algorithms. Many studies have been conducted to balance and classify this distribution. These studies are at the data and algorithm level and are undersampling and oversampling processes. In this study, the existing samples belonging to the minority class were resampled synthetically, and the datasets were balanced. For the resampling process, among the samples belonging to the minority class, the closest neighbors were determined for all data points using the Euclidean distance metric. Based on these neighbors, the desired number of new synthetic samples were created between each sample using the Weighted Geometric Mean. As a result of this process, the dataset has been balanced. In addition, Random Undersampling (RUS), Random Oversampling (ROS), and Synthetic Minority Sampling Technique (SMOTE) methods are also used to balance the datasets. The raw and balanced datasets are classified using the Random Forest algorithm, and the results are compared. As a result of the study, an increase is observed in all performance values of the datasets balanced with the new resampling approach. Using the approach proposed in the study, it is shown that the balanced datasets using the new resampling method improve the classification performance compared to the raw dataset and other methods.	tr
dc.language.iso	en	tr
dc.publisher	Adıyaman Üniversitesi	tr
dc.subject	Yeniden Örnekleme	tr
dc.subject	Ağırlıklı Geometrik Ortalama	tr
dc.subject	Dengesiz Veri	tr
dc.subject	SMOTE	tr
dc.subject	Resampling	tr
dc.subject	Weighted Geometric Mean	tr
dc.subject	Unbalanced Data	tr
dc.title	Dengesiz veriler için ağırlıklı geometrik ortalama tabanlı yeni bir yeniden örnekleme yaklaşımı	tr
dc.title.alternative	A new resampling approach based on weighted geometric mean for unbalanced data	tr
dc.type	Article	tr
dc.contributor.department	Adıyaman University, Graduate Education Institute, Department of Mathematics, Adıyaman, 02040, Turkey	tr
dc.contributor.department	Adıyaman University, Faculty of Arts and Sciences, Department of Physics, Adıyaman, 02040, Turkey	tr
dc.contributor.department	Adıyaman University, Vocational School of Technical Sciences, Department of Computer Technologies, Adıyaman, 02040, Turkey	tr
dc.identifier.endpage	352	tr
dc.identifier.issue	14	tr
dc.identifier.startpage	343	tr
dc.identifier.volume	8	tr
dc.source.title	Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi	tr