2021, Vol. 12, No. 4. - go to content...
Permanent address of this page - https://sfk-mn.ru/en/21scsk421.html
Метаданные этой статьи так же доступны на русском языке
Full article in PDF format (file size: 537.2 KB)
For citation:
Samoylova T.A., Griber Yu.A. [A comparative study of the effectiveness of data classification algorithms in sociological analysis of color names] World of Science. Series: Sociology, Philology, Cultural Studies, 2021, Vol. 12, No. 4. Available at: https://sfk-mn.ru/PDF/21SCSK421.pdf (in Russian).
A comparative study of the effectiveness of data classification algorithms in sociological analysis of color names
Samoylova Tatyana Arkadyevna
Smolensk State University, Smolensk, Russia
E-mail: tatsamoilova24@gmail.com
RSCI: https://elibrary.ru/author_profile.asp?id=100995
Griber Yulia Alexandrovna
Smolensk State University, Smolensk, Russia
E-mail: y.griber@gmail.com
ORCID: https://orcid.org/0000-0002-2603-5928
RSCI: https://elibrary.ru/author_profile.asp?id=303167
Researcher ID: https://www.researcherid.com/rid/AAG-4410-2019
SCOPUS: https://www.scopus.com/authid/detail.url?authorId=56809444600
Abstract. The article presents results of a comparative study of different classification algorithms for predicting the gender of the respondent based on his answers in an online experiment aimed at studying the social differentiation of the Russian color-naming system. The data were conducted in an online experiment (http://colournaming.com) in which 2457 native Russian speakers (1402 women, 1055 men), belonging to different age groups ranging from 16 to 98 years old (mean age = 41.36 years, SD = 17.71), participated in 2018–2020. Each of the answers received in the course of the study (N = 55515) contained a number of characteristics of a fundamentally different nature, fixing not only the coordinates of the color samples in the CIELAB system and the color names assigned to them (simple or compound word, phrase or the whole sentence), but also socio-demographic information about the sex and age of the respondent, his place of birth and permanent residence, educational level and profession. The authors analyze various classification algorithms using the software libraries NumPy, Pandas, Scikit-learn for the Python programming language. The effectiveness of the classifiers is evaluated by such parameters as accuracy, precision, F1-score and receiver operating characteristic curve. Simulation results show that the decision tree algorithm classifies data with 92 % accuracy and quality corresponding to AUC (Area Under Curve) = 0.99. This means that it is the best one to use in the processing and analysis of the obtained data. The presented in the paper methodology for assessing the effectiveness of the classification algorithm using a set of complementary metrics can be used as a model for selecting the most appropriate software tool, taking into account the specifics of a particular case in further sociological research.
Keywords: experiment; sociological data analysis; social differentiation of language; color naming; machine learning; classification; Phyton
This work is licensed under a Creative Commons Attribution 4.0 License.
ISSN 2542-0577 (Online)
Dear readers! Comments on articles are accepted in Russian and English.
Comments are moderated and appear on the site after verification by the editor.
Comments not related to the subject of the article are not published.