Optimizing hierarchical classifiers with parameter tuning and confidence scoring

Mashtalir, Sergii V.; Машталір, Сергій Володимирович; Nikolenko, Oleksandr V.; Ніколенко, Олександр Володимирович

Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://dspace.opu.ua/jspui/handle/123456789/14686

Полная запись метаданных

Поле DC	Значение	Язык
dc.contributor.author	Mashtalir, Sergii V.	-
dc.contributor.author	Машталір, Сергій Володимирович	-
dc.contributor.author	Nikolenko, Oleksandr V.	-
dc.contributor.author	Ніколенко, Олександр Володимирович	-
dc.date.accessioned	2024-10-13T13:04:37Z	-
dc.date.available	2024-10-13T13:04:37Z	-
dc.date.issued	2024-09-27	-
dc.identifier.issn	2663-0176	-
dc.identifier.issn	2663-7731	-
dc.identifier.uri	http://dspace.opu.ua/jspui/handle/123456789/14686	-
dc.description.abstract	Hierarchical classifiers play a crucial role in addressing complex classification tasks by breaking them down into smaller, more manageable sub-tasks. This paper continues a series of works, focused on the technical Ukrainian texts hierarchical classification, specifically the classification of repair works and spare parts used in automobile maintenance and servicing. We tackle the challenges posed by multilingual data inputs – specifically Ukrainian, Russian, and their hybrid – and the lack of standard data cleaning models for the Ukrainian language. We developed a novel classification algorithm, which employs TF-IDF victimization with unigrams and bigrams, keyword selection, and cosine similarity for classification. This paper describes a method for training and evaluating a hierarchical classification model using parameter tuning for each node in a tree structure. The training process involves initializing weights for tokens in the class tree nodes and input strings, followed by iterative parameter tuning to optimize classification accuracy. Initial weights are assigned based on predefined rules, and the iterative process adjusts these weights to achieve optimal performance. The paper also addresses the challenge of interpreting multiple confidence scores from the classification process, proposing a machine learning approach using Scikit-learn's GradientBoostingClassifier to calculate a unified confidence score. This score helps assess the classification reliability, particularly for unlabeled data, by transforming input values, generating polynomial parameters, and using logarithmic transformations and scaling. The classifier is fine-tuned using hyper parameter optimization techniques, and the final model provides a robust confidence score for classification tasks, enabling the verification and classification results optimization across large datasets. Our experimental results demonstrate significant improvements in classification performance. Overall classification accuracy nearly doubled after training, reaching 92.38 %. This research not only advances the theoretical framework of hierarchical classifiers but also provides practical solutions for processing large-scale, unlabeled datasets in the automotive industry. The developed methodology can enhance various applications, including automated customer support systems, predictive maintenance, and decision-making processes for stakeholders like insurance companies and service centers. Future work will extend this approach to more complex tasks, such as extracting and classifying information from extensive text sources like telephone call transcriptions.	en
dc.description.abstract	Ієрархічні класифікатори відіграють вирішальну роль у вирішенні складних задач класифікації, розбиваючи їх на менші, більш керовані підзадачі. Ця стаття продовжує серію робіт, зосереджених на ієрархічній класифікації технічних українських текстів, зокрема класифікації ремонтних робіт та запасних частин, що використовуються в обслуговуванні та ремонті автомобілів. Ми вирішуємо питання, пов'язані з багатомовними вхідними даними – зокрема українською, російською та їх міксом – і відсутністю стандартних моделей попередньої обробки даних для української мови. У цій статті описується метод навчання та оцінювання моделі ієрархічної класифікації за допомогою налаштування параметрів для кожного вузла в деревоподібній структурі. Процес навчання включає ініціалізацію ваг для токенів у вузлах дерева класів та вхідних рядках, після чого проводиться ітеративне налаштування параметрів для оптимізації точності класифікації. Початкові ваги призначаються на основі наперед визначених правил, а ітеративний процес коригує ці ваги для досягнення оптимальної продуктивності. Стаття також розглядає проблему інтерпретації множинних показників впевненості, отриманих з процесу класифікації, пропонуючи підхід машинного навчання з використанням GradientBoostingClassifier з бібліотеки Scikit-learn для розрахунку уніфікованого показника впевненості. Цей показник допомагає оцінити надійність класифікації, особливо для нерозмічених даних, шляхом трансформації вхідних значень, генерації поліноміальних параметрів та використання логарифмічних перетворень і масштабування. Класифікатор точно налаштовується за допомогою технік оптимізації гіперпараметрів, а фінальна модель забезпечує надійний показник впевненості для задач класифікації, дозволяючи перевіряти та оптимізувати результатів класифікації на великих наборах даних. Загальна точність класифікації майже подвоїлася після навчання, досягнувши 92.38 %. Це дослідження не тільки просуває теоретичну основу ієрархічних класифікаторів, але й надає практичні рішення для обробки великомасштабних, нерозмічених наборів даних в автомобільній індустрії. Майбутні роботи будуть спрямовані на розширення цього підходу на більш складні задачі, такі як знаходження та класифікація інформації з великих текстів, наприклад, транскрипцій телефонних дзвінків.	en
dc.language.iso	en	en
dc.publisher	Odessа Polytechnic National University	en
dc.subject	Natural language processing	en
dc.subject	tree-based classification	en
dc.subject	machine learning	en
dc.subject	data analysis	en
dc.subject	applied intelligent systems	en
dc.subject	обробка природної мови (NLP)	en
dc.subject	деревоподібна класифікація	en
dc.subject	машинне навчання	en
dc.subject	аналіз даних	en
dc.subject	прикладні інтелектуальні системи	en
dc.title	Optimizing hierarchical classifiers with parameter tuning and confidence scoring	en
dc.title.alternative	Оптимізація ієрархічних класифікаторів шляхом налаштування параметрів та оцінки впевненості	en
dc.type	Article	en
opu.citation.journal	Herald of Advanced Information Technology	en
opu.citation.volume	3	en
opu.citation.firstpage	231	en
opu.citation.lastpage	242	en
opu.citation.issue	7	en
Располагается в коллекциях:	2024, Vol. 7, № 3

Файлы этого ресурса:

Файл	Описание	Размер	Формат
1_ Машталір Ніколенко .pdf		1.23 MB	Adobe PDF	Просмотреть/Открыть

Показать базовое описание ресурса Просмотр статистики

Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.