Аннотация:
There are many ways people express their reactions in the media. Text data is one of them,
for example, comments, reviews, blog posts, messages, etc. Analysis of emotions expressed there is
in high demand nowadays for various purposes. This research provides a method of performing
sentiment analysis of text information using machine learning. The authors trained a classifier based
on the BERT encoder, which recognizes emotions in text messages in English written in chat style.
To handle raw chat-style messages, authors developed an enhanced text standardization layer. The
list of emotions identified includes admiration, amusement, anger, annoyance, approval, caring,
confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear,
gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, and
surprise. The model solves the problem of multiclass multilabel text classification, which means
that more than one class can be predicted from one piece of text. The authors trained the model on
the GoEmotions dataset, which consists of 54,263 text comments from Reddit. The model reached
a macro-averaged F1-Score of 0.50704 in emotions prediction and 0.7349 in sentiments prediction on
the testing dataset. The presented model increased the quality of emotions prediction by 10.2% and
sentiments prediction by 6.5% in comparison to the baseline approach.