MultiEmotions-It: a New Dataset for Opinion Polarity and Emotion Analysis for Italian

English. This paper1 presents a new linguistic resource for Italian, called MultiEmotions-It, containing comments to music videos and advertisements posted on YouTube and Facebook. These comments are manually annotated according to four different dimensions: i.e., relatedness, opinion polarity, emotions and sarcasm. For the annotation of emotions we adopted the Plutchik’s model taking into account both basic and complex emotions, i.e. dyads.


Introduction
Emotions play an influential role in consumer behaviour affecting the decision to purchase goods and services of different types, including music (Mizerski and White, 1986;Lacher, 1989). Both positive and negative emotions have an influence and this is why marketing strategies have always focused on both rational and emotional aspects (Cotte and Ritchie, 2005). With the advent of social media, platforms such as YouTube and Facebook have gained importance in the marketing industry because they allow to connect and engage consumers (Kujur and Singh, 2018). The progressive consolidation of social media as marketing spaces has highlighted the need to monitor unstructured data written by social media users. In this context, the application of Sentiment Analysis techniques have flourished with the aim of tracking customers' opinions and attitudes by analysing comments or reviews posted on social media channels (Micu et al., 2017). In this paper we present a new linguistic resource for Italian, called MultiEmotions-It, con-taining comments to music videos and advertisement posted on YouTube and Facebook. Comments are manually annotated according to four different dimensions: relatedness, opinion polarity, emotions and sarcasm. Particular attention is devoted to the annotation of emotions for which we adopted the model proposed by Plutchik (1980). Following Plutchik, we take into consideration both the eight basic emotions (joy, sadness, fear, anger, trust, disgust, surprise, anticipation) and the dyads, that is feelings composed of two basic emotions (e.g., love is a blend of joy and Trust). At the time of writing, MultiEmotions-It is the only freely available manually annotated dataset for emotion analysis for Italian. 2

Related Works
The computational study of opinions and emotions falls within the scope of the Sentiment Analysis research field (Liu, 2012). Opinion polarity identification is a task aiming at understanding whether a text is expressing positive, negative or neutral sentiment towards the subject of the text. As for emotions, their analysis follows two main approaches (Buechel and Hahn, 2017): in the first one emotions are classified into discrete categories based on the theories of psychologists such as those of Ekman (Ekman, 1992) and Plutchik whereas in the second approach emotions are represented in a dimensional form using continuous values such as valence, arousal and dominance (the so called VAD model). Survey papers like the ones by Hakak et al. (2017), Bostan & Klinger (2018) and Kim & Klinger (2019) report on studies that focus on different text genres, mainly news (Strapparava and Mihalcea, 2007), social media (Mohammad, 2012) and literary works (Alm et al., 2005).
Among social media, Twitter is the most studied platform and datasets of annotated tweets are available for different Sentiment Analysis tasks. For emotion analysis see, among others, Em-paTweet (Roberts et al., 2012) and EmoTweet (Liew et al., 2016). The literature also reports works on Facebook posts and YouTube comments with corpora and systems developed for various languages such as English (Preoţiuc-Pietro et al., 2016), Thai (Sarakit et al., 2015), Bangla (Tripto and Ali, 2018) and Indonesian (Savigny and Purwarianti, 2017). As for Italian, there are several emotion lexicons, for example (Araque et al., 2019;Passaro and Lenci, 2016;Mohammad and Turney, 2013;Mohammad, 2018), but, at the moment, no dataset with annotated emotions has been released yet. 3 Similarly to SenTube (Uryupina et al., 2014), MultiEmotions-It includes YouTube comments and contains the annotation of opinion polarity: however, we also include comments to Facebook posts and we pay particular attention to the categorical annotation of emotions. More specifically, our emotion annotation is inspired by that proposed by Phan et al. (2016) that goes beyond the classification of only the basic emotions to include Plutchik's dyads so to better capture the spectrum of human emotional experience.

Data Collection
Comments were scraped from YouTube and Facebook around mid-April 2020 using "Web Scraper" 4 , an extension for browsers. We focused on two genres of media contents: music videos (MVs) on YouTube and advertisements (Ads) both in the form of short videos (on YouTube and Facebook) and pictures (only on Facebook). We chose 9 music videos of the songs presented during Sanremo Music Festival 2020 selecting both songs that reached the top of the chart in the contest and those that ranked in the last positions. All those videos had thousands of comments: we downloaded the most recent ones, at least one hundred comments per video. Finding advertising videos with lots of comments on YouTube was more complicated because many brands disable 3 Annotated datasets for emotion analysis have been mainly developed in enterprises and are not public, see for example (Bolioli et al., 2013 the possibility of adding comments to their channel. In the end, we managed to select 20 videos of various products, mostly of food and services, such as telecommunication and banking. Similar products and services were also chosen on Facebook by downloading the comments from 13 different posts.

Data Annotation
The annotation was performed in the context of the "Sentiment Analysis" seminar held within the 'Comunicazione per l'impresa, i media e le organizzazioni complesse" 5 master's degree at Università Cattolica del Sacro Cuore in Milan. The annotation process lasted 1 week and involved thirty six students: each student annotated 30 comments for each category (i.e., YouTube MVs, YouTube Ads, Facebook Ads) for a total of 90 comments. Each comment was annotated by two students. It is important to note that students had no previous experience in linguistic annotation but had specific training in the strategic management of communication flows on various media platforms.
Annotation Guidelines. Students were required to annotate the following four dimensions for each comment; a comment may consist of more than one sentence but was analysed as a single unit: 1. Relatedness: does the comment refer to the media content? Is the comment written in a COMMENT UNR NEU POS NEG JOY TRU SAD ANG FEA DIS SUR ANT SAR EMOTIONS Saludos desde   (2000), we define sarcasm as a language device that conveys the opposite of its literal meaning (Cignarella et al., 2018).
Annotation was carried on using spreadsheets where the aforementioned dimensions were converted into 13 fields: unrelated, neutral, positive, negative, joy, trust, sadness, anger, fear, disgust, surprise, anticipation, sarcasm. Each field had to be filled in with a binary value: 0 (the dimension is absent) or 1 (the dimension is present). Spreadsheets contained 4 additional metadata fields: type, title, URL, comment. For the annotation, students were provided with the images of Plutchik's "Wheel of Emotions" 6 and of the combination of emotions in dyads 7 .
Inter-Annotator Agreement. Table 1 reports the results of the inter-annotator agreement (IAA): we measured the Krippendorff's Alpha for each label and for each pair of annotators and then we computed the average for each type of comment. The average across the three type of comments is reported in the table as well. For all the labels, IAA is below the 0.8 threshold usually considered as good reliability for content analysis research (Klaus, 1980;Artstein and Poesio, 2008), however these results are in line with the ones obtained in similar works presenting a multi-label annotation of emotions or the annotation of mixed emotions (Aman and Szpakowicz, 2007;Phan et al., 2016). The analysis of the cases of disagreement revealed several interesting issues: i) labels unrelated and neutral tended to be confused with each other. For example, the comment Qualcuno mi sa dire dove si trova il porticato della quinta immagine? (Can anyone tell me where the portico in  the fifth image is located?) is related to the content of the video but it is neutral; ii) sarcasm was confused with other forms of figurative language such as metaphors, e.g.È l'Ibrahimovic dei biscotti: perfetto (EN: it is the Ibrahimovic of biscuits: perfect); iii) the assignment of positive and negative labels registered the highest scores (average Alpha across the 3 categories: 0.71 for positive and 0.61 for negative). Nevertheless, sometimes annotators failed to distinguish between the annotation of opinion polarity and the annotation of emotions by assigning a negative polarity to comments containing negative emotions. However, the two dimensions do not always match: for example, the comment sta canzone meritava molto di più (EN: this song deserved much more) expresses disappointment but also an implicit appreciation for the song and thus a positive opinion polarity. iv) the IAA on the single emotion labels varies greatly: a similar wide variability is reported also in previous works even when dealing with non multi-label annotation (Strapparava and Mihalcea, 2008;Aman and Szpakowicz, 2007).
Creation of the Ground Truth. All comments were manually revised and disagreement were reconciled so to assign gold labels. In this way, we generated a ground truth dataset where the noise coming from the annotation of non-expert annotators was minimized. Moreover, the field emotions was added to the spreadsheets so to make explicit the name of the emotions conveyed by the comments. Table 2 shows the structure of the final dataset (metadata fields are not displayed due to space limitation) and some examples of annotation. In particular, the table reports: an unrelated comment, a neutral comment, a com-ment with a negative polarity, a basic emotion (i.e. disgust) and sarcasm, a comment with a negative polarity and a dyad (i.e., disgust which is made of sadness and fear), a comment with mixed polarity and mixed emotions. Table 3 summarizes the statistics of our final dataset showing the distribution of labels in the three categories of media content. MultiEmotions-It contains 3,240 comments for a total of more than 58,000 tokens. Only 470 comments (14.5% of the whole dataset) have no associated emotions because annotated as unrelated or neutral. Comments with positive opinion polarity are more than those with negative polarity: this is especially evident for YouTube MVs that are mostly commented by supporters of the artists performing in the video. Sarcasm is not a pervasive phenomenon: the number of comments annotated with the corresponding label is marginal, covering 1.6% of the total number of comments with an affective content, i.e. annotated with at least one emotion. More specifically, sarcasm co-occurs with two basic emotions: that is, anger (10 comments) and disgust (9 comments). As for emotions, trust is the most frequent one: indeed, many comments express admiration towards the media content in different ways, for example by thanking the brand, declaring loyalty to a product or expressing appreciation for a specific feature of the media content (e.g. the location of the video). The emotion trust does not appear in the dataset only as a basic emotion but also in several combinations: indeed, 36.5% of the comments with an affective content are annotated with a dyad and 18.3% with a mix of emotions. Table 4 reports the 3 most frequent dyads and mixes of emotions in the dataset together with an example. As shown in the table, sentimentality (that is a combination of trust and sadness) plays an important role in Ads that try to induce a deep, overwhelming emotional response. Indeed, sentimentality is an emotion that marketing research has identified as a fundamental purchase decision variable (Morton et al., 2013).

Dataset Analysis
Optimism (anticipation + joy) and pessimism (anticipation + sadness) are not very frequent in the dataset with 65 and 16 occurrences respectively. However, it is interesting to note that they are mainly associated with comments on advertisements related to the COVID-19 pandemic, for example:

Baseline System
To establish a baseline on our data, we developed a simple multi-label classification model using the fastText library (Joulin et al., 2016). 8 The aim of the model is to assign the correct emotion labels to comments. To this end, we randomly split comments and their annotated emotion labels into train and validation following an 80:20 ratio, thus having 2,592 comments for training and the remaining 648 for testing the performance of the learned classifier on new data. Texts have been lowercased and punctuation removed. We trained the model with the following parameters: • learning rate: 0.5 • epochs: 25 • word n-grams: 2 • loss function: one-vs-all With the previous setting, we obtained 0.57 Precision, 0.43 Recall and 0.49 F-measure. Only four labels registered a F-measure above 0.5: i.e., trust (0.68), love (0.54), delight (0.53), sentimentality (0.50). 9

Conclusion and Future Work
This paper describes MultiEmotions-It, a new manually annotated dataset for opinion polarity and emotion analysis made of more than 3,000 comments on music videos and advertisements published on YouTube and Facebook. As for future work, we plan to: (i) extend the annotation guidelines to distinguish the specific object towards which the opinion is directed (e.g. the product, the actor, the location of the video) following the work by Severyn et al. (2016), (ii) extend the dataset with new comments taken also from Instagram and Twitter, (iii) extract a new word-emotion association lexicon from MultiEmotions-It using vector space models (Passaro et al., 2015) in order to cover complex emotions.