Tracing Metonymic Relations in T-PAS: An Annotation Exercise on a Corpus-based Resource for Italian

In this paper we address the main issues and results of a research thesis (Romani, 2020) dedicated to the annotation of metonymies in T-PAS, a corpus-based digital repository of Italian verbal patterns (Ježek et al., 2014). The annotation was performed on the corpus instances of a selected list of 30 verbs and was aimed at both implementing the resource with metonymic patterns and identifying and creating a map of the metonymic relations that occur in the verbal patterns. The annotated corpus data (consisting of 1218 corpus instances), the patterns, and the relations can be useful for NLP tasks such as metonymy recognition.


Introduction
Metonymy is a language phenomenon for which one referent is used to denote another referent associated with it (Lakoff & Johnson, 1980;Fauconnier, 1985;Ježek, 2016). For example, in the sentence 'he drank a glass at the pub', glass (the metonymic or source type denoting a container) refers to its content (the target type, a beverage). In our research, we investigated metonymy from a corpusbased perspective, through the analysis of corpus data and the annotation performed in T-PAS, a corpus-based resource for Italian verbs. T-PAS consists of a repository of typed predicate argument structures (Ježek et al., 2014), i.e. verbal patterns together with semantically-specified arguments, linked to manually annotated corpus instances (see Section 3.1). An example of a pattern (or t-pas) for the verb bere 'to drink' is reported in Figure 1: The annotation of metonymies was performed on the corpus instances of a list of 30 verbs contained in T-PAS (taken from . As emerged from this background study, the semantic properties of those verbs were likely to convey metonymies in their argument structures. Starting from this list, our work was intended as an implementation of the resource; specifically, we annotated metonymic corpus instances and created metonymic sub-patterns linked to them. The research had several aims. First, we were interested in studying qualitatively the phenomenon in and through the corpus instances and in implementing the annotation tool of the resource with a specific feature that allowed us to encode metonymic arguments in the verbal patterns. For the latter purpose, we collaborated with the Faculty of Informatics at Masaryk University of Brno (CZ): they gave us the technical support for the implementation of the annotation tool.
Second, our intention was to conceive a general theoretical framework to represent the metonymies found through the qualitative corpus analysis, by designing a map of metonymies and by drafting a list of the metonymic relations that occur in the verbal patterns (see Section 4).
The paper is organized as follows. In Section 2 we present related studies. In Section 3 we describe the methodology we followed in annotating the corpus instances for metonymies, together with a brief introduction to T-PAS. In Section 4 we present the results of our annotation: a generalization of the metonymic relations found, and a map which visually highlights the semantic and cognitive connections between the semantic types. Further developments of the project are described in Section 5; our intention is to enrich the number of analysed verbs and eventually add new types of metonymic relations.

Related works
Corpus-based studies on metonymy are often intended for NLP tasks. Markert & Nissim (2006), provide a corpus-based annotation scheme for metonymies with the aim of improving automatic metonymy recognition and resolution. Related to it, Markert and Nissim (2007) present the results of a supervised task on metonymy resolution; an analogous task has been addressed by Pustejovsky et al. (2010) within the scope of SemEval-2010. A recent study elaborated a computational model based on the dataset of Pustejovsky et al. (2010) for the detection of metonymies (McGregor et al., 2017).
Corpus-based studies on metonymies do not necessarily address NLP tasks. An attempt to implement corpus-based resources to display metonymies is described in Ježek & Frontini (2010). Also, Pustejovsky & Ježek (2008) present a corpus investigation aimed at identifying metonymic mechanisms in predicate-argument constructions from a theoretical perspective. Finally, Marini & Ježek (2020) performed an equivalent corpus-based metonymy annotation on a sample of 101 Croatian verbs within the scope of CROATPAS (Marini & Ježek, 2019), sister project of T-PAS.

The resource: T-PAS
T-PAS is the corpus-based resource used in this research. It consists of a repository of Typed Predicate Argument Structures (T-PAS) (Ježek et al., 2014) for Italian verbs. The resource consists of three components: 1) a repository of corpus-derived predicate argument structures for verbs with semantic specification of the arguments, linked to lexical units (verbs); 2) an inventory of about 200 corpus-derived semantic classes for nouns, relevant for the disambiguation of the verb in context; 3) a corpus 2 of sentences that instantiate T-2 The corpus is a reduced and cleaned version of It-WaC (Baroni et al., 2009), a corpus of Italian texts, available in the Sketch Engine tool (Kilgarriff et al., 2014). PAS, tagged with lexical unit (verb) and pattern number. Typed predicate argument structures (or t-pass) are patterns which display the syntactic and semantic properties of verbs: for each meaning of a verb a specific t-pas is provided. Verb sense is determined by the arguments it combines with (subject, object, etc.), which are defined through a specific Semantic Type. 3 T-pass are corpus-derived: patterns were acquired through the manual clustering and annotation of corpus instances for each verb following the CPA procedure (Hanks, 2013). Each t-pas in the resource is labelled with a number and connected to a list of corpus instances, realizing the specific verb meaning. Each pattern is associated with a sense description, a brief definition of the meaning of the verb (see the second line in Figure 1). Each pattern can have sub-patterns created by annotators, for corpus instances that do not reflect the prototypical semantic behaviour of the verb or of its arguments, as in metonymic uses. Like their patterns, sub-patterns are connected to annotated instances from the corpus. In our work, we implemented the annotation tool by adding a new label ('.m'), which we used to annotate metonymic uses in sub-patterns (see Figure 2).

Methodology
We conceived an empirical methodology in order to get significant results from the corpus analysis: we manually extracted significant instances from the corpus and annotated them as metonymic instances under their specific pattern. In order to annotate the instances, we exploited the Sketch Engine functions available for analysing the corpus. The annotation scheme can be summarized as follows: 1) Random sampling of about 200 corpus instances for each of the 30 verbs (the sample allowed to reduce the time spent in skimming the instances, still providing a balanced overview of the kind of instances contained in the corpus); 2) Manual annotation of the metonymic instances through the sublabel (signalled with ".m"); 3) Implementation of the sub-pattern in the resource by adding metonymic semantic types (see 1.m in Figure 1); 4) Definition of the metonymic relation (see Table 2) between the source and the target semantic type (e.g.
[Container] 'contains' [Beverage]), with its encoding in the sense description, translated in Italian (see Figure  2). In Table 1, we show the number of instances annotated for each of the 30 verbs. Overall, the dataset consists of 1218 annotated instances. The number of instances from the random sample can vary from a verb to another, because verbs have different frequencies in the corpus and metonymic phenomena can be more or less pervasive according to the verb under examination. Some cases (e.g. divorare -'to devour'in Table 1) did not provide any metonymic instance at all (for an explanation and further discussion on this point, see Romani, 2020). 4 The annotation procedure was conducted manually by one single annotator (the first author) and, so far, it was not possible to evaluate our annotation procedure as we focused on the qualitative analysis and the retrieval of the relations: it is our intent for the future, as it is essential for further progresses in the research.
Currently, the adopted annotation scheme did not provide ambiguous cases, as metonymies were usually clear-cut and the shift of referent from the source to the target semantic type easily identifia-ble. This may differ from metaphors, for example, where the shift between literal and non-literal meaning may be perceived as more gradual. However, further investigation needs to be done through the annotation of a higher number of instances (expanding the list of verbs) and the comparison and the evaluation of the annotation results of more than one annotator.

Results
The overall aim of the research was to give a theoretical account of the metonymic relations found through the corpus analysis and pattern annotation. Therefore, the main results of the study are a list of metonymic relations between the target and the metonymic (source) semantic type (Table 2, Appendix) and a map where the target semantic types are connected to the metonymic types, and the relation between the two is expressed (Figure 3).
The second column in from ItWaC reduced corpus, for each relation found. For each instance, the metonymic argument (exemplifying the source-metonymic semantic type) is highlighted in bold, and the verb triggering the metonymy is in italics. As a second step, we attempted to draw a map of the metonymic relations, by connecting the target semantic types to their metonymic arguments. In Figure 3, each target semantic type is at the centre of a circular area (target type area), highlighted in bold; in each area the metonymic types related to the target semantic type are included; for each target semantic type, a different colour is given.  Table 2 in the Appendix). As mentioned, we included metonymic semantic types in the areas of the map. In our representation, metonymic and target semantic types are connected to each other through arrows, on which the relation is notated. The direction of the arrow traces the direction of the metonymic shift: from the metonymic semantic type to the target semantic type (e.g., [Container]  The structure of the map we conceived draws attention to two main aspects. First, it depicts the complexity of the metonymic relations between semantic types and highlights how metonymy is not a unidirectional phenomenon but, conversely, it is fluid and changeable. Second, from a cognitive point of view, [Human] is at the centre of most of the relations and each target type area is connected to it by multiple relations. In particular, in our data, [Human] is deeply connected and involved within the [Sound] area (for more details, see Romani, 2020).
Finally, for what concerns the limited sample of verbs under investigation, it is interesting to notice that even if there are various source types, the potential target semantic types are only six. We may argue that there is a limited number of target types that attract different source types, in particular regarding [Human] and [Sound], which have the highest number of relations (see Table 2). Further investigation on this point is necessary, together with the extension of the number of examined verbs and instances. In this paper, we approached the study of metonymy from a corpus-based perspective. The research was conducted on a selected list of verbs, taken from a background study . Our aim was to search for metonymic phenomena inside a corpus of Italian language and to register them in a resource for Italian verbs, T-PAS. To do so, we conceived an annotation scheme and procedure that gave us relevant results and allowed us to register a variety of metonymic relations.
We also attempted to make some theoretical generalizations based on the metonymic relations we found through the corpus analysis. We therefore created a list of metonymic relations and we designed a map in which the relations are connected to the semantic types they involve. Both the map and the list depict the complexity and variety of the phenomenon, in terms of number of possible metonymic relations and of the semantic types interested.
In future perspectives, we intend to enrich the map and the list with new relations by extending the number of verbs investigated and to evaluate the annotation procedure. For future annotations, we will provide the current version of the list and of the map on the online public version of T-PAS (upcoming). We are also interested in comparing our results with those in Marini & Ježek (2020), in a cross-linguistic perspective.
In line with previous studies (Section 2), we believe that the annotated corpus data, as well as the relations in Table 2, could be useful for automatic detection of metonymies. To our knowledge, little work has been done on this for Italian language: it would be therefore intriguing to test our data in NLP tasks.