


The scheme is the object of Section 4, where each label applied in the annotation is associated to several examples extracted from the different sections of the corpus. The next section surveys the main contribution in this area, while Section 3 describes the collection of the novel dataset for Italian and its organization in different sub-corpora which represent different kinds of Twitter texts. Our analysis of the disagreement on the developed corpus aims at shedding some light on the difficulties related to the task.ĤThe paper is organized as follows. Such finer-grained annotation results to be more challenging, also due to some peculiarities of the micro-blog textual genre. The investigation that we propose here is at a deeper level and concerns the linguistic devices known in pragmatics as signals of irony and their relevance for modeling irony in social media in a computational perspective. Nevertheless, even if they are biased, humans can easily detect the presence of irony when it occurs also in quite early stages of their life. Human annotators, even skilled or domain experts, are always connected to their individual experiences, their individual sense of humor and a certain situational context. It has been successfully applied on French and English corpora, but also on a small dataset for Italian, which can be seen as the preliminary stage of the work presented in (Cignarella, Bosco, and Patti 2017).ģThe analysis of the disagreement among the native Italian speakers involved in the annotation of the Italian resource confirms that the fine-grained annotation of irony is an especially challenging task, due to aspects related to the subjectivity of the involved annotators. Considering the complexity of the phenomenon of irony, often described in literature (Grice 1975, 1978 Sperber and Wilson 1981 Wilson and Sperber 2007), this scheme includes different layers and allows a fine-grained description of the addressed phenomenon. Nevertheless, in order to lay the foundation for future comparisons with other languages and resources, we based the annotation on a scheme designed within the context of a multilingual project and described in (Karoui et al. twittirò, to be used as benchmark for systems addressing the irony detection task within evaluation campaigns of NLP tools for Italian (see the shared task IronITA proposed at EVALITA 2018 (Cignarella, Frenda, et al. Considering that the more promising tools for irony detection apply supervised approaches, it is becoming progressively crucial the development of linguistic resources where they can be trained and tested.ĢThe main aim of the work here described is the creation of a currently missing resource, that is an Italian corpus annotated for irony, i.e.

Torna suġThe recognition of irony and the identification of pragmatic and linguistic devices that activate it are known as very challenging tasks to be performed by both humans or automatic tools (Mihalcea and Pulman 2007 Reyes, Rosso, and Buscaldi 2010, 2012 Kouloumpis, Wilson, and Moore 2011 Maynard and Funk 2011).

The result is a novel gold standard corpus for irony detection in Italian, which enriches the scenario of multilingual datasets available for this challenging task and is ready to be used as a benchmark in automatic irony detection experiments and evaluation campaigns. In particular, an in-depth discussion of the inter-annotator agreement and of the sources of disagreement is included. This is supported by a discussion of the outcome of the annotation carried on by native Italian speakers in the development of the corpus. We present, in particular, an analysis of the annotation process and distribution of the labels of each layer involved in the scheme. In applying the annotation on this corpus, we outline and discuss the issues and peculiarities emerged about the exploitation of the semantic scheme for Twitter textual messages in Italian, thus shedding some lights on the future directions that can be followed in the multilingual and cross-language perspective too. The project mainly consists in the application on the Twitter corpus TWITTIRÒ of a multi-layered scheme for the fine-grained annotation of irony, as proposed in a multilingual setting and previously applied also on French and English datasets (Karoui et al. Provided the difficulties that still affect a correct identification of irony within the context of Sentiment Analysis tasks, in this paper we describe the main issues emerged during the development of a novel resource for Italian annotated for irony.
