Abstract Jezek & Lenci

Investigating sense modulation and coercion phenomena occurring at the V-N boundary through corpus data.

Generative Lexicon Theory (GL) is one of the most prominent models for the semantic lexicon trying to cope with sense variation in context through a rich array of representational devices (e.g. qualia structure, etc.) and of compositional operations between lexemes. On the one hand GL argues for a highly structured organization of lexicalized concepts, forming a multidimensional concept space. On the other hand, a series of generative mechanisms are at play in the syntagmatic construction of meaning, enriching, modulating and variously modifying in context the semantics of lexical items. Actually, one of the major claims of GL approach is that at least a relevant subset of sense modulation phenomena can be explained in terms of lexically-constrained semantic processes operating at the predicate-argument boundary. In Pustejovsky (2006) a taxonomy of these operations is proposed, including mechanisms of exploitation of part of the semantic content of lexical items and true operations leading to the introduction of new conceptual dimensions in context. Compositional operations act on an extended notion of semantic type associated to lexical items, which is designed as to capture the multidimensional character of lexical content. The result is a complex interplay between the semantic potentiality inherently encoded in lexical items in terms of their associated type, and their enrichment in context. This approach raises relevant issues concerning the shape and dynamics of the lexical system: how can we characterize and represent the repertoire of semantic types and their structure? How can we better define and characterize the battery of compositional operations? Which constraints do they obey to? How do they interact with the internal conceptual constitution of lexical entries?

The goal of this paper is to investigate how corpus data can contribute to shed new lights on these issues. While the use of large scale corpora is nowadays a common practice in computational linguistics, its use to validate semantic lexical theories is still not widespread. In this work we will present some preliminary results of a data-driven survey of the generative operations at play in V-N combinations in Italian. Data have been collected for a number of pilot verbs that we have selected as probes for the identification of the semantic phenomena occurring at the V-N boundary. In this preliminary phase, our analysis has been limited to exploring the data extracted from a small 3 million section of the PAROLE Italian corpus of written text, and from CORAL-ROM, a linguistically annotated corpus of spoken texts. In the next stages we are planning to automatically extract verb-noun contexts from much larger corpora, to gain a wider perspective on the shape and effects of different semantic syntagmatic processes.

Collected examples have been analyzed in terms of the following grid of semantic generative devices proposed in Pustejovsky (2006): 1. pure type selection, 2. type accommodation, 3. type exploitation, 4. type introduction.

The results of the empirical analysis brings to afore a highly complex picture of the semantic dynamics occurring at the predicate-argument boundary in actual contexts of use. This may call for a more fine-grained taxonomy of generative devices as well as of the type system articulation. In general the analysis raises various issues on the architecture of the lexicon and on its interaction with the conceptual system, showing the crucial role of corpus data as a testing ground for models of the lexicon. Specifically, one of the main issues concerns the interplay between the paradigmatic organization of lexical types and the syntagmatic processes modulating their composition in context.



