Corpus-aided Construction Grammar: Semantic Tools in the Russian National Corpus
The paper demonstrates the research potential of the Russian National Corpus manager (http://www.ruscorpora.ru) that is focused on the Construction Grammar. The latest release of RNC is annotated for both grammatical categories and lexico-semantic classes. Semantic features are based on the taxonomic classification of Russian lexicon which includes
|main ontological classes||names of ‘spaces’, ‘texts’; verbs of ‘motion’, ‘emotion’; adjectives and adverbs of ‘speed’, ‘direction’ etc.|
|mereological classes of nouns||names of ‘parts’, ‘sets’ etc.|
|causative/non-causative verbs||names of ‘containers’, ‘horizontal surfaces’ etc.|
|positive and negative polarity items|
|derivational classes||diminutives, prefixal verbs, adjectives derived from the names of plants etc.|
The combination of morphological and semantic features makes it possible to search for various types of constructions. For instance, examples of so-called constructions of measure with parameters can be found by means of the following tags:
(1) Noun&‘physical object’ + Adj&GEN&‘size’ + Noun&GEN&‘parameter’
e.g. èkran bol’shogo razmera
screen.NOM big.GEN size.GEN
‘screen of big size’
In this paper the author addresses two aspects of the further development of the semantic project: (1) the problem of polysemy and (2) prototypical and peripheral zones in lexical classes.
At present the semantic annotation takes into account all the possible meanings of a polysemous word. As a result, when a user searches for separate words (lemmas of a certain class), the results occur to be very “noisy”, e. g. the resulting collection of the ‘body parts’ inquiry includes both klan’at’s’a v pojas ‘to bow from the waist’ and zat’anut’ pojas ‘to pull in a belt’, stojat’ na kolen’ax ‘to be on one's knees’ and koleno truby ‘pipe ell’ etc. Strikingly, the portion of inappropriate examples decreases greatly when the user search for a word combination or a construction. For instance, the construction of level (po pojas ‘up to the waist’, po koleno ‘knee-deep’ etc.) that can be mapped in the corpus by the following features:
(2) Noun|Verb + po&Preposition + Noun&ACC&‘human body part’,
doesn’t mix with other constructions at all. This demonstrates the well-known postulate of cognitive linguistic theory that the polysemy exists only in a dictionary and disappears in the discourse being suppressed by the context.
The computational approach to the lexical classes assumes that the lexeme clustering is highly context-sensitive and varies from one construction to another. Nevertheless, our empirical (and non-quantitative) exploration of the issue shows that construction restrictions are not chaotic. For example, the class of containers is associated with the locative construction (to drink from a glass) and the construction of measure (a glass of water). We can define the prototypical zone in the class of containers which includes cups and plates, bags and boxes, and peripheral zone which includes rooms, lakes and seas. While the members of the core domain are associated with both constructions, the items that form the peripheral zone can not be used in the construction of measure. Thus, the notion of ‘periphery’ and sub-clustering serves for the better detection of constructions in the corpus.