Cognitive-linguistic approaches: what can we gain by computational treatment of data?
Work with empirical data is important, if not essential, to cognitive linguistics. Electronic corpora of written texts or transcriptions of speech are increasingly used and sometimes purposefully collected by linguists in their investigations of phenomena such as metaphor, metonymy, idioms, and frames. During their work, some linguists also compile - more or less private - electronic archives of phenomena studied in cognitive linguistics: searchable lists, classifications, databases. Moreover, they have to deal with these phenomena - usually in cooperation with computational linguists and computer scientists - when building general lexicon resources for the automatic treatment of language.
Problems that arise when working with corpora are connected to the way they are prepared for and processed by the corpus tools (concordancers, corpus managers). For example, in spite of some attempts in computational linguistics to detect metaphors in running texts, no corpus manager disposes of a "Show all metaphors" function. Rather, in order to search a corpus for metaphors, linguists will devise their own methods, be they theory-based or data-driven.
Other problems arise when creating project-specific as well as more general archives of language usage examples classified by cognitive linguistic criteria. Here, linguists decide which criteria they use in their classifications and which features of the archived data they annotate. These decisions are often made at a project-specific basis and therefore different classifications might be difficult to compare.
At a larger scale, this also applies to general linguistic resources developed for Human Language Technology applications. The decisions taken during linguistic resource - building may then be evaluated - by the resource developers or others -, based on large quantities of data encoded in the resources themselves. Evaluations of this kind are at the same time test-beds for theories put forth in cognitive linguistics, and their results provide valuable feedback for theory development.
In this theme session, we would like to discuss methods of exploiting electronic corpora for any cognitive linguistic research, not restricted to the phenomena mentioned above, as well as practical experiences with resource building in cognitive linguistics. We also invite contributions that evaluate the implications of data encoded in computational resources, from the viewpoint of cognitive linguistic theory.