Interdisziplinäres Zentrum für Kognitive Sprachforschung

Abstract Riepl & Schmid

The notion of entrenchment has always been firmly connected to frequency values as a means to measure how deeply entrenched single lexemes or more complex structures are in the language user’s mind (Bybee 1985, 117). Although this relation between degree of entrenchment and frequency of use seems fairly clear-cut and straightforward at first glance, there are several open questions still remaining to be answered. Measuring the frequency of certain lexemes or constructions in discourse (especially when we use large corpora) does not nec-essarily tell us something about individual entrenchment, but rather about the degree of collective conventionalization. What is more, raw frequencies cannot be taken as the sole measurement of entrenchment and/or conventionalization, as they need to be seen relative to their syntactic or communicative functions (‘cotext-free’ vs. ‘contextual entrenchment’, Schmid forthc.). In addition to this problem, the notorious relation between type and token frequency (and their relevance with regard to the schematicity of units/constructions) has not been entirely clarified so far.

Especially when it comes to producing converging evidence, attempts to tally data on discourse frequency with results of psycholinguistic experiments (Gries, Hampe and Schönefeld 2005, Wiechmann 2008, Gilquin & Gries 2009) have been scare and not entirely satisfactory in several respects, e.g. the quality of the experiments carried out or simply taken as a benchmark, and hence also the methodology used for the comparison of experimental with corpus data. Tackling these shortcomings is the main objective of one of the two methodological service projects within the Collaborative Research Centre. Mod-ern, tailor-made statistical models will be developed, which are capable of integrating mixed data, in order to come up with measures that allow for a psychologically realistic assessment of corpus data with regard to degrees of entrenchment; these mixed data sets will comprise information retrieved from corpora, but also (neuro-) psychological tests, such as sentence completion, prim-ing, cloze tests or ERP.

The overarching aim of both service projects is to provide all projects in the CRC with additional expertise in the fields of computing, statistics and experimental psychology. Project M2 assists in the structuring of object- and metalanguage data, administration of databases as well as the structured processing of data in a rule-based way by using web-based, platform-independent software modules or programming tools. It also provides methods for appropriate statistic modelling, e.g. Poisson regression models containing stochastic and non-linear effects, which have been developed intensively in statistics in recent years. These and other complex regression models are likely to be very useful in solving problems within the Collaborative Research Centre. In addition, modern procedures exist at the border area between statistics and data-mining, which are useful for performing multivariate analysis in the Collaborative Research Centre.

As the measures and tests developed in these projects will be beneficial to all projects with quantitative and experimental research methodologies, these service projects hold a key position in the larger architecture of the Collabora-tive Research Centre. With regard to linguistic theorizing, they will contribute to clarifying the relation between type frequency and token frequency, on the one hand, and entrenchment and conventionalization processes, on the other. 



