I am due to give a paper entitled ‘sign language corpus creation as digital humanities ethnography’ at the 4th international corpus linguistics conference held in Birmingham (UK) between 27 and 30 July 2007.
With this paper I aim to suggest that if the design of a corpus (in this case of sign language, see for example the ECHO sign language datasets) extends its research value well beyond linguistics as a discipline (to include e.g. education, sociology, psychology, history etc.), then the methodology for the creation of such a corpus can usefully be based on ethnographic methods, including also in particular ethnographic collaboration with language users — in this case, collaboration with deaf people themselves. At this point I am unsure to what extent this might also include virtual ethnographic approaches. Any ideas?
Secondly, it is clear that current language corpora are less dynamic (in terms of interaction between users, producers and commentators of language within the datasets) than could perhaps be envisaged. For example, wikis might be considered corpora of lexical meanings that derive from broad participation in the construction of lemmata and the semantic description of lexis, although they lack data on word frequency and distribution. The second question I am pondering for my paper is therefore, what role might virtual ethnographers play in working towards ‘next-generation’ language corpora — more dynamic datasets based on broad participation? Again, your comments are welcome.