Mitton, Roger and Hardcastle, David and Pedler, Jennifer (2007) BNC! Handle with care! Spelling and tagging errors in the BNC. In: Fourth Corpus Linguistics Conference, 27-30 July 2007, Birmingham, U.K..
Text
corpling-jul07.doc Restricted to Repository staff only Download (75kB) |
||
|
Text
591.pdf Download (105kB) | Preview |
Abstract
"You loose your no-claims bonus," instead of "You lose your no-claims bonus," is an example of a real-word spelling error. One way to enable a spellchecker to detect such errors is to prime it with information about likely features of the context for "loose" (verb) as compared with "lose". To this end, we extracted all the examples of "loose" used as a verb from the BNC (World edition, text). There were, apparently, 159 occurrences of "loose" (VVB or VVI). However, on inspection, well over half of these were not verbs at all (tagging errors) and over half of the rest were misspellings of "lose". Only about 15% were actual occurrences of "loose" as a verb. This prompted us to undertake a small investigation into errors in the BNC. We report on some words that occur more often as misspellings than in their own right - only one of the 63 occurrences of "ail", for example, is correct (possibly OCR errors) - and some words that are always mistagged, such as "haulier" and "glazier" (never NN), and "hanker" and "loiter" (never VV). We note in particular that, if a rare word resembles a common word (in spelling), it is more likely to appear as a misspelling of the common word than as a correct spelling of the rare word. These cases require some modification of an earlier conclusion (Damerau and Mays, 1989) on misspellings of rare words. We conclude with a discussion of the desirability, or otherwise, of correcting errors in corpora such as the BNC. The results may be of interest to people who use the BNC as training data or for teaching.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Keyword(s) / Subject(s): | BNC, British National Corpus, spelling errors, misspellings, tagging errors, spellcheckers |
School: | Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences |
Depositing User: | John Mitton |
Date Deposited: | 11 Oct 2007 |
Last Modified: | 09 Aug 2023 12:29 |
URI: | https://eprints.bbk.ac.uk/id/eprint/591 |
Statistics
Additional statistics are available via IRStats2.