Piao, S, Rayson, P, Archer, DE, Wilson, A and McEnery, T (2003) Extracting multiword expressions with a semantic tagger. In: ACL 2003, 41st Annual Meeting of the Association for Computational Linguistics, 07 July 2003 - 12 July 2003, Sapporo, Japan.
|
Download (354kB) | Preview |
Abstract
Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowl-edge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field annotator. We use an English semantic tagger (USAS) de-veloped at Lancaster University to identify multiword units which de-pict single semantic concepts. The Meter Corpus (Gaizauskas et al., 2001; Clough et al., 2002) built in Sheffield was used to evaluate our approach. In our evaluation, this ap-proach extracted a total of 4,195 MWE candidates, of which, after manual checking, 3,792 were ac-cepted as valid MWEs, producing a precision of 90.39% and an esti-mated recall of 39.38%. Of the ac-cepted MWEs, 68.22% or 2,587 are low frequency terms, occurring only once or twice in the corpus. These results show that our approach pro-vides a practical solution to MWE extraction.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.