Li, Zhao ORCID: https://orcid.org/0009-0009-1071-5708 and Shardlow, Matthew ORCID: https://orcid.org/0000-0003-1129-2750 (2024) How do control tokens affect natural language generation tasks like text simplification. Natural Language Engineering. pp. 1-28. ISSN 1351-3249
|
Published Version
Available under License Creative Commons Attribution. Download (826kB) | Preview |
Abstract
Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.