How do control tokens affect natural language generation tasks like text simplification

Li, Zhao ORCID: https://orcid.org/0009-0009-1071-5708 and Shardlow, Matthew ORCID: https://orcid.org/0000-0003-1129-2750 (2024) How do control tokens affect natural language generation tasks like text simplification. Natural Language Engineering. pp. 1-28. ISSN 1351-3249

Preview

Published Version
Available under License Creative Commons Attribution.
Download (826kB) | Preview

Official URL: http://dx.doi.org/10.1017/s1351324923000566

Abstract

Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	24 Jun 2024 14:41
Publisher:	Cambridge University Press (CUP)
Additional Information:	This is an open access article which first appeared in Natural Language Engineering
Divisions:	Faculties > Science and Engineering
Subject terms:	0801 Artificial Intelligence and Image Processing, 1702 Cognitive Sciences, 2004 Linguistics, Artificial Intelligence & Image Processing
URI:	https://mmu-uat.leaf.cosector.com/id/eprint/634945
DOI:	https://doi.org/10.1017/S1351324923000566
ISSN	1351-3249
e-ISSN	1469-8110

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

944Downloads

6 month trend

25Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record