Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

2024, Volume 25, Issue 1

Abstract

Keywords

Related Research

Frontiers of Information Technology & Electronic Engineering >> 2024, Volume 25, Issue 1 doi: 10.1631/FITEE.2300296

Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China; 2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650504, China;

Received: 2023-04-27 Accepted: 2024-02-19 Available online: 2024-02-19

HTML0 PDF 0 Collect 0

Next Previous

Abstract

(CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a approach to address low-resource CLS based on . We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the and between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.

Keywords

Cross-lingual summarization ; Low-resource language ; Noisy data ; Fine-grained reinforcement learning ; Word correlation ; Word missing degree

Related Research