Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2022, Volume 18, Issue 11 doi: 10.1016/j.eng.2021.03.023

Progress in Machine Translation

a Baidu Inc., Beijing 100193, China
b Baidu Research, Sunnyvale, CA 94089, USA

Received: 2020-11-15 Revised: 2021-01-30 Accepted: 2021-03-29 Available online: 2021-07-14

Next Previous

Abstract

After more than 70 years of evolution, great achievements have been made in machine translation. Especially in recent years, translation quality has been greatly improved with the emergence of neural machine translation (NMT). In this article, we first review the history of machine translation from rule-based machine translation to example-based machine translation and statistical machine translation. We then introduce NMT in more detail, including the basic framework and the current dominant framework, Transformer, as well as multilingual translation models to deal with the data sparseness problem. In addition, we introduce cutting-edge simultaneous translation methods that achieve a balance between translation quality and latency. We then describe various products and applications of machine translation. At the end of this article, we briefly discuss challenges and future research directions in this field.

Figures

Fig. 1

Fig. 2

Fig. 3

References

[ 1 ] Weaver W. Translation. Mach Transl Lang 1955;14:15–23. link1

[ 2 ] Hutchins J. ALPAC: the (in) famous report. In: Nirenburg S, Somers HL, Wilks YA, editors. Readings in machine translation. Cambridge: MIT Press; 2003. link1

[ 3 ] Nagao M. A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R, editors. In: Proceedings of the International NATO Symposium on Artificial and Human Intelligence. New York City: Elsevier North-Holland, Inc; 1984. p. 173–80. link1

[ 4 ] Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, et al. A statistical approach to machine translation. Comput Linguist 1990;16(2): 79–85. link1

[ 5 ] Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL. The mathematics of statistical machine translation: parameter estimation. Comput Linguist 1993;19(2):263–311. link1

[ 6 ] Church KW, Mercer RL. Introduction to the special issue on computational linguistics using large corpora. Comput Linguist 1993;19(1):1–24. link1

[ 7 ] Al-Onaizan Y, Curin J, Jahr M, Knight K, Lafferty J, Melamed D, et al. Statistical machine translation: final report. Baltimore: Johns Hopkins University Summer Workshop; 1999. link1

[ 8 ] Och FJ, Ney H. A systematic comparison of various statistical alignment models. Comput Linguist 2003;29(1):19–51. link1

[ 9 ] Koehn P, Och FJ, Marcu D. Statistical phrase-based translation. In: Proceedings of the Human Language Technology Conference of the North AmericanChapter of the Association for Computational Linguistics; 2003 May 27–Jun 1; Edmonton, AB, Canada; 2003. link1

[10] Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, et al. Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; 2007 Jun 25–27; Prague, Czech Republic; 2007. link1

[11] Wang H. [Multi-strategy machine translation]. In: Cao YQ, Sun MS, editors. [Frontiers of Chinese information processing]. Beijing: Tsinghua University Press; 2006. p. 45–52. Chinese. link1

[12] Koehn P, Hoang H. Factored translation models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP–CoNLL); 2007 Jun 25–27; Prague, Czech Republic; 2007. link1

[13] Chiang D. Hierarchical phrase-based translation. Comput Linguist 2007;33(2): 201–28. link1

[14] Yamada K, Knight K. A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics; 2001 Jul 6–11; Toulouse, France; 2001. link1

[15] Galley M, Graehl J, Knight K, Marcu D, DeNeefe S, Wang W, et al. Scalable inference and training of context-rich syntactic translation models. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics; 2006 Jul 17–21; Sydney, NSW, Australia; 2006. link1

[16] Liu Y, Liu Q, Lin S. Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics; 2006 Jul 17–21; Sydney, NSW, Australia; 2006. link1

[17] Graehl J, Knight K, May J. Training tree transducers. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics; 2004 May 2–7; Boston, MA, USA; 2004. link1

[18] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations; 2015 May 7–9; San Diego, USA; 2015. link1

[19] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems; 2014 Dec 8–13; Montreal, QC, Canada; 2014. link1

[20] Dong D, Wu H, He W, Yu D, Wang H. Multi-task learning for multiple language translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing; 2015 Jul 26–31; Beijing, China; 2015. link1

[21] Pouliquen B. WIPO Translate: patent neural machine translation publicly available in 10 languages [presentation]. In: Machine Translation XVI; 2017 Sep 18–22; Nagoya, Japan; 2017. link1

[22] Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, et al. Google’s neural machine translation system: bridging the gap between human and machine translation. 2016. arXiv: 1609.08144.

[23] Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN. Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning; 2017 Aug 6–11; Sydney, NSW, Australia; 2017. link1

[24] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4–9; Long Beach, CA, USA; 2017. link1

[25] Gu J, Bradbury J, Xiong C, Li VOK, Socher R. Non-autoregressive neural machine translation. In: Proceedings of the International Conference on Learning Representations; 2018 Apr 30–May 3; Vancouver, BC, Canada; 2018. link1

[26] Wei B, Wang M, Zhou H, Lin J, Xie J, Sun X. Imitation learning for nonautoregressive neural machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul 28–Aug 2; Florence, Italy; 2019. link1

[27] Lample G, Conneau A, Denoyer L, Ranzato M. Unsupervised machine translation using monolingual corpora only. In: Proceedings of the International Conference on Learning Representations; 2018 Apr 30–May 3; Vancouver, BC, Canada; 2018. link1

[28] Artetxe M, Labaka G, Agirre E. An effective approach to unsupervised machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul 28–Aug 2; Florence, Italy; 2019. link1

[29] Song K, Tan X, Qin T, Lu J, Liu TY. Mass: masked sequence to sequence pretraining for language generation. In: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9–15; Long Beach, CA, USA; 2019. link1

[30] Kato Y. The future of voice-processing technology in the world of computers and communications. Pro Natl Acad Sci USA 1995;92(22):10060–3. link1

[31] Tomita M, Tomabechi H, Saito H. SpeechTrans: an experimental real-time speech-to-speech translation. Lang Res 1990;26(4):663–72. link1

[32] Kitano H. Speech-to-speech translation: a massively parallel memory-based approach. Boston: Kluwer Academic Publishers; 1994. link1

[33] Waibel A, Jain AN, McNair AE, Saito H, Hauptmann AG, Tebelskis J. JANUS: a speech-to-speech translation using connectionist and symbolic processing strategies. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing; 1991 Apr 14–17; Toronto, ON, Canada; 1991. link1

[34] Morimoto T, Takezawa T, Yato F, Sagayama S, Tashiro T, Nagata M, et al. ATR’s speech translation system: ASURA. In: Proceedings of the 3rd European Conference on Speech Communication and Technology; 1993 Sep 22–25; Berlin, Germany; 1993. link1

[35] Roe DB, Pereira FCN, Sproat RW, Riley MD, Moreno PJ, Macarron A. Efficient grammar processing for a spoken language translation system. In: Proceedings of the 1992 IEEE International Conference on Acoustics, Speech and Signal Processing; 1992 Mar 23–26; San Francisco, CA, USA; 1992. link1

[36] Sumita E, Shimizu T, Nakamura S. NICT-ATR speech-to-speech translation system. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; 2007 Jun 25–27; Prague, Czech Republic; 2007. link1

[37] Fügen C, Kolss M, Paulik M, Stüker S, Schultz T, Waibel A. Open domain speech translation: from seminars and speeches to lectures. In: Proceedings of TC-STAR Workshop on Speech-to-Speech Translation; 2006 Jun 19–21; Barcelona, Spain; 2006. link1

[38] Moser-Mercer B, Künzli A, Korac M. Prolonged turns in interpreting: effects on quality, physiological and psychological stress (pilot study). Interpreting 1998;3(1):47–64. link1

[39] Wang H, Wu H, Hu X, Liu Z, Li J, Ren D, et al. The TCH machine translation system for IWSLT 2008. In: Proceedings of International Workshop on Spoken Language Translation; 2008 Oct 20–21; Honolulu, HI, USA; 2008. link1

[40] Nakamura S, Markov K, Nakaiwa H, Kikui G, Kawai H, Jitsuhiro T, et al. The ATR multilingual speech-to-speech translation system. IEEE Trans Audio Speech Lang Process 2006;14(2):365–76. link1

[41] He H, Boyd-Graber J, Daume H III. Interpretese vs. translationese: the uniqueness of human strategies in simultaneous interpretation. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2016 Jun 12–17; San Diego, CA, USA; 2016.

[42] Wang H, Gao W, Li S. Utterance segmentation of spoken Chinese. Chin J Comput 1999;22(10):1009–13. Chinese. link1

[43] Ma M, Huang L, Xiong H, Zheng R, Liu K, Zhang B, et al. STACL: simultaneous translation with implicit anticipation and controllable latency using prefixto-prefix framework. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul 28–Aug 2; Florence, Italy; 2019. link1

[44] Zhang JJ, Zong CQ. Neural machine translation: challenges, progress and future. Sci China Technol Sci 2020;63(10):2028–50. link1

[45] Edunov S, Ott M, Auli M, Grangier D. Understanding back-translation at scale. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018 Oct 31–Nov 4; Brussels, Belgium; 2018. link1

[46] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735–80. link1

[47] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 1994;5(2):157–66. link1

[48] He W, He Z, Wu H, Wang H. Improved neural machine translation with SMT features. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence; 2016 Feb 12–17; Phoenix, AZ, USA; 2016. link1

[49] Tu Z, Lu Z, Liu Y, Liu X, Li H. Modeling coverage for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics; 2016 Aug 7–12; Berlin, Germany; 2016. link1

[50] Cheng Y, Shen S, He Z, He W, Wu H, Sun M, et al. Agreement-based joint training for bidirectional attention-based neural machine translation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence; 2016 Jul 9–15; New York City, NY, USA; 2016. link1

[51] Sennrich R, Haddow B. Linguistic input features improve neural machine translation. In: Proceedings of the First Conference on Machine Translation; 2016 Aug 11–12; Berlin, Germany; 2016. link1

[52] Wu S, Zhou M, Zhang D. Improved neural machine translation with source syntax. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence; 2017 Aug 19–25; Melbourne, VIC, Australia; 2017. link1

[53] Li J, Xiong D, Tu Z, Zhu M, Zhang M, Zhou G. Modeling source syntax for neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics; 2017 Jul 30–Aug 4; Vancouver, CB, Canada; 2017. link1

[54] Feng Y, Zhang S, Zhang A, Wang D, Abel A. Memory-augmented neural machine translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing; 2017 Sep 7–11; Copenhagen, Denmark; 2017. link1

[55] Zhao Y, Wang Y, Zhang J, Zong C. Phrase table as recommendation memory for neural machine translation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence; 2018 Jul 13–19; Stockholm, Sweden; 2018. link1

[56] Wang X, Lu Z, Tu Z, Li H, Xiong D, Zhang M. Neural machine translation advised by statistical machine translation. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence; 2017 Feb 4–9; San Francisco, CA, USA; 2017. link1

[57] Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In: Proceedings of Annual Meeting of the Association for Computational Linguistics; 2016 Aug 7–12; Berlin, Germany; 2016. link1

[58] Gage P. A new algorithm for data compression. C Users J 1994;12(2):23–38. link1

[59] Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 Jun 2–7; Minnesota, MN, USA; 2019. link1

[60] Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, et al. ERNIE 2.0: a continual pretraining framework for language understanding. In: Proceedings of the ThirtyFourth AAAI Conference on Artificial Intelligence; 2020 Feb 7–12; New York City, NY, USA; 2020. link1

[61] Zhou C, Neubig G, Gu J. Understanding knowledge distillation in nonautoregressive machine translation. 2019. arXiv:1911.02727.

[62] Guo J, Tan X, Xu L, Qin T, Chen E, Liu TY. Fine-tuning by curriculum learning for non-autoregressive neural machine translation. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence; 2020 Feb 7–12; New York City, NY, USA; 2020. link1

[63] Koehn P, Knowles R. Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation; 2017 Aug 4; Vancouver, CB, Canada; 2017. link1

[64] Sennrich R, Haddow B, Birch A. Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics; 2016 Aug 7–12; Berlin, Germany; 2016. link1

[65] Poncelas A, Shterionov D, Way A, Wenniger GMDB, Passban P. Investigating backtranslation in neural machine translation. 2018. arXiv:1804.06189.

[66] Lample G, Conneau A, Denoyer L, Ranzato M. Unsupervised machine translation using monolingual corpora only. In: Proceedings of the International Conference on Learning Representations; 2018 Apr 30–May 3; Vancouver, BC, Canada; 2018. link1

[67] Artetxe M, Labaka G, Agirre E. An effective approach to unsupervised machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul 28–Aug 2; Florence, Italy; 2019. link1

[68] Conneau A, Lample G. Cross-lingual language model pretraining. In: Proceedings of the 33rd Conference on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, BC, Canada; 2019. link1

[69] Ren S, Wu Y, Liu S, Zhou M, Ma S. Explicit cross-lingual pre-training for unsupervised machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing; 2019 Nov 3–7; Hong Kong, China; 2019. link1

[70] Wang H, Wu H, Liu Z. Word alignment for languages with scarce resources using bilingual corpora of other language pairs. In: Proceedings of the COLING/ACL2006 Main Conference Poster Sessions; 2006 Jul 17–21; Sydney, NSW, Australia; 2006. link1

[71] Utiyama M, Isahara H. A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics; 2007 Apr 22–27; Rochester, NY, USA; 2007. link1

[72] Khalilov M, Costa-Jussà MR, Henríquez CA, Fonollosa JAR, Hernández A, Mariño JB, et al. The TALP & I2R SMT systems for IWSLT 2008. In: Proceedings of the International Workshop on Spoken Language Translation; 2008 Oct 20–21; Honolulu, HI, USA; 2008. link1

[73] Wu H, Wang H. Pivot language approach for phrase-based statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; 2007 Jun 25–27; Prague, Czech Republic; 2007. link1

[74] Wu H, Wang H. Revisiting pivot language approach for machine translation. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language; 2009 Aug 2–7; Singapore; 2009. link1

[75] Cohn T, Lapata M. Machine translation by triangulation: making effective use of multi-parallel corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; 2007 Jun 25–27; Prague, Czech Republic; 2007. link1

[76] Luong MT, Le QV, Sutskever I, Vinyals O, Kaiser L. Multi-task sequence to sequence learning. In: Proceedings of the International Conference on Learning Representations; 2016 May 2–4; San Juan, Puerto Rico; 2016. link1

[77] Zoph B, Knight K. Multi-source neural translation. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2016 Jun 12–17; San Diego, CA, USA; 2016. link1

[78] Firat O, Cho K, Bengio Y. Multi-way, multilingual neural machine translation with a shared attention mechanism. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2016 Jun 12–17; San Diego, CA, USA; 2016. link1

[79] Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 2017;5:339–51. link1

[80] Kudugunta S, Bapna A, Caswell I, Firat O. Investigating multilingual NMT representations at scale. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing; 2019 Nov 3–7; Hong Kong, China; 2019. link1

[81] Tan X, Chen J, He D, Xia Y, Qin T, Liu TY. Multilingual neural machine translation with language clustering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing; 2019 Nov 3–7; Hong Kong, China; 2019. link1

[82] Arivazhagan N, Bapna A, Firat O, Lepikhin D, Johnson M, Krikun M, et al. Massively multilingual neural machine translation in the wild: findings and challenges. 2019. arXiv:1907.05019.

[83] Fan A, Bhosale S, Schwenk H, Ma Z, El-Kishky A, Goyal S, et al. Beyond Englishcentric multilingual machine translation. 2020. arXiv:2010.11125.

[84] Dalvi F, Durrani N, Sajjad H, Vogel S. Incremental decoding and training methods for simultaneous translation in neural machine translation. 2018. arXiv:1806.03661.

[85] Sridhar VKR, Chen J, Bangalore S, Ljolje A, Chengalvarayan R. Segmentation strategies for streaming speech translation. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2013 Jun 9–14; Atlanta,GA; USA; 2013. link1

[86] Oda Y, Neubig G, Sakti S, Toda T, Nakamura S. Optimizing segmentation strategies for simultaneous speech translation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics; 2014 Jun 23–25; Baltimore, MD, USA; 2014. link1

[87] Cho K, Esipova M. Can neural machine translation do simultaneous translation? 2016. arXiv:1606.02012.

[88] Gu J, Neubig G, Cho K, Li VOK. Learning to translate in real-time with neural machine translation. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics; 2017 Apr 3–7; Valencia, Spain; 2017. link1

[89] Fujita T, Neubig G, Sakti S, Toda T, Nakamura S. Simple, lexicalized choice of translation timing for simultaneous speech translation. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association; 2013 Aug 25–29; Lyon, France; 2013. link1

[90] Arivazhagan N, Cherry C, Macherey W, Chiu CC, Yavuz S, Pang R, et al. Monotonic infinite lookback attention for simultaneous machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019 Jul 28–Aug 2; Florence, Italy; 2019. link1

[91] Ma X, Pino J, Cross J, Puzon L, Gu J. Monotonic multihead attention. In: Proceedings of the International Conference on Learning Representations; 2020 Apr 26–May 1; Addis Ababa, Ethiopia; 2020. link1

[92] Zhang R, Zhang C, He Z, Wu H, Wang H. Learning adaptive segmentation policy for simultaneous translation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing; 2020 Nov 16–20; online; 2020. link1

[93] Baumann T. Partial representations improve the prosody of incremental speech synthesis. In: Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association; 2014 Sep 14–18; Singapore; 2014. link1

[94] Baumann T. Decision tree usage for incremental parametric speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing; 2014 May 4–9; Florence, Italy; 2014. link1

[95] Pouget M, Hueber T, Bailly G, Baumann T. HMM training strategy for incremental speech synthesis. In: Proceedings of the 16th Annual Conference of the International Speech Communication Association; 2015 Sep 6–10; Dresden, Germany; 2015. link1

[96] Pouget M, Nahorna O, Hueber T, Bailly G. Adaptive latency for part-of-speech tagging in incremental text-to-speech synthesis. In: Proceedings of InterSpeech 2016; 2016 Sep 8–12; San Francisco, CA, USA; 2016. link1

[97] Yanagita T, Sakti S, Nakamura S. Incremental TTS for Japanese language. In: Proceedings of Interspeech; 2018 Sep 2–6; Hyderabad, India; 2018. link1

[98] Yanagita T, Sakti S, Nakamura S. Neural iTTS: toward synthesizing speech in real-time with end-to-end neural text-to-speech framework. In: Proceedings of the 10th ISCA Speech Synthesis Workshop; 2019 Sep 20–22; Vienna, Austria; 2019. link1

[99] Ma M, Zheng B, Liu K, Zheng R, Liu H, Peng K, et al. Incremental textto-speech synthesis with prefix-to-prefix framework. In: Findings of the Association for Computational Linguistics: EMNLP 2020; 2020 Nov 16–20; online; 2020. link1

[100] Shimizu H, Neubig G, Sakti S, Toda T, Nakamura S. Collection of a simultaneous translation corpus for comparative analysis. In: Proceedings of Ninth International Conference on Language Resources and Evaluation; 2014 May 26–31; Reykjavik, Iceland; 2014. link1

[101] Toyama H, Ryu K, Matsubara S, Kawaguchi, Nobuo K, Inagaki Y. CIAIR simultaneous interpretation corpus. In: Proceedings of Oriental COCOSDA; 2004 Nov 17–19; New Delhi, India; 2004.

[102] Sandrelli A, Bendazzoli C. Tagging a corpus of interpreted speeches: the European parliament interpreting corpus (EPIC). In: Proceedings of LREC; 2006 May 22–28; Genoa, Italy; 2004. link1

[103] Di Gangi MA, Cattoni R, Bentivogli L, Negri M, Turchi M. MuST-C: a multilingual speech translation corpus. In: Proceedings of 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 Jun 2–7; Minnesota, MN, USA; 2019.

[104] Xiong H, Zhang R, Zhang C, He Z, Wu H, Wang H. DuTongChuan: context– aware translation model for simultaneous interpreting. 2019. arXiv:1907. 12984.

[105] Bansal S, Kamper H, Livescu K, Lopez A, Goldwater S. Pre-training on highresource speech recognition improves low-resource speech-to-text translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics; 2018 Jun 1–6; New Orleans, LA, USA; 2018. link1

[106] Weiss RJ, Chorowski J, Jaitly N, Wu Y, Chen Z. Sequence-to-sequence models can directly translate foreign speech. In: Proceedings of the 18th Annual Conference of the International Speech Communication Association; 2017 Aug 20–24; Stockholm, Sweden; 2017. link1

[107] Anastasopoulos A, Chiang D. Leveraging translations for speech transcription in low-resource settings. In: Proceedings of the 19th Annual Conference of the International Speech Communication Association; 2018 Sep 2–6; Hyderabad, India; 2018. link1

[108] Bérard A, Pietquin O, Servan C, Besacier L, Servan C. Listen and translate: a proof of concept for end-to-end speech-to-text translation. In: Proceedings of the 30th Conference on Neural Information Processing Systems; 2016 Dec 5– 10; Barcelona, Spain; 2016. link1

[109] Liu Y, Xiong H, Zhang J, He Z, Wu H, Wang H, et al. End-to-end speech translation with knowledge distillation. In: Proceedings of the 20th Annual Conference of the International Speech Communication Association; 2019 Sep 15–19; Graz, Austria; 2019. link1

[110] Kano T, Sakti S, Nakamura S. Structured based curriculum learning for endto-end English–Japanese speech translation. In: Proceedings of the 18th Annul Conference of the International Speech Communication Association; 2017 Aug 20–24; Stockholm, Sweden; 2017. link1

[111] Anastasopoulos A, Chiang D. Tied multitask learning for neural speech translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2018 Jun 1–6; New Orleans, Louisiana; 2018. link1

[112] Sperber M, Neubig G, Niehues J, Waibel A. Attention-passing models for robust and data-efficient end-to-end speech translation. Transl Assoc Comput Linguist 2019;7:313–25. link1

[113] Liu Y, Zhang J, Xiong H, Zhou L, He Z, Wu H, et al. Synchronous speech recognition and speech-to-text translation with interactive decoding. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence; 2020 Feb 7–12; New York City, NY, USA; 2020. link1

[114] Jia Y, Weiss RJ, Biadsy F, Macherey W, Johnson M, Chen Z, et al. Direct speech-to-speech translation with a sequence-to-sequence model. In: Proceedings of the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019); 2019 Sep 15–19; Graz, Austria; 2019. link1

[115] Kano T, Sakti S, Nakamura S. Transformer-based direct speech-to-speech translation with transcoder. In: Proceedings of the IEEE Spoken Language Technology Workshop; 2021 Jan 19–22; Shenzhen, China; 2021. link1

[116] Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: a neural image caption generator. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 8–10; Boston, MA, USA; 2015. link1

[117] Lu J, Xiong C, Parikh D, Socher R. Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21–26; Honolulu, HI, USA; 2017. link1

[118] Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, et al. Bottom–up and top–down attention for image captioning and visual question answering. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–23; Salt Lake City, UT, USA; 2018. link1

[119] Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M. LayoutLM: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Minning; 2020 Aug 23–27; online; 2020. link1

[120] Wang Z, He W, Wu H, Wu H, Li W, Wang H, et al. Chinese poetry generation with planning based neural network. Proceedings of the 26th International Conference on Computational Linguistics; 2016 Dec 11–16; Osaka, Japan; 2016. link1

[121] Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002 Jul 7–12; Philadelphia, PA, USA; 2002. link1

[122] Tomás J, Mas JÀ, Casacuberta F. A quantitative method for machine translation evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable? 2003 Apr 12–17; Budapest, Hungary; 2003. link1

[123] Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization; 2005 Jun 29; Ann Arbor, MI, USA; 2005. link1

[124] Tsiartas A, Georgiou PG, Narayanan SS. Toward transfer of acoustic cues of emphasis across languages. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association; 2013 Aug 25–29; Lyon, France; 2013. link1

[125] Do QT, Sakti S, Nakamura S. Sequence-to-sequence models for emphasis speech translation. IEEE/ACM Trans Audio Speech Lang Process 2018;26(10): 1873–83. link1

[126] Do QT, Toda T, Neubig G, Sakti S, Nakamura S. Preserving word-level emphasis in speech-to-speech translation. IEEE/ACM Trans Audio Speech Lang Process 2017;25(3):544–56. link1

Related Research