Construction and Application of English: Chinese Interpretation Corpus Based on Big Data

Author Names:

Xiaomeng Hu, Fengru Zhang

Author Affiliation:

Public Course Teaching Department, Hainan Vocational College of Politics and Law, Haikou, China

Author Email:

zhang_fengru@outlook.com

Publication Date:

April 24, 2026

Page numbers:

2229-2239

DOI Number:

https://doi.org/10.1177/14727978251367161

Abstract:

Interpreting teaching and research need a large number of real, high-quality interpreting corpus, but the existing interpreting corpus has many shortcomings, such as small scale, single type, and uneven quality. In this paper, we utilize big data technology to build a powerful, easy-to-use and open-sharing English-Chinese interpreting corpus database to provide rich and diverse high-quality interpreting examples for the teaching and research of interpreting. We collect English-Chinese interpreting data of various types, scenarios, topics, and levels from the Internet, TV broadcasts, and other channels, clean, standardize, slice, align, and annotate the data, store the metadata information in XML format, and design and implement the structure, functions, and interfaces of the corpus. This paper mainly introduces the data method, model construction, and application effect of the corpus, including the collection, organization, annotation, storage, management, retrieval, analysis, display, and application of the corpus.

Keywords:

big data, English-Chinese interpreting, corpus, construction and application