Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Strategic Study of CAE >> 2023, Volume 25, Issue 2 doi: 10.15302/J-SSCAE-2023.02.018

Development and Prospect of Big Data Knowledge Engineering

1. Xi'an Jiaotong University, Xi'an 710049, China; 

2. School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China;

3. Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi'an 710049, China

Funding project:National Natural Science Foundation of China (62250009); China Engineering Science and Technology Knowledge Center Project (CKCEST-2022-1-40) Received: 2023-01-16 Revised: 2023-03-02 Available online: 2023-03-24

Next Previous

Abstract

Big Data Knowledge Engineering is the infrastructure of artificial intelligence, a common requirement faced by various industries and fields, and the inevitable path for the digitalization to intelligence. In this paper, we firstly elaborate on the background and connotation of big data knowledge engineering and propose a research framework of “data knowledgeization, knowledge systematization, and knowledge reasoning”. Secondly, we sort out the key technologies of knowledge acquisition and fusion, knowledge representation, and knowledge reasoning and introduce engineering applications in typical scenarios such as smart education, tax risk control, and smart healthcare. Thirdly, we summary the challenges faced by big data knowledge engineering and predict the future research directions including complex big data knowledge acquisition, knowledge+data hybrid learning, and brain-inspired knowledge coding and memorizing. Finally, several suggestions are given by the research: guiding interdisciplinary integration and establishing major and key R&D projects to promote the basic theory and technological breakthroughs of big data knowledge engineering; strengthening communication and cooperation between enterprises and research institutions as well as promoting cutting-edge research results to form application demonstrations, so as to establish an industry-standard system for big data knowledge engineering; exploring school-enterprise cooperation in line with market demands, orienting towards major application needs, and accelerating the landing application of big data knowledge engineering technology in the country's important industries.

Figures

图1

图2

图3

图4

图5

References

[ 1 ] 郑庆华 , 张玲玲 , 龚铁梁 , 等‍‍ . 大数据知识工程 [M]‍. 北京 : 科学出版社 , 2022 ‍.
Zheng Q H , Zhang L L , Gong T L , al e t ‍. Big data knowledge engineering [M]‍. Beijing : Science Press , 2003 ‍.

[ 2 ] Vaswani A, Shazeer N, Parmar N, al et‍. Attention is all you need [C]‍. Long Beach: The 31st International Conference on Neural Information Processing Systems, 2017‍.

[ 3 ] Strubell E, Ganesh A, McCallum A‍. Energy and policy considerations for deep learning in NLP [C]‍. Florence: The 57th Annual Meeting of the Association for Computational Linguistics, 2019‍.

[ 4 ] Xie Z K, He F X, Fu S P, al et‍. Artificial neural variability for deep learning: On overfitting, noise memorization, and catastrophic forgetting [J]‍. Neural Computation‍. 2021, 33(8): 2163‒2192‍.

[ 5 ] Bengio Y‍. The consciousness prior [EB/OL]‍. (2019-12-02)[2022-12-20]‍. https://arxiv‍.org/abs/1709‍.08568‍. link1

[ 6 ] L‍ Ackoff R. From data to wisdom [J]‍. Journal of Applied Systems Analysis, 1989, 16(1): 3‒9‍.

[ 7 ] Marcus G‍. The next decade in AI: Four steps towards robust artificial intelligence [EB/OL]‍. (2020-02-19)[2023-02-23]‍. https://arxiv‍.org/abs/2002‍.06177‍. link1

[ 8 ] 张钹 , 朱军 , 苏航‍ . 迈向第三代人工智能 [J]‍. 中国科学: 信息科学 , 2020 , 50 9 : 1281 ‒ 1302 ‍.
Zhang B , Zhu J , Su H‍ . Toward the third generation of artificial intelligence [J]‍. Scientia Sinica Informationis , 2020 , 50 9 : 1281 ‒ 1302 ‍.

[ 9 ] LeCun Y, Bengio Y, Hinton G‍. Deep learning [J]‍. Nature, 2015, 521(7553): 436‒444‍.

[10] Lample G, Ballesteros M, Subramanian S, al et‍. Neural architectures for named entity recognition [C]‍. San Diego: The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016‍.

[11] Ji G, Liu K, He S, al et‍. Distant supervision for relation extraction with sentence-level attention and entity descriptions [C]‍. San Francisco: The AAAI Conference on Artificial Intelligence, 2017‍.

[12] Galárraga L A, Teflioudi C, Hose K, al et‍. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases [C]‍. Rio de Janeiro: The 22nd International Conference on World Wide Web, 2013‍.

[13] W‍ Cohen W. TensorLog: A differentiable deductive database [EB/OL]‍. (2016-07-19)[2022-12-20]‍. https://arxiv‍.org/abs/1605‍.06523‍. link1

[14] 郑庆华 , 刘均 , 魏笔凡 , 等‍ . 知识森林: 理论、方法与实践 [M]‍. 北京 : 科学出版社 , 2021 ‍.
Zheng Q H , Liu J , Wei B F , al e t ‍. Knowledge forest: Theory, method, and application [M]‍. Beijing : Science Press , 2003 ‍.

[15] Wei B, Liu J, Ma J, al et‍. Motif-based hyponym relation extraction from wikipedia hyperlinks [J]‍. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2507‒2519‍.

[16] Wu B, Wei B, Liu J, al et‍. Faceted text segmentation via multitask learning [J]‍. IEEE Transactions on Neural Networks and Learning Systems‍, 2020, 32(9): 3846‒3857‍.

[17] Liang C, Wu Z, Huang W, al et‍. Measuring prerequisite relations among concepts [C]‍. Lisbon: The Conference on Empirical Methods in Natural Language Processing, 2015‍.

[18] Hinton G E, Osindero S, W‍ Teh Y. A fast learning algorithm for deep belief nets [J]‍. Neural Computation‍. 2006, 18(7): 1527‒1554‍.

[19] Zha H, Chen Z, Yan X‍. Inductive relation prediction by BERT [C]//Proceedings of the AAAI Conference on Artificial Intelligence‍. Washington DC: Association for the Advancement of Artificial Intelligence(AAAI), 2022: 5923‒5931‍.

[20] Bordes A, Usunier N, Garcia-Duran A, al et‍. Translating embeddings for modeling multi-relational data [C]// Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2013: 2787‒2795‍.

[21] Nickel M, Tresp V, P‍ Kriegel H. A three-way model for collective learning on multi-relational data [C]//Proceedings of the International Conference on International Conference on Machine Learning‍. Washington DC: Association for Computing Machinery, 2011: 809‒816‍.

[22] Teru K, Denis E, Hamilton W‍. Inductive relation prediction by subgraph reasoning [C]//Proceedings of the International Conference on Machine Learning‍. Washington DC: Association for Computing Machinery, 2020: 9448‒9457‍.

[23] Lin Q, Liu J, Zhang L, al et‍. Contrastive graph representations for logical formulas embedding [J/OL]‍. IEEE Transactions on Knowledge and Data Engineering‍. 2021 [2022-12-20]‍. https://ieeexplore‍.ieee‍.org/abstract/document/9667296‍. link1

[24] Irving G, Szegedy C, Alemi A A, al et‍. Deepmath-deep sequence models for premise selection [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2016: 2235‒2243‍.

[25] Evans R, Saxton D, Amos D, al et‍. Can neural networks understand logical entailment? [C]‍. Vancouver: The International Conference on Learning Representations, 2018‍.

[26] Xie Y, Xu Z, Kankanhalli M S, al et‍. Embedding symbolic knowledge into deep networks [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2019: 4233‒4243‍.

[27] Brown T, Mann B, Ryder N, al et‍. Language models are few-shot learners [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2020: 1877‒1901‍.

[28] Kim D, Yoo Y J, Kim J S, al et‍. Dynamic graph generation network: Generating relational knowledge from diagrams [C]‍. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018‍.

[29] Anderson P, He X, Buehler C, al et‍. Bottom-up and top-down attention for image captioning and visual question answering [C]‍. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018‍.

[30] Hochreiter S, Schmidhuber J‍. Long short-term memory [J]‍. Neural Computation‍. 1997, 9(8): 1735‒1780‍.

[31] Graves A, Wayne G, Danihelka I‍. Neural turing machines [EB/OL]‍. (2014-12-10)[2022-12-20]‍. https://arxiv‍.org/abs/1410‍.5401‍. link1

[32] Weston J, Chopra S, Bordes A‍. Memory networks [EB/OL]‍. (2015-11-29)[2022-12-20]‍. https://arxiv‍.org/abs/1410‍.3916‍. link1

[33] Sukhbaatar S, Szlam A, Weston J, al et‍. End-to-end memory networks [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2015: 2440‒2448‍.

[34] Graves A, Wayne G, Reynolds M, al et‍. Hybrid computing using a neural network with dynamic external memory [J]‍. Nature, 2016, 538(7626): 471‒476‍.

[35] 侯中妮 , 靳小龙 , 陈剑赟 , 等‍ . 知识图谱可解释推理研究综述 [J]‍. 软件学报 , 2022 , 33 12 : 4644 ‒ 4667 ‍.
Hou Z N , Jin X L , Chen J Y , al e t ‍. Survey of interpretable reasoning on knowledge graphs [J]‍. Journal of Software , 2022 , 33 12 : 4644 ‒ 4667 ‍.

[36] Huang W, Li J, P‍ Edwards P. Mesoscience: Exploring the common principle at mesoscales [J]‍. National Science Review, 2018, 5(3): 321‒326‍.

[37] 刘淇‍ . 大数据驱动的教育变革: 从线下到线上、从人工到智能 [EBOL]‍. 2022-05-28 ‍[ 2022-12-20 ]‍. https:www‍.huaweicloud‍.comcloudplusfourthphasedetail_03‍.html‍ .
Liu Q‍ . Education reform driven by big data: From offline to online, from artificial to intelligent [EBOL]‍. 2022-05-28 [ 2022-12-20 ]‍. https:www‍.huaweicloud‍.comcloudplusfourthphasedetail_03‍.html‍ . link1

[38] Buenaño‐Fernandez D, Villegas‐CH W, Luján‐Mora S‍. The use of tools of data mining to decision making in engineering education—A systematic mapping study [J]‍. Computer Applications in Engineering Education, 2019, 27(3): 744‒758‍.

[39] Wang H, Fu W‍. Personalized learning resource recommendation method based on dynamic collaborative filtering [J]‍. Mobile Networks and Applications, 2021, 26(2): 473‒487‍.

[40] 郑庆华 , 师斌 , 董博‍ . 面向智慧税务的大数据知识工程技术及其应用 [JOL]‍. 中国工程科学 , [ 2022-12-09 ]‍‍‍. https:kns.cnki.netkcmsdetail11.4421.G3.20221208.1118.002.html .
Zheng Q H , Shi B , Dong B‍ . Technologies and applications of big data knowledge engineering for smart taxation systems [JOL]‍. Stragetic Study of CAE , [ 2022-12-09 ]‍‍. https:kns.cnki.netkcmsdetail11.4421.G3.20221208.1118.002.html. . link1

[41] 余红艳 , 孙丽 , 刘亚利‍ . 减税政策: 动因追溯、制度约束与路向选择 [J]‍. 税务研究 , 2022 7 : 32 ‒ 37 ‍.
Yu H Y , Sun L , Liu Y L‍ . Tax reduction policy: Motive tracing, institutional constraint and direction choice [J]‍. Tax Research , 2022 7 : 32 ‒ 37 ‍.

[42] 奥德玛 , 杨云飞 , 穗志方 , 等‍ . 中文医学知识图谱CMeKG构建初探 [J]‍. 中文信息学报‍ . 2019 , 33 10 : 1 ‒ 7 ‍.
Byambasuren O , Yang Y F , Sui Z F , al e t ‍. Preliminary study on the construction of Chinese medical knowledge graph [J]‍. Journal of Chinese Information Processing , 2019 , 33 10 : 1 ‒ 7 ‍.

[43] Sundararajan V, Henderson T, Perry C, al et‍. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality [J]‍. Journal of Clinical Epidemiology‍, 2004, 57(12): 1288‒1294‍.

[44] Chen L, Zeng W M, Cai Y D, al et‍. Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities [J]‍. PloS One, 2012, 7(4): e35254‍.

[45] Stearns M Q, Price C, Spackman K A, al et‍. SNOMED clinical terms: Overview of the development process and project status [C]//Proceedings of American Medical Informatics Association Annual Symposium‍. Washington DC: American Medical Informatics Association, 2001: 662‒666‍.

[46] E‍ Lipscomb C. Medical subject headings (MeSH) [J]‍. Bulletin of the Medical Library Association, 2000, 88(3): 265‒270‍.

[47] Suriya S, Nivetha S‍. Design of UML diagrams for WEBMED-healthcare service system services [J]‍. EAI Endorsed Transactions on e-Learning, 2023, 8(1): e5‍.

[48] Sherwani J, Ali N, Mirza S, al et‍. Healthline: Speech-based access to health information by low-literate users [C]‍. Bangalore: International Conference on Information and Communication Technologies and Development, 2007‍.

[49] Ayers J W, Althouse B M, Allem J P, al et‍. Seasonality in seeking mental health information on google [J]‍. American Journal of Preventive Medicine, 2013, 44(5): 520‒525‍.

[50] Barnett G O, Cimino J J, Hupp J A, al et‍. DXplain: An evolving diagnostic decision-support system [J]‍. The Journal of the American Medical Association, 1987, 258(1): 67‒74‍.

[51] J‍ Chatfield A. Lexicomp online and micromedex 2‍.0 [J]‍. Journal of the Medical Library Association, 2015, 103(2): 112‒113‍.

[52] Hey T, Tansley S, Tolle K, al et‍. The fourth paradigm: Data-intensive scientific discovery [M]‍. Mountain View: Microsoft Research, 2009‍.

[53] Jumper J, Evans R, Pritzel A, al et‍. Highly accurate protein structure prediction with AlphaFold [J]‍. Nature, 2021, 596(7873): 583‒589‍.

[54] 李国杰‍ . 关于人工智能的若干认识问题 [J]‍. 中国计算机学会通讯 , 2021 , 17 7 : 44 ‒ 50 ‍.
Li G J‍ . Some issues on understanding artificial intelligence [J]‍. Communications of the CCF , 2021 , 17 7 : 44 ‒ 50 ‍.

[55] Pan Y H‍. Miniaturized five fundamental issues about visual knowledge [J]‍. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 615‒618‍.

[56] Anderson J R, Crawford J‍. Cognitive psychology and its implications [M]‍. San Francisco: WH Freeman, 1980‍.

[57] Ilievski F, Oltramari A, Ma K, al et‍. Dimensions of commonsense knowledge [J]‍. Knowledge-Based Systems, 2021, 229(11): 107347‍.

[58] Speer R, Chin J, Havasi C‍. Conceptnet 5‍.5: An open multilingual graph of general knowledge [C]‍. San Francisco: The AAAI Conference on Artificial Intelligence, 2017‍.

[59] Sap M, Le Bras R, Allaway E, al et‍. Atomic: An atlas of machine commonsense for if-then reasoning [C]‍. Hawaii: The AAAI Conference on Artificial Intelligence, 2019‍.

[60] A‍ Miller G. WordNet: An electronic lexical database [M]‍. Cambridge: Massachusetts Institute of Technology Press, 1998‍.

[61] M‍ Roget P. Roget´s Thesaurus of English words and phrases [M]‍. New York: Thomas Y‍. Crowell Company, 1911‍.

[62] Devlin J, Chang M-W, Lee K, al et‍. Bert: Pre-training of deep bidirectional transformers for language understanding [C]‍. Minneapoli: The North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019‍.

[63] Brown T, Mann B, Ryder N, al et‍. Language models are few-shot learners [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2020: 1877‒1901‍.

[64] Nieh E H, Schottdorf M, Freeman N W, al et‍. Geometry of abstract learned knowledge in the hippocampus [J]‍. Nature, 2021, 595(7865): 80‒84‍.

[65] Xie Y, Hu P Y, Li J R, al et‍. Geometry of sequence working memory in macaque prefrontal cortex [J]‍. Science, 2022, 375(6581): 632‒639‍.

[66] Pearl J, Mackenzie D‍. The book of why: The new science of cause and effect [M]‍. New York: Basic Books, 2018‍.

[67] Pearl J‍. Causality [M]‍. Cambridge: Cambridge University Press, 2009‍.

Related Research