期刊首页 优先出版 当期阅读 过刊浏览 作者中心 关于期刊 English

《中国工程科学》 >> 2023年 第25卷 第2期 doi: 10.15302/J-SSCAE-2023.02.018

大数据知识工程发展现状及展望

1. 西安交通大学,西安 710049;

2. 西安交通大学计算机科学与技术学院,西安 710049;

3. 陕西省大数据知识工程重点实验室,西安 710049

资助项目 :国家自然科学基金项目(62250009);中国工程科技知识中心项目(CKCEST-2022-1-40) 收稿日期: 2023-01-16 修回日期: 2023-03-02 发布日期: 2023-03-24

下一篇 上一篇

摘要

大数据知识工程是人工智能的“基础设施”、诸多行业和领域面临的共性需求、信息化迈向智能化的必由之路。本文阐述了大数据知识工程产生的背景与概念内涵,提出了“数据知识化、知识体系化、知识可推理”的研究框架;梳理了知识获取与融合、知识表征、知识推理等大数据知识工程关键技术和智慧教育、税务风险管控、智慧医疗等典型场景中的工程应用;总结了大数据知识工程面临的挑战,研判了大数据知识工程的未来研究方向,包括复杂大数据知识获取、知识+数据混合学习、脑启发知识编码记忆等。研究建议,引导多学科交叉融合,设立重大和重点研发专项,推动大数据知识工程基础理论与技术攻关;加强企业和研究机构间交流合作,推广前沿研究成果并形成应用示范,建立大数据知识工程行业标准体系;以重大需求应用为导向,探索校企协同育人模式,加快大数据知识工程技术在重要行业的落地应用。

图片

图1

图2

图3

图4

图5

参考文献

[ 1 ] 郑庆华 , 张玲玲 , 龚铁梁 , 等‍‍ . 大数据知识工程 [M]‍. 北京 : 科学出版社 , 2022 ‍.
Zheng Q H , Zhang L L , Gong T L , al e t ‍. Big data knowledge engineering [M]‍. Beijing : Science Press , 2003 ‍.

[ 2 ] Vaswani A, Shazeer N, Parmar N, al et‍. Attention is all you need [C]‍. Long Beach: The 31st International Conference on Neural Information Processing Systems, 2017‍.

[ 3 ] Strubell E, Ganesh A, McCallum A‍. Energy and policy considerations for deep learning in NLP [C]‍. Florence: The 57th Annual Meeting of the Association for Computational Linguistics, 2019‍.

[ 4 ] Xie Z K, He F X, Fu S P, al et‍. Artificial neural variability for deep learning: On overfitting, noise memorization, and catastrophic forgetting [J]‍. Neural Computation‍. 2021, 33(8): 2163‒2192‍.

[ 5 ] Bengio Y‍. The consciousness prior [EB/OL]‍. (2019-12-02)[2022-12-20]‍. https://arxiv‍.org/abs/1709‍.08568‍. 链接1

[ 6 ] L‍ Ackoff R. From data to wisdom [J]‍. Journal of Applied Systems Analysis, 1989, 16(1): 3‒9‍.

[ 7 ] Marcus G‍. The next decade in AI: Four steps towards robust artificial intelligence [EB/OL]‍. (2020-02-19)[2023-02-23]‍. https://arxiv‍.org/abs/2002‍.06177‍. 链接1

[ 8 ] 张钹 , 朱军 , 苏航‍ . 迈向第三代人工智能 [J]‍. 中国科学: 信息科学 , 2020 , 50 9 : 1281 ‒ 1302 ‍.
Zhang B , Zhu J , Su H‍ . Toward the third generation of artificial intelligence [J]‍. Scientia Sinica Informationis , 2020 , 50 9 : 1281 ‒ 1302 ‍.

[ 9 ] LeCun Y, Bengio Y, Hinton G‍. Deep learning [J]‍. Nature, 2015, 521(7553): 436‒444‍.

[10] Lample G, Ballesteros M, Subramanian S, al et‍. Neural architectures for named entity recognition [C]‍. San Diego: The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016‍.

[11] Ji G, Liu K, He S, al et‍. Distant supervision for relation extraction with sentence-level attention and entity descriptions [C]‍. San Francisco: The AAAI Conference on Artificial Intelligence, 2017‍.

[12] Galárraga L A, Teflioudi C, Hose K, al et‍. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases [C]‍. Rio de Janeiro: The 22nd International Conference on World Wide Web, 2013‍.

[13] W‍ Cohen W. TensorLog: A differentiable deductive database [EB/OL]‍. (2016-07-19)[2022-12-20]‍. https://arxiv‍.org/abs/1605‍.06523‍. 链接1

[14] 郑庆华 , 刘均 , 魏笔凡 , 等‍ . 知识森林: 理论、方法与实践 [M]‍. 北京 : 科学出版社 , 2021 ‍.
Zheng Q H , Liu J , Wei B F , al e t ‍. Knowledge forest: Theory, method, and application [M]‍. Beijing : Science Press , 2003 ‍.

[15] Wei B, Liu J, Ma J, al et‍. Motif-based hyponym relation extraction from wikipedia hyperlinks [J]‍. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2507‒2519‍.

[16] Wu B, Wei B, Liu J, al et‍. Faceted text segmentation via multitask learning [J]‍. IEEE Transactions on Neural Networks and Learning Systems‍, 2020, 32(9): 3846‒3857‍.

[17] Liang C, Wu Z, Huang W, al et‍. Measuring prerequisite relations among concepts [C]‍. Lisbon: The Conference on Empirical Methods in Natural Language Processing, 2015‍.

[18] Hinton G E, Osindero S, W‍ Teh Y. A fast learning algorithm for deep belief nets [J]‍. Neural Computation‍. 2006, 18(7): 1527‒1554‍.

[19] Zha H, Chen Z, Yan X‍. Inductive relation prediction by BERT [C]//Proceedings of the AAAI Conference on Artificial Intelligence‍. Washington DC: Association for the Advancement of Artificial Intelligence(AAAI), 2022: 5923‒5931‍.

[20] Bordes A, Usunier N, Garcia-Duran A, al et‍. Translating embeddings for modeling multi-relational data [C]// Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2013: 2787‒2795‍.

[21] Nickel M, Tresp V, P‍ Kriegel H. A three-way model for collective learning on multi-relational data [C]//Proceedings of the International Conference on International Conference on Machine Learning‍. Washington DC: Association for Computing Machinery, 2011: 809‒816‍.

[22] Teru K, Denis E, Hamilton W‍. Inductive relation prediction by subgraph reasoning [C]//Proceedings of the International Conference on Machine Learning‍. Washington DC: Association for Computing Machinery, 2020: 9448‒9457‍.

[23] Lin Q, Liu J, Zhang L, al et‍. Contrastive graph representations for logical formulas embedding [J/OL]‍. IEEE Transactions on Knowledge and Data Engineering‍. 2021 [2022-12-20]‍. https://ieeexplore‍.ieee‍.org/abstract/document/9667296‍. 链接1

[24] Irving G, Szegedy C, Alemi A A, al et‍. Deepmath-deep sequence models for premise selection [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2016: 2235‒2243‍.

[25] Evans R, Saxton D, Amos D, al et‍. Can neural networks understand logical entailment? [C]‍. Vancouver: The International Conference on Learning Representations, 2018‍.

[26] Xie Y, Xu Z, Kankanhalli M S, al et‍. Embedding symbolic knowledge into deep networks [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2019: 4233‒4243‍.

[27] Brown T, Mann B, Ryder N, al et‍. Language models are few-shot learners [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2020: 1877‒1901‍.

[28] Kim D, Yoo Y J, Kim J S, al et‍. Dynamic graph generation network: Generating relational knowledge from diagrams [C]‍. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018‍.

[29] Anderson P, He X, Buehler C, al et‍. Bottom-up and top-down attention for image captioning and visual question answering [C]‍. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018‍.

[30] Hochreiter S, Schmidhuber J‍. Long short-term memory [J]‍. Neural Computation‍. 1997, 9(8): 1735‒1780‍.

[31] Graves A, Wayne G, Danihelka I‍. Neural turing machines [EB/OL]‍. (2014-12-10)[2022-12-20]‍. https://arxiv‍.org/abs/1410‍.5401‍. 链接1

[32] Weston J, Chopra S, Bordes A‍. Memory networks [EB/OL]‍. (2015-11-29)[2022-12-20]‍. https://arxiv‍.org/abs/1410‍.3916‍. 链接1

[33] Sukhbaatar S, Szlam A, Weston J, al et‍. End-to-end memory networks [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2015: 2440‒2448‍.

[34] Graves A, Wayne G, Reynolds M, al et‍. Hybrid computing using a neural network with dynamic external memory [J]‍. Nature, 2016, 538(7626): 471‒476‍.

[35] 侯中妮 , 靳小龙 , 陈剑赟 , 等‍ . 知识图谱可解释推理研究综述 [J]‍. 软件学报 , 2022 , 33 12 : 4644 ‒ 4667 ‍.
Hou Z N , Jin X L , Chen J Y , al e t ‍. Survey of interpretable reasoning on knowledge graphs [J]‍. Journal of Software , 2022 , 33 12 : 4644 ‒ 4667 ‍.

[36] Huang W, Li J, P‍ Edwards P. Mesoscience: Exploring the common principle at mesoscales [J]‍. National Science Review, 2018, 5(3): 321‒326‍.

[37] 刘淇‍ . 大数据驱动的教育变革: 从线下到线上、从人工到智能 [EBOL]‍. 2022-05-28 ‍[ 2022-12-20 ]‍. https:www‍.huaweicloud‍.comcloudplusfourthphasedetail_03‍.html‍ .
Liu Q‍ . Education reform driven by big data: From offline to online, from artificial to intelligent [EBOL]‍. 2022-05-28 [ 2022-12-20 ]‍. https:www‍.huaweicloud‍.comcloudplusfourthphasedetail_03‍.html‍ . 链接1

[38] Buenaño‐Fernandez D, Villegas‐CH W, Luján‐Mora S‍. The use of tools of data mining to decision making in engineering education—A systematic mapping study [J]‍. Computer Applications in Engineering Education, 2019, 27(3): 744‒758‍.

[39] Wang H, Fu W‍. Personalized learning resource recommendation method based on dynamic collaborative filtering [J]‍. Mobile Networks and Applications, 2021, 26(2): 473‒487‍.

[40] 郑庆华 , 师斌 , 董博‍ . 面向智慧税务的大数据知识工程技术及其应用 [JOL]‍. 中国工程科学 , [ 2022-12-09 ]‍‍‍. https:kns.cnki.netkcmsdetail11.4421.G3.20221208.1118.002.html .
Zheng Q H , Shi B , Dong B‍ . Technologies and applications of big data knowledge engineering for smart taxation systems [JOL]‍. Stragetic Study of CAE , [ 2022-12-09 ]‍‍. https:kns.cnki.netkcmsdetail11.4421.G3.20221208.1118.002.html. . 链接1

[41] 余红艳 , 孙丽 , 刘亚利‍ . 减税政策: 动因追溯、制度约束与路向选择 [J]‍. 税务研究 , 2022 7 : 32 ‒ 37 ‍.
Yu H Y , Sun L , Liu Y L‍ . Tax reduction policy: Motive tracing, institutional constraint and direction choice [J]‍. Tax Research , 2022 7 : 32 ‒ 37 ‍.

[42] 奥德玛 , 杨云飞 , 穗志方 , 等‍ . 中文医学知识图谱CMeKG构建初探 [J]‍. 中文信息学报‍ . 2019 , 33 10 : 1 ‒ 7 ‍.
Byambasuren O , Yang Y F , Sui Z F , al e t ‍. Preliminary study on the construction of Chinese medical knowledge graph [J]‍. Journal of Chinese Information Processing , 2019 , 33 10 : 1 ‒ 7 ‍.

[43] Sundararajan V, Henderson T, Perry C, al et‍. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality [J]‍. Journal of Clinical Epidemiology‍, 2004, 57(12): 1288‒1294‍.

[44] Chen L, Zeng W M, Cai Y D, al et‍. Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities [J]‍. PloS One, 2012, 7(4): e35254‍.

[45] Stearns M Q, Price C, Spackman K A, al et‍. SNOMED clinical terms: Overview of the development process and project status [C]//Proceedings of American Medical Informatics Association Annual Symposium‍. Washington DC: American Medical Informatics Association, 2001: 662‒666‍.

[46] E‍ Lipscomb C. Medical subject headings (MeSH) [J]‍. Bulletin of the Medical Library Association, 2000, 88(3): 265‒270‍.

[47] Suriya S, Nivetha S‍. Design of UML diagrams for WEBMED-healthcare service system services [J]‍. EAI Endorsed Transactions on e-Learning, 2023, 8(1): e5‍.

[48] Sherwani J, Ali N, Mirza S, al et‍. Healthline: Speech-based access to health information by low-literate users [C]‍. Bangalore: International Conference on Information and Communication Technologies and Development, 2007‍.

[49] Ayers J W, Althouse B M, Allem J P, al et‍. Seasonality in seeking mental health information on google [J]‍. American Journal of Preventive Medicine, 2013, 44(5): 520‒525‍.

[50] Barnett G O, Cimino J J, Hupp J A, al et‍. DXplain: An evolving diagnostic decision-support system [J]‍. The Journal of the American Medical Association, 1987, 258(1): 67‒74‍.

[51] J‍ Chatfield A. Lexicomp online and micromedex 2‍.0 [J]‍. Journal of the Medical Library Association, 2015, 103(2): 112‒113‍.

[52] Hey T, Tansley S, Tolle K, al et‍. The fourth paradigm: Data-intensive scientific discovery [M]‍. Mountain View: Microsoft Research, 2009‍.

[53] Jumper J, Evans R, Pritzel A, al et‍. Highly accurate protein structure prediction with AlphaFold [J]‍. Nature, 2021, 596(7873): 583‒589‍.

[54] 李国杰‍ . 关于人工智能的若干认识问题 [J]‍. 中国计算机学会通讯 , 2021 , 17 7 : 44 ‒ 50 ‍.
Li G J‍ . Some issues on understanding artificial intelligence [J]‍. Communications of the CCF , 2021 , 17 7 : 44 ‒ 50 ‍.

[55] Pan Y H‍. Miniaturized five fundamental issues about visual knowledge [J]‍. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 615‒618‍.

[56] Anderson J R, Crawford J‍. Cognitive psychology and its implications [M]‍. San Francisco: WH Freeman, 1980‍.

[57] Ilievski F, Oltramari A, Ma K, al et‍. Dimensions of commonsense knowledge [J]‍. Knowledge-Based Systems, 2021, 229(11): 107347‍.

[58] Speer R, Chin J, Havasi C‍. Conceptnet 5‍.5: An open multilingual graph of general knowledge [C]‍. San Francisco: The AAAI Conference on Artificial Intelligence, 2017‍.

[59] Sap M, Le Bras R, Allaway E, al et‍. Atomic: An atlas of machine commonsense for if-then reasoning [C]‍. Hawaii: The AAAI Conference on Artificial Intelligence, 2019‍.

[60] A‍ Miller G. WordNet: An electronic lexical database [M]‍. Cambridge: Massachusetts Institute of Technology Press, 1998‍.

[61] M‍ Roget P. Roget´s Thesaurus of English words and phrases [M]‍. New York: Thomas Y‍. Crowell Company, 1911‍.

[62] Devlin J, Chang M-W, Lee K, al et‍. Bert: Pre-training of deep bidirectional transformers for language understanding [C]‍. Minneapoli: The North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019‍.

[63] Brown T, Mann B, Ryder N, al et‍. Language models are few-shot learners [C]//Proceedings of the International Conference on Neural Information Processing Systems‍. Cambridge: Massachusetts Institute of Technology Press, 2020: 1877‒1901‍.

[64] Nieh E H, Schottdorf M, Freeman N W, al et‍. Geometry of abstract learned knowledge in the hippocampus [J]‍. Nature, 2021, 595(7865): 80‒84‍.

[65] Xie Y, Hu P Y, Li J R, al et‍. Geometry of sequence working memory in macaque prefrontal cortex [J]‍. Science, 2022, 375(6581): 632‒639‍.

[66] Pearl J, Mackenzie D‍. The book of why: The new science of cause and effect [M]‍. New York: Basic Books, 2018‍.

[67] Pearl J‍. Causality [M]‍. Cambridge: Cambridge University Press, 2009‍.

相关研究