《中国工程科学》 >> 2023年 第25卷 第2期 doi: 10.15302/J-SSCAE-2023.02.018
大数据知识工程发展现状及展望
1. 西安交通大学,西安 710049;
2. 西安交通大学计算机科学与技术学院,西安 710049;
3. 陕西省大数据知识工程重点实验室,西安 710049
下一篇 上一篇
摘要
大数据知识工程是人工智能的“基础设施”、诸多行业和领域面临的共性需求、信息化迈向智能化的必由之路。本文阐述了大数据知识工程产生的背景与概念内涵,提出了“数据知识化、知识体系化、知识可推理”的研究框架;梳理了知识获取与融合、知识表征、知识推理等大数据知识工程关键技术和智慧教育、税务风险管控、智慧医疗等典型场景中的工程应用;总结了大数据知识工程面临的挑战,研判了大数据知识工程的未来研究方向,包括复杂大数据知识获取、知识+数据混合学习、脑启发知识编码记忆等。研究建议,引导多学科交叉融合,设立重大和重点研发专项,推动大数据知识工程基础理论与技术攻关;加强企业和研究机构间交流合作,推广前沿研究成果并形成应用示范,建立大数据知识工程行业标准体系;以重大需求应用为导向,探索校企协同育人模式,加快大数据知识工程技术在重要行业的落地应用。
参考文献
[ 1 ]
郑庆华 , 张玲玲 , 龚铁梁 , 等 . 大数据知识工程 [M]. 北京 : 科学出版社 , 2022 .
Zheng Q H , Zhang L L , Gong T L , al e t . Big data knowledge engineering [M]. Beijing : Science Press , 2003 .
[ 2 ] Vaswani A, Shazeer N, Parmar N, al et. Attention is all you need [C]. Long Beach: The 31st International Conference on Neural Information Processing Systems, 2017.
[ 3 ] Strubell E, Ganesh A, McCallum A. Energy and policy considerations for deep learning in NLP [C]. Florence: The 57th Annual Meeting of the Association for Computational Linguistics, 2019.
[ 4 ] Xie Z K, He F X, Fu S P, al et. Artificial neural variability for deep learning: On overfitting, noise memorization, and catastrophic forgetting [J]. Neural Computation. 2021, 33(8): 2163‒2192.
[ 5 ] Bengio Y. The consciousness prior [EB/OL]. (2019-12-02)[2022-12-20]. https://arxiv.org/abs/1709.08568. 链接1
[ 6 ] L Ackoff R. From data to wisdom [J]. Journal of Applied Systems Analysis, 1989, 16(1): 3‒9.
[ 7 ] Marcus G. The next decade in AI: Four steps towards robust artificial intelligence [EB/OL]. (2020-02-19)[2023-02-23]. https://arxiv.org/abs/2002.06177. 链接1
[ 8 ]
张钹 , 朱军 , 苏航 . 迈向第三代人工智能 [J]. 中国科学: 信息科学 , 2020 , 50 9 : 1281 ‒ 1302 .
Zhang B , Zhu J , Su H . Toward the third generation of artificial intelligence [J]. Scientia Sinica Informationis , 2020 , 50 9 : 1281 ‒ 1302 .
[ 9 ] LeCun Y, Bengio Y, Hinton G. Deep learning [J]. Nature, 2015, 521(7553): 436‒444.
[10] Lample G, Ballesteros M, Subramanian S, al et. Neural architectures for named entity recognition [C]. San Diego: The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016.
[11] Ji G, Liu K, He S, al et. Distant supervision for relation extraction with sentence-level attention and entity descriptions [C]. San Francisco: The AAAI Conference on Artificial Intelligence, 2017.
[12] Galárraga L A, Teflioudi C, Hose K, al et. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases [C]. Rio de Janeiro: The 22nd International Conference on World Wide Web, 2013.
[13] W Cohen W. TensorLog: A differentiable deductive database [EB/OL]. (2016-07-19)[2022-12-20]. https://arxiv.org/abs/1605.06523. 链接1
[14]
郑庆华 , 刘均 , 魏笔凡 , 等 . 知识森林: 理论、方法与实践 [M]. 北京 : 科学出版社 , 2021 .
Zheng Q H , Liu J , Wei B F , al e t . Knowledge forest: Theory, method, and application [M]. Beijing : Science Press , 2003 .
[15] Wei B, Liu J, Ma J, al et. Motif-based hyponym relation extraction from wikipedia hyperlinks [J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2507‒2519.
[16] Wu B, Wei B, Liu J, al et. Faceted text segmentation via multitask learning [J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(9): 3846‒3857.
[17] Liang C, Wu Z, Huang W, al et. Measuring prerequisite relations among concepts [C]. Lisbon: The Conference on Empirical Methods in Natural Language Processing, 2015.
[18] Hinton G E, Osindero S, W Teh Y. A fast learning algorithm for deep belief nets [J]. Neural Computation. 2006, 18(7): 1527‒1554.
[19] Zha H, Chen Z, Yan X. Inductive relation prediction by BERT [C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington DC: Association for the Advancement of Artificial Intelligence(AAAI), 2022: 5923‒5931.
[20] Bordes A, Usunier N, Garcia-Duran A, al et. Translating embeddings for modeling multi-relational data [C]// Proceedings of the International Conference on Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 2013: 2787‒2795.
[21] Nickel M, Tresp V, P Kriegel H. A three-way model for collective learning on multi-relational data [C]//Proceedings of the International Conference on International Conference on Machine Learning. Washington DC: Association for Computing Machinery, 2011: 809‒816.
[22] Teru K, Denis E, Hamilton W. Inductive relation prediction by subgraph reasoning [C]//Proceedings of the International Conference on Machine Learning. Washington DC: Association for Computing Machinery, 2020: 9448‒9457.
[23] Lin Q, Liu J, Zhang L, al et. Contrastive graph representations for logical formulas embedding [J/OL]. IEEE Transactions on Knowledge and Data Engineering. 2021 [2022-12-20]. https://ieeexplore.ieee.org/abstract/document/9667296. 链接1
[24] Irving G, Szegedy C, Alemi A A, al et. Deepmath-deep sequence models for premise selection [C]//Proceedings of the International Conference on Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 2016: 2235‒2243.
[25] Evans R, Saxton D, Amos D, al et. Can neural networks understand logical entailment? [C]. Vancouver: The International Conference on Learning Representations, 2018.
[26] Xie Y, Xu Z, Kankanhalli M S, al et. Embedding symbolic knowledge into deep networks [C]//Proceedings of the International Conference on Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 2019: 4233‒4243.
[27] Brown T, Mann B, Ryder N, al et. Language models are few-shot learners [C]//Proceedings of the International Conference on Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 2020: 1877‒1901.
[28] Kim D, Yoo Y J, Kim J S, al et. Dynamic graph generation network: Generating relational knowledge from diagrams [C]. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[29] Anderson P, He X, Buehler C, al et. Bottom-up and top-down attention for image captioning and visual question answering [C]. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[30] Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural Computation. 1997, 9(8): 1735‒1780.
[31] Graves A, Wayne G, Danihelka I. Neural turing machines [EB/OL]. (2014-12-10)[2022-12-20]. https://arxiv.org/abs/1410.5401. 链接1
[32] Weston J, Chopra S, Bordes A. Memory networks [EB/OL]. (2015-11-29)[2022-12-20]. https://arxiv.org/abs/1410.3916. 链接1
[33] Sukhbaatar S, Szlam A, Weston J, al et. End-to-end memory networks [C]//Proceedings of the International Conference on Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 2015: 2440‒2448.
[34] Graves A, Wayne G, Reynolds M, al et. Hybrid computing using a neural network with dynamic external memory [J]. Nature, 2016, 538(7626): 471‒476.
[35]
侯中妮 , 靳小龙 , 陈剑赟 , 等 . 知识图谱可解释推理研究综述 [J]. 软件学报 , 2022 , 33 12 : 4644 ‒ 4667 .
Hou Z N , Jin X L , Chen J Y , al e t . Survey of interpretable reasoning on knowledge graphs [J]. Journal of Software , 2022 , 33 12 : 4644 ‒ 4667 .
[36] Huang W, Li J, P Edwards P. Mesoscience: Exploring the common principle at mesoscales [J]. National Science Review, 2018, 5(3): 321‒326.
[37]
刘淇 . 大数据驱动的教育变革: 从线下到线上、从人工到智能 [EBOL]. 2022-05-28 [ 2022-12-20 ]. https:www.huaweicloud.comcloudplusfourthphasedetail_03.html .
Liu Q . Education reform driven by big data: From offline to online, from artificial to intelligent [EBOL]. 2022-05-28 [ 2022-12-20 ]. https:www.huaweicloud.comcloudplusfourthphasedetail_03.html .
链接1
[38] Buenaño‐Fernandez D, Villegas‐CH W, Luján‐Mora S. The use of tools of data mining to decision making in engineering education—A systematic mapping study [J]. Computer Applications in Engineering Education, 2019, 27(3): 744‒758.
[39] Wang H, Fu W. Personalized learning resource recommendation method based on dynamic collaborative filtering [J]. Mobile Networks and Applications, 2021, 26(2): 473‒487.
[40]
郑庆华 , 师斌 , 董博 . 面向智慧税务的大数据知识工程技术及其应用 [JOL]. 中国工程科学 , [ 2022-12-09 ]. https:kns.cnki.netkcmsdetail11.4421.G3.20221208.1118.002.html .
Zheng Q H , Shi B , Dong B . Technologies and applications of big data knowledge engineering for smart taxation systems [JOL]. Stragetic Study of CAE , [ 2022-12-09 ]. https:kns.cnki.netkcmsdetail11.4421.G3.20221208.1118.002.html. .
链接1
[41]
余红艳 , 孙丽 , 刘亚利 . 减税政策: 动因追溯、制度约束与路向选择 [J]. 税务研究 , 2022 7 : 32 ‒ 37 .
Yu H Y , Sun L , Liu Y L . Tax reduction policy: Motive tracing, institutional constraint and direction choice [J]. Tax Research , 2022 7 : 32 ‒ 37 .
[42]
奥德玛 , 杨云飞 , 穗志方 , 等 . 中文医学知识图谱CMeKG构建初探 [J]. 中文信息学报 . 2019 , 33 10 : 1 ‒ 7 .
Byambasuren O , Yang Y F , Sui Z F , al e t . Preliminary study on the construction of Chinese medical knowledge graph [J]. Journal of Chinese Information Processing , 2019 , 33 10 : 1 ‒ 7 .
[43] Sundararajan V, Henderson T, Perry C, al et. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality [J]. Journal of Clinical Epidemiology, 2004, 57(12): 1288‒1294.
[44] Chen L, Zeng W M, Cai Y D, al et. Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities [J]. PloS One, 2012, 7(4): e35254.
[45] Stearns M Q, Price C, Spackman K A, al et. SNOMED clinical terms: Overview of the development process and project status [C]//Proceedings of American Medical Informatics Association Annual Symposium. Washington DC: American Medical Informatics Association, 2001: 662‒666.
[46] E Lipscomb C. Medical subject headings (MeSH) [J]. Bulletin of the Medical Library Association, 2000, 88(3): 265‒270.
[47] Suriya S, Nivetha S. Design of UML diagrams for WEBMED-healthcare service system services [J]. EAI Endorsed Transactions on e-Learning, 2023, 8(1): e5.
[48] Sherwani J, Ali N, Mirza S, al et. Healthline: Speech-based access to health information by low-literate users [C]. Bangalore: International Conference on Information and Communication Technologies and Development, 2007.
[49] Ayers J W, Althouse B M, Allem J P, al et. Seasonality in seeking mental health information on google [J]. American Journal of Preventive Medicine, 2013, 44(5): 520‒525.
[50] Barnett G O, Cimino J J, Hupp J A, al et. DXplain: An evolving diagnostic decision-support system [J]. The Journal of the American Medical Association, 1987, 258(1): 67‒74.
[51] J Chatfield A. Lexicomp online and micromedex 2.0 [J]. Journal of the Medical Library Association, 2015, 103(2): 112‒113.
[52] Hey T, Tansley S, Tolle K, al et. The fourth paradigm: Data-intensive scientific discovery [M]. Mountain View: Microsoft Research, 2009.
[53] Jumper J, Evans R, Pritzel A, al et. Highly accurate protein structure prediction with AlphaFold [J]. Nature, 2021, 596(7873): 583‒589.
[54]
李国杰 . 关于人工智能的若干认识问题 [J]. 中国计算机学会通讯 , 2021 , 17 7 : 44 ‒ 50 .
Li G J . Some issues on understanding artificial intelligence [J]. Communications of the CCF , 2021 , 17 7 : 44 ‒ 50 .
[55] Pan Y H. Miniaturized five fundamental issues about visual knowledge [J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 615‒618.
[56] Anderson J R, Crawford J. Cognitive psychology and its implications [M]. San Francisco: WH Freeman, 1980.
[57] Ilievski F, Oltramari A, Ma K, al et. Dimensions of commonsense knowledge [J]. Knowledge-Based Systems, 2021, 229(11): 107347.
[58] Speer R, Chin J, Havasi C. Conceptnet 5.5: An open multilingual graph of general knowledge [C]. San Francisco: The AAAI Conference on Artificial Intelligence, 2017.
[59] Sap M, Le Bras R, Allaway E, al et. Atomic: An atlas of machine commonsense for if-then reasoning [C]. Hawaii: The AAAI Conference on Artificial Intelligence, 2019.
[60] A Miller G. WordNet: An electronic lexical database [M]. Cambridge: Massachusetts Institute of Technology Press, 1998.
[61] M Roget P. Roget´s Thesaurus of English words and phrases [M]. New York: Thomas Y. Crowell Company, 1911.
[62] Devlin J, Chang M-W, Lee K, al et. Bert: Pre-training of deep bidirectional transformers for language understanding [C]. Minneapoli: The North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
[63] Brown T, Mann B, Ryder N, al et. Language models are few-shot learners [C]//Proceedings of the International Conference on Neural Information Processing Systems. Cambridge: Massachusetts Institute of Technology Press, 2020: 1877‒1901.
[64] Nieh E H, Schottdorf M, Freeman N W, al et. Geometry of abstract learned knowledge in the hippocampus [J]. Nature, 2021, 595(7865): 80‒84.
[65] Xie Y, Hu P Y, Li J R, al et. Geometry of sequence working memory in macaque prefrontal cortex [J]. Science, 2022, 375(6581): 632‒639.
[66] Pearl J, Mackenzie D. The book of why: The new science of cause and effect [M]. New York: Basic Books, 2018.
[67] Pearl J. Causality [M]. Cambridge: Cambridge University Press, 2009.