期刊首页 优先出版 当期阅读 过刊浏览 作者中心 关于期刊 English

《工程(英文)》 >> 2023年 第25卷 第6期 doi: 10.1016/j.eng.2022.12.008

面向数据权利、数据定价和隐私计算的数据驱动学习

a College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
b International Digital Economy Academy, Shenzhen 518045, China
c Craiditx, Shanghai 200050, China
d Antgroup, Hangzhou 310023, China
e Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
f School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, Canada

收稿日期: 2022-01-11 修回日期: 2022-10-17 录用日期: 2022-12-25 发布日期: 2023-02-09

下一篇 上一篇

摘要

近年来,数据已成为数字经济中最重要的生产要素之一。与传统生产要素不同,数据的数字化性质使其难以合同和交易。因此,建立一个高效和标准的数据交易市场体系将有利于降低成本,提高行业各方的生产力。尽管许多研究致力于数据法规和其他数据交易问题,如隐私和定价,但很少有工作对机器学习和数据科学领域的这些研究进行全面回顾。为了提供对这个主题的完整和最新的理解,本文涵盖了数据交易过程中的三个关键问题:数据权利、数据定价和隐私计算。通过厘清这些主题之间的关系,本文提供了一个数据生态系统的全貌,其中数据由个人、研究机构和政府等数据主体生成,而数据处理者出于创新或运营目的获取数据,并通过适当的定价机制根据数据主体各自的所有权分配收益。为了使人工智能(AI)能够长期有益于人类社会的发展,人工智能算法需要通过数据保护法规(即隐私保护法规)进行评估,以帮助构建日常生活中值得信赖的人工智能系统。

图片

图1

图2

图3

图4

参考文献

[ 1 ] Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 2019;5(9):1572‒83. 链接1

[ 2 ] Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577(7792):706‒10. 链接1

[ 3 ] Lu L, Meng X, Mao Z, Karniadakis GE. DeepXDE: a deep learning library for solving differential equations. SIAM Rev 2021;63(1):208‒28. 链接1

[ 4 ] Pei J. A survey on data pricing: from economics to data science. IEEE Trans Knowl Data Eng 2020;34(10):4586‒608.

[ 5 ] Cong Z, Luo X, Jian P, Zhu F, Zhang Y. Data pricing in machine learning pipelines. Knowl Inf Syst 2021;64:1417‒55. 链接1

[ 6 ] Parkins D. The world’s most valuable resource is no longer oil, but data [Internet]. New York City: The Economist; 2017 May 6 [cited 2022 Dec 27]. Available from: https://www.economist.com/leaders/2017/05/06/the-worldsmost-valuable-resource-is-no-longer-oil-but-data. 链接1

[ 7 ] Atkinson RD. IP protection in the data economy: getting the balance right on 13 critical issues. Report. Washington, DC: Information Technology & Innovation Foundation; 2019 Jan 22.

[ 8 ] Klein B, Crawford RG, Alchian AA. Vertical integration, appropriable rents, and the competitive contracting process. J Law Econ 1978;21(2):297‒326. 链接1

[ 9 ] Williamson OE. Transaction-cost economics: the governance of contractual relations. J Law Econ 1979;22(2):233‒61. 链接1

[10] Demsetz H. Toward a theory of property rights. Am Econ Rev 1967;57(2):347‒59.

[11] Balkin JM. The fiduciary model of privacy. Harv Law Rev Forum 2020;134:11‒33.

[12] Ritter J, Mayer A. Regulating data as property: a new construct for moving forward. Duke Law Technol Rev 2018;16:220‒77.

[13] Michael K, Kobran S, Abbas R, PrivacyHamdoun S., data rights and cybersecurity: technology for good in the achievement of sustainable development goals. In: Proceedings of 2019 IEEEInternational Symposium on Technology and Society ISTAS); 2019 Nov 15‒16; Medford, MA, USA. New York City: IEEE; 2019. p. 1‒13. 链接1

[14] Voigt P, von dem Bussche A. The EU General Data Protection Regulation (GDPR). Brussels: European Commission; 2017. 链接1

[15] Truong NB, Sun K, Lee GM, Guo Y. GDPR-compliant personal data management: a blockchain-based solution. IEEE Trans Inf Forensics Secur 2020;15:1746‒61. 链接1

[16] Wingerath W, Gessert F, Witt E, Kuhlmann H, Bücklers F, Wollmer B, et al. Speed Kit: a polyglot & GDPR-compliant approach for caching personalized content. In: Proceedings of 2020 IEEE 36th International Conference on Data Engineering (ICDE); 2020 Apr 20‒24; Dallas, TX, USA. New York City: IEEE; 2020. p. 1603‒8. 链接1

[17] Agostinelli S, Maggi FM, Marrella A, Sapio F. Achieving GDPR compliance of BPMN process models. In: Cappiello C, Ruiz M, editors. Information systems engineering in responsible information systems. New York City: Springer; 2019. 链接1

[18] Ginart AA, Guan MY, Valiant G, Zou J. Making AI forget you: data deletion in machine learning. In: Proceedings of 33rd Conference on Neural Information Processing Systems; 2019 Dec 8‒14; Vancouver, BC, Canada; 2019.

[19] Li Q, Wen Z, Wu Z, Hu S, Wang N, Li Y, et al. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Trans Knowl Data Eng 2023;35(4):3347‒66. 链接1

[20] McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communicationefficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS); 2017 Apr 20‒22; Lauderdale, FL, USA; 2017.

[21] The Chartered Institute of Marketing (CIM). Data right: best data practice [Internet]. Berkshire: CIM; c2018 [cited 2022 Dec 27]. Available from: https:// www.cim.co.uk/more/data-right/. 链接1

[22] Kerber W. A new (intellectual) property right for non-personal data? An economic analysis. J Eur Int IP Law 2016;11:989‒99. 链接1

[23] Grossman SJ, Hart OD. The costs and benefits of ownership: a theory of vertical and lateral integration. J Polit Econ 1986;94(4):691‒719. 链接1

[24] Yan T, Procaccia AD. If you like Shapley then you’ll love the core. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2021 Feb 2‒9; online. Palo Alto: AAAI Press; 2021. p. 5751‒9. 链接1

[25] Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Query-based data pricing. J ACM 2015;62(5):1‒44. 链接1

[26] Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Toward practical query pricing with QueryMarket. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 2013 Jun 22‒27; New York City, NY, USA. New York City: Association for Computing Machinery; 2013. p. 613‒24. 链接1

[27] Deep S, Koutris P. QIRANA: a framework for scalable query pricing. In: Proceedings of the 2017 ACM International Conference on Management of Data; 2017 May 14‒19; Chicago, IL, USA. New York City: Association for Computing Machinery; 2017. p. 699‒713. 链接1

[28] Cook RD. Detection of influential observation in linear regression. Technometrics 2000;42(1):65‒8. 链接1

[29] Cook RD, Weisberg S. Residuals and influence in regression. New York City: Chapman and Hall; 1982. 链接1

[30] Yoon J, Arik S, Pfister T. Data valuation using reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13‒18; Vienna, Austria; 2020.

[31] Shapley LS. A value for n-person games. In: Kuhn HW, Tucker AW, editors. Contributions to the theory of games. Princeton: Princeton University Press; 2016.

[32] Ghorbani A, Zou J. Data Shapley: equitable valuation of data for machine learning. In: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9‒15; Long Beach, CA, USA; 2019.

[33] Jia R, Dao D, Wang B, Hubis FA, Gurel NM, Li B, et al. Efficient task-specific data valuation for nearest neighbor algorithms. Proc VLDB Endow 2019;12(11): 1610‒23. 链接1

[34] Amirata G, Kim M, Zou J. A distributional framework for data valuation. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jun 12‒18; Vienna, Austria. 2020. p. 3535‒44.

[35] Kwon Y, Rivas MA, Zou J. Efficient computation and analysis of distributional Shapley values. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics; 2021 Apr 13‒15; online. 2021. p. 793‒801.

[36] Xu X, Wu Z, Foo CS, Low BKH. Validation free and replication robust volumebased data valuation. In: Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021); 2021 Dec 7‒10; online. 2021. p. 10837‒48.

[37] Wu Z, Shu Y, Low BKH. DAVINZ: data valuation using deep neural networks at initialization. In: Proceedings of International Conference on Machine Learning; 2022 Jul 17‒23; Baltimore, MA, USA. 2022. p. 24150‒76.

[38] Wang J, Zhang Y, Kim TK, Gu Y. Shapley Q-value: a local reward approach to solve global reward games. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence; 2020 Feb 7‒12; New York City, NY, USA. Palo Alto: AAAI Press; 2020. p. 7285‒92. 链接1

[39] Wang J, Wiens J, Lundberg S. Shapley flow: a graph-based approach to interpreting model predictions. In: Proceedings of 23rd International Conference on Artificial Intelligence and Statistics; 2020 Aug 26‒28; online. New York City: Society for Artificial Intelligence and Statistics; 2021. p. 721‒9.

[40] Ghorbani A, Zou J, Esteva A. Data Shapley valuation for efficient batch active learning. 2021. arXiv:2104.08312. 链接1

[41] Fan Z, Fang H, Zhou Z, Pei J, Friedlander MP, Liu C, et al. Improving fairness for data valuation in federated learning. 2021. arXiv:2109.09046. 链接1

[42] Xu X, Lyu L, Ma X, Miao CL, Foo CS, Low BKH. Gradient driven rewards to guarantee fairness in collaborative machine learning. In: Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021); 2021 Dec 7‒10; online. 2021. p. 16104‒17.

[43] Tang S, Ghorbani A, Yamashita R, Rehman S, Dunnmon JA, Zou J, et al. Data valuation for medical imaging using Shapley value and application to a largescale chest X-ray dataset. Sci Rep 2021;11:8366. 链接1

[44] Niu C, Zheng Z, Wu F, Tang SJ, Gao X, Chen G. Unlocking the value of privacy: trading aggregate statistics over private correlated data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018 Aug 19‒23; London, UK. New York City: Association for Computing Machinery (ACM); 2018. p. 2031‒40. 链接1

[45] Chen L, Koutris P, Kumar A. Towards model-based pricing for machine learning in a data marketplace. In: Proceedings of the 2019 International Conference on Management of Data; 2019 Jun 30‒Jul 5; Amsterdam, the Netherlands. New York City: Association for Computing Machinery (ACM); 2019. p. 1535‒52. 链接1

[46] Liu J, Lou J, Liu J, Xiong L, Pei J, Sun J. Dealer: an end-to-end model marketplace with differential privacy. Pro VLDB Endow 2021;14:957‒69. 链接1

[47] Lin Q, Zhang J, Liu J, Ren K, Lou J, Jun L, et al. Demonstration of Dealer: an endto- end model marketplace with differential privacy. Pro VLDB Endow 2021;14 (12):2747‒50. 链接1

[48] Zheng S, Cao Y, Yoshikawa M. Trading data with personalized differential privacy and partial arbitrage freeness. 2021. arXiv:2105.01651. 链接1

[49] Niu C, Zheng Z, Wu F, Gao X, Chen G. Trading data in good faith: integrating truthfulness and privacy preservation in data markets. In: Proceedings of 2017 IEEE 33rd International Conference on Data Engineering (ICDE); 2017 Apr 19‒22; DiegoSan, CA, USA. New York City: IEEE; 2017. p. 223‒6. 链接1

[50] Zhou Z, Cao X, Liu J, Zhang B, Ren K. Zero knowledge contingent payments for trained neural networks. In: Bertino E, Shulman H, Waidner M, editors. Computer security—ESORICS 2021. New York City: Springer; 2021. p. 628‒48. 链接1

[51] Isaak J, Hanna MJ. User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 2018;51(8):56‒9. 链接1

[52] Dwork C. Differential privacy. In: Bugliesi M, Preneel B, Sassone V, Wegener I, editors. International colloquium on automata, languages, and programming. Berlin: Springer; 2006. p. 1‒12. 链接1

[53] Dwork C, Roth A. The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 2014;9(3‒4):211‒407.

[54] Erlingsson Ú, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacypreserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security; 2014 Nov 3‍‒‍7; Scottsdale, AZ, USA. New York City: Association for Computing Machinery (ACM); 2014. p. 1054‒67. 链接1

[55] Qin Z, Yang Y, Yu T, Khalil I, Xiao X, Ren K. Heavy hitter estimation over setvalued data with local differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; 2016 Oct 24‒28; Vienna, Austria. New York City: Association for Computing Machinery (ACM); 2016. p. 192‒203. 链接1

[56] Qin Z, Yu T, Yang Y, Khalil I, Xiao X, Ren K. Generating synthetic decentralized social graphs with local differential privacy. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; 2017 Oct 30‒ Nov 3; Dallas, TX, USA. New York City: Association for Computing Machinery (ACM); 2017. p. 425‒38. 链接1

[57] Yao AC. Protocols for secure computations. In: Proceedings of 23rd Annual Symposium On Foundations Of Computer Science (SFCS 1982); 1982 Nov 3‒5; Chicago, IL, USA. New York City: IEEE; 1982. p. 160‒4. 链接1

[58] Rabin MO. How to exchange secrets with oblivious transfer. 2005. IACR Cryptology ePrint Archive:187.

[59] Tassa T. Generalized oblivious transfer by secret sharing. Des Codes Cryptogr 2011;58(1):11‒21. 链接1

[60] Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh TA, Bacon D. Federated learning: strategies for improving communication efficiency. 2016. arXiv:1610.05492.

[61] Liu Y, Kang Y, Xing C, Chen T, Yang Q. A secure federated transfer learning framework. IEEE Intell Syst 2020;35(4):70‒82. 链接1

[62] Kim H, Park J, Bennis M, Kim SL. Blockchained on-device federated learning. IEEE Commun Lett 2020;24(6):1279‒83. 链接1

[63] Smith V, Chiang CK, Sanjabi M, Talwalkar A. Federated multi-task learning. In: Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4‒9; Long Beach, CA, USA. Red Hook: Curran Associates Inc.; 2017. p. 30.

[64] Cheng K, Fan T, Jin Y, Liu Y, Chen T, Papadopoulos D, et al. Secureboost: a lossless federated learning framework. IEEE Intell Syst 2021;36(6):87‒98. 链接1

[65] Hardy S, Henecka W, Ivey-Law H, Nock R, Patrini G, Smith G, et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. 2017. arXiv:1711.10677.

[66] Zhao S, Zhou L, Wang W, Cai D, Kam TL, Xu Y, et al. Splitnet: divide and co-training. 2020. arXiv:2011.14660.

[67] Vepakomma P, Gupta O, Swedish T, Raskar R. Split learning for health: distributed deep learning without sharing raw patient data. 2018. arXiv:1812.00564.

[68] Gentry C. Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing; 2009 May 31‒Jun 2; Bethesda, MD, USA. New York City: Association for Computing Machinery (ACM); 2009. p. 169‒78. 链接1

[69] Shoukry Y, Gatsis K, Alanwar A, Pappas GJ, Seshia SA, Srivastava M, et al. Privacy-aware quadratic optimization using partially homomorphic encryption. In: Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC); 2016 Dec 12‒14; VegasLas, NV, USA. New York City: IEEE; 2016. p. 5053‒8. 链接1

[70] Damgård I, Pastro V, Smart N, Zakarias S. Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini R, Canetti R, editors. Advances in cryptology—CRYPTO 2012. Berlin: Springer; 2012. p. 43‒62. 链接1

[71] Gorbunov S, Vaikuntanathan V, Wichs D. Leveled fully homomorphic signatures from standard lattices. In: Proceedings of the 57th Annual ACM Symposium on Theory of Computing; 2015 Jun 14‒17; Portland, OR, USA. New York City: Association for Computing Machinery (ACM); 2015. p. 469‒77. 链接1

[72] Brakerski Z, Vaikuntanathan V. Efficient fully homomorphic encryption from (standard) LWE. SIAM J Comput 2014;43(2):831‒71. 链接1

[73] López-Alt A, Tromer E, Vaikuntanathan V. On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the 44th Annual ACM Symposium on Theory of Computing; 2012 May 19‒22; New York City, NY, USA. New York City: Association for Computing Machinery; 2012. p. 1219‒34. 链接1

[74] Chillotti I, Gama N, Georgieva M, Izabachène M. Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Proceedings of 22nd International Conference on the Theory and Application of cryptology and Information Security; 2016 Dec 4‒8; Hanoi, Vietnam. Berlin: Springer; 2016. p. 3‒33. 链接1

[75] Cheon JH, Kim A, Kim M, Song Y. Homomorphic encryption for arithmetic of approximate numbers. In: Takagi T, Peyrin T, editors. Advances in cryptology—ASIACRYPT 2017. Berlin: Springer; 2017. p. 409‒37. 链接1

[76] Sabt M, Achemlal M, Bouabdallah A. Trusted execution environment: what it is, and what it is not. In: Proceedings of the 2015 BigDataSE/ISPAIEEETrustcom/; 2015 Aug 20‒22; Helsinki, Finland. New York City: IEEE; 2015. p. 57‒64. 链接1

[77] Goldwasser S, Micali S, Rackoff C. The knowledge complexity of interactive proof systems. SIAM J Comput 1989;18:186‒208. 链接1

[78] Bitansky N, Canetti R, Chiesa A, Tromer E. From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference; 2012 Jan 8‒10; Cambridge, MA, USA. New York City: Association for Computing Machinery (ACM); 2012. p. 326‒49. 链接1

相关研究