Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2023, Volume 25, Issue 6 doi: 10.1016/j.eng.2022.12.008

Data-Driven Learning for Data Rights, Data Pricing, and Privacy Computing

a College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
b International Digital Economy Academy, Shenzhen 518045, China
c Craiditx, Shanghai 200050, China
d Antgroup, Hangzhou 310023, China
e Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
f School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, Canada

Received: 2022-01-11 Revised: 2022-10-17 Accepted: 2022-12-25 Available online: 2023-02-09

Next Previous

Abstract


In recent years, data has become one of the most important resources in the digital economy. Unlike traditional resources, the digital nature of data makes it difficult to value and contract. Therefore, establishing an efficient and standard data-transaction market system would be beneficial for lowering cost and improving productivity among the parties in this industry. Although numerous studies have been dedicated to the issue of complying with data regulations and other data-transaction issues such as privacy and pricing, little work has been done to provide a comprehensive review of these studies in the fields of machine learning and data science. To provide a complete and up-to-date understanding of this topic, this review covers the three key issues of data transaction: data rights, data pricing, and privacy computing. By connecting these topics, this paper provides a big picture of a data ecosystem in which data is generated by data subjects such as individuals, research agencies, and governments, while data processors acquire data for innovational or operational purposes, and benefits are allocated according to the data’s respective ownership via an appropriate price. With the long-term goal of making artificial intelligence (AI) beneficial to human society, AI algorithms will then be assessed by data protection regulations (i.e., privacy protection regulations) to help build trustworthy AI systems for daily life.

Figures

Fig. 1

Fig. 2

Fig. 3

Fig. 4

References

[ 1 ] Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 2019;5(9):1572‒83. link1

[ 2 ] Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577(7792):706‒10. link1

[ 3 ] Lu L, Meng X, Mao Z, Karniadakis GE. DeepXDE: a deep learning library for solving differential equations. SIAM Rev 2021;63(1):208‒28. link1

[ 4 ] Pei J. A survey on data pricing: from economics to data science. IEEE Trans Knowl Data Eng 2020;34(10):4586‒608.

[ 5 ] Cong Z, Luo X, Jian P, Zhu F, Zhang Y. Data pricing in machine learning pipelines. Knowl Inf Syst 2021;64:1417‒55. link1

[ 6 ] Parkins D. The world’s most valuable resource is no longer oil, but data [Internet]. New York City: The Economist; 2017 May 6 [cited 2022 Dec 27]. Available from: https://www.economist.com/leaders/2017/05/06/the-worldsmost-valuable-resource-is-no-longer-oil-but-data. link1

[ 7 ] Atkinson RD. IP protection in the data economy: getting the balance right on 13 critical issues. Report. Washington, DC: Information Technology & Innovation Foundation; 2019 Jan 22.

[ 8 ] Klein B, Crawford RG, Alchian AA. Vertical integration, appropriable rents, and the competitive contracting process. J Law Econ 1978;21(2):297‒326. link1

[ 9 ] Williamson OE. Transaction-cost economics: the governance of contractual relations. J Law Econ 1979;22(2):233‒61. link1

[10] Demsetz H. Toward a theory of property rights. Am Econ Rev 1967;57(2):347‒59.

[11] Balkin JM. The fiduciary model of privacy. Harv Law Rev Forum 2020;134:11‒33.

[12] Ritter J, Mayer A. Regulating data as property: a new construct for moving forward. Duke Law Technol Rev 2018;16:220‒77.

[13] Michael K, Kobran S, Abbas R, PrivacyHamdoun S., data rights and cybersecurity: technology for good in the achievement of sustainable development goals. In: Proceedings of 2019 IEEEInternational Symposium on Technology and Society ISTAS); 2019 Nov 15‒16; Medford, MA, USA. New York City: IEEE; 2019. p. 1‒13. link1

[14] Voigt P, von dem Bussche A. The EU General Data Protection Regulation (GDPR). Brussels: European Commission; 2017. link1

[15] Truong NB, Sun K, Lee GM, Guo Y. GDPR-compliant personal data management: a blockchain-based solution. IEEE Trans Inf Forensics Secur 2020;15:1746‒61. link1

[16] Wingerath W, Gessert F, Witt E, Kuhlmann H, Bücklers F, Wollmer B, et al. Speed Kit: a polyglot & GDPR-compliant approach for caching personalized content. In: Proceedings of 2020 IEEE 36th International Conference on Data Engineering (ICDE); 2020 Apr 20‒24; Dallas, TX, USA. New York City: IEEE; 2020. p. 1603‒8. link1

[17] Agostinelli S, Maggi FM, Marrella A, Sapio F. Achieving GDPR compliance of BPMN process models. In: Cappiello C, Ruiz M, editors. Information systems engineering in responsible information systems. New York City: Springer; 2019. link1

[18] Ginart AA, Guan MY, Valiant G, Zou J. Making AI forget you: data deletion in machine learning. In: Proceedings of 33rd Conference on Neural Information Processing Systems; 2019 Dec 8‒14; Vancouver, BC, Canada; 2019.

[19] Li Q, Wen Z, Wu Z, Hu S, Wang N, Li Y, et al. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Trans Knowl Data Eng 2023;35(4):3347‒66. link1

[20] McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communicationefficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS); 2017 Apr 20‒22; Lauderdale, FL, USA; 2017.

[21] The Chartered Institute of Marketing (CIM). Data right: best data practice [Internet]. Berkshire: CIM; c2018 [cited 2022 Dec 27]. Available from: https:// www.cim.co.uk/more/data-right/. link1

[22] Kerber W. A new (intellectual) property right for non-personal data? An economic analysis. J Eur Int IP Law 2016;11:989‒99. link1

[23] Grossman SJ, Hart OD. The costs and benefits of ownership: a theory of vertical and lateral integration. J Polit Econ 1986;94(4):691‒719. link1

[24] Yan T, Procaccia AD. If you like Shapley then you’ll love the core. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2021 Feb 2‒9; online. Palo Alto: AAAI Press; 2021. p. 5751‒9. link1

[25] Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Query-based data pricing. J ACM 2015;62(5):1‒44. link1

[26] Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Toward practical query pricing with QueryMarket. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 2013 Jun 22‒27; New York City, NY, USA. New York City: Association for Computing Machinery; 2013. p. 613‒24. link1

[27] Deep S, Koutris P. QIRANA: a framework for scalable query pricing. In: Proceedings of the 2017 ACM International Conference on Management of Data; 2017 May 14‒19; Chicago, IL, USA. New York City: Association for Computing Machinery; 2017. p. 699‒713. link1

[28] Cook RD. Detection of influential observation in linear regression. Technometrics 2000;42(1):65‒8. link1

[29] Cook RD, Weisberg S. Residuals and influence in regression. New York City: Chapman and Hall; 1982. link1

[30] Yoon J, Arik S, Pfister T. Data valuation using reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13‒18; Vienna, Austria; 2020.

[31] Shapley LS. A value for n-person games. In: Kuhn HW, Tucker AW, editors. Contributions to the theory of games. Princeton: Princeton University Press; 2016.

[32] Ghorbani A, Zou J. Data Shapley: equitable valuation of data for machine learning. In: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9‒15; Long Beach, CA, USA; 2019.

[33] Jia R, Dao D, Wang B, Hubis FA, Gurel NM, Li B, et al. Efficient task-specific data valuation for nearest neighbor algorithms. Proc VLDB Endow 2019;12(11): 1610‒23. link1

[34] Amirata G, Kim M, Zou J. A distributional framework for data valuation. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jun 12‒18; Vienna, Austria. 2020. p. 3535‒44.

[35] Kwon Y, Rivas MA, Zou J. Efficient computation and analysis of distributional Shapley values. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics; 2021 Apr 13‒15; online. 2021. p. 793‒801.

[36] Xu X, Wu Z, Foo CS, Low BKH. Validation free and replication robust volumebased data valuation. In: Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021); 2021 Dec 7‒10; online. 2021. p. 10837‒48.

[37] Wu Z, Shu Y, Low BKH. DAVINZ: data valuation using deep neural networks at initialization. In: Proceedings of International Conference on Machine Learning; 2022 Jul 17‒23; Baltimore, MA, USA. 2022. p. 24150‒76.

[38] Wang J, Zhang Y, Kim TK, Gu Y. Shapley Q-value: a local reward approach to solve global reward games. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence; 2020 Feb 7‒12; New York City, NY, USA. Palo Alto: AAAI Press; 2020. p. 7285‒92. link1

[39] Wang J, Wiens J, Lundberg S. Shapley flow: a graph-based approach to interpreting model predictions. In: Proceedings of 23rd International Conference on Artificial Intelligence and Statistics; 2020 Aug 26‒28; online. New York City: Society for Artificial Intelligence and Statistics; 2021. p. 721‒9.

[40] Ghorbani A, Zou J, Esteva A. Data Shapley valuation for efficient batch active learning. 2021. arXiv:2104.08312. link1

[41] Fan Z, Fang H, Zhou Z, Pei J, Friedlander MP, Liu C, et al. Improving fairness for data valuation in federated learning. 2021. arXiv:2109.09046. link1

[42] Xu X, Lyu L, Ma X, Miao CL, Foo CS, Low BKH. Gradient driven rewards to guarantee fairness in collaborative machine learning. In: Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021); 2021 Dec 7‒10; online. 2021. p. 16104‒17.

[43] Tang S, Ghorbani A, Yamashita R, Rehman S, Dunnmon JA, Zou J, et al. Data valuation for medical imaging using Shapley value and application to a largescale chest X-ray dataset. Sci Rep 2021;11:8366. link1

[44] Niu C, Zheng Z, Wu F, Tang SJ, Gao X, Chen G. Unlocking the value of privacy: trading aggregate statistics over private correlated data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018 Aug 19‒23; London, UK. New York City: Association for Computing Machinery (ACM); 2018. p. 2031‒40. link1

[45] Chen L, Koutris P, Kumar A. Towards model-based pricing for machine learning in a data marketplace. In: Proceedings of the 2019 International Conference on Management of Data; 2019 Jun 30‒Jul 5; Amsterdam, the Netherlands. New York City: Association for Computing Machinery (ACM); 2019. p. 1535‒52. link1

[46] Liu J, Lou J, Liu J, Xiong L, Pei J, Sun J. Dealer: an end-to-end model marketplace with differential privacy. Pro VLDB Endow 2021;14:957‒69. link1

[47] Lin Q, Zhang J, Liu J, Ren K, Lou J, Jun L, et al. Demonstration of Dealer: an endto- end model marketplace with differential privacy. Pro VLDB Endow 2021;14 (12):2747‒50. link1

[48] Zheng S, Cao Y, Yoshikawa M. Trading data with personalized differential privacy and partial arbitrage freeness. 2021. arXiv:2105.01651. link1

[49] Niu C, Zheng Z, Wu F, Gao X, Chen G. Trading data in good faith: integrating truthfulness and privacy preservation in data markets. In: Proceedings of 2017 IEEE 33rd International Conference on Data Engineering (ICDE); 2017 Apr 19‒22; DiegoSan, CA, USA. New York City: IEEE; 2017. p. 223‒6. link1

[50] Zhou Z, Cao X, Liu J, Zhang B, Ren K. Zero knowledge contingent payments for trained neural networks. In: Bertino E, Shulman H, Waidner M, editors. Computer security—ESORICS 2021. New York City: Springer; 2021. p. 628‒48. link1

[51] Isaak J, Hanna MJ. User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 2018;51(8):56‒9. link1

[52] Dwork C. Differential privacy. In: Bugliesi M, Preneel B, Sassone V, Wegener I, editors. International colloquium on automata, languages, and programming. Berlin: Springer; 2006. p. 1‒12. link1

[53] Dwork C, Roth A. The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 2014;9(3‒4):211‒407.

[54] Erlingsson Ú, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacypreserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security; 2014 Nov 3‍‒‍7; Scottsdale, AZ, USA. New York City: Association for Computing Machinery (ACM); 2014. p. 1054‒67. link1

[55] Qin Z, Yang Y, Yu T, Khalil I, Xiao X, Ren K. Heavy hitter estimation over setvalued data with local differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; 2016 Oct 24‒28; Vienna, Austria. New York City: Association for Computing Machinery (ACM); 2016. p. 192‒203. link1

[56] Qin Z, Yu T, Yang Y, Khalil I, Xiao X, Ren K. Generating synthetic decentralized social graphs with local differential privacy. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; 2017 Oct 30‒ Nov 3; Dallas, TX, USA. New York City: Association for Computing Machinery (ACM); 2017. p. 425‒38. link1

[57] Yao AC. Protocols for secure computations. In: Proceedings of 23rd Annual Symposium On Foundations Of Computer Science (SFCS 1982); 1982 Nov 3‒5; Chicago, IL, USA. New York City: IEEE; 1982. p. 160‒4. link1

[58] Rabin MO. How to exchange secrets with oblivious transfer. 2005. IACR Cryptology ePrint Archive:187.

[59] Tassa T. Generalized oblivious transfer by secret sharing. Des Codes Cryptogr 2011;58(1):11‒21. link1

[60] Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh TA, Bacon D. Federated learning: strategies for improving communication efficiency. 2016. arXiv:1610.05492.

[61] Liu Y, Kang Y, Xing C, Chen T, Yang Q. A secure federated transfer learning framework. IEEE Intell Syst 2020;35(4):70‒82. link1

[62] Kim H, Park J, Bennis M, Kim SL. Blockchained on-device federated learning. IEEE Commun Lett 2020;24(6):1279‒83. link1

[63] Smith V, Chiang CK, Sanjabi M, Talwalkar A. Federated multi-task learning. In: Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4‒9; Long Beach, CA, USA. Red Hook: Curran Associates Inc.; 2017. p. 30.

[64] Cheng K, Fan T, Jin Y, Liu Y, Chen T, Papadopoulos D, et al. Secureboost: a lossless federated learning framework. IEEE Intell Syst 2021;36(6):87‒98. link1

[65] Hardy S, Henecka W, Ivey-Law H, Nock R, Patrini G, Smith G, et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. 2017. arXiv:1711.10677.

[66] Zhao S, Zhou L, Wang W, Cai D, Kam TL, Xu Y, et al. Splitnet: divide and co-training. 2020. arXiv:2011.14660.

[67] Vepakomma P, Gupta O, Swedish T, Raskar R. Split learning for health: distributed deep learning without sharing raw patient data. 2018. arXiv:1812.00564.

[68] Gentry C. Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing; 2009 May 31‒Jun 2; Bethesda, MD, USA. New York City: Association for Computing Machinery (ACM); 2009. p. 169‒78. link1

[69] Shoukry Y, Gatsis K, Alanwar A, Pappas GJ, Seshia SA, Srivastava M, et al. Privacy-aware quadratic optimization using partially homomorphic encryption. In: Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC); 2016 Dec 12‒14; VegasLas, NV, USA. New York City: IEEE; 2016. p. 5053‒8. link1

[70] Damgård I, Pastro V, Smart N, Zakarias S. Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini R, Canetti R, editors. Advances in cryptology—CRYPTO 2012. Berlin: Springer; 2012. p. 43‒62. link1

[71] Gorbunov S, Vaikuntanathan V, Wichs D. Leveled fully homomorphic signatures from standard lattices. In: Proceedings of the 57th Annual ACM Symposium on Theory of Computing; 2015 Jun 14‒17; Portland, OR, USA. New York City: Association for Computing Machinery (ACM); 2015. p. 469‒77. link1

[72] Brakerski Z, Vaikuntanathan V. Efficient fully homomorphic encryption from (standard) LWE. SIAM J Comput 2014;43(2):831‒71. link1

[73] López-Alt A, Tromer E, Vaikuntanathan V. On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the 44th Annual ACM Symposium on Theory of Computing; 2012 May 19‒22; New York City, NY, USA. New York City: Association for Computing Machinery; 2012. p. 1219‒34. link1

[74] Chillotti I, Gama N, Georgieva M, Izabachène M. Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Proceedings of 22nd International Conference on the Theory and Application of cryptology and Information Security; 2016 Dec 4‒8; Hanoi, Vietnam. Berlin: Springer; 2016. p. 3‒33. link1

[75] Cheon JH, Kim A, Kim M, Song Y. Homomorphic encryption for arithmetic of approximate numbers. In: Takagi T, Peyrin T, editors. Advances in cryptology—ASIACRYPT 2017. Berlin: Springer; 2017. p. 409‒37. link1

[76] Sabt M, Achemlal M, Bouabdallah A. Trusted execution environment: what it is, and what it is not. In: Proceedings of the 2015 BigDataSE/ISPAIEEETrustcom/; 2015 Aug 20‒22; Helsinki, Finland. New York City: IEEE; 2015. p. 57‒64. link1

[77] Goldwasser S, Micali S, Rackoff C. The knowledge complexity of interactive proof systems. SIAM J Comput 1989;18:186‒208. link1

[78] Bitansky N, Canetti R, Chiesa A, Tromer E. From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference; 2012 Jan 8‒10; Cambridge, MA, USA. New York City: Association for Computing Machinery (ACM); 2012. p. 326‒49. link1

Related Research