Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2019, Volume 5, Issue 6 doi: 10.1016/j.eng.2019.02.011

Big Data Creates New Opportunities for Materials Research: A Review on Methods and Applications of Machine Learning for Materials Design

a Process Systems Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg 39106, Germany
b Process Systems Engineering, Otto-von-Guericke University Magdeburg, Magdeburg 39106, Germany

Received:2018-11-21 Revised:2018-12-13 Accepted: 2019-02-25 Available online:2019-08-22

Next Previous


Materials development has historically been driven by human needs and desires, and this is likely to continue in the foreseeable future. The global population is expected to reach ten billion by 2050, which will promote increasingly large demands for clean and high-efficiency energy, personalized consumer products, secure food supplies, and professional healthcare. New functional materials that are made and tailored for targeted properties or behaviors will be the key to tackling this challenge. Traditionally, advanced materials are found empirically or through experimental trial-and-error approaches. As big data generated by modern experimental and computational techniques is becoming more readily available, data-driven or machine learning (ML) methods have opened new paradigms for the discovery and rational design of materials. In this review article, we provide a brief introduction on various ML methods and related software or tools. Main ideas and basic procedures for employing ML approaches in materials research are highlighted. We then summarize recent important applications of ML for the large-scale screening and optimal design of polymer and porous materials, catalytic materials, and energetic materials. Finally, concluding remarks and an outlook are provided.


Fig. 1

Fig. 2

Fig. 3

Fig. 4


[1]  Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN. Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J Am Chem Soc 2013;135(19):7296–303. link1

[2]  Rajan K. Materials informatics: the materials ‘‘gene” and big data. Annu Rev Mater Res 2015;45(1):153–69. link1

[3]  Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater 2013;1(1):011002. link1

[4]  Michalski RS, Carbonell JG, Mitchell TM, editors. Machine learning: an artificial intelligence approach. Berlin: Springer-Verlag; 2013. link1

[5]  Agrawal A, Deshpande PD, Cecen A, Basavarsu GP, Choudhary AN, Kalidindi SR. Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters. Integr Mater Manuf Innovation 2014;3:8. link1

[6]  Karak SK, Chatterjee S, Bandopadhyay S. Mathematical modelling of the physical and mechanical properties of nano-Y2O3 dispersed ferritic alloys using evolutionary algorithm-based neural network. Powder Technol 2015;274:217–26. link1

[7]  Pilania G, Mannodi-Kanakkithodi A, Uberuaga BP, Ramprasad R, Gubernatis JE, Lookman T. Machine learning bandgaps of double perovskites. Sci Rep 2016;6:19375. link1

[8]  Jinnouchi R, Asahi R. Predicting catalytic activity of nanoparticles by a DFTaided machine-learning algorithm. J Phys Chem Lett 2017;8(17):4279–83. link1

[9]  Zhou T, Jhamb S, Liang X, Sundmacher K, Gani R. Prediction of acid dissociation constants of organic compounds using group contribution methods. Chem Eng Sci 2018;183:95–105. link1

[10]  Aghaji MZ, Fernandez M, Boyd PG, Daff TD, Woo TK. Quantitative structure– property relationship models for recognizing metal organic frameworks (MOFs) with high CO2 working capacity and CO2/CH4 selectivity for methane purification. Eur J Inorg Chem 2016;2016(27):4505–11. link1

[11]  Sharma V, Wang C, Lorenzini RG, Ma R, Zhu Q, Sinkovits DW, et al. Rational design of all organic polymer dielectrics. Nat Commun 2014;5:4845. link1

[12]  Madaan N, Shiju NR, Rothenberg G. Predicting the performance of oxidation catalysts using descriptor models. Catal Sci Technol 2016;6(1):125–33. link1

[13]  Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel TD, Duvenaud D, Maclaurin D, Blood-Forsythe MA, et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat Mater 2016;15(10):1120–7. link1

[14]  Stanev V, Oses C, Kusne AG, Rodriguez E, Paglione J, Curtarolo S, et al. Machine learning modeling of superconducting critical temperature. NPJ Comput Mater 2018;4(1):29. link1

[15]  Olivares-Amaya R, Amador-Bedolla C, Hachmann J, Atahan-Evrenk S, SánchezCarrera RS, Vogt L, et al. Accelerated computational discovery of highperformance materials for organic photovoltaics by means of cheminformatics. Energy Environ Sci 2011;4(12):4849–61. link1

[16]  Web of Science [Internet]. Boston: Clarivate Analytics; c2018 [cited 2018 October]. Available from: link1

[17]  Agrawal A, Choudhary A. Perspective: materials informatics and big data: realization of the ‘‘fourth paradigm” of science in materials science. APL Mater 2016;4(5):053208. link1

[18]  Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature 2018;559(7715):547–55. link1

[19]  Achenie LEK, Gani R, Venkatasubramanian V, editors. Computer aided molecular design: theory and practice. Amsterdam: Elsevier; 2003. link1

[20]  Zhang L, Cignitti S, Gani R. Generic mathematical programming formulation and solution for computer-aided molecular design. Comput Chem Eng 2015;78:79–84. link1

[21]  Song Z, Zhou T, Qi Z, Sundmacher K. Systematic method for screening ionic liquids as extraction solvents exemplified by an extractive desulfurization process. ACS Sustain Chem Eng 2017;5(4):3382–9. link1

[22]  Song Z, Zhang C, Qi Z, Zhou T, Sundmacher K. Computer-aided design of ionic liquids as solvents for extractive desulfurization. AIChE J 2018;64(3):1013–25. link1

[23]  Zhou T, McBride K, Zhang X, Qi Z, Sundmacher K. Integrated solvent and process design exemplified for a Diels-Alder reaction. AIChE J 2015;61 (1):147–58. link1

[24]  Zhou T, Lyu Z, Qi Z, Sundmacher K. Robust design of optimal solvents for chemical reactions—a combined experimental and computational strategy. Chem Eng Sci 2015;137:613–25. link1

[25]  Zhou T, Wang J, McBride K, Sundmacher K. Optimal design of solvents for extractive reaction processes. AIChE J 2016;62(9):3238–49. link1

[26]  Zhou T, Zhou Y, Sundmacher K. A hybrid stochastic–deterministic optimization approach for integrated solvent and process design. Chem Eng Sci 2017;159:207–16. link1

[27]  Siddhaye S, Camarda K, Southard M, Topp E. Pharmaceutical product design using combinatorial optimization. Comput Chem Eng 2004;28(3):425–34. link1

[28]  Zhang L, Mao H, Liu L, Du J, Gani R. A machine learning based computer-aided molecular design/screening methodology for fragrance molecules. Comput Chem Eng 2018;115:295–308. link1

[29]  Papadopoulos AI, Stijepovic M, Linke P. On the systematic design and selection of optimal working fluids for Organic Rankine Cycles. Appl Therm Eng 2010;30 (6–7):760–9. link1

[30]  Samudra A, Sahinidis NV. Design of heat-transfer media components for retail food refrigeration. Ind Eng Chem Res 2013;52(25):8518–26. link1

[31]  Chavali S, Lin B, Miller DC, Camarda KV. Environmentally-benign transition metal catalyst design using optimization techniques. Comput Chem Eng 2004;28(5):605–11. link1

[32]  Ramprasad R, Batra R, Pilania G, Mannodi-Kanakkithodi A, Kim C. Machine learning in materials informatics: recent applications and prospects. Npj Comput Mater 2017;3(1):54. link1

[33]  Curtarolo S, Hart GL, Nardelli MB, Mingo N, Sanvito S, Levy O. The highthroughput highway to computational materials design. Nat Mater 2013;12 (3):191–201. link1

[34]  Galvez J, Garcia R, Salabert MT, Soler R. Charge indexes. New topological descriptors. J Chem Inf Comput Sci 1994;34(3):520–5. link1

[35]  Gozalbes R, Doucet JP, Derouin F. Application of topological descriptors in QSAR and drug design: history and new trends. Curr Drug Targets Infect Disord 2002;2(1):93–102. link1

[36]  Ponce YM, Garit JA, Torrens F, Zaldivar VR, Castro EA. Atom, atom-type, and total linear indices of the ‘‘molecular pseudograph’s atom adjacency matrix”: application to QSPR/QSAR studies of organic compounds. Molecules 2004;9 (12):1100–23. link1

[37]  Dureja H, Madan AK. Superaugmented eccentric connectivity indices: newgeneration highly discriminating topological descriptors for QSAR/QSPR modeling. Med Chem Res 2007;16(7–9):331–41. link1

[38]  Fernandez M, Trefiak NR, Woo TK. Atomic property weighted radial distribution functions descriptors of metal–organic frameworks for the prediction of gas uptake capacity. J Phys Chem C 2013;117(27):14095–105. link1

[39]  Han J, Kamber M, Pei J. Data mining: concepts and techniques. 3rd ed. San Francisco: Morgan Kaufmann; 2011. link1

[40]  Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010;2(4):433–59. link1

[41]  Zhou T, Qi Z, Sundmacher K. Model-based method for the screening of solvents for chemical reactions. Chem Eng Sci 2014;115:177–85. link1

[42]  Williams CKI, Rasmussen CE. Gaussian processes for regression. In: Touretzky DS, Mozer MC, Hasselmo ME, editors. Advances in neural information processing systems 8. Cambridge: A Bradford Book; 1996. p. 514–20.

[43]  Abraham A. Artificial neural networks. In: Sydenham P, Thorn R, editors. Handbook of measuring system design. Hoboken: John Wiley & Sons, Ltd.; 2005. link1

[44]  Basak D, Pal S, Patranabis DC. Support vector regression. Neural Inf Process 2007;11(10):203–24. link1

[45]  Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 1991;21(3):660–74. link1

[46]  Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 2003;43(6):1947–58. link1

[47]  Kazantzi V, Qin X, El-Halwagi M, Eljack F, Eden M. Simultaneous process and molecular design through property clustering techniques: a visualization tool. Ind Eng Chem Res 2007;46(10):3400–9. link1

[48]  Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 2002;24(7):881–92. link1

[49]  Johnson SC. Hierarchical clustering schemes. Psychometrika 1967;32 (3):241–54. link1

[50]  Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 1994;235 (5):1501–31. link1

[51]  Mueller T, Kusne AG, Ramprasad R. Machine learning in materials science: recent progress and emerging applications. Rev Comput Chem 2016;29:186–273. link1

[52]  Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv 2010;4:40–79. link1

[53]  Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence; 1995 Aug 20–25; Montreal, QC, Canada. San Francisco: Morgan Kaufmann Publishers Inc.; 1995. p. 1137–43. link1

[54]  Shao J. Bootstrap model selection. J Am Stat Assoc 1996;91(434):655–65. link1

[55]  Zhai X, Chen M, Lu W. Accelerated search for perovskite materials with higher Curie temperature based on the machine learning methods. Comput Mater Sci 2018;151:41–8. link1

[56]  Mannodi-Kanakkithodi A, Pilania G, Huan TD, Lookman T, Ramprasad R. Machine learning strategy for accelerated design of polymer dielectrics. Sci Rep 2016;6:20952. link1

[57]  Lin MH, Tsai JF, Yu CS. A review of deterministic optimization methods in engineering and management. Math Probl Eng 2012;2012:756023. link1

[58]  Spall JC. Introduction to stochastic search and optimization: estimation, simulation, and control. Hoboken: John Wiley & Sons, Ltd.; 2003. link1

[59]  Breneman CM, Brinson LC, Schadler LS, Natarajan B, Krein M, Wu K, et al. Stalking the materials genome: a data-driven approach to the virtual design of nanostructured polymers. Adv Funct Mater 2013;23(46):5746–52. link1

[60]  Venkatraman V, Alsberg BK. Designing high-refractive index polymers using materials informatics. Polymers 2018;10(1):E103. link1

[61]  Wu K, Sukumar N, Lanzillo NA, Wang C, Ramprasad RR, Ma R, et al. Prediction of polymer properties using infinite chain descriptors (ICD) and machine learning: toward optimized dielectric polymeric materials. J Polym Sci B Polym Phys 2016;54(20):2082–91. link1

[62]  Sukumar N, Krein M, Luo Q, Breneman C. MQSPR modeling in materials informatics: a way to shorten design cycles? J Mater Sci 2012;47(21): 7703–15. link1

[63]  Mannodi-Kanakkithodi A, Chandrasekaran A, Kim C, Huan TD, Pilania G, Botu V, et al. Scoping the polymer genome: a roadmap for rational polymer dielectrics design and beyond. Mater Today 2018;21(7):785–96. link1

[64]  Fernandez M, Woo TK, Wilmer CE, Snurr RQ. Large-scale quantitative structure–property relationship (QSPR) analysis of methane storage in metal-organic frameworks. J Phys Chem C 2013;117(15):7681–9. link1

[65]  Fernandez M, Boyd PG, Daff TD, Aghaji MZ, Woo TK. Rapid and accurate machine learning recognition of high performing metal organic frameworks for CO2 capture. J Phys Chem Lett 2014;5(17):3056–60. link1

[66]  Ohno H, Mukae Y. Machine learning approach for prediction and search: application to methane storage in a metal–organic framework. J Phys Chem C 2016;120(42):23963–8. link1

[67]  Simon CM, Mercado R, Schnell SK, Smit B, Haranczyk M. What are the best materials to separate a xenon/krypton mixture? Chem Mater 2015;27 (12):4459–75. link1

[68]  Fernandez M, Barnard AS. Geometrical properties can predict CO2 and N2 adsorption performance of metal–organic frameworks (MOFs) at low pressure. ACS Comb Sci 2016;18(5):243–52. link1

[69]  Qiao Z, Xu Q, Jiang J. High-throughput computational screening of metal– organic framework membranes for upgrading of natural gas. J Membr Sci 2018;551:47–54. link1

[70]  Huang K, Zhan XL, Chen FQ, Lü DW. Catalyst design for methane oxidative coupling by using artificial neural network and hybrid genetic algorithm. Chem Eng Sci 2003;58(1):81–7. link1

[71]  Baumes L, Farrusseng D, Lengliz M, Mirodatos C. Using artificial neural networks to boost high-throughput discovery in heterogeneous catalysis. QSAR Comb Sci 2004;23(9):767–78. link1

[72]  Baumes LA, Serra JM, Serna P, Corma A. Support vector machines for predictive modeling in heterogeneous catalysis: a comprehensive introduction and overfitting investigation based on two real applications. J Comb Chem 2006;8(4):583–96. link1

[73]  Thornton AW, Winkler DA, Liu MS, Haranczyk M, Kennedy DF. Towards computational design of zeolite catalysts for CO2 reduction. RSC Adv 2015;5 (55):44361–70. link1

[74]  Corma A, Serra JM, Serna P, Moliner M. Integrating high-throughput characterization into combinatorial heterogeneous catalysis: unsupervised construction of quantitative structure/property relationship models. J Catal 2005;232(2):335–41. link1

[75]  Li Z, Ma X, Xin H. Feature engineering of machine-learning chemisorption models for catalyst design. Catal Today 2017;280(Pt 2):232–8. link1

[76]  Li Z, Wang S, Chin WS, Achenie LE, Xin H. High-throughput screening of bimetallic catalysts enabled by machine learning. J Mater Chem A Mater Energy Sustain 2017;5(46):24131–8. link1

[77]  Ulissi ZW, Tang MT, Xiao J, Liu X, Torelli DA, Karamad M, et al. Machinelearning methods enable exhaustive searches for active bimetallic facets and reveal active site motifs for CO2 reduction. ACS Catal 2017;7(10):6600–8. link1

[78]  Astruc D, editor. Nanoparticles and catalysis. Weinheim: Wiley-VCH; 2008. link1

[79]  Fernandez M, Barron H, Barnard AS. Artificial neural network analysis of the catalytic efficiency of platinum nanoparticles. RSC Adv 2017;7(77):48962–71. link1

[80]  Maldonado AG, Rothenberg G. Predictive modeling in homogeneous catalysis: a tutorial. Chem Soc Rev 2010;39(6):1891–902. link1

[81]  Janet JP, Kulik HJ. Predicting electronic structure properties of transition metal complexes with neural networks. Chem Sci 2017;8(7):5137–52. link1

[82]  Fujimura K, Seko A, Koyama Y, Kuwabara A, Kishida I, Shitara K, et al. Accelerated materials design of lithium superionic conductors based on firstprinciples calculations and machine learning algorithms. Adv Energy Mater 2013;3(8):980–5. link1

[83]  Shandiz MA, Gauvin R. Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries. Comput Mater Sci 2016;117:270–8. link1

[84]  Sendek AD, Yang Q, Cubuk ED, Duerloo KA, Cui Y, Reed EJ. Holistic computational structure screening of more than 12000 candidates for solid lithium-ion conductor materials. Energy Environ Sci 2017;10(1):306–20. link1

[85]  Scott DJ, Manos S, Coveney PV. Design of electroceramic materials using artificial neural networks and multiobjective evolutionary algorithms. J Chem Inf Model 2008;48(2):262–73. link1

[86]  Gaultois MW, Oliynyk AO, Mar A, Sparks TD, Mulholland GJ, Meredig B. Perspective: web-based machine learning models for real-time screening of thermoelectric materials properties. APL Mater 2016;4(5):053213. link1

[87]  Nagasawa S, Al-Naamani E, Saeki A. Computer-aided screening of conjugated polymers for organic solar cell: classification by random forest. J Phys Chem Lett 2018;9(10):2639–46. link1

[88]  Yosipof A, Nahum OE, Anderson AY, Barad HN, Zaban A, Senderowitz H. Data mining and machine learning tools for combinatorial material science of alloxide photovoltaic cells. Mol Inform 2015;34(6–7):367–79. link1

[89]  Manser JS, Christians JA, Kamat PV. Intriguing optoelectronic properties of metal halide perovskites. Chem Rev 2016;116(21):12956–3008. link1

Related Research