Toward Next-Generation Heterogeneous Catalysts: Empowering Surface Reactivity Prediction with Machine Learning

Xinyan Liu , Hong-Jie Peng

Engineering ›› 2024, Vol. 39 ›› Issue (8) : 26 -47.

PDF (6470KB)
Engineering ›› 2024, Vol. 39 ›› Issue (8) :26 -47. DOI: 10.1016/j.eng.2023.07.021
Research
Review
Toward Next-Generation Heterogeneous Catalysts: Empowering Surface Reactivity Prediction with Machine Learning
Author information +
History +
PDF (6470KB)

Abstract

Heterogeneous catalysis remains at the core of various bulk chemical manufacturing and energy conversion processes, and its revolution necessitates the hunt for new materials with ideal catalytic activities and economic feasibility. Computational high-throughput screening presents a viable solution to this challenge, as machine learning (ML) has demonstrated its great potential in accelerating such processes by providing satisfactory estimations of surface reactivity with relatively low-cost information. This review focuses on recent progress in applying ML in adsorption energy prediction, which predominantly quantifies the catalytic potential of a solid catalyst. ML models that leverage inputs from different categories and exhibit various levels of complexity are classified and discussed. At the end of the review, an outlook on the current challenges and future opportunities of ML-assisted catalyst screening is supplied. We believe that this review summarizes major achievements in accelerating catalyst discovery through ML and can inspire researchers to further devise novel strategies to accelerate materials design and, ultimately, reshape the chemical industry and energy landscape.

Graphical abstract

Keywords

Machine learning / Heterogeneous catalysis / Chemisorption / Theoretical simulation / Materials design / High-throughput screening

Cite this article

Download citation ▾
Xinyan Liu, Hong-Jie Peng. Toward Next-Generation Heterogeneous Catalysts: Empowering Surface Reactivity Prediction with Machine Learning. Engineering, 2024, 39(8): 26-47 DOI:10.1016/j.eng.2023.07.021

登录浏览全文

4963

注册一个新账户 忘记密码

1. Introduction

Positioned at the heart of the chemical industry, catalytic reactions are involved in the processes of over 80 % of all manufactured products [1]. Among various catalysis scenarios, heterogeneous catalysis using solid catalysts receives exceptional attention due to its high scalability for bulk manufacturing and its outstanding advantages in product separation and catalyst recycling [2,3]. Contemporary industrialized heterogeneous catalytic processes, such as methane reforming [4], ammonia synthesis [5], hydrocarbon cracking [6], and a variety of selective hydrogenation/dehydrogenation reactions [7-10], are mostly thermochemical and usually require high-temperature and/or high-pressure conditions to shift the chemical equilibria and modulate the reaction rates. Moreover, these conventional processes rely heavily on the use of fossil resources as reactants and for energy inputs, as well as on precious metals (e.g., P t , P d , R u, and R h) as catalysts, thereby deviating from the goal of global sustainability [11,12]. Therefore, it is imperative to design new catalytic reactions and processes that are more energetically efficient, environmentally friendly, and economically favorable. Along with the continuous advancement of human civilization, the journey to hunt for such processes and corresponding key materials never ceases.

The rapid development of renewable energy technology, such as photovoltaics, has spurred this journey by enabling large-scale and low-cost "green" electricity generation [11,13]. To better utilize surplus electricity, one of the most well-known initiatives is to replace fossil-fuel-derived "grey" hydrogen with "green" hydrogen, the production of which relies on key technology such as electrochemical water splitting [14,15]. Similar concepts of green electricity-to-chemical energy conversion have also been implemented in carbon dioxide reduction reactions C O 2 R R [16-22] and ammonia electrosynthesis [23-26]. In turn, renewably synthesized hydrogen, carbon-containing fuels, and ammonia are attractive feeds for fuel cells/engines or raw materials for the chemical industry, aiding to close the fossil-resource-free loops of carbon and nitrogen. The rational design of highly efficient and earth-abundant catalytic materials plays a central role in achieving this goal, prior to subsequent reaction engineering and scaling up. Unfortunately, current catalytic materials are still far from satisfactory in terms of efficiency and/or scalability [27-30]. Innovations in next-generation catalyst design are therefore in high demand.

The design, optimization, and further development of novel catalytic materials traditionally rely on Edisonian trial-and-error processes in Fig. 1 (Scheme 1). However, the efficiency of such processes is limited, as it usually takes decades to discover and commercialize a new catalyst. Furthermore, as it is impossible to exhaust all-or even the majority-of the abundant candidate space of both compositions and structures, a more efficient methodology to navigate through this space remains indispensable. In fact, the flourishing of computational methods and theoretical modeling (e.g., density functional theory (DFT) calculations) has enabled another path that can replace tedious experimental exploration in Fig. 1 (Scheme 2) [31-34]. It has been revealed that the surface reaction rates on a solid catalyst can be correlated to the surface bond energies of adsorbed species presented in the reaction network (including transition states (TSs)), which are accessible through state-of-the-art computations [34-36]. Thus, it is possible to conduct "virtual" experiments on computers to assess the catalytic activity of a material by calculating the relevant energies. When reaction rates are reformulated as functions of only one or two descriptor(s), the high-dimensional problem of searching for candidates with desirable catalytic performance can be further collapsed to the hunt for catalysts exhibiting optimal descriptor values, where the descriptor is often a physical or chemical property that can be calculated or measured [36]. This so-called descriptor-based approach opens up new possibilities for the high-throughput computational screening of undiscovered catalysts. Among various electronic and geometric descriptors, the adsorption energies of surface species are frequently adopted, as ① they can be obtained via computations and ② the calculation results can be verified through accurate calorimetric experiments [37]. More importantly, the adsorption-energy-based activity map can be viewed as a quantitative implementation of the classical Sabatier principle, providing a rational understanding of trends in heterogeneous catalysis [38]. Although establishing activity maps helps to expedite the discovery of novel catalysts, acquiring energetic descriptors through modeling is still computationally demanding on a large scale, especially considering the enormous compositional and/or structural heterogeneities when searching for multicomponent and/or multisite catalytic materials. To explore the vast material space for heterogeneous catalyst screening, it is therefore vital to develop ways to obtain surface adsorption strengths more efficiently and effectively.

Over the past decades, the rapid development of computer science and artificial intelligence (AI), along with the establishment of comprehensive databases, has enabled numerous possibilities for applying AI in chemistry and materials sciences for experiments, characterizations, and modeling [39-54]. Incorporating advanced machine learning (ML) models in catalyst design and screening makes it possible to directly predict the surface reactivity from fewer or less computationally expensive properties, with huge potential for improvements in cost and accuracy in Fig. 1 (Scheme 3). Consequently, the acceleration of the entire screening process can be envisioned. In addition, unveiling hidden patterns and correlations through ML offers alternative opportunities to further our physical understanding of catalytic systems and obtain fresh perspectives on catalyst design [55]. In this case, the application of ML for adsorption energy prediction and high-throughput catalyst screening, while still in its infancy, has already demonstrated its huge potential in enabling a paradigm shift in the discovery of new materials for emerging catalytic processes. Thus, summarizing the latest advances in ML-empowered high-throughput catalyst screening and proposing promising directions remains beneficial and necessary for future research.

Unlike the existing reviews covering the many aspects of ML application in catalysis research [56-66], this review has a particular focus on the data-driven prediction of adsorption energies, as the surface reactivity dominantly quantifies the catalytic potential of a solid catalyst. In addition, we highlight efforts to combine ML models with experimental exploration. In this review, we first categorize ML models according to the inputs adopted-namely, a b initio or non-ab initio features-and discuss related research progress in two consecutive sections. In each section, works targeting systems with different levels of complexity are summarized, along with the physical understandings these works might supply. Next, ML-guided experimental catalyst discovery is showcased, based on either the ML model’s predictive power or interpretable insights. Finally, we provide an outlook on current challenges and future opportunities in ML-assisted catalyst screening. As this is a focus review, we do not discuss the general principles and common models of ML, or the application of ML in other aspects of catalysis research such as high-throughput experimentation and ML-accelerated theoretical modeling; detailed information on these topics is already available in other reviews [57-61].

2. ML with a b initio features

2.1. Features based on calculated adsorption energies

2.1.1. Adsorption scaling relations

As discussed in the previous section, the idea of high-throughput screening can be realized, with the availability of material descriptors (e.g., adsorption energy and electronic structure) through a b initio calculations and the proposal of the descriptor-based approach. The main idea of this approach is the dimension reduction brought by so-called scaling relations, which projects reaction energetics onto a few properties [35,36,38]. In brief, it has been found that the adsorption energies of different adsorbates that bind to the surface through the same atom(s) tend to scale with each other, usually in a linear fashion. The foundation of this scaling relation lies in the d-band model initially proposed by Hammer and Nørskov [67] to explain the noblest properties of gold among the transition metals, which is now a well-established and widely recognized quantitative theory for catalysis after years of continuous research and development [68]. In the d-band theory, the chemisorption abilities of transition metal surfaces can be well described by the energy distribution of the d-band of the corresponding metal surfaces, which is mostly quantified using the average energy of the band-namely, the d-band center. The adsorption of similar species on these surfaces therefore tends to correlate when the species’ adsorption energies rely only on the adsorbate valence and metallic d-band properties. Given the ubiquity of chemical bond formation between transition metal sites and adsorbates in heterogeneous catalysis, such a relation has been found to hold across a broad range of materials.

In a foundational work, Abild-Pedersen et al. [69] found that the adsorption energies of hydrogen-containing molecules, A H x, correlated linearly with the adsorption energy of atom A ( A = C , N , O, and S). The mean absolute error (MAE) was reported to be only 0.13 e V when such linear relations were applied to describe the adsorption strengths of hydrogenated species over a range of pure metals. The successful prediction of the adsorption energies of hydrogenated species based on their atomic counterparts simplifies the estimation of the reaction energies of dehydrogenation and hydrogenation reactions, and can also be established in other more sophisticated reactions. For example, Chowdhury et al. [70] investigated the adsorption energies of surface species involved in the decarboxylation and decarbonylation of propionic acid over eight flat monometallic transition-metal surfaces (the (111) surfaces of N i , P t , P d , R u , R h , R e , C u, and A g). They found that multivariate linear scaling relations with a combination of descriptors (i.e., the adsorption energies of C H C H C O * , O H *, and C *, where * refers to the adsorbed species) yielded exceptionally accurate results, with a MAE of 0.12 e V, which could not be outperformed by any other nonlinear models. It is only when the training dataset is incomplete (i.e., contains a random subset of adsorption energies) that kernel-based nonlinear ML models start to become superior. Although this comparison accentuates the effectiveness of linear scaling relations in rationalizing a complete and large dataset, it also points out the inadequacy of linear models in predicting adsorption energies from a limited dataset.

While scaling relations are generally an effective and efficient way to largely reduce the reaction intermediate space to a few descriptors, several challenges remain when applying scaling relations in high-throughput catalyst screening. First, scaling relations usually only apply to similar adsorbates that bind through the same atom, with accuracies limited to around 0.1-0.2 eV. Second, stemming from the d-band theory, scaling relations work quite well for pure and alloyed transition metals; however, although a variety of scaling relations have been successfully established for inorganic compounds such as oxides, some of these apply only to limited systems with specificities in either composition or crystal structure [71,72]. Third, for complex reactions involving large organic molecules (e.g., alkanes containing more than three carbon atoms), the possibilities of single descriptors or descriptor pairs start to explode, interfering with the determination of a good catalyst, as the as-constructed activity maps are highly dependent on the chosen descriptor(s). For example, Wang et al. [73] showcased the importance of descriptor engineering with a selective propane dehydrogenation reaction to propylene. They found that the adoption of both C H 3 C H C H 2 * and C H 3 C H 2 C H * bindings as descriptors not only resulted in an overall MAE lower than 0.09 e V for all scaling relations but also enabled the greatest differentiation of elemental metals. Nevertheless, the use of such an approach to determine descriptors often requires the input of external knowledge (e.g., C H 3 C H C H 2 * as the selectivity-determining species in the above showcased reaction). Developing strategies that do not rely on significant domain input or human intuition is therefore highly desirable.

2.1.2. Improving scaling relations through ML

Given the challenges outlined above, numerous efforts have been devoted to improving scaling relations. In this regard, Mamun et al. [74] proposed a Bayesian framework to extend the single descriptor linear scaling relation to a multi-descriptor linear regression model. Bayesian information criteria (BIC) were adopted as the model evidence to select the best model, providing a statistical rationalization of the descriptor selection regarding how many and which descriptors should be employed to yield the best bias-variance trade-off (Fig. 2(a)). In an attempt to further improve the prediction accuracy, the researchers also leveraged Gaussian process regression (GPR) to predict the residual of the selected model (i.e., residual learning; Fig. 2(b)). When applied to the (111) or (100) facet of 2035 binary alloy materials in their A 1, L 1 0, and L 1 2 Strukturbericht designation and six typical hydrogen-containing adsorbates (C H * , C H 2 * , C H 3 * , O H * , N H *, and S H *), the as-devised framework demonstrated an impressive performance, with a test MAE of 0.1 e V, which is very comparable with standard DFT error. This is a promising example of how ML can improve model fidelity and yield more accurate adsorption energy predictions than conventional linear scaling relations.

Similarly, García-Muelas and López [75] reported the application of a statistical principle component analysis (PCA) and principle component regression (PCR) model to the DFT-computed adsorption strengths of 71 C 1 - C 2 species on 12 close-packed metal surfaces (Cu, Ag, Au, Ni, Pd, Pt, Rh, Ir, Ru, Os, Zn, and Cd). As a common method for dimension reduction in unsupervised learning, PCA revealed that the majority of the thermochemistry of a given metal can be sufficiently estimated with two principal components (PCs) constructed from the formation energies of three predictors O * , O H *, and C C H O H *. One component presents the affinity of a metal to form covalent bonds with an intermediate, while the other describes the ionicity of the metal-adsorbate bond (Figs. 2(c) and (d)). The inclusion of the second component was found to be the key in extending the adsorbate thermochemistry predictions on transition metals to beyond conventional d-band theory, especially for adsorbates or metals with almost-filled valence shells or d-bands. A later PCR further confirmed this finding, exhibiting an MAE of 0.12 e V on the validation set. This model was also applied to single-atom and near-surface alloy systems. With a minimum of DFT energy evaluations (around 1800), a full set of 31000 formation energies were predicted with high accuracy M A E = 0.19 e V. The high predictive power of statistical learning based on PCA/PCR was thereby demonstrated.

2.1.3. Estimating activation energies through ML

The theoretical justification of estimating the activity of a solid catalyst through its adsorption energies relies on the existence of the Brønsted-Evans-Polanyi (BEP) relation, which states that the activation energy of an elementary step is positively correlated with its reaction energy [76]. There are cases, however, in which the linear BEP relations fail to capture the catalytic trend [77- 79]. Thus, it remains more desirable yet challenging to directly predict activation energies and assess the influence brought by other parameters besides reaction energy. Based on an open-access database, CatApp [80], which contains a set of DFT-calculated reaction energies and activation energies for a large number of elementary steps on single-crystal metal surfaces, including those with low symmetry such as stepped (211) surfaces, Takahashi and Miyazato [81] attempted to implement ML algorithms in conventional BEP relations in order to improve the accuracy in predicting activation energies. In addition to reaction energies, other features describing the catalyst, surface plane, reactants, and product were considered in nonlinear models such as random forest and support vector regression, resulting in better accuracy than linear models.

Similarly, Artrith et al. [82] demonstrated an MAE of 0.20 eV (lower than an MAE of 0.35 e V through BEP approximation) when predicting the TS energies of various C - C and C - O scission steps involved in ethanol reforming, using a set of a b initio (e.g., reaction energies) and non-ab initio (e.g., electronegativity and nearest-neighbor distance of chemical species) features in the ML model. The TS energies predicted in this model were further adopted as features in a second model based on a smaller experimental database, enabling the direct prediction of ethanol reforming activity/ selectivity without the need to know detailed reaction mechanisms or establish theoretical activity/selectivity volcano maps. These works provide methods for the rapid estimation of activation energy, although their transferability to catalysts beyond transition metals/alloys and to reactions beyond thermochemical reactions requires further demonstration.

In sum, this subsection focused on improvements upon the traditional linear scaling relations that have been extensively relied on in conventional catalyst screening. One obvious advantage of the approaches discussed above lies in their physical rationality, as the theoretical foundation of linear scaling relations is fairly solid. However, these approaches all utilize features related to adsorption energies, which require DFT relaxations and are expensive to obtain. Furthermore, the adsorption energy is already an overall reflection of many geometric and electronic structural factors, whose contributions are challenging to understand and disentangle from a fundamental perspective. Therefore, it is still desirable to incorporate features that are formulated directly from the material electronic structure for adsorption energy prediction, which is discussed in the next subsection.

2.2. Features based on calculated electronic structure properties

Aside from the adsorption energies of some basic species, the a b initio electronic structure properties can also be calculated and employed as informative features for the ML-enabled estimation of adsorption energies. In this section, we discuss works that leverage electronic structure features, which not only present stronger potential for generalization but could also lead to physical understandings of specific heterogeneous catalytic processes.

2.2.1. Formulated electronic structure properties

The incorporation of domain knowledge, such as the d-band theory, can help researchers identify and formulate suitable electronic structure properties as feature inputs. Along this line, the M a et al. [83] and Li et al. [84] evaluated several characteristics of the d-band distribution and the local Pauling electronegativity, which reflects the delocalized sp-states, as features in neural network (NN) models to predict C O * binding energies on (100)- and (111)-terminated multi-metallic alloys for C O 2 R R catalyst screening (Fig. 3(a)). The root-mean-square errors (RMSEs) for the predictions were approximately 0.1 - 0.2 e V, depending on the surface models. Similarly, with a target space holding various C - , N -, and O-containing adsorbates over different facets ((100), (111), and (211) of 11 transition metals with an face-centered cubic (fcc) bulk structure, including Co, Rh, Ir, Ni, Pd, Pt, Ru, Os, Cu, Ag, and Au), Praveen and Comas-Vives [85] devised a single ML model capable of predicting the adsorption strengths of multiple adsorbates simultaneously. With features related to the properties of the active sites, the elements involved in direct bonding, and electronic structure properties obtained from DFT calculations of free adsorbates and clean metal surfaces, the researchers trained an extreme gradient boosting (XGBoost) regressor that remained effective for adsorption energy prediction, with MAEs for the training and testing sets of 0.074 and 0.174 e V, respectively.

A more important aspect of leveraging electronic features in ML-based adsorption energy prediction is to assist in the identification of the most influential features. Understanding why these features are important can prevent researchers from taking ML-based analyses at face value and allow for the identification of the principle factors determining surface catalytic chemistry, as well as potential ways to tailor better catalysts. The study by Praveen and Comas-Vives [85] mentioned above suggests that the most important features are electronic properties, primarily from the adsorbate and then from the metal, according to their feature importance analysis. Aside from feature importance ranking, a Bayesian learning approach (called Bayeschem) has been proposed to bridge the complexity of electronic descriptors [86]. Built upon the well-established d-band theory and a Newns-Anderson-type Hamiltonian for capturing the essential physics of chemisorption processes, a model optimized with pristine transition-metal data demonstrated impressive prediction accuracies 0.1 - 0.2 e V and uncertainty quantifications for adsorbates such as O * and O H * at a diverse range of atomically tailored metal sites. More importantly, insights into the orbital-wise nature of chemical bonding at adsorption sites with d-state characteristics ranging from bulk-like semi-elliptic bands to free-atom-like discrete energy levels can be naturally drawn from the model.

Beyond pure metallic systems, ML methods have also been found to be efficient in describing the reactivity of metal compound catalysts. For example, Göltl et al. [87] adopted an ML genetic algorithm (GA) to analyze the correlation between various DFT-calculated electronic structure properties and C O * / N O * adsorption strengths on transition metal sites (Cu, Ni, Co, and Fe) in zeolites (SSZ-13 and mordenite). Through this analysis, the position of the s orbital, the number of valence electrons of the active site, and the highest occupied molecular orbital (HOMO)-lowest unoccupied molecular orbital (LUMO) gap of the adsorbate were found to be the most important electronic descriptors. Moreover, this work pointed out the importance of capturing site reconstruction in adsorption prediction. Similarly, molecular-orbital-based analysis was performed to quantify the interactions between a variety of small molecules and the surfaces of group 13 metal oxides [88]. The HOMO energies of the adsorbates and the surface energies of the oxide surfaces were identified as two major factors governing the solid-adsorbate interactions in such systems.

The application of ML-based predictive models has also been extended to the screening of single-atom catalysts (SACs) [89-91]. In this regard, Chen et al. [92] constructed a comprehensive dataset comprising 1060 atomically dispersed metal/nonmetal co-doped graphene systems as model carbon-supported SACs for C O 2 R R, as well as an ML model based on XGBoost and simple features, revealing that the Pauling electronegativity and covalent radius of central metal atoms are more important features than the metal d-electron number. These understandings obtained for zeolites, oxides, or SACs are generally quite different from those gained from transition metals, highlighting the great opportunities to leverage ML to disclose unique catalytic chemistry beyond transition metals.

In addition to the identification of the main factors affecting the interactions between adsorbates and surfaces, ML models exhibit the capability to construct new descriptors from explicit expressions of these influential factors. For example, Andersen et al. [93] proposed so-called "data-driven" descriptors, whose predictive power was shown to extend over a wide range of adsorbates, multi-metallic transition metal surfaces, and facets. Identified using the recently developed compressed sensing method sure independence screening and sparsifying operator (SISSO), the descriptors are expressed as nonlinear functions of the intrinsic properties of the clean catalyst surface, including the coordination numbers and d-band moments (Fig. 3(b)). The good agreement between DFT-calculated and SISSO-predicted adsorption strengths demonstrates the effectiveness of new descriptors over scaling relations, as well as the possibility of extending them to broader material spaces.

2.2.2. Raw electronic structure properties

While the aforementioned works adopt statistical features computed from electronic structure properties such as the d-band center or width, it is also possible to construct frameworks that directly digest raw electronic structural data such as the density of states (DOS). For example, Fung et al. [94] leveraged the DOS of catalytic surfaces for adsorption prediction, using the same dataset reported by Mamun et al. [74]. Unlike the previous work by Mamun et al. [74], Fung et al. [94] additionally computed the DOS of the surfaces. A convolutional neural network (CNN) model, which has been widely utilized in image processing and characterization, was adopted to automatically extract information from the raw DOS data without the need for external knowledge (Fig. 4(a)), yielding a low test MAE on the order of 0.1 e V. In addition, with the incorporation of domain knowledge, the as-devised model (referred to as "DOSnet") supplied physically meaningful guidance through occlusion sensitivity analyses, by which the energetic responses to perturbations on electronic structures could be well estimated. This CNN-aided framework can thus potentially accelerate the discovery of new catalysts by enabling the exploration of an electronic structure space without adsorption energy calculations. As only a single calculation is required for each catalytic surface, DOSnet will exhibit even greater potential in computational savings and high-throughput screening when investigating surfaces containing a large quantity of unique adsorption sites (e.g., high-entropy alloy (HEA) surfaces).

In an attempt to obtain more interpretable features and descriptors, further engineering of DOS can be performed. For example, an automated framework was proposed to obtain accurate and interpretable descriptors of chemical activity for metal alloys and oxides using unsupervised ML (Fig. 4(b)) [95]. PCA was first adopted to identify a lower dimension basis of the DOS matrix, which consisted of PC descriptors. Models leveraging different features-namely, the traditional electronic descriptors, the full DOS, and 10 PC descriptors with top scores-were compared for C * , O * , N *, and H * adsorption energy predictions on layered alloys; the PC-based models exhibited the most accurate results, with RMSEs smaller than those of the other two models by a factor of about two. In addition to prediction accuracies, this model is endowed with physical interpretability via the signal reconstruction of electronic-structure patterns captured by PC descriptors; thus, it provides suggestions on potential design motifs for future catalysts and establishes a link between the material’s geometric and catalytic properties.

The importance and indispensable role of electronic structure-related features in adsorption prediction is clearly demonstrated by the works discussed above. In addition to providing great predictive power, these features make nontrivial contributions to the model interpretability, through which fundamental understandings of the most influential electronic structural factors can be acquired and consequent objective catalyst design can be further enabled. However, the computational burden is a major concern in these approaches, as obtaining ab initio features can be expensive, especially in large systems. Realizing accurate adsorption prediction with only non-ab initio features is more appealing, in this sense. Such approaches are discussed in the following section.

3. ML with non-ab initio features

The central role of electronic structures in determining adsorbate-surface interactions makes it natural to include related features for adsorption energy predictions. However, acquiring these features often requires ab initio calculations, especially for unexplored new materials that cannot be found in existing databases. The resulting increase of the computational cost is obviously undesirable, especially given the aim for high-throughput screening in a material space with unlimited possibilities of crystallographic orientations, surface compositions, and binding sites (e.g., HEAs and high-entropy metal compounds). Therefore, there has been a strong tendency to realize adsorption predictions using only low-cost features that do not require new ab initio calculations. For example, Toyao et al. [96] were the first to adopt 12 readily available elemental properties (EPs; e.g., surface energy, melting point, and group in the periodic table) as features in ML models for predicting the adsorption energies of C H 4 -related species C H 3 * , C H 2 *, C H * , C *, and H *) on copper (Cu)-based alloys, realizing decent accuracy with MAEs < 0.3 e V. Once non-ab initio features are further rationally engineered to yield better model performance, we can anticipate a boost in new catalyst discovery, as time-consuming DFT calculations will no longer be heavily relied on.

3.1. Physically inspired non-ab initio features

The implication of well-established theory in a predictive model is a general strategy when engineering simple, non-ab initio features with physical rationality. Aiming to predict C O * binding energies on alloys, Noh et al. [97] proposed a framework leveraging active learning (AL) and kernel ridge regression. More specifically, they adopted the d-band width calculated from linear muffin-tin orbital (LMTO) theory to account for the local coordination environment and the geometric mean of electronegativity to describe adsorbate renormalization. Demonstrated mostly on the (100) facets of subsurface alloy systems in an fcc bulk structure (Fig. 5(a)), the automated framework yields an impressive prediction MAE of only 0.05 e V when only adopting LMTO-derived features, which instills confidence in applying this model to screen for ideal subsurface alloys to catalyze C O 2 R R (Fig. 5(b)).

Leveraging tree-based models, Esterhuizen et al. [98] proposed a generalized additive model (iGAM) to investigate perturbations brought by strain or the ligand effect (Figs. 5(c)-(e)). The chemisorption of species representative of both electron-rich O H * and C l * and electron-poorer O * and S * adsorbates on the (111) facets of subsurface metal alloys were focused on. Aside from its superior predictive capabilities (in general, with training RMSEs < 0.032 e V and testing RMSEs < 0.065 e V), the iGAM model can provide further information, as it forces the model fit through construction to be a linear combination of different functions, where each function is only dependent on one feature of interest. In this case, the chemisorption strength was found to be impacted by three crucial site-related features: the strain in the surface layer, the number of d-electrons in the ligand metal, and the size of the ligand atom.

Other than the manually selected features, new features can be constructed through ML. For example, the SISSO method was found to be effective in assembling initial features whose values are readily available in existing databases into new combinations, thereby either enlarging the feature space for chemisorption prediction on different metal alloys [99] or deriving more accurate descriptors for Pt-based oxygen reduction reaction (ORR) catalysts [100]. Insights on the critical physical concepts that control the chemisorption process on metal surfaces can also be further extracted.

In the above work, the features were mostly formulated using known theories or domain knowledge. However, an inverse approach can be used based on previous theoretical models. For example, based on a unified empirical model [101] that correlates adsorption strength with a few electronic structure parameters including the d-band center, the number of p electrons, and the matrix coupling element between the adsorbate and the metal states, Montemore et al. [102] first predicted these parameters using ML and then derived the adsorption energies of a broad range of species (C, N, O, OH, H, S, K, and F) on flat metal and alloy surfaces with the predicted parameters as inputs to the empirical model, achieving an MAE of 0.29 e V. Given the large ranges of the adsorbates and surfaces in this study, this model can be deemed to be general and reusable. Nevertheless, through a comparison between the two approaches, we note that these physically inspired models may present the dilemma of lower model accuracy or less generalizability, and such a balance often depends on how well the established theory works with the target chemical space.

3.2. Enhanced representation of surfaces and molecules

The works described above mostly focus on a single or a few adsorption sites, along with simple adsorbates. This might be sufficient for describing the activities of simple flat facets such as (111) and (100), which exhibit relatively high symmetry. However, as has been well established in many catalytic reactions, stepped-like surfaces are much more reactive and make major contributions to the overall activities [103,104]. Modeling catalytic reactions on these surfaces presents greater challenges, due to the broken surface symmetry and the resulting increase in surface heterogeneity. To accommodate various possible binding sites, traditional screening typically relies on the introduction of geometric descriptors [105,106] or the establishment of multiple site-specific activity maps [107,108]. On the other hand, emerging catalytic applications such as biomass [109,110] and plastic valorization [111,112] often require the description of interactions between large molecules and catalytic surfaces. Explicitly obtaining either the site-specific structure-activity relationships or the surface adsorption/reaction energetics involving large molecules adds up to a heavy computational burden. In this regard, ML is extremely suitable for overcoming this hurdle, once the enhanced representation of complex surfaces, molecules, or catalytic systems under more realistic conditions is implemented.

3.2.1. Enhanced representation of complex surfaces

As mentioned above, a prediction on stepped alloy surfaces serves as an example of a scenario in which the increased structural diversity of the catalytic surfaces must be considered. This scenario can be rather simple if the host metal remains unchanged, such as when predicting H * adsorption on stepped silver (Ag) alloys. An ML model yielded an MAE as low as 0.014 e V while only using non-ab initio features relative to the dopant atoms, without deliberate consideration of local geometric variations [113]. However, ML cannot work well with appropriate surface representation if the alloy composition is more variable. Saxena et al. [114] compared several ML models in predicting C * and O * binding energies on the (211) surfaces of A 3 B alloys with some common non-ab initio feature inputs, obtaining RMSEs of 0.31-0.38 eV depending on the surface termination and the adsorbate. However, the vast number of site possibilities on a (211) surface were not considered, leading to a prediction accuracy that was incomparable with those of the aforementioned models on simpler surfaces. Taking a step further, our group focused on the (211) surfaces of binary L 1 2 -type alloys across 37 common metal and metalloid elements with site-specific binding configurations, generating a rich library of site motifs and yielding a comprehensive dataset containing about 2000 adsorption energies [115]. With the inclusion of only low cost, non-ab initio features encoding both the electronic structure properties and the coordinate-based geometric information of the surface sites, our models demonstrated satisfactory prediction accuracies, with test MAEs of 0.14 and 0.18 e V for C * and O * binding, respectively. Furthermore, interpretable physical insights could be extracted from the feature importance distributions and Kullback-Leibler divergence analysis, showing the most probable structural and compositional characteristics of an ideal alloy catalyst for a specific reaction. The proposed models were further validated through DFT calculations and microkinetics modeling, with low-temperature methanol synthesis as a test reaction and a C u 3 P d alloy as a promising candidate identified by ML. In principle, due to its simplicity, the use of this model as a rapid screening tool prior to any detailed theoretical or experimental investigations is readily applicable to other reactions that are well described by C * and O * binding strengths. Other coordinate-based geometric representations, such as the generalized coordination number, have also been found to be effective in improving the prediction accuracy of ML models based on non-ab initio electronic structure features [116-118].

The above examples tend to focus on a system consisting of only one or two elements; however, it is also beneficial to realize effective adsorption evaluation across a broader spectrum of elements. Thus, prediction on HEA surfaces serves as another example of a scenario in which compositional heterogeneity plays an interesting role. For example, Batchelor et al. [119] explored HEAs composed of five elements (Ir, Pd, Pt, Rh, and Ru) as candidate catalysts for ORR, in which the adsorption strengths of O * and O H * were targeted. The researchers constructed a very simple linear model that leveraged parameterizations based solely on the nearest-neighbor compositions to the binding sites. Three and five types of atomic zones in (111)-type HEAs were classified for O H * and O * adsorption, respectively (Fig. 6(a)). By adopting the adsorption energies on a random subset of available binding sites as the training set, the model exhibited impressive prediction accuracy, with RMSEs of 0.063 and 0.076 e V for O H * and O * adsorption, respectively, on other possible sites. More importantly, the as-developed model was then applied to optimize the HEA composition, offering a design platform for the discovery of novel alloys by promoting sites with exceptional catalytic activities (Fig. 6(b)).

A similar concept of site representation was adopted for screening bimetallic or HEA catalysts for either C O 2 hydrogenation to methanol [120] or the hydrogen evolution reaction (HER) [121]. The use of distance-based descriptors as an alternative to the nearest-neighbor information was found to contribute to the accurate prediction of H * adsorption on multi-metallic surfaces [122]. Nevertheless, the prediction of multi-metallic or HEA catalysts is mainly limited to (111) or (100) model surfaces at present. Accurate predictive models capable of encompassing both structural and compositional variations (e.g., HEA catalysts with non-ideal flat surfaces) are still lacking and require future development.

The coordinate-based representation method further enables the AL-based fully automated theoretical framework to guide the DFT calculations of desirable energetic descriptors, as demonstrated by Tran and Ulissi [123]. More specifically, these researchers proposed a fingerprinting method to represent the adsorption site numerically (Fig. 6(c)). This method describes each element type coordinated with the adsorbates using a vector of four numbers: the atomic number; the Pauling electronegativity; the number of atoms of the element coordinated with the adsorbate, as determined by the Voronoi tessellation; and the median adsorption energy between the adsorbate and the pure element Δ E. Having enumerated all possible binding sites over 1499 different intermetallic combinations across 31 elements, the researchers were able to identify 54 candidates with surfaces having near-optimal C O * binding for electrochemical C O 2 R R and 102 candidates with ideal H * binding for the HER (Figs. 6(d) and (e)). The prediction MAEs were reported to be 0.29 and 0.24 e V for C O * and H *, respectively. This proposed framework is a successful example of combining flexibility, automation, and ML guidance to enable holistic analyses across numerous adsorption sites, surfaces, and material spaces and the consequent acceleration of theoretical discovery. It should be noted that, although the AL framework basically adopted non-ab initio features (except for Δ E), additional DFT calculations were iteratively performed to verify the prediction and generate new DFT data for model retraining.

Compared with coordinate-based methods, graph-based deep learning (DL) methods have advantages in high-level feature representations [124]. With the same dataset as that used in Ref. [123], Back et al. [125] demonstrated lower MAEs of 0.15 eV with CNNs that were built on top of the graph representation and used only initial structures as inputs. Even more impressive prediction accuracies (i.e., test MAEs of 0.116 and 0.085 e V for C O * and H * binding, respectively) were achieved with an ensemble of crystal graph CNNs (CGCNNs) and a labeling method representing the binding site atoms of the unrelaxed bare surface geometry [126]. The site labeling method (Fig. 7(a)) enables the complete removal of DFT-based surface relaxation by generating unrelaxed surface structures from relaxed bulk structures that are computationally cheaper or even readily available in open-sourced databases such as Ref. [127]. In principle, such a universal method can be applied to any DL-based adsorption prediction model without modification. These works demonstrate that the combination of a novel site description method and advanced ML algorithms provides a viable solution for the high-throughput prediction of complex catalytic surfaces, significantly extending the searching space from single-crystal model catalysts to more practical ones.

When combined with different ML methods or modules, graph-based representations also provide a promising strategy for increasing the interpretability of features extracted from electronic structure properties such as DOS. Wang et al. [128] directly infused the famous d-band theory into DL, obtaining a framework capable of suppling physical insights from learned data by design. This so-called theory-infused NN (TinNet) approach contains two sequential components: a convolutional-NN-based regression module that encodes the atomic and electronic structural information from the raw data; and a theory module that takes outputs from the regression module and predicts the adsorption properties of a metal site (Fig. 7(b)). The effectiveness of TinNet was demonstrated with representative simple adsorbates such as O H * and O *. With an MAE of 0.118 e V, the prediction performance was among the best in comparison with existing models or algorithms such as GPR [74], Bayeschem [86], DOSnet [94], and CGCNN. In addition to having a prediction performance on par with purely data-driven ML methods, TinNet allows for the decomposition of d-contributed adsorption energy into Pauli repulsion and orbital hybridization, a detailed analysis of which sheds light on potential paths to tailor novel motifs with desired catalytic properties.

3.2.2. Enhanced representation of complex molecules

Since the interaction between surfaces and molecules plays a central role in heterogeneous catalysis, the numbers of both possible adsorption configurations and possible reaction pathways increase drastically when the target reactions involve larger molecules. Thus, the explicit calculation of all adsorption energies can be very resource- and time-consuming. As has been well-established and demonstrated in general molecular ML for organic synthesis or drug discovery, many molecular representation methods have been directly implemented in predictive ML models for catalysis [129-133]. For example, Li et al. [134] compared different combinations of methods, including EP [96] and Coulomb matrix [129] representations for surfaces, as well as extended connectivity fingerprint (ECFP) [130], spectral London Axilrod-Teller-Muto (SLATM) [131], and bags-of-bonds (BOB) [132] representations for adsorbates, and found that the EP + SLATM combination yielded the lowest MAE of approximately 0.18 e V for 68 adsorbates on four low-index metal facets (Cu(111), Pt(111), Pd(111), and Ru(0001)). The researchers further extended the simple surfaces to broader transition metal/alloy surfaces and made a change in various representation methods [123,126,133] by replacing the atomic number with the elemental group and periods, thereby achieving an MAE of about 0.05 e V for H * binding prediction and MAEs of about 0.1 e V for other strong binding adsorbates (C * , N * , O *, and S * [135]. Using molecular fingerprints based on simplified molecular input line entry system (SMILES) notation (Fig. 8(a)) [136,137], Chowd-hury et al. [137] constructed multiple filter-based NN models to extrapolate from a C 4 dataset to a C 2 / C 3 dataset on P t 111, where C 2 - C 4 refer to species made up of two to four carbon atoms. The SMILES-based representation was demonstrated to lower the extrapolation MAE by approximately 20 % compared with coordinate-based ones. Similar feature engineering has also helped to predict and compare the adsorption energies of ring and chain species on metal surfaces [138]. Both works demonstrate the effectiveness of SMILES notation in encoding complex molecular structures in predictive ML models.

Similar to surface representation, graph-based methods enable enhanced and efficient molecular representation due to their conveniently readable and extendable data structure. For example, various graph-based methods such as graph NN (GNN) have been employed to represent up to 315 C 1 / C 2 surface intermediates and TSs on Rh(111) for syngas-to-ethanol conversion [139]. The best RMSE and MAE for adsorption energy prediction were found to be 0.19 and 0.15 e V, respectively, and the error for activation energy prediction was lower than those of conventional BEP relations. Very recently, the superiority of GNN in representing complex molecules was substantiated by Pablo-García et al. [140], who demonstrated the construction of a well-balanced chemically diverse dataset and a new GNN architecture called graph-based adsorption on a metal energy (GAME)-neural network (Net) (Fig. 8(b)). Their dataset is very comprehensive, containing closed-shell C 1 - 4 molecules with functional groups including N, O , S, and C 6 - 10 aromatic rings (3315 entries). The optimal adsorption configuration and position of all the molecules were explored through DFT calculations after extensive sampling. Only the lowest energy configurations were included in the dataset. A molecule adsorbed on a closed-pack metal surface was further represented as an integral graph to train GAME-Net, consisting of fully connected layers, convolutional layers, and a pooling layer. The strong predictive power of GAME-Net was demonstrated by a low MAE of 0.18 e V on the test set and six orders of magnitude less time consumed compared with DFT. The model could even be directly adopted to predict larger plastic and biomass molecules with up to 30 heteroatoms, which were not presented in the initial dataset for training, yielding an MAE of 0.016 eV per atom that showed the model’s promising accuracy. Although this model still has a few limitations, such as the requirement of highly symmetric surfaces (i.e., only close-packed pure metal is considered) and neglect of lateral effects, the simplicity and generality of this model make it a useful tool for the fast screening of catalytic materials for unique applications that cannot be easily simulated by traditional methods such as DFT.

3.2.3. Enhanced representation of catalytic systems under more realistic conditions

While the above works focus on model catalytic systems such as single-crystal surfaces with low coverage of adsorbates, efforts to leverage ML in order to better describe and predict more practical catalytic systems also benefit from enhanced representation. For example, the importance of accurate surface representation is further demonstrated by the prediction of practical catalytic materials beyond single-crystal model surfaces, such as nanoparticles (NPs) and small clusters. With a focus on describing the catalytic NO decomposition performance of RhAu alloy NPs (Fig. 9(a)), Jin-nouchi and Asahi [141] proposed a universal ML scheme to investigate reaction activities based on local atomic configurations. To evaluate the structural similarities, the researchers adopted a so-called smooth overlap atomic position (SOAP) similarity kernel, which consists of overlap integrals between three-dimensional (3D) atomic distributions within a cutoff radius from different surface sites. The success of this model demonstrates the fact that the adsorbate binding is rather local and the prediction accuracy can be systematically improved by increasing the number of DFT data to cover all possible local structures. Similar conclusions were drawn when a research group combined SOAP descriptors with ML models to predict H * adsorption on a variety of M o S 2 and Cu-Au nanoclusters [142].

Advanced local structure representation can then be assembled using various global structure generation methods into ML pipelines for predicting structurally diverse practical catalytic systems. Chen et al. [143] devised an NN model to identify the active sites on gold (Au) NPs and dealloyed A u 3 F e NPs for C O 2 R R to C O. The researchers focused on a performance indicator called the a -value, which can be expressed as a = Δ E C O - 1.4423 Δ E H O C O, where Δ E C O and Δ E H O C O represent the adsorption energy of C O and the surface carboxyl H O C O *, respectively. Both energies can be obtained by means of quantum mechanics (QM). Using a developed force field for reactive systems called ReaxFF [144], the researchers first constructed a 10 n m Au NP, which contained more than 10000 surfaces sites. Then, features based on the interatomic distances between the Au atoms were leveraged to describe the extremely irregular and disordered Au surfaces, with RMSEs of approximately 0.05 and 0.06 e V for the Δ E C O and Δ E H O C O predictions, respectively. The catalytic activity of the whole surface was further mapped to illustrate the desirable site geometries of the NPs (Fig. 9(b)) and guide the design of high-performance electro-catalysts for C O 2 R R. A similar ML-QM-ReaxFF framework was applied to study C O 2 R R on Au NPs while considering solvation effects and roughened C u surfaces, demonstrating the good versatility of this strategy [145,146].

Different site representation and initial structure generation methods can be considered to further modify the workflow. By leveraging the fingerprint labeling method [126], Gu et al. [147] integrated the force field, DFT, ML, and kinetic Monte Carlo in an end-to-end multiscale simulation framework to elucidate the alkaline HER kinetics of jagged platinum (Pt) nanowires. This framework not only achieved a high prediction accuracy for H * adsorption energies, with an MAE < 0.05 e V, but also offered insights into the autobifunctional alkaline HER mechanism. It also suggested structure motifs of highly active Pt catalysts for alkaline HER. Similarly focusing on HER catalysts but with an amorphous system, Zhang et al. [148] adopted a GA optimization method implemented in the universal structure predictor evolutionary Xtallography code to obtain over 600 amorphous surface structures of N i 2 P. Non-ab initio features relying only on the local chemical environment were utilized to predict the frozen adsorption energies of H *, with an RMSE < 0.1 e V. However, we note that the H * adsorption energy consists of a frozen term and a relaxation term. The prediction of the latter, which accounts for the energy change upon site and surface deformation, still requires ab initio features, in accordance with prior discussions on the zeolite system [87].

Another aspect of practical catalytic complexity stems from lateral effects such as adsorbate-adsorbate interactions and solvation. Explicitly accounting for these effects in a b initio simulations, however, is often extremely computationally demanding. For example, to identify the most optimal binding configuration on a surface at high coverages normally requires the enumeration of all possible binding configurations and then acquiring the energy of each configuration using DFT calculations. The exploration of such a large space of atomistic configurations could take orders of magnitude more time than a single calculation at a low coverage. To address this challenge, the Greeley group developed an ML-based surrogate model, named the adsorbate chemical environment-based-graph convolution neural network (ACE-GCN), to replace expensive DFT calculations in determining the atomistic configurations of high-coverage catalytic surfaces (Fig. 9(c)) [149]. This model was based on the SurfGraph algorithm, which allows for the conversion of atomistic configurations to undirected graph representations [150]. The graph representations were further split into subgraphs for featurization and model training. This splitting into subgraphs is the key in explicitly accounting for the local environment of the adsorbate so that subtle atomistic interplay such as adsorbate-adsorbate interaction can be accurately captured. Illustrated by O H * adsorption on a stepped Pt(221) surface, the ACE-GCN not only enabled the use of a mixed training dataset (high-coverage data obtained on both P t 221 and Pt(100) surfaces) to improve the model’s reliability in ranking the most likely adsorption configurations but also successfully identified energetically favorable and unfavorable high-coverage (corresponding to 1 / 2 monolayer) O H * adsorption configurations on Pt(221) with 96% fewer DFT relaxations (Fig. 9(d)).

The rigorous description of catalytic systems embracing both the complexities originating from nanostructured catalysts and realistic reaction conditions is rarely reported, except for a very recent study by Cao and Mueller [151], who adopted a machine-learned cluster expansion method to map ORR activity on Pt-Ni alloy nanoparticles. Nevertheless, it is definitely a promising direction to accelerate the in situ theoretical description of practical catalytic systems using ML and advanced representation methods.

4. ML-guided experimental catalyst discovery

An accurate estimation of the adsorbate binding strength helps lay the foundation for efficient high-throughput catalyst screening and catalyst design, the effectiveness of which-of course-still requires experimental validation. In this section, we present a few examples of the successful development of highly active catalysts under ML guidance to further demonstrate the significance of ML methods in accelerating experimental catalyst discovery.

For example, Zhong et al. [152] adopted the AL framework discussed above [123] to investigate CO adsorption strengths on alloy surfaces. Based on insights obtained from a scaling-derived volcano map, which indicated that the optimal C O binding for C O 2 R R should be around - 0.67 e V [107], the researchers examined a wide range of alloys to identify the ideal catalysts that exhibit adsorption strengths around that value. As illustrated by its t - distributed stochastic neighbor embedding (t-SNE) diagram [153], the Cu-Al alloy presents multiple sites and surface orientations with near-optimal CO binding, demonstrating its great potential for efficient and selective C O 2 R R catalysis. This was later confirmed with a synthesized C u - A l catalyst, which efficiently reduces C O 2 to ethylene with the highest reported Faradaic efficiency of over 80 %. Similarly, ML has been verified to be effective in designing alloy catalysts for nitrogen-related chemistries such as ammonia oxidation. For example, adopting the aforementioned TinNet framework [128], Pillai et al. [154] explored the immense design space of ternary Pt alloy nanostructures (Figs. 10(a) and (b)). With a training dataset of a b initio data, concurrent predictions of site reactivity, surface stability, and catalyst synthesizabil-ity descriptors can be realized. An AL workflow showed P t 3 R u - M (M = Fe, Co, or Ni) alloys to be promising iridium (Ir)-free candidates, and their catalytic potential was confirmed by the corresponding experimentally synthesized nanocubes, which exhibited higher activities than state-of-the-art Pt catalysts and its bimetallic alloy counterparts (Figs. 10(c) and (d)). The great potential of ML in guiding and accelerating the experimental exploration of catalysts in a vast chemical space such as that of a multi-metallic system was thereby established.

In addition to its use in high-throughput screening, ML’s attractive capability to supply valuable physical insights for experimental catalyst design has been established. Along this line, Zhai et al. [155] devised an NN model correlating the ORR activity of perovskite oxides to nine ionic descriptors including the ionic Lewis acid strength (ISA) on A- and B-sites, which was later confirmed to be the most influential feature according to the feature importance ranking. Tuning the ISAs of perovskites is therefore suggested as a viable approach for optimizing perovskites’ ORR activity. Experimental characterization has revealed that decreased A-site and increased B-site ISAs can considerably improve the surface exchange kinetics of perovskite oxides. Based on this premise, four perovskite oxides were synthesized, whose superior catalytic performance substantiated the effectiveness of ML-derived catalyst design principles. Similarly, machine-learned insights through Bayeschem [86] were found to be effective in discovering novel catalysts for the electrochemical nitrate reduction reaction N O 3 R R that break the adsorption-energy scaling limitations posed by conventional catalysts [156]. More specifically, Bayeschem was used to determine that the non-scaling behavior originated from site-specific Pauli repulsion interactions of the metal d-states with the adsorbate frontier orbitals and could be realized on (100)-type sites, where * N and * N O 3 exhibited different orbital overlap degrees with subsurface metal atoms. As a result, tuning the subsurface elements in ordered B2 intermetallics became a rational strategy to optimize the N O 3 R R performance. This strategy was further verified by synthesizing and testing monodisperse ordered B2 CuPd nanocubes with (100)-like surface orientations, which displayed a high Faradaic efficiency of 92.5 % for N O 3 R R to ammonia and improved ammonia yield rates more than C u or P d. This success in translating machine-learned insights into rational experimental catalyst design principles sheds light on ML-guided new catalyst discovery aside from direct computational high-throughput screening.

5. Summary and outlook

The search for efficient catalysts for the next-generation chemical industry will continue to be a research hotspot for decades to come. As a rising field that is still in its infancy, ML-aided surface reactivity evaluation has already demonstrated its huge potential to enable a paradigm shift in high-throughput catalyst screening. Considering the progress that has already been achieved, we point out two major propellants (Fig. 11) in the development of ML models for adsorption energy prediction:

(1) The construction and curation of datasets. Rather than generating a completely new set of training data points from scratch, many works leverage datasets from previous papers or public data repositories to devise novel models for binding strength prediction. For example, the datasets reported in Refs. [74,84,93,123] have been widely adopted in other works, which present fresh perspectives by tackling these published data from a different angle. Public data repositories such as CatApp [80] and Catalysis-Hub.org [157] maintained by the SUNCAT center at the Stanford Linear Accelerator Center (SLAC) have also been frequently used. The reuse of the same dataset for the demonstration of different ML models enables objective performance comparison, where the establishment of appropriate benchmarks encourages the development of more accurate and robust models. With the aim of constructing extensive datasets for heterogeneous catalysis, Fundamental AI Research at Meta AI (originally Facebook AI) and Carnegie Mellon University’s Department of Chemical Engineering launched the Open Catalyst (OC) project in 2020. Its original dataset, OC2020, consists of 1.28 million DFT relaxations (2̃60 million single-point evaluations), spanning across 55 elements, 82 adsorbates, and unary/binary/ternary inorganic materials [158]. The release of such a large-scale dataset is undoubtedly beneficial in attracting broader interests and gathering the research community together to address open challenges in developing generalizable ML models for catalysis discovery [159].

(2) The implementation and improvement of matter representation. As demonstrated in Section 3.2, ML model accuracy is largely dependent on an appropriate representation of surfaces and molecules, whose role becomes even more predominant when modeling the catalytic activities of structurally or compositionally complex systems such as nanoparticles and HEAs. Given the ubiquity of site diversity that results from likely catalyst reconstruction under realistic conditions, it is therefore crucial to rationalize and optimize matter representation. DL-based approaches have recently exhibited great potential in sophisticated matter representation [124-126,140,150,160]. Their representations are more expressive than hand-crafted ones and are expected to be compatible with large-scale datasets, as revealed by a comparative study on the OC2020 dataset [159].

Despite the impressive achievements that have been made so far, accessing adsorption strengths directly through ML still presents the following nontrivial challenges (Fig. 11):

(1) Generalizability. As many previous works have mostly focused on systems based on specific chemistries and material compositions (e.g., predominantly metal alloys) with limited demonstration of their generalizability, it remains a "holy grail" task in this field to develop a universal model that can operate across the abundant space of materials and molecular adsorbates. Similar to AI/ML model optimization in other fields, a model’s predictive capability generally improves as the amount of data increases. Unfortunately, this improvement is not as simple and scalable. As revealed by the OC team [158] using current baseline models, the scaling between the dataset size and model performance is more difficult for catalysis datasets than for datasets of organic small molecules and inorganic materials. Innovations in ML models are therefore greatly needed to overcome this hurdle.

(2) Efficiency. Given access to large-scale datasets, the next task is to enhance model efficiency. This usually relies on the utilization of low-cost features (e.g., using only the graphic information of initial atomistic structures, as in OC2020 tasks [158]) and the improvement of prediction accuracy. As the ultimate goal is to identify materials with desirable properties within an almost unlimited candidate space, the adoption of computationally costly information is not preferable. On the other hand, the prediction accuracy of ML models remains essential, since inadequate results eventually lead to a waste of time and resources, which diminishes the goal of accelerated material screening. Unfortunately, reducing the cost and improving the accuracy often result in a dilemma, as demonstrated by the comparison between models using ab initio and non-ab initio features. It is therefore vital to carefully and delicately balance these two demands.

(3) Complexity. Despite the desirable efforts that have been made to predict adsorption energies for species involved in complicated reaction networks or on complex catalytic surfaces, training datasets are mostly obtained on idealized surfaces with simple assumptions such as a high vacuum, low adsorbate coverage, and single surface species. These approximations, however, can be too crude and may deviate substantially from the actual reaction conditions, especially for the electrocatalytic reactions used in a wide swath of future clean-energy-related applications. In addition to some common complexities introduced by, for example, species co-adsorption or adsorbate-adsorbate interaction [107,161], these electrocatalytic reactions embrace additional complications stemming from the inherent electrochemical interfaces, which can lead to profound solvation and charge separation effects [162-164]. The prediction results of ML models will not be as useful and impactful if these complexities cannot be well captured, despite the potentially satisfactory prediction accuracies such models might be able to achieve [149].

(4) Reliability. The energetic data in most current databases are obtained through generalized gradient approximation (GGA)-level DFT computation. Consequently, the accuracies of ML models built upon these data are also restricted to such a level. More sophisticated methods such as meta-GGA or hybrid functionals are capable of supplying more reliable results, but they usually induce an enormous computation burden at the same time, making it impractical to construct datasets with these methods. In addition, some systems-such as those with spin polarization or strong electron correlation (e.g., magnetic 3D metal oxides)-require the delicate tuning of DFT parameters to yield physically sensible results, presenting another hurdle in the formulation of large-scale datasets. For example, the OC2020 dataset simply considers no spin polarization for all systems [158]. This inconsistency in computational methods introduces additional uncertainties when adopting databases from different sources. The uncertainty quantification, in this case, remains necessary. Developing reliable methods to accelerate high-precision DFT simulations or to provide accurate DFT surrogates is another valuable direction, in which ML has already demonstrated its great potential [165-169]. A discussion on this aspect, however, lies beyond the scope of this review.

(5) Interpretability. Improving a model’s interpretability helps to better exploit its predictive power. Other than merely obtaining a few promising candidates, it is also of paramount significance to acquire fresh understandings and new principles to aid in the design of better catalysts through objective optimization. Most previous works have adopted pure data-driven approaches, which yield impressively low prediction errors but provide limited interpretability. Post-training analysis is therefore a common yet effective way to extract more physical insights from such models. Alternatively, it is even more ideal to intentionally weave mechanistic understandings into the ML framework, in which case the physical rationality of the model can be automatically ensured and the model’s interpretability will come naturally. More importantly, merging interpretability into ML models can help to partially address the reliability concern, as experts can try to rationalize the derived interpretations and compare them with known physics [55].

We note that the above challenges can be highly entangled, and that there might not be a single ideal ML model capable of overcoming all obstacles simultaneously. Alternatively, we envision a hierarchical workflow to leverage multiple ML models with unique superiorities in different aspects, while the overall mission of high-throughput screening could be decomposed into a sequential task consisting of steps with different requirements for accuracy, complexity, and scalability. For example, pure data-driven ML models can first be employed to rapidly navigate through the vast material space with simple assumptions and compromised prediction accuracies. Given appropriate uncertainty quantification, it would still be possible to locate the subspace enclosing possible promising candidates. Next, highly reliable prediction and knowledge extraction could be enabled by focusing on this specific subspace while utilizing ML models that accommodate smaller datasets, leverage more accurate computational methods, compile more realistic approximations, and exhibit greater interpretability. Finally, the obtained physical insights could be further applied to reexamine the entire material space in an attempt to search for potential missing candidates that align well with the extracted patterns. In sum, despite the many challenges presented by the application of ML for surface reactivity prediction and high-throughput catalyst screening, we believe that this remains an extremely promising field with great potential to improve computational science, accelerate materials design, and ultimately reshape the future chemical industry and energy landscape.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (22109020 and 22109082).

Compliance with ethics guidelines

Xinyan Liu and Hong-Jie Peng declare that they have no conflict of interest or financial conflicts to disclose.

References

[1]

C.R. Catlow, M. Davidson, C. Hardacre, G.J. Hutchings. Catalysis making the world a better place. Philos Trans R Soc A Eng Sci, 374 (2061) (2016), p. 20150089.

[2]

R. Schlögl. Heterogeneous catalysis. Angew Chem Int Ed Engl, 54 (11) (2015), pp. 3465-3520.

[3]

A.Q. Wang, J. Li, T. Zhang. Heterogeneous single-atom catalysis. Nat Rev Chem, 2 (6) (2018), pp. 65-81.

[4]

J.R. Rostrup-Nielsen, J. Sehested, J.K. Nørskov. Hydrogen and synthesis gas by steam- and CO2 reforming. Adv Catal, 47 (2002), pp. 65-139.

[5]

Q.R. Wang, J.P. Guo, P. Chen. Recent progress towards mild-condition ammonia synthesis. J Energy Chem, 36 (2019), pp. 25-36.

[6]

E.T.C. Vogt, B.M. Weckhuysen. Fluid catalytic cracking: recent developments on the grand old lady of zeolite catalysis. Chem Soc Rev, 44 (20) (2015), pp. 7342-7370.

[7]

X. Jiang, X. Nie, X. Guo, C. Song, J.G.G. Chen. Recent advances in carbon dioxide hydrogenation to methanol via heterogeneous catalysis. Chem Rev, 120 (15) (2020), pp. 7984-8034.

[8]

K. Tomishige, Y. Nakagawa, M. Tamura. Taming heterogeneous rhenium catalysis for the production of biomass-derived chemicals. Chin Chem Lett, 31 (5) (2020), pp. 1071-1077.

[9]

P. Schwach, X. Pan, X. Bao. Direct conversion of methane to value-added chemicals over heterogeneous catalysts: challenges and prospects. Chem Rev, 117 (13) (2017), pp. 8497-8520.

[10]

Y. Dai, X. Gao, Q. Wang, X. Wan, C. Zhou, Y. Yang. Recent progress in heterogeneous metal and metal oxide catalysts for direct dehydrogenation of ethane and propane. Chem Soc Rev, 50 (9) (2021), pp. 5590-5630.

[11]

Z.W. Seh, J. Kibsgaard, C.F. Dickens, I. Chorkendorff, J.K. Nørskov, T.F. Jaramillo. Combining theory and experiment in electrocatalysis: insights into materials design. Science, 355 (6321) (2017), Article eaad4998.

[12]

R.M. Bullock, J.G.G. Chen, L. Gagliardi, P.J. Chirik, O.K. Farha, C.H. Hendon, et al. Using nature’s blueprint to expand catalysis with Earth-abundant metals. Science, 369 (6505) (2020), Article eabc3183.

[13]

S. Chu, Y. Cui, N. Liu. The path towards sustainable energy. Nat Mater, 16 (1) (2016), pp. 16-22.

[14]

P. Nikolaidis, A. Poullikkas. A comparative overview of hydrogen production processes. Renew Sustain Energy Rev, 67 (2017), pp. 597-611.

[15]

M.F. Lagadec, A. Grimaud. Water electrolysers with closed and open electrochemical systems. Nat Mater, 19 (11) (2020), pp. 1140-1150.

[16]

L. Zhang, Z.J. Zhao, J. Gong. Nanostructured materials for heterogeneous electrocatalytic CO2 reduction and their related reaction mechanisms. Angew Chem Int Ed Engl, 56 (38) (2017), pp. 11326-11353.

[17]

D.F. Gao, R.M. Aran-Ais, H.S. Jeon, C.B. Roldan. Rational catalyst and electrolyte design for CO2 electroreduction towards multicarbon products. Nat Catal, 2 (3) (2019), pp. 198-210.

[18]

S. Nitopi, E. Bertheussen, S.B. Scott, X. Liu, A.K. Engstfeld, S. Horch, et al. Progress and perspectives of electrochemical CO2 reduction on copper in aqueous electrolyte. Chem Rev, 119 (12) (2019), pp. 7610-7672.

[19]

M.B. Ross, P. De Luna, Y.F. Li, C.T. Dinh, D. Kim, P. Yang, et al. Designing materials for electrochemical carbon dioxide recycling. Nat Catal, 2 (8) (2019), pp. 648-658.

[20]

X.Y. Liu, B.Q. Li, B. Ni, L. Wang, H.J. Peng. A perspective on the electrocatalytic conversion of carbon dioxide to methanol with metallomacrocyclic catalysts. J Energy Chem, 64 (2022), pp. 263-275.

[21]

Z. Zhu, Z. Li, J. Wang, R. Li, H. Chen, Y. Li, et al. Improving NiNx and pyridinic N active sites with space-confined pyrolysis for effective CO2 electroreduction. eScience, 2 (4) (2022), pp. 445-452.

[22]

Z.Q. Gao, J.J. Li, Z.C. Zhang, W.P. Hu. Recent advances in carbon-based materials for electrochemical CO2 reduction reaction. Chin Chem Lett, 33 (5) (2022), pp. 2270-2280.

[23]

J.G. Chen, R.M. Crooks, L.C. Seefeldt, K.L. Bren, R.M. Bullock, M.Y. Darensbourg, et al. Beyond fossil fuel-driven nitrogen transformations. Science, 360 (6391) (2018), Article eaar6611.

[24]

B.H.R. Suryanto, H.L. Du, D.B. Wang, J. Chen, A.N. Simonov, D.R. MacFarlane. Challenges and prospects in the catalysis of electroreduction of nitrogen to ammonia. Nat Catal, 2 (4) (2019), pp. 290-296.

[25]

S.Z. Andersen, V. Čolić, S. Yang, J.A. Schwalbe, A.C. Nielander, J.M. McEnaney, et al. A rigorous electrochemical ammonia synthesis protocol with quantitative isotope measurements. Nature, 570 (7762) (2019), pp. 504-508.

[26]

X.Y. Cui, C. Tang, Q. Zhang. A review of electrocatalytic reduction of dinitrogen to ammonia under ambient conditions. Adv Energy Mater, 8 (22) (2018), Article 1800369.

[27]

Y. Jiao, Y. Zheng, M. Jaroniec, S.Z. Qiao. Design of electrocatalysts for oxygen- and hydrogen-involving energy conversion reactions. Chem Soc Rev, 44 (8) (2015), pp. 2060-2086.

[28]

C.C.L. McCrory, S. Jung, I.M. Ferrer, S.M. Chatman, J.C. Peters, T.F. Jaramillo. Benchmarking hydrogen evolving reaction and oxygen evolving reaction electrocatalysts for solar water splitting devices. J Am Chem Soc, 137 (13) (2015), pp. 4347-4357.

[29]

M. Shao, Q. Chang, J.P. Dodelet, R. Chenitz. Recent advances in electrocatalysts for oxygen reduction reaction. Chem Rev, 116 (6) (2016), pp. 3594-3657.

[30]

J. Kibsgaard, I. Chorkendorff. Considerations for the scaling-up of water splitting catalysts. Nat Energy, 4 (6) (2019), pp. 430-433.

[31]

J.K. Nørskov, T. Bligaard, J. Rossmeisl, C.H. Christensen. Towards the computational design of solid catalysts. Nat Chem, 1 (1) (2009), pp. 37-46.

[32]

A. Bruix, J.T. Margraf, M. Andersen, K. Reuter. First-principles-based multiscale modelling of heterogeneous catalysis. Nat Catal, 2 (8) (2019), pp. 659-670.

[33]

B.W.J. Chen, L. Xu, M. Mavrikakis. Computational methods in heterogeneous catalysis. Chem Rev, 121 (2) (2021), pp. 1007-1048.

[34]

A.H. Motagamwala, J.A. Dumesic. Microkinetic modeling: a tool for rational catalyst design. Chem Rev, 121 (2) (2021), pp. 1049-1076.

[35]

J. Greeley. Theoretical heterogeneous catalysis: scaling relationships and computational catalyst design. Annu Rev Chem Biomol Eng, 7 (1) (2016), pp. 605-635.

[36]

Z.J. Zhao, S.H. Liu, S.J. Zha, D.F. Cheng, F. Studt, G. Henkelman, et al. Theory-guided design of catalytic materials using scaling relationships and reactivity descriptors. Nat Rev Mater, 4 (12) (2019), pp. 792-804.

[37]

C.T. Campbell. Energies of adsorbed catalytic intermediates on transition metal surfaces: calorimetric measurements and benchmarks for theory. Acc Chem Res, 52 (4) (2019), pp. 984-993.

[38]

A.J. Medford, A. Vojvodic, J.S. Hummelshoj, J. Voss, F. Abild-Pedersen, F. Studt, et al. From the Sabatier principle to a predictive theory of transition-metal heterogeneous catalysis. J Catal, 328 (2015), pp. 36-42.

[39]

K.T. Butler, D.W. Davies, H. Cartwright, O. Isayev, A. Walsh. Machine learning for molecular and materials science. Nature, 559 (7715) (2018), pp. 547-555.

[40]

V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571 (7763) (2019), pp. 95-98.

[41]

T. Zhou, Z. Song, K. Sundmacher. Big data creates new opportunities for materials research: a review on methods and applications of machine learning for materials design. Engineering, 5 (6) (2019), pp. 1017-1026.

[42]

A. Chen, X. Zhang, Z. Zhou. Machine learning: accelerating materials development for energy storage and conversion. InfoMat, 2 (3) (2020), pp. 553-576.

[43]

Y. Liu, B.R. Guo, X.X. Zou, Y.J. Li, S.Q. Shi. Machine learning assisted materials design and discovery for rechargeable batteries. Energy Storage Mater, 31 (2020), pp. 434-450.

[44]

X. Chen, X. Liu, X. Shen, Q. Zhang. Applying machine learning to rechargeable batteries: from the microscale to the macroscale. Angew Chem Int Ed, 60 (46) (2021), pp. 24354-24366.

[45]

J.Z. Li, X.B. Huang, P. Pianetta, Y.J. Liu. Machine-and-data intelligence for synchrotron science. Nat Rev Phys, 3 (12) (2021), pp. 766-768.

[46]

S. Xu, J. Li, P. Cai, X. Liu, B. Liu, X. Wang. Self-improving photosensitizer discovery system via Bayesian search with first-principle simulations. J Am Chem Soc, 143 (47) (2021), pp. 19769-19777.

[47]

S.N. Li, Y.J. Liu, D. Chen, Y. Jiang, Z.W. Nie, F. Pan. Encoding the atomic structure for machine learning in materials science. Wiley Interdiscip Rev Comput Mol Sci, 12 (1) (2022), p. e1558.

[48]

T. Lombardo, M. Duquesnoy, H. El-Bouysidy, F. Årén, A. Gallo-Bueno, P.B. Jørgensen, et al. Artificial intelligence applied to battery research: hype or reality>. Chem Rev, 122 (12) (2022), pp. 10899-10969.

[49]

X.Y. Liu, X.Q. Zhang, X. Chen, G.L. Zhu, C. Yan, J.Q. Huang, et al. A generalizable, data-driven online approach to forecast capacity degradation trajectory of lithium batteries. J Energy Chem, 68 (2022), pp. 548-555.

[50]

M. Lin, J. Xiong, M. Su, F. Wang, X. Liu, Y. Hou, et al. A machine learning protocol for revealing ion transport mechanisms from dynamic NMR shifts in paramagnetic battery materials. Chem Sci, 13 (26) (2022), pp. 7863-7872.

[51]

X. Wang, S. Jiang, W. Hu, S. Ye, T. Wang, F. Wu, et al. Quantitatively determining surface-adsorbate properties from vibrational spectroscopy with interpretable machine learning. J Am Chem Soc, 144 (35) (2022), pp. 16069-16076.

[52]

J.C.A. Oliveira, J. Frey, S.Q. Zhang, L.C. Xu, X. Li, S.W. Li, et al. When machine learning meets molecular synthesis. Trends Chem, 4 (10) (2022), pp. 863-885.

[53]

X. Liu, H.J. Peng, B.Q. Li, X. Chen, Z. Li, J.Q. Huang, et al. Untangling degradation chemistries of lithium-sulfur batteries through interpretable hybrid machine learning. Angew Chem Int Ed Engl, 61 (48) (2022), p. e202214037.

[54]

Z.P. Yao, Y.W. Lum, A. Johnston, L.M. Mejia-Mendoza, X. Zhou, Y.G. Wen, et al. Machine learning for a sustainable energy future. Nat Rev Mater, 8 (3) (2022), pp. 202-215.

[55]

J.A. Esterhuizen, B.R. Goldsmith, S. Linic. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat Catal, 5 (3) (2022), pp. 175-184.

[56]

A.J. Medford, M.R. Kunz, S.M. Ewing, T. Borders, R. Fushimi. Extracting knowledge from data through catalysis informatics. ACS Catal, 8 (8) (2018), pp. 7403-7429.

[57]

P.S. Lamoureux, K.T. Winther, J.A.G. Torres, V. Streibel, M. Zhao, M. Bajdich, et al. Machine learning for computational heterogeneous catalysis. ChemCatChem, 11 (16) (2019), pp. 3581-3601.

[58]

T. Toyao, Z. Maeno, S. Takakusagi, T. Kamachi, I. Takigawa, K. Shimizu. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal, 10 (3) (2020), pp. 2260-2297.

[59]

G.H. Gu, C. Choi, Y. Lee, A.B. Situmorang, J. Noh, Y.H. Kim, et al. Progress in computational and machine-learning methods for heterogeneous small-molecule activation. Adv Mater, 32 (35) (2020), p. 1907865.

[60]

S.C. Ma, Z.P. Liu. Machine learning for atomic simulation and activity prediction in heterogeneous catalysis: current status and future. ACS Catal, 10 (22) (2020), pp. 13213-13226.

[61]

J. Xu, X.M. Cao, P. Hu. Perspective on computational reaction prediction using machine learning methods in heterogeneous catalysis. Phys Chem Chem Phys, 23 (19) (2021), pp. 11155-11179.

[62]

L.T. Chen, X. Zhang, A. Chen, S. Yao, X. Hu, Z. Zhou. Targeted design of advanced electrocatalysts by machine learning. Chin J Catal, 43 (1) (2022), pp. 11-32.

[63]

L. Cao. Recent advances in the application of machine-learning algorithms to predict adsorption energies. Trends Chem, 4 (4) (2022), pp. 347-360.

[64]

H. Li, Y. Jiao, K. Davey, S.Z. Qiao. Data-driven machine learning for understanding surface structures of heterogeneous catalysts. Angew Chem Int Ed, 62 (9) (2023), Article e202216383.

[65]

T.Y. Mou, H.S. Pillai, S.W. Wang, M.Y. Wan, X. Han, N.M. Schweitzer, et al. Bridging the complexity gap in computational heterogeneous catalysis with machine learning. Nat Catal, 6 (2) (2023), pp. 122-136.

[66]

H. Yang, Z.Q. He, M.D. Zhang, X.J. Tan, K. Sun, H.Y. Liu, et al. Reshaping the material research paradigm of electrochemical energy storage and conversion by machine learning. EcoMat, 5 (5) (2023), p. e12330.

[67]

B. Hammer, J.K. Nørskov. Why gold is the noblest of all the metals. Nature, 376 (6537) (1995), pp. 238-240.

[68]

J.K. Nørskov, F. Studt, F. Abild-Pedersen, T. Bligaard. Fundamental concepts in heterogeneous catalysis. John Wiley & Sons, Inc., Hoboken (2014).

[69]

F. Abild-Pedersen, J. Greeley, F. Studt, J. Rossmeisl, T.R. Munter, P.G. Moses, et al. Scaling properties of adsorption energies for hydrogen-containing molecules on transition-metal surfaces. Phys Rev Lett, 99 (1) (2007), Article 016105.

[70]

A.J. Chowdhury, W.Q. Yang, E. Walker, O. Mamun, A. Heyden, G.A. Terejanu. Prediction of adsorption energies for chemical species on metal catalyst surfaces using machine learning. J Phys Chem C, 122 (49) (2018), pp. 28142-28150.

[71]

I.C. Man, H.Y. Su, F. Calle-Vallejo, H.A. Hansen, J.I. Martinez, N.G. Inoglu, et al. Universality in oxygen evolution electrocatalysis on oxide surfaces. ChemCatChem, 3 (7) (2011), pp. 1159-1165.

[72]

A.A. Latimer, A.R. Kulkarni, H. Aljama, J.H. Montoya, J.S. Yoo, C. Tsai, et al. Understanding trends in C-H bond activation in heterogeneous catalysis. Nat Mater, 16 (2) (2017), pp. 225-229.

[73]

T. Wang, X.J. Cui, K.T. Winther, F. Abild-Pedersen, T. Bligaard, J.K. Nørskov. Theory-aided discovery of metallic catalysts for selective propane dehydrogenation to propylene. ACS Catal, 11 (10) (2021), pp. 6290-6297.

[74]

O. Mamun, K.T. Winther, J.R. Boes, T. Bligaard. A Bayesian framework for adsorption energy prediction on bimetallic alloy catalysts. npj Comput Mater, 6 (1) (2020), p. 177.

[75]

R. García-Muelas, N. López. Statistical learning goes beyond the d-band model providing the thermochemistry of adsorbates on transition metals. Nat Commun, 10 (1) (2019), p. 4687.

[76]

T. Bligaard, J.K. Nørskov, S. Dahl, J. Matthiesen, C.H. Christensen, J. Sehested. The Bronsted-Evans-Polanyi relation and the volcano curve in heterogeneous catalysis. J Catal, 224 (1) (2004), pp. 206-217.

[77]

L. Yu, F. Abild-Pedersen. Bond order conservation strategies in catalysis applied to the NH3 decomposition reaction. ACS Catal, 7 (1) (2017), pp. 864-871.

[78]

H.J. Peng, M.T. Tang, X.Y. Liu, P. Schlexer Lamoureux, M. Bajdich, F. Abild-Pedersen. The role of atomic carbon in directing electrochemical CO2 reduction to multicarbon products. Energy Environ Sci, 14 (1) (2021), pp. 473-482.

[79]

Y.L. Cheng, C.T. Hsieh, Y.S. Ho, M.H. Shen, T.H. Chao, M.J. Cheng. Examination of the Brønsted-Evans-Polanyi relationship for the hydrogen evolution reaction on transition metals based on constant electrode potential density functional theory. Phys Chem Chem Phys, 24 (4) (2022), pp. 2476-2481.

[80]

J.S. Hummelshøj, F. Abild-Pedersen, F. Studt, T. Bligaard, J.K. Nørskov. CatApp: a web application for surface chemistry and heterogeneous catalysis. Angew Chem Int Ed Engl, 51 (1) (2012), pp. 272-274.

[81]

K. Takahashi, I. Miyazato. Rapid estimation of activation energy in heterogeneous catalytic reactions via machine learning. J Comput Chem, 39 (28) (2018), pp. 2405-2408.

[82]

N. Artrith, Z.X. Lin, J.G. Chen. Predicting the activity and selectivity of bimetallic metal catalysts for ethanol reforming using machine learning. ACS Catal, 10 (16) (2020), pp. 9438-9444.

[83]

X. Ma, Z. Li, L.E.K. Achenie, H. Xin. Machine-learning-augmented chemisorption model for CO2 electroreduction catalyst screening. J Phys Chem Lett, 6 (18) (2015), pp. 3528-3533.

[84]

Z. Li, S.W. Wang, W.S. Chin, L.E. Achenie, H.L. Xin. High-throughput screening of bimetallic catalysts enabled by machine learning. J Mater Chem A, 5 (46) (2017), pp. 24131-24138.

[85]

C.S. Praveen, A. Comas-Vives. Design of an accurate machine learning algorithm to predict the binding energies of several adsorbates on multiple sites of metal surfaces. ChemCatChem, 12 (18) (2020), pp. 4611-4617.

[86]

S. Wang, H.S. Pillai, H. Xin. Bayesian learning of chemisorption for bridging the complexity of electronic descriptors. Nat Commun, 11 (1) (2020), p. 6132.

[87]

F. Göltl, P. Muller, P. Uchupalanun, P. Sautet, I. Hermans. Developing a descriptor-based approach for CO and NO adsorption strength to transition metal sites in zeolites. Chem Mater, 29 (15) (2017), pp. 6434-6444.

[88]

C. Liu, Y.X. Li, M. Takao, T. Toyao, Z. Maeno, T. Kamachi, et al. Frontier molecular orbital based analysis of solid-adsorbate interactions over group 13 metal oxide surfaces. J Phys Chem C, 124 (28) (2020), pp. 15355-15365.

[89]

M.V. Jyothirmai, D. Roshini, B.M. Abraham, J.K. Singh. Accelerating the discovery of g-C3N4-supported single atom catalysts for hydrogen evolution reaction: a combined DFT and machine learning strategy. ACS Appl Energy Mater, 6 (10) (2023), pp. 5598-5606.

[90]

T.Y. Liu, X. Zhao, X.F. Liu, W.J. Xiao, Z.J. Luo, W.T. Wang, et al. Understanding the hydrogen evolution reaction activity of doped single-atom catalysts on two-dimensional GaPS4 by DFT and machine learning. J Energy Chem, 81 (2023), pp. 93-100.

[91]

H. Sun, Y.Z. Li, L.Y. Gao, M.Y. Chang, X.R. Jin, B.Y. Li, et al. High throughput screening of single atomic catalysts with optimized local structures for the electrochemical oxygen reduction by machine learning. J Energy Chem, 81 (2023), pp. 349-357.

[92]

A. Chen, X. Zhang, L.T. Chen, S. Yao, Z. Zhou. A machine learning model on simple features for CO2 reduction electrocatalysts. J Phys Chem C, 124 (41) (2020), pp. 22471-22478.

[93]

M. Andersen, S.V. Levchenko, M. Scheffler, K. Reuter. Beyond scaling relations for the description of catalytic materials. ACS Catal, 9 (4) (2019), pp. 2752-2759.

[94]

V. Fung, G. Hu, P. Ganesh, B.G. Sumpter. Machine learned features from density of states for accurate adsorption energy prediction. Nat Commun, 12 (1) (2021), p. 88.

[95]

J.A. Esterhuizen, B.R. Goldsmith, S. Linic. Uncovering electronic and geometric descriptors of chemical activity for metal alloys and oxides using unsupervised machine learning. Chem Catal, 1 (4) (2021), pp. 923-940.

[96]

T. Toyao, K. Suzuki, S. Kikuchi, S. Takakusagi, K. Shimizu, I. Takigawa. Toward effective utilization of methane: machine learning prediction of adsorption energies on metal alloys. J Phys Chem C, 122 (15) (2018), pp. 8315-8326.

[97]

J. Noh, S. Back, J. Kim, Y. Jung. Active learning with non-ab initio input features toward efficient CO2 reduction catalysts. Chem Sci, 9 (23) (2018), pp. 5152-5159. View article.

[98]

J.A. Esterhuizen, B.R. Goldsmith, S. Linic. Theory-guided machine learning finds geometric structure-property relationships for chemisorption on subsurface alloys. Chem, 6 (11) (2020), pp. 3100-3117.

[99]

T.R. Wang, J.C. Li, W. Shu, S.L. Hu, R.H. Ouyang, W.X. Li. Machine-learning adsorption on binary alloy surfaces for catalyst screening. Chin J Chem Phys, 33 (6) (2020), pp. 703-711.

[100]

X. Zhang, Z. Wang, A.M. Lawan, J.H. Wang, C.Y. Hsieh, C.R. Duan, et al. Data-driven structural descriptor for predicting platinum-based alloys as oxygen reduction electrocatalysts. InfoMat, 5 (6) (2023), p. e12406.

[101]

M.M. Montemore, J.W. Medlin. A unified picture of adsorption on transition metals through different atoms. J Am Chem Soc, 136 (26) (2014), pp. 9272-9275.

[102]

M.M. Montemore, C.F. Nwaokorie, G.O. Kayode. General screening of surface alloys for catalysis. Catal Sci Technol, 10 (13) (2020), pp. 4467-4476.

[103]

G.A. Somorjai, J.Y. Park. Molecular surface chemistry by metal single crystals and nanoparticles from vacuum to high pressure. Chem Soc Rev, 37 (10) (2008), pp. 2155-2162.

[104]

J.K. Nørskov, T. Bligaard, B. Hvolbaek, F. Abild-Pedersen, I. Chorkendorff, C.H. Christensen. The nature of the active site in heterogeneous metal catalysis. Chem Soc Rev, 37 (10) (2008), pp. 2163-2171.

[105]

F. Calle-Vallejo, D. Loffreda, M.T.M. Koper, P. Sautet. Introducing structural sensitivity into adsorption-energy scaling relations by means of coordination numbers. Nat Chem, 7 (5) (2015), pp. 403-410.

[106]

F. Calle-Vallejo, J. Tymoczko, V. Colic, Q.H. Vu, M.D. Pohl, K. Morgenstern, et al. Finding optimal surface sites on heterogeneous catalysts by counting nearest neighbors. Science, 350 (6257) (2015), pp. 185-189.

[107]

X. Liu, J. Xiao, H. Peng, X. Hong, K. Chan, J.K. Nørskov. Understanding trends in electrochemical carbon dioxide reduction rates. Nat Commun, 8 (1) (2017), p. 15438.

[108]

T.S. Choksi, L.T. Roling, V. Streibel, F. Abild-Pedersen. Predicting adsorption properties of catalytic descriptors on bimetallic nanoalloys with site-specific precision. J Phys Chem Lett, 10 (8) (2019), pp. 1852-1859.

[109]

R.A. Sheldon. Green and sustainable manufacture of chemicals from biomass: state of the art. Green Chem, 16 (3) (2014), pp. 950-963.

[110]

C. Mondelli, G. Gözaydın, N. Yan, J. Pérez-Ramírez. Biomass valorisation over metal-based solid catalysts from nanoparticles to single atoms. Chem Soc Rev, 49 (12) (2020), pp. 3764-3782.

[111]

I. Vollmer, M.J.F. Jenks, M.C.P. Roelands, R.J. White, T. Van Harmelen, P. de Wild, et al. Beyond mechanical recycling: giving new life to plastic waste. Angew Chem Int Ed Engl, 59 (36) (2020), pp. 15402-15423.

[112]

H. Zhou, Y. Wang, Y. Ren, Z.H. Li, X.G. Kong, M.F. Shao, et al. Plastic waste valorization by leveraging multidisciplinary catalytic technologies. ACS Catal, 12 (15) (2022), pp. 9307-9324.

[113]

R.A. Hoyt, M.M. Montemore, I. Fampiou, W. Chen, G. Tritsaris, E. Kaxiras. Machine learning prediction of H adsorption energies on Ag alloys. J Chem Inf Model, 59 (4) (2019), pp. 1357-1365.

[114]

S. Saxena, T.S. Khan, F. Jalid, M. Ramteke, M.A. Haider. In silico high throughput screening of bimetallic and single atom alloys using machine learning and ab initio microkinetic modelling. J Mater Chem A, 8 (1) (2020), pp. 107-123.

[115]

X.Y. Liu, C. Cai, W.H. Zhao, H.J. Peng, T. Wang. Machine learning-assisted screening of stepped alloy surfaces for C1 catalysis. ACS Catal, 12 (8) (2022), pp. 4252-4260.

[116]

Z. Yang, W. Gao, Q. Jiang. A machine learning scheme for the catalytic activity of alloys with intrinsic descriptors. J Mater Chem A, 8 (34) (2020), pp. 17507-17515.

[117]

X. Zong, D.G. Vlachos. Exploring structure-sensitive relations for small species adsorption using machine learning. J Chem Inf Model, 62 (18) (2022), pp. 4361-4368.

[118]

J. Yang, Z. Wang, Z. Liu, Q. Wang, Y. Wen, A. Zhang, et al. Rational ensemble design of alloy catalysts for selective ammonia oxidation based on machine learning. J Mater Chem A, 10 (47) (2022), pp. 25238-25248.

[119]

T.A.A. Batchelor, J.K. Pedersen, S.H. Winther, I.E. Castelli, K.W. Jacobsen, J. Rossmeisl. High-entropy alloys as a discovery platform for electrocatalysis. Joule, 3 (3) (2019), pp. 834-845.

[120]

D. Roy, S.C. Mandal, B. Pathak. Machine learning-driven high-throughput screening of alloy-based catalysts for selective CO2 hydrogenation to methanol. ACS Appl Mater Interfaces, 13 (47) (2021), pp. 56151-56163.

[121]

N.K. Pandit, D. Roy, S.C. Mandal, B. Pathak. Rational designing of bimetallic/trimetallic hydrogen evolution reaction catalysts using supervised machine learning. J Phys Chem Lett, 13 (32) (2022), pp. 7583-7593.

[122]

X. Zhang, K.P. Li, B. Wen, J. Ma, D.F. Diao. Machine learning accelerated DFT research on platinum-modified amorphous alloy surface catalysts. Chin Chem Lett, 34 (5) (2023), Article 107833.

[123]

K. Tran, Z.W. Ulissi. Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution. Nat Catal, 1 (9) (2018), pp. 696-703.

[124]

T. Xie, J.C. Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys Rev Lett, 120 (14) (2018), Article 145301.

[125]

S. Back, J. Yoon, N. Tian, W. Zhong, K. Tran, Z.W. Ulissi. Convolutional neural network of atomic surface structures to predict binding energies for high-throughput screening of catalysts. J Phys Chem Lett, 10 (15) (2019), pp. 4401-4408.

[126]

G.H. Gu, J. Noh, S. Kim, S. Back, Z. Ulissi, Y. Jung. Practical deep-learning representation for fast heterogeneous catalyst screening. J Phys Chem Lett, 11 (9) (2020), pp. 3185-3191.

[127]

A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater, 1 (1) (2013), Article 011002.

[128]

S.H. Wang, H.S. Pillai, S. Wang, L.E.K. Achenie, H. Xin. Infusing theory into deep learning for interpretable reactivity prediction. Nat Commun, 12 (1) (2021), p. 5288.

[129]

K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, et al. Assessment and validation of machine learning methods for predicting molecular atomization energies. J Chem Theory Comput, 9 (8) (2013), pp. 3404-3419.

[130]

D. Rogers, M. Hahn. Extended-connectivity fingerprints. J Chem Inf Model, 50 (5) (2010), pp. 742-754.

[131]

B. Huang, O.A. Von Lilienfeld. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat Chem, 12 (10) (2020), pp. 945-951.

[132]

K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O.A. Von Lilienfeld, K.R. Müller, et al. Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space. J Phys Chem Lett, 6 (12) (2015), pp. 2326-2331.

[133]

A.S. Christensen, L.A. Bratholm, F.A. Faber,OA Von Lilienfeld. FCHL revisited: faster and more accurate quantum machine learning. J Chem Phys, 152 (4) (2020), p. 044107.

[134]

X. Li, R. Chiong, Z. Hu, D. Cornforth, A.J. Page. Improved representations of heterogeneous carbon reforming catalysis using machine learning. J Chem Theory Comput, 15 (12) (2019), pp. 6882-6894.

[135]

X. Li, R. Chiong, A.J. Page. Group and period-based representations for improved machine learning prediction of heterogeneous alloy catalysts. J Phys Chem Lett, 12 (21) (2021), pp. 5156-5162.

[136]

D. SMILES Weininger. a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci, 28 (1) (1988), pp. 31-36.

[137]

A.J. Chowdhury, W. Yang, K.E. Abdelfatah, M. Zare, A. Heyden, G.A. Terejanu. A multiple filter based neural network approach to the extrapolation of adsorption energies on metal surfaces for catalysis applications. J Chem Theory Comput, 16 (2) (2020), pp. 1105-1114.

[138]

A.J. Chowdhury, W.Q. Yang, A. Heyden, G.A. Terejanu. Comparative study on the machine learning-based prediction of adsorption energies for ring and chain species on metal catalyst surfaces. J Phys Chem C, 125 (32) (2021), pp. 17742-17748.

[139]

B.C. Wang, T.J. Gu, Y.J. Lu, B. Yang. Prediction of energies for reaction intermediates and transition states on catalyst surfaces using graph-based machine learning models. Mol Catal, 498 (2020), Article 111266.

[140]

S. Pablo-García, S. Morandi, R.A. Vargas-Hernández, K. Jorner, Z. Ivković, N. López, et al. Fast evaluation of the adsorption energy of organic molecules on metals via graph neural networks. Nat Comput Sci, 3 (5) (2023), pp. 433-442.

[141]

R. Jinnouchi, R. Asahi. Predicting catalytic activity of nanoparticles by a DFT-aided machine-learning algorithm. J Phys Chem Lett, 8 (17) (2017), pp. 4279-4283.

[142]

M.O.J. Jager, E.V. Morooka, F.F. Canova, L. Himanen, A.S. Foster. Machine learning hydrogen adsorption on nanoclusters through structural descriptors. npj Comput Mater, 4 (2018), p. 37.

[143]

Y. Chen, Y. Huang, T. Cheng, W.A. Goddard III. Identifying active sites for CO2 reduction on dealloyed gold surfaces by combining machine learning with multiscale simulations. J Am Chem Soc, 141 (29) (2019), pp. 11651-11657.

[144]

A.C.T. Van Duin, S. Dasgupta, F. Lorant, W.A. Goddard. ReaxFF: a reactive force field for hydrocarbons. J Phys Chem A, 105 (41) (2001), pp. 9396-9409.

[145]

S. Naserifar, Y.L. Chen, S. Kwon, H. Xiao, W.A. Goddard III. Artificial intelligence and QM/MM with a polarizable reactive force field for next-generation electrocatalysts. Matter, 4 (1) (2021), pp. 195-216.

[146]

K. Jiang, Y.F. Huang, G.S. Zeng, F.M. Toma, W.A. Goddard III, A.T. Bell. Effects of surface roughness on the electrochemical reduction of CO2 over Cu. ACS Energy Lett, 5 (4) (2020), pp. 1206-1214.

[147]

G.H. Gu, J. Lim, C. Wan, T. Cheng, H. Pu, S. Kim, et al. Autobifunctional mechanism of jagged Pt nanowires for hydrogen evolution kinetics via end-to-end simulation. J Am Chem Soc, 143 (14) (2021), pp. 5355-5363.

[148]

J.W. Zhang, P.J. Hu, H.F. Wang. Amorphous catalysis: machine learning driven high-throughput screening of superior active site for hydrogen evolution reaction. J Phys Chem C, 124 (19) (2020), pp. 10483-10494.

[149]

P.G. Ghanekar, S. Deshpande, J. Greeley. Adsorbate chemical environment-based machine learning framework for heterogeneous catalysis. Nat Commun, 13 (1) (2022), p. 5788.

[150]

S. Deshpande, T. Maxson, J. Greeley. Graph theory approach to determine configurations of multidentate and high coverage adsorbates for heterogeneous catalysis. npj Comput Mater, 6 (1) (2020), p. 79.

[151]

L. Cao, T. Mueller. Catalytic activity maps for alloy nanoparticles. J Am Chem Soc, 145 (13) (2023), pp. 7352-7360.

[152]

M. Zhong, K. Tran, Y. Min, C. Wang, Z. Wang, C.T. Dinh, et al. Accelerated discovery of CO2 electrocatalysts using active machine learning. Nature, 581 (7807) (2020), pp. 178-183.

[153]

L. Van der Maaten. Accelerating t-SNE using tree-based algorithms. J Mach Learn Res, 15 (1) (2014), pp. 3221-3245.

[154]

H.S. Pillai, Y. Li, S.H. Wang, N. Omidvar, Q. Mu, L.E.K. Achenie, et al. Interpretable design of Ir-free trimetallic electrocatalysts for ammonia oxidation with graph neural networks. Nat Commun, 14 (1) (2023), p. 792.

[155]

S. Zhai, H.P. Xie, P. Cui, D.Q. Guan, J. Wang, S.Y. Zhao, et al. A combined ionic Lewis acid descriptor and machine-learning approach to prediction of efficient oxygen reduction electrodes for ceramic fuel cells. Nat Energy, 7 (9) (2022), pp. 866-875.

[156]

Q. Gao, H.S. Pillai, Y. Huang, S. Liu, Q. Mu, X. Han, et al. Breaking adsorption-energy scaling limitations of electrocatalytic nitrate reduction on intermetallic CuPd nanocubes by machine-learned insights. Nat Commun, 13 (1) (2022), p. 2338.

[157]

K.T. Winther, M.J. Hoffmann, J.R. Boes, O. Mamun, M. Bajdich, T. Bligaard. Catalysis-Hub. org, an open electronic structure database for surface reactions. Sci Data, 6 (1) (2019), p. 75.

[158]

L. Chanussot, A. Das, S. Goyal, T. Lavril, M. Shuaibi, M. Riviere, et al. Open catalyst 2020 (OC20) dataset and community challenges. ACS Catal, 11 (10) (2021), pp. 6059-6072.

[159]

A. Kolluru, M. Shuaibi, A. Palizhati, N. Shoghi, A. Das, B. Wood, et al. Open challenges in developing generalizable large-scale machine-learning models for catalyst discovery. ACS Catal, 12 (14) (2022), pp. 8572-8581.

[160]

C. Chen, W.K. Ye, Y.X. Zuo, C. Zheng, S.P. Ong. Graph networks as a universal machine learning framework for molecules and crystals. Chem Mater, 31 (9) (2019), pp. 3564-3572.

[161]

N. Yang, A.J. Medford, X. Liu, F. Studt, T. Bligaard, S.F. Bent, et al. Intrinsic selectivity and structure sensitivity of rhodium catalysts for C2+ oxygenate production. J Am Chem Soc, 138 (11) (2016), pp. 3705-3714.

[162]

R. Sundararaman, D. Vigil-Fowler, K. Schwarz. Improving the accuracy of atomistic simulations of the electrochemical interface. Chem Rev, 122 (12) (2022), pp. 10651-10674.

[163]

X. Liu, P. Schlexer, J. Xiao, Y. Ji, L. Wang, R.B. Sandberg, et al. pH effects on the electrochemical reduction of CO2 towards C2 products on stepped copper. Nat Commun, 10 (1) (2019), p. 32.

[164]

H.J. Peng, M.T. Tang, J. Halldin Stenlid, X. Liu, F. Abild-Pedersen. Trends in oxygenate/hydrocarbon selectivity for electrochemical CO2 reduction to C2 products. Nat Commun, 13 (1) (2022), p. 1399.

[165]

F.A. Faber, L. Hutchison, B. Huang, J. Gilmer, S.S. Schoenholz, G.E. Dahl, et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J Chem Theory Comput, 13 (11) (2017), pp. 5255-5264.

[166]

M. Bogojeski, L. Vogt-Maranto, M.E. Tuckerman, K.R. Müller, K. Burke. Quantum chemical accuracy from density functional approximations via machine learning. Nat Commun, 11 (1) (2020), p. 5223.

[167]

M.K. Bisbo, B. Hammer. Efficient global structure optimization with a machine-learned surrogate model. Phys Rev Lett, 124 (8) (2020), Article 086102.

[168]

J. Behler. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew Chem Int Ed Engl, 56 (42) (2017), pp. 12828-12840.

[169]

P. Friederich, F. Häse, J. Proppe, A. Aspuru-Guzik. Machine-learned potentials for next-generation matter simulations. Nat Mater, 20 (6) (2021), pp. 750-761.

RIGHTS & PERMISSIONS

THE AUTHOR

PDF (6470KB)

21541

Accesses

0

Citation

Detail

Sections
Recommended

/