《1. Introduction 》

1. Introduction

Artificial intelligence (AI) seeks to mimic human decision making. The subset of AI techniques known as machine learning (ML) enables computers to learn how to act outside the confines of their programmed behavior, through the use of external data. ML has revolutionized myriad industries and fields of study, with incredibly wide-ranging applications from stock market analysis to selfdriving cars. With the advent of Internet of Things (IoT) devices and big data (where a high volume of data is generated at high velocity and with many different varieties), ML is one of the most important technologies to ensure that actionable insights can be gleaned from big data.

In the oil and gas industry, model types are divided into three main categories: physical, mathematical, and empirical models [1]. A physical model is a scaled-down or scaled-up version of an object that is developed to simplify the understanding of how a physical object or scenario looks or operates. These models have the disadvantages of being costly and time-consuming to develop, and may not be sufficiently accurate in some cases. Empirical models are established based on experiments; they are subject to a variety of errors, such as human and measurement errors, and are not generalizable. Mathematical models encode physical laws to simulate the underlying physics; however, they require many assumptions and simplifications [1]. To deal with the challenges in these three model types, derive insights, and make intelligent decisions in a timely manner, a more promising technique is required. This is where ML can be applied, due to its ability to capture and act upon insights from vast datasets that could never be handled through purely programmatic rules, due to the complexity of the relationships between data and the insights gleaned from the data.

The oil and gas industry is rapidly transitioning to oil-field digitization, and there has been an increased drive to apply datadriven modeling and ML algorithms to various petroleum engineering challenges. Data-driven modeling uses mathematical equations derived from data analysis, as opposed to knowledgedriven modeling, in which logic is the main tool to represent a theory [2,3]. While there may be data-driven algorithms that do not learn from data (and thus cannot be called ML), ML is a subset of data-driven approaches that demonstrate a form of AI. Fig. 1 summarizes different types of ML algorithms.

《Fig. 1》

Fig. 1. Different types of ML algorithms. DBSCAN: density-based spatial clustering of applications with noise; HDBSCAN: hierarchical density-based spatial clustering of applications with noise.

ML has been widely used in different areas of the petroleum industry, including geoscience, reservoir engineering, production engineering, and drilling engineering. The next four sections present a critical review and perspective of the application of ML in each of these areas.

《1.1. Intelligent geoscience》

1.1. Intelligent geoscience

Geoscience has utilized algorithms such as decision trees, Markov chains, and K-means clustering as early as the 1960s. Markov chains have been utilized in sedimentology [4], hydrology [5], and well-log analysis [6]. Preston and Henderson [7] used K-means clustering to interpret the periodicity of sediment deposits. Early application of decision trees can be found in economic geology and perspective mapping [8,9]. Due to a variety of factors, including a lack of computational power and immaturity of the field, ML adoption did not perfectly satisfy initial expectations; hence, little development of AI occurred in the 1970s. Zhao and Mendel [10] employed recurrent neural networks (NNs) to perform seismic deconvolution in the 1980s, which can be considered a resurgence of interest in AI. A shift from knowledge-driven to data-driven ML occurred in the 1990s, when the first review of NNs in geophysics was published by McCormack [11]. McCormack’s review explored pattern recognition and presented a summary of NN applications over the previous 30 years, along with applied samples of seismic trace editing and automated well-log analysis. Deep learning (DL) and, more specifically, convolutional neural networks (CNNs) were revitalized in the 2010s, when Waldeland and Solberg [12] applied a small CNN to seismic data for salt recognition. Fault interpretation [13–15], horizon picking [16], and facies classification [17,18] are other applications of CNNs in geoscience. Mosser et al. [19] applied a generative adversarial network (GAN)—that is, an ML model in which two NNs work together competitively to make more accurate predictions—early on in geoscience to perform pore scale modeling of three-dimensional (3D) porous media. GANs have also been utilized in seismogram generation and geostatistical inversion [20].

Seismicity is another important field in geoscience in which ML has become widely used. Mousavi et al. [21] used ML algorithms to discriminate deep micro-seismic events from shallow ones based on features of waveforms recorded on surface receivers. He et al. [22] used an ML algorithm to improve the risk management of induced seismic events. The proposed model was a set of simple closed-form expressions, with the advantages of high transparency and fast execution speed, providing the operator with the greatest chance of success. Industrial activities such as mining, oil- and gasfield depletion, wastewater injection, and geothermal operations can induce seismicity [23,24]. In western Canada, the seismicity induced from hydraulic fracturing (HF) has galvanized public and academic attention [25]. Investigating the correlations between induced seismicity and HF has been exceedingly challenging for a long time, due to the complexity introduced by strongly coupled geomechanical, geophysical, and geological behaviors. Thus, there is plenty of room for exploring ML applications in seismicity.

《1.2. Intelligent reservoir engineering》

1.2. Intelligent reservoir engineering

ML algorithms have become popular in various areas of reservoir engineering, particularly in reservoir characterization and in pressure, volume, and temperature (PVT) computations. A twolayer NN was developed by Gharbi and Elsharkawy [26] to estimate bubble point pressure and a formation volume factor for oil reservoirs. In another study, a radial basis function and multilayer perceptron NN were employed to estimate a formation volume factor, isothermal compressibility, and brine salinity [27]. Wang et al. [28] used artificial neural networks (ANNs) in a compositional reservoir simulation for phase equilibrium calculations, including phase stability tests and phase splitting calculations. A combination of two approaches—namely, a support vector machine (SVM) and fuzzy logic—was utilized to predict permeability and porosity using real-life well logs as an input feed [29]. Patel and Chatterjee [30] utilized classification algorithms to carry out quick and accurate rock typing (i.e., classifying reservoir rock into different categories based on similarities). In the presence of random noise, the performance of an ANN with a single hidden layer was explored by An [31] to establish a model to predict the thickness of a low-velocity layer. The proposed approach was also applied on an oil field in northern Alberta, Canada, to construct a distribution map of porosity-net pay thickness, based on which four wells were drilled and the field production increased by almost 20% [32]. Jamialahmadi and Javadpour [33] utilized a radial basis function NN using depth measurements and the porosity of core data as inputs to estimate the permeability of an entire oil field in southern Iraq. An ensemble ML model (i.e., a random forest algorithm) was developed by Wang et al. [34] to predict time-lapse saturation profiles at well locations using actual production and injection data from a structurally complicated and highly faulted offshore oil field as the major inputs. A new framework for the prediction of multiple reservoir parameters (i.e., porosity, saturation, lithofacies, and shale content) was developed by introducing an extreme learning machine (ELM), which is one of the most advanced ML techniques [35]. In comparison to the classic single-layer feed-forward NN approach, the proposed method requires fewer computing resources and less training time without sacrificing accuracy.

《1.3. Intelligent production engineering》

1.3. Intelligent production engineering

Production prediction/optimization and HF are two other fields in the energy industry in which ML has grown popular. Many parameters must be taken into account for production prediction and optimization, including the recovery process, proppant type, well spacing, treatment rate, and number of fracturing stages. Although the optimization of operational parameters can save millions of dollars and significantly enhance unconventional reservoir production, traditional reservoir simulations are computationally expensive, which adds up when taking different variations of reservoir characteristics into account [36,37]. Hence, production prediction and optimization are good candidates for AI applications, as shown by the recent development and analyses of ML algorithms for various recovery processes, such as water and chemical floods and steam injection [38–40]. Dang et al. [41] utilized an NN for the multidimensional interpolation of relative permeability to overcome the impacts of different parameters (i.e., the polymer, surfactant, and salinity) during hybrid recovery processes. Production forecasting for wells in different reservoirs using geological, core, and log data is a widely used ML application in this domain [42,43]. Tadjer et al. [44] utilized DeepAR and Prophet (a time series ML algorithm) as alternatives to decline curve analysis for short-term oil and gas well forecasting. Using an NN to predict bottom hole pressure in vertical wells, which is a crucial parameter in the design of production facilities, is another application of ML in this area [45]. A long short-term memory (LSTM) model along with a feature-selection method was applied to predict the daily production rates of shale gas wells in the Duvernay Formation in Canada [46]. Horizontal well placement optimization was investigated by Popa and Connel [47] via stratigraphic performance estimation using a combination of fuzzy logic and NN.

In the last two decades, the growing number of HF jobs has resulted in a substantial amount of measured data that can be used to construct ML prediction models. A study was conducted by Mohaghegh [48] to map a natural fracture network in Utica shale using fuzzy logic cluster analysis. He et al. [49] developed a model to optimize HF design in shale gas reservoirs using AI and fuzzy logic analysis. A novel model was developed using an SVM to determine the hydraulic apertures of rough rocks [50]. Yang et al. [51] established a data analytics approach combining design parameters derived from acoustic wireline logs and postfracturing analysis to optimize fracturing treatment design. The obtained fracturing optimization algorithm was validated using production logging tool data and deep shear-wave imaging along horizontal wells in the Marcellus shale reservoirs. An integrated approach combining ML, reservoir simulation, and HF was presented by Wang and Sun [52] to optimize well spacing in Permian shales, considering a typical well for each representative region in this large area. A reinforcement learning algorithm was applied by Bangi and Kwon [53] to achieve a uniform proppant concentration along fractures in order to improve HF productivity; the researchers coupled dimensionality reduction with transfer learning to speed up the learning process. Duplyakov et al. [54] presented a model based on a combination of boosting algorithms and ridge regression to predict the cumulative oil production of a well completed with multistage fractures. A case study was performed on 74 hydraulically fractured wells in the Montney Formation in Alberta, Canada, to predict cumulative production profiles over a five-year period using well spacing, rock mechanical properties, and completion parameters as input features [55]. A proxy model was developed to predict cumulative gas production for shale reservoirs using a deep NN based on production, completion, and HF data as input features; this model was validated using field data for 1239 horizontal wells in the Montney Formation [56].

《1.4. Intelligent drilling engineering》

1.4. Intelligent drilling engineering

Considering that huge volumes of real-time data are being produced daily during drilling operations, drilling engineering has also benefited greatly from the application of ML. Due to the volatility of oil prices in recent years, methods to ensure good economics in a variety of price scenarios have been sought. In order to achieve this, ML has become increasingly common to alleviate drilling challenges in real time. Drilling operation optimization and stuck pipe prediction are two of the most critical areas in drilling engineering and have been frequently investigated using AI applications. Mohaghegh [57] utilized an ANN for the real-time identification of drilling anomalies and their related nonproductive time (NPT). An ML model was developed by Unrau et al. [58] to determine a real-time alarm threshold in order to detect anomalies in flow rates and mud volume data during drilling operations. This model assists in the early detection of lost circulation and minimizes false alarm creation. Reinforcement learning algorithms were applied by Pollock et al. [59] to refine a pretrained NN based on 14 horizontal wells in the Permian and Appalachia basins. The refined model managed to minimize tortuosity and deviations from planned trajectories with a less than 3% error. Zhao et al. [60] applied ML algorithms to derive a trend of different drilling parameters in order to identify anomalous incidents and propose remedial actions accordingly. An attempt was made to apply ML algorithms to the optimization of a rate of penetration (ROP) using drilling features such as weights on bit, flow rate, and rotations per minute [61]. Goebel et al. [62] developed an ML model to predict future stuck pipes based on the monitoring and investigation of various parameters including ROP, pipe rotation, inclination angle, and flow rate. A year later, real-time risk prediction during drilling was presented by Dursun et al. [63]. ML algorithms were coupled with data mining and natural language processing (NLP) techniques to investigate daily drilling reports (DDRs) for two onshore fields in the Middle East in an exceptionally short time, in order to categorize productive and NPT and discover critical contributing factors of NPT [64].

《2. Challenges and opportunities》

2. Challenges and opportunities

ML algorithms can be very effectively applied to address three main types of problems: building surrogate models for understood problems to reduce computational costs; building ML models for problems that require human intervention and knowledge for analysis; and building ML models for complex problems that were previously impractical to be addressed. ML yields the fastest success in realms in which the environment is straightforward, data is easily available, and decisions are not expensive. Although most cases of ML use in the petroleum industry do not meet any of these criteria, as the environments are usually heterogeneous, decisions are expensive (e.g., drilling a well), and data is sporadic, the investment in longer term gains through the effective application of ML can provide a great deal of value, although thoughtful design and a high degree of collaboration with domain experts is required [65].

Applying ML algorithms to petroleum engineering problems requires a variety of challenges to be overcome. One challenge is that the data often comes in a high volume (i.e., large amounts of data), with a wide range of variety (i.e., many different data formats) and veracity (i.e., data inconsistency and inaccuracy), and at a high velocity (i.e., a high rate of data influx). Massive amounts and varieties of data are being produced daily from downhole and surface sensors installed on operational equipment in the petroleum industry. The industry utilizes structured and unstructured data to keep track of production, safety, and maintenance. Acquiring accurate data in the petroleum industry is usually difficult or impossible, and can be expensive. As a result, obtaining sufficient quantities of high-quality data for training and verification is a prevalent challenge in the petroleum industry, which causes uncertainties and noise in training data. In turn, such issues compromise the generalizability and accuracy of ML models. In addition, raw data is often not ready for ML algorithms and needs to be preprocessed and cleaned. Subsurface uncertainties and dataprocessing time delays are also important considerations. Moreover, such data usually resides in departmental silos, and the corresponding models are either unavailable or not open to others due to confidentiality concerns and competitive edges—a problem that is particularly prevalent in academic environments [65]. Furthermore, model explainability is important for geoscience, since it can be just as important to know the reason for a result as it is to know the result itself.

Perhaps due to the challenges mentioned above, ML adoption in geoscience is not moving as quickly as in many other fields. Although ML is a promising technique for using big data to discover input–output relationships and derive insights, ML performance can be affected by the high dimensionality of the data. This may lead to misleading correlations and impractical and unreliable clustering. It is noteworthy that data is usually ambiguous in its initial state; thus, different preprocessing techniques are required to identify salient features and make the ML model capable of learning a system’s behavior. There is a risk that missing data and a lack of system stability will introduce biases into ML models, making it problematic to extract beneficial knowledge from data [66]. Moreover, the considerations and challenges of utilizing data from diverse sources should be taken into account. Privacy, security, and ethics related to data are also very important aspects to consider. Hybrid modeling, which integrates ML algorithms with physics-based methods, can be considered as a way to mitigate the abovementioned problems. Furthermore, transfer learning—in which a pretrained model is used as a starting point, and then a model is trained on top of it by considering one’s own training data—is a relatively recent ML technique that can potentially be beneficial in geoscience contexts.

《3. Perspectives》

3. Perspectives

The potential of ML has not been fully used in two areas of the petroleum industry—namely, reservoir simulation and text mining. Reservoir simulation involves differential equations (DEs) that adequately illustrate physical property changes over time and space and are thus useful for describing physical phenomena in nature. There are many problems in science and engineering that require solving complicated DEs. However, DEs are remarkably difficult to solve, and their associated simulations are extremely complex and computationally intensive. This level of complexity requires the use of giant computers to perform simulations and justifies the interest in AI among researchers in this area. Utilizing DL, which involves NNs with more than one hidden layer, is a promising technique that will speed up solving DEs and save scientists and engineers a great deal of time and effort. Caltech researchers have introduced a new DL technique for solving DEs that is more accurate, generalizable, and 1000 times faster than traditional DL algorithms [67]. This new adaptation is based on defining the input and output in a Fourier space, as opposed to a Euclidean space in traditional DL. This development will not only lessen the dependency on supercomputers but also raise the computational capacity to efficiently model more intricate problems.

The petroleum industry is just beginning to harness the power of ML for smart reporting and extracting information from text documents. Daily drilling and completion reports are two of the main text-based documents in the industry that contain important text, as well as a variety of other types of data such as depths, casing sizes, hole sizes, and perforation depths. NLP and DL algorithms can be used to develop models for automated quality control of operations and performance improvement, providing approaches that are far more efficient than the traditional approach of relying on the knowledge of subject matter experts [68]. Several studies have investigated text processing in the petroleum industry, with a focus on topics including the text mining of operational data for risk management and issue prediction [69], producing metrics and pattern recognition based on contextual analysis of reports [70,71], and reports classification [71]. Although the literature contains such studies on using text mining techniques to mitigate text-based challenges in the industry, there is still great potential for ML in this area, and it must be further explored.

《4. Concluding remarks》

4. Concluding remarks

Data-driven approaches and AI algorithms hold enough promise that they may someday be relied upon even more than physicsbased methods. Their main feed is data, which is the fundamental element of each scenario. These algorithms learn from data and reveal unseen patterns. Within the petroleum industry, there is great interest in using this technology to gain insight from the huge volumes of data that are generated every second. Many studies explore AI applicability in various subdisciplines of this industry; however, there is a noticeable lack of two main features; that is, most of the research on this topic is either not practical enough to be applicable in real-field challenges or limited to a specific problem and not generalizable. Attention must be given to data itself and to how it is classified and stored. Although tremendous volumes of data are produced in different disciplines, such data remains within departmental silos and is not accessible by others. To derive as much insight as possible from data, the data must be stored in a centralized repository from which it can be readily consumed for different applications. Between data acquisition and the application of AI and ML techniques, data must be processed in order to effectively extract features and ensure that the data can effectively support the algorithms. Although AI and ML techniques are increasingly important within the petroleum and reservoir engineering domain, they are only part of a holistic system. In order to ensure that this system can deliver value, careful consideration is required to apply algorithms to this challenging domain, and the right type, quality and volume of data must be available and effectively processed to achieve the desired outcomes. Thus, although AI is a critical tool to efficiently manage the world’s underground resources, data is the key to fully exploit the possibilities.

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Mohammad Ali Mirza, Mahtab Ghoroori, and Zhangxin Chen declare that they have no conflict of interest or financial conflicts to disclose.