《1. Introduction》

1. Introduction

Deep learning is a type of machine learning method that can be used to achieve data representation, abstraction, and advanced tasks by simulating a multi-layer artificial neural network in a computer [1]. Recent advances in deep learning have attracted much attention. In some fields, deep learning performance has been shown to be superior to that of human experts. Deep learning has revolutionized the fields of artificial intelligence (AI) and computer science, and great advances have been made in these fields. Deep learning has been widely applied to computer vision [2], voice/image recognition [3–5], robotics [6], and other applications [7–9]. However, electronic and active deep learning is strongly limited by the von Neumann architecture in terms of processing time and energy consumption [10,11]. In the last several decades, optical information processing, which implements the operations of convolution, correlation, and Fourier transformation in an optical system, has been found to exhibit unique advantages for parallel processing and has been widely investigated [12–17]. Computer-based deep learning has been achieved in optical systems by use of diffractive optical elements, and optical deep learning based on diffractive optical elements has been validated using image classification [18–21]. The terahertz diffractive deep neural network (D2NN) based on three-dimensional (3D) printing technology is the landmark method for optical deep learning [18].

Based on deep learning-based design and error backpropagation methods, D2NN is trained using a computer. During the training period, each pixel in the diffractive optical element layer is a neuron. The transmission coefficients (or complex amplitude distribution) for these pixels are optimized by the computer to control diffraction light from the input plane to the output plane to perform the required task. When this training phase is complete, these passive diffractive optical element layers can be physically fabricated. By stacking these layers together to form an alloptical network, these diffractive layers can execute the trained function without the use of energy, except for the coherent illumination light that is used to encode the input object’s information and the output detectors. Various nonlinear functions, such as Fourier-space D2NN, where the optical nonlinearity is introduced using ferroelectric thin films [17], and linear functions, such as handwritten digits and a fashion product classifier [18–20] without the use of nonlinear optoelectronic materials, have been performed using a D2NN [18–21].

A 3D-printed terahertz D2NN has been proven to be a great success. The promising performance and validity of D2NN for terahertz platforms has been verified in many studies [18–20]. Despite the huge advantages, terahertz itself suffers from some well-known limitations in practical applications, such as material losses [22] and limited interparticle coupling [22]. Therefore, there is a growing requirement to revise the terahertz scheme for use with visible or near-infrared [23] wavelengths. Extension of the working wavelength from the terahertz bandwidth range to visible light has the potential to offer more novel perspectives [24–29]. However, some restrictions and shortcomings must be addressed to adapt a D2NN from a long wavelength to a short wavelength. Some contradictions exist between the working wavelength, neuron size, and fabrication limitations. A shorter wavelength has a smaller neuron size, which makes the process more difficult, and traditional maximal half-cone diffraction angle theory cannot be used to overcome this contradiction [18–20].

In this work, a new general formula containing detailed analysis of the D2NN framework, including the different parameters required for the design space, is proposed to overcome the abovementioned contradiction. This new formula is proposed by introducing a formula that includes related variables based on the traditional maximum half-cone diffraction angle theory. The proposed general formula is a quantitative analysis on how to expand D2NN for visible light sources such as a helium (He)–neon (Ne) laser source. A series of simulation analyses are designed to verify the proposed formula. As an example, a D2NN classifier is used with a subset of handwritten digits from the Mixed National Institute of Standards and Technology (MNIST) training dataset to reduce the training time and training complexity. The digital classifier is used to recognize handwritten digits from 0 to 9. In this situation, a phase-only, five-layer D2NN is used and a total of 32 000 neurons are used to prevent overfitting. The numerically blind testing accuracy for different cases with a new test dataset is used to verify the proposed formula.

Based on the proposed general formula, a novel visible light D2NN classifier was designed and further experimentally verified. The working light was a He–Ne laser with a wavelength of 632.8 nm. In contrast with existing D2NN classifiers [18–20], the proposed visible light D2NN classifier can classify handwritten digits from 0 to 9 even for a changing target, such as the digits being covered or altered, which are two common cases. The visible light D2NN’s identification capability can be improved by extending the training dataset period. For this test, a total of 55 000 handwritten digits in the MNIST training dataset were extended to a new training dataset with 80 000 images of handwritten digits. The incremental 25 000 handwritten digits were used as the deformed MNIST training dataset, which were covered or altered. By using the five-layer, phase-only D2NN with five million neurons, a numerically blind testing accuracy of 91.57% was achieved in the new test dataset of 11 000 images. The visible light D2NN was fabricated using a multi-step photolithography-etching process on a silicon dioxide (SiO2) substrate. The inputs for the experiment were 50 transmissive digital objects that were fabricated using micro-fabrication technology. A blind testing accuracy of 84% was achieved. The experimental classification accuracy (84%) and the numerical classification accuracy (91.57%) quantify the match between theoretical design and the fabricated system performance. The relatively small reduction in the performance of the experimental network compared to the numerical testing proves the validity of the design theory for the visible light D2NN.

By using these systematic advances for designing a D2NN, the reported method and the improvement set the state of the art for visible light D2NNs. Deep neural networks are sometimes called black boxes [30], as the hidden layers can be difficult to extract to explain how data is processed. The proposed D2NN may provide some insights into this issue. Additionally, understanding the interaction between biological neurons in the human brain is of fundamental interest when building deep neural networks [31]. The proposed D2NN may provide insight into the current understanding of the interactions between biological neurons. The presented framework and theory can be used to apply D2NNs to various practical applications for human–computer interaction equipment and AI interaction devices and can promote progress in biology and computational science.

《2. Material and methods》

2. Material and methods

《2.1. Deep learning and D2NN architecture》

2.1. Deep learning and D2NN architecture

The proposed visible light D2NN follows optical diffraction theory and a deep learning structure [32–35]. Compared with traditional deep learning systems, the D2NN system has some differences: ① The D2NN system obeys optical diffractive theories, such as Huygens’ principle and Rayleigh–Sommerfeld diffraction [36], for the forward-propagation model; ② the training loss function used in this system (i.e., the softmax-cross-entropy (SCE) loss function), is based on the light power incident on different detector regions in the output plane. Therefore, based on the calculated error with respect to the target output and according to the desired loss function, the network structure and its neuron phase values can be optimized using an error back-propagation algorithm.

Generally, in deep learning, the phase values for the neurons in each layer are iteratively adjusted (or trained) to perform a specific function by feeding training data to the input layer followed by computing the network’s output through optical diffraction. Before a specific analysis of the forward-propagation model within a D2NN, some definitions are provided for the lth layer as follows (where represents the layer number of the network):

is the lth layer’s complex transmission function. The complex transmission coefficient of the ith neuron, where i represents the neuron located at of layer   (where is the spatial coordinate of the ith neuron along the x-, y-, and z-axis, respectively), is composed of amplitude and phase terms. It can be further defined as   ,  where   is the amplitude term,  is the neuron phase term, j is an imaginary number and j = . For a phase-only D2NN architecture, the amplitude  is assumed to be a constant value equal to 1. 

•  is the secondary diffractive wave just after the lth layer. The secondary diffractive wave after the ith neuron located at   for layer  is . The amplitude and phase of  are determined by the product of the input wave to layer and its complex transmission coefficient  ,  which are complex-valued functions. 

• is the output function when the secondary diffractive wave function propagates across the distance between the diffractive layers   and + 1.  is also the input function for the next layer + 1. When the input function is vertically irradiated at the lth diffraction layer, the complex amplitude distribution of  is the product of the transmission function of t l and the input function :

where  is the Hadamard product operation.

After the secondary diffractive wave  propagates from the diffractive layer to the diffractive layer + 1, a phase shift is introduced with a corresponding phase factor as follows:

where is the distance between the diffractive layer and + 1; α, β, and  are the angles for the propagation direction of the secondary diffractive wave along the x-, y-, and z-axis, respectively; k is the wave number and, is the wavelength.

Assuming that and , where u and v are spatial frequency, the corresponding phase factor after propagation of the secondary diffractive wave from the diffractive layer to the diffractive layer + 1 can be expressed as follows:

We then obtain

Therefore, the overall analysis of the forward-propagation model within a D2NN is given as follows:

《2.2. TensorFlow-based design for a D2NN and processing flow》

2.2. TensorFlow-based design for a D2NN and processing flow

The proposed visible light D2NN was realized using Python (v3.7) and the TensorFlow (v1.12.0, Google Inc., USA) framework. The proposed D2NN system was trained for 20 epochs using a desktop computer with a GeForce GTX 1080 Ti graphical processing unit (GPU) and an Intel®CoreTM E5-2650 central processing unit (CPU) @2.00 GHz and 128 GB of random access memory (RAM), running the Windows 10 operating system (Microsoft Corporation, USA).

The trainable parameters in a diffractive neural network are the modulation values for each layer, which, here, were optimized using the back-propagation method of the adaptive moment estimation (Adam) optimizer with a learning rate of 10–3 . Furthermore, the number of network layers and the axial distance between these layers are also design parameters. The training time for a five-layer visible light D2NN to classify both unchanged and changed (covered or altered) handwritten digits was approximately 20 h.

The input digit objects were encoded based on the input amplitude into the D2NN and were fabricated by laser direct writing (LDW). The target objects were fabricated on a soda glass substrate. The glass substrate was first cleaned using acetone and isopropyl alcohol. The clear substrate was coated with a layer of chromium (Cr) with a thickness of a few hundred nanometres using electron beam evaporation. After spin-coating positive photoresist and a prebake process, the handwritten digit patterns were exposed using LDW technology. The exposed resist was stripped using a developer and the uncovered Cr was stripped using chrome mordant. Finally, any remnant resist was also cleaned using acetone and isopropyl alcohol.

To enhance the fabrication of the visible wavelength D2NN, a sigmoid function was applied to limit the phase value of each neuron to 0–π, which enabled the neurons to be easily fabricated using a traditional multi-step photolithography-etching method. Before processing, the neuron phase values Φ need to be converted into a relative height map , where is the refractive index difference between the fabricated substrate and air). The D2NN layers were fabricated onto a SiO2 substrate using a similar cleaning process as above. After cleaning, equipment pre-treated with hexamethyldisilazane was used to change the surface activity of the SiO2 substrate to enhance the adhesion between the photoresist and the substrate. After spin-coating photoresist and exposure, the exposed resist was stripped using a developer. Then, after an oxygen plasma sizing treatment, magnetic neutral loop discharge etching was used. This process was repeated until the D2NN layer structure was achieved. More processing details can be found in Figs. S1 and S2 in Appendix A.

《3. Results》

3. Results

By using a multi-step photolithography-etching process on a SiO2 substrate, a D2NN was fabricated as five layers of diffractive optical elements, which were mounted as shown in Fig. 1. The identification capability of the visible light D2NN was improved by extending the target dataset for the training period. For a changing target, for example, a target being covered or altered, existing deep learning systems will falsely identify the target [18–20], even with sufficient object recognition accuracy improvements [20]. Therefore, our D2NN was trained as a digit classifier to perform automated classification of handwritten digits. The designed D2NN can classify unchanged number targets from 0 to 9 as well as targets that have been changed (by being covered or altered), as shown in Fig. 1(a). For these tasks, a phase-only transmission five-layer D2NN was designed by training 80 000 images, comprising 55 000 unchanged handwritten digits that were obtained from the MNIST training dataset and 25 000 changed handwritten digits (i.e., covered or altered digits) which were derived from the deformed MNIST training dataset. The input digits were encoded into the D2NN based on the input amplitude. The diffractive neutral network was trained to map the input digits to eleven detector regions, which were marked by different numbers, as shown in Fig. 1(a). The unchanged input digits from 0 to 9 were mapped to the No. 0 to No. 9 detector regions, respectively. The changed input digits were all mapped to the No. X detector region. These detectors are also shown in Fig. 1(a). The classification criterion was used to find the detector with the maximum optical signal.

《Fig. 1》

Fig. 1. Schematic diagram and experimental setup of the visible light D2NN. (a) Schematic diagram of the classifier used for the unchanged handwritten digit targets from 0 to 9 and the changed handwritten digit goals, such as the covered digit 7 and the altered digit 5. The spatial distribution of the detectors is also shown in this figure. (b) Numerical phase values for the neurons of the five layers L1, L2, L3, L4, and L5. (c) The experimental schematic. (d, e) The experimental setup. CCD: charge coupled device.

Once the training was completed, the improved D2NN digit classifier was numerically tested using 11 000 additional images, which were not used as training image sets and comprised 10 000 unchanged handwritten digits that were obtained from the MNIST test dataset and 1000 deformed handwritten digits (i.e., covered or altered digits) that were derived from the deformed MNIST test dataset. The improved system achieved a blind classification accuracy of 91.57%.

Using the numerical phase values for the neurons in each layer, as shown in Fig. 1(b), the designed five-layer D2NN was processed. The phase of the neurons in each layer was physically encoded based on the relative thickness of each layer point. Therefore, the diffractive optical element for the D2NN was processed using a multi-step photolithography-etching process on a SiO2 substrate. The experimental schematic for the whole five-layer D2NN is shown in Fig. 1(c). The experimental setup based on a He–Ne laser (25-STP-912-230, Melles Griot, USA) is shown in Figs. 1(d) and (e). In the experiment, a He–Ne laser beam was collimated by lenses 1 and 2, and a pinhole was used as a filter. The collimated He–Ne laser beam was used to illuminate the target objects in the input plane. Using the five-layer D2NN, the diffractive field for the output plane was detected by a charge coupled device (CCD) (Beijing Daheng Imaging Vision Co., Ltd., China). Since the training set contains a large quantity of samples, each layer contains one million neurons (1000 × 1000), and there are five million neurons in total in the five-layer D2NN. The wavelength of the applied He–Ne laser was 632.8 nm. The power of the applied He–Ne laser was 5 mW. Each neuron had a size of approximately 4 μm and each layer had an area of 4 mm × 4 mm. The distance between two adjacent diffractive layers is approximately 5 cm. The details for the alignment of the five-layer D2NN can be found in Fig. S3 in Appendix A.

For the numerical testing of the 11 000 test images, the classification accuracy of the designed five-layer D2NN was determined to be 91.57%. The confusion matrix is given in Fig. 2(a) and shows the details and distribution of the correctly identified examples and the incorrectly identified examples. For the 50 digital objects fabricated by LDW, the experimental blind testing accuracy was found to be 84%. The relatively small reduction in the performance of the experimental network compared to the numerical testing indicates that the design theory is correct. The confusion matrix in Fig. 2(b) shows the experimental details for examples of correct and incorrect identification. A CCD with a specially designed light barrier was applied to each illuminated input object to obtain the D2NN output. The transmission regions for the light barrier correspond to the detector positions from No. 0 to No. X, respectively. The remaining regions are opaque. The first step in this test was to assess the recognition capability of the unchanged and changed handwritten test numbers. A handwritten 3, an altered handwritten 3, a handwritten 4, and a covered handwritten 4 were chosen as the input objects, as shown in Fig. 3(a). The simulated results and the experimental results in Fig. 3(b) indicate that the visible light D2NN can be used to easily classify the deformed object inputs. As shown in Fig. 3(c), the energy distribution shows that the system can identify the maximum optical signal for the correct detector. The second step in this test was to use four different forms of the handwritten digit 6 as input objects, as shown in Fig. 4. The simulated results and the experimental results in Fig. 4(c) demonstrate that the fabricated diffractive neural network and the inference capability are valid. The average intensity distribution at the output plane of the visible light D2NN can converge the maximum input energy to the corresponding detector assigned to that digit.

《Fig. 2》

Fig. 2. Confusion matrix for the simulated and experimental results. (a) Confusion matrix for the simulated results. Numerical testing of the five-layer D2NN design achieves a classification accuracy of 91.57% over around 11 000 different test images. (b) Confusion matrix for the experimental results obtained for 50 different handwritten digits prepared by LDW. The classification accuracy is approximately 84%.

《Fig. 3》

Fig. 3. Handwritten digit classifier for a visible light D2NN. (a) Objects under an optical microscope, including a handwritten 3, an altered handwritten 3, a handwritten 4, and a covered handwritten 4. Amp: amplitude. (b) Simulated results and experimental results showing that the handwritten digit classifier D2NN successfully classifies handwritten input digits based on 11 different detector regions at the output plane of the network, each corresponding to one digit. As an example, the output of the handwritten input of 3 and 4 are focused onto the No. 3 and No. 4 detectors, as indicated by the white arrows. The altered and covered handwritten input of 3 and 4 are all indicated by the No. X detector. Max: maximal. (c) Energy distribution percentage for our experimental results and simulated results, which demonstrates the success of the fabricated diffractive neural network and its inference capability.

《Fig. 4》

Fig. 4. Handwritten digit classifier for a visible light D2NN. (a) Objects under an optical microscope, including four different forms of the handwritten 6. (b) Simulated results and experimental results showing that the handwritten digit classifier D2NN successfully classifies different types of handwritten input digits. Four different forms of 6 were all focused onto the No. 6 detector, as indicated by the white arrows. (c) Energy distribution percentage for our experimental results and simulated results, which demonstrate the success of the fabricated diffractive neural network and its inference capability.

In summary, the proposed D2NN illuminated by a He–Ne laser was demonstrated to successfully recognize unchanged targets (from 0 to 9) and changed targets (i.e., targets that are covered or altered) at a visible wavelength of 632.8 nm. Additionally, the proposed visible D2NN system was shown to have a transfer learning ability, as shown in Fig. 5. When the laser is passed directly into the D2NN system without passing through any handwritten digits, the existing D2NN system [18–20] will still diffract light to a digital detector. This indicates that incident light is misjudged to be a number. When the laser is directly incident onto the proposed D2NN system, as shown in Fig. 5(a), the proposed visible light D2NN system focuses the incident light into the No. X detector, which indicates that an incorrect number has been identified, which is not part of the classification set. The experimental results shown in Fig. 5(b) show strong agreement with the simulated results.

《Fig. 5》

Fig. 5. Verification of the transfer learning ability of the proposed D2NN. (a) Simulated results for the light-field distribution in the output plane when a plane wave passes directly into our system without passing through any handwritten digits. Most of the light is concentrated in the No. X region, which indicates that an incorrect number or incorrect case has been identified. (b) Experimental results for the light-field distribution in the output plane. The experimental results are in good agreement with the simulated results.

The demonstrated D2NN can be used to address the contradictions that occur when adapting from a long wavelength to a visible light source. The quantitative analysis performed here demonstrates the building of a visible light D2NN and addresses the existing contradictions between wavelength, neuron size, and processing difficulty.

Connectivity between layers is a dominant factor that directly influences the diffraction of neurons. Therefore, the information transfer and the inference performance of the D2NN were determined. A fully connected D2NN can achieve sufficient information transfer and optical interconnection between neurons. A fully connected network requires that the diffraction angle of all neurons should be large enough to optically cover the diffractive optical element in the next layer. The maximal half-cone diffraction angle of a neuron (φmax) governed by wavelength and the neuron size can be qualitatively described for a fully connected structure as follows:

where df is the neuron size.

To obtain a large diffraction angle, it is necessary to have a small neuron size and a long wavelength. In previous work [18], a terahertz source wavelength of 0.75 mm, neuron size of approximately 0.4 mm, and maximal half-cone diffraction angle of approximately 70° were used. However, for visible light, the wavelength of the He–Ne laser used here is 632.8 nm, which is approximately 1200 times smaller than the terahertz wavelength. To obtain a 70° halfcone diffraction angle, the neuron size should be less than 330 nm, which is also 1200 times smaller than the neuron size for the terahertz bandwidth. The maximal unit size of 330 nm requires a complicated fabrication technique, which can result in a contradiction between wavelength, neuron size, and processing difficulty. Therefore, a general method should be applied when designing visible light D2NNs.

For a propagation distance D between two adjacent diffractive layers, the radius R of the diffraction spot of each neuron can be expressed as follows:

If df is the neuron size and N is the number of neurons in each diffractive layer, the side length w of each diffractive layer is

where the diffraction layer is assumed to be a square [18–20]. To obtain a fully connected D2NN, the condition of R ≥ w needs to be met. Therefore, based on the traditional maximal half-cone diffraction angle theory, a new formula is proposed as follows:

The improved formula can be used to quantitatively analyse the D2NN connectivity. A fully connected D2NN has better inference performance when the parameters satisfy Eq. (9). This formula indicates that the connectivity of a D2NN is affected by wavelength, neuron size, number of neurons, and the distance between layers. The contradiction between wavelength, neuron size, and processing difficulty can be alleviated by adjusting the number of neurons and spacing but using a longer wavelength. This formula provides a general case for the application of a D2NN to any wavelength.

The experimental parameters in Figs. 2–5 were chosen using Eq. (9) and the accuracy of this new formula was confirmed by the experimental results. To further verify the proposed formula, a series of simulation analyses were performed. To reduce the training time and complexity, a phase-only, five-layer D2NN was trained as a digital classifier to recognize only handwritten digits from 0 to 9 using a subset of the MNIST training image dataset, as shown in Fig. 6 (a). The training set contained 10 000 handwritten digit images (from 0 to 9); there were approximately 1000 images of each type of handwritten digit, which were randomly selected. These 10 000 input digits were encoded to the amplitude of the input field into the D2NN. The diffractive network was trained to map the input digits to ten detector regions, with one region for each digit. The classification criterion sought to find the detector with the maximum optical signal. After training, the D2NN digit classifier design was numerically tested using 500 images, which were also randomly selected from the MNIST test dataset and not contained within the training or validation image sets. The blind testing accuracy for the test set was used to verify the new proposed theory.

《Fig. 6》

Fig. 6. Schematic diagram of the D2NN classifier used to verify the improved Eq. (9). (a) Schematic diagram of the D2NN classifier used to reduce the training time for the unchanged handwritten digit targets from 0 to 9. The location of each detector is displayed. (b) The fitting curve. (c) Four cases. The blind testing accuracy for different parameters in (c) were investigated and the confusion matrix is shown in (d) case 1, (e) case 2, (f) case 3, and (g) case 4, which demonstrate the success of the revised theory.

A quantitative analysis of the D2NN connectivity was performed, and the fitting curve for Eq. (9) is given in Fig. 6(b). For example, in order to prevent overfitting, the number of neurons in each layer was assumed to be 6400 (80 × 80) based on previous experience. The connectivity space for the D2NN was divided using Fig. 6(b), taking the relationship between the wavelength, distance, and neuron size into consideration. Once the parameters are within or above the fitting curve, as indicated by the green arrow, the D2NN will realize full connectivity and perfect inference is guaranteed. For the case marked by the red arrow, the D2NN cannot achieve full connectivity. In Fig. 6(b), cases 1 and 2 are within the fitting curve, case 3 is above the fitting curve, and case 4 is below the fitting curve. The blind testing accuracies for cases 1 to 3 are all above 90%, while the accuracy for case 4 is approximately 0.1%. The confusion matrix for cases 1–4 are shown in detail in Figs. 6(d)–(g), respectively. These results prove the accuracy of the improved theory for the connectivity. The improved theory can offer a quantitative analysis for building a D2NN and demonstrates the performance advantages of a fully connected D2NN. The simulated results shown in Fig. 6(c) are consistent with previous studies [18–20]. By comparing cases 1 and 2, it can be seen that the proposed fully connected theory can overcome the contradiction between neuron size and processing difficulty by adjustment of the distance. The D2NN performance over a long distance D (5.7 × 103 ) and a large neuron size df (6) is consistent with that for a short distance D (15) and a small neuron size d(0.53), since the neuron size df is adjusted based on the distance D, which reduces the processing difficulty. We also analysed the influence of alignment errors between diffractive layers and the phase depth error for the diffractive layer. Further details can be found in Figs. S4 and S5 in Appendix A.

《4. Discussion and conclusions》

4. Discussion and conclusions

In this work, a general model for a D2NN at visible wavelengths was proposed. A visible wavelength D2NN can be used to overcome some of the drawbacks of the terahertz bandwidth and has many potential practical applications [25–29]. However, there are some restrictions and shortcomings, which make it challenging to change the bandwidth from terahertz to the visible light region. The first difficulty is the contradiction between the working wavelength, neuron size, and fabrication limitations. Shorter wavelengths require smaller neuron sizes, which make the processing more complex. A general theory that includes a revised formula was proposed to overcome these contradictions. A series of simulation analyses were designed that were able to successfully verify the proposed formula. Based on this theory, a novel visible light D2NN classifier was used to recognize unchanged targets (handwritten images of digits ranging from 0 to 9) as well as changed targets (i.e., covered or altered targets) at a visible wavelength of 632.8 nm. A numerical classification accuracy of 91.57% was obtained and is highly matched with an experimental classification accuracy of 84%, proving that both the theoretical analysis and the designed system can be successfully used.

Although there has been some recent success implementing deep neural networks for optical platforms [18–22,24], an alloptical design has not yet been fully demonstrated and realized. For example, computers are still required for the training process and the advantages of low energy consumption and high speed offered by optical information processing have not yet been realized. Additionally, applications for optical deep learning techniques are still emerging and many early attempts [18–20] use standard machine learning models, which may not be the best choice for an optical deep learning design. Other learning paradigms, such as unsupervised learning [37], generative adversarial networks [38], and reinforcement learning [39,40], should also be integrated into an optical neural network. It is expected that faster and more accurate optical deep learning frameworks will be proposed in the future, which may be able to offer capabilities that go beyond even current human knowledge.

《Acknowledgements》

Acknowledgements

This research was supported in part by National Natural Science Foundation of China (61675056 and 61875048).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Hang Chen, Jianan Feng, Minwei Jiang, Yiqun Wang, Jie Lin, Jiubin Tan, and Peng Jin declare that they have no conflict of interest or financial conflicts to disclose.

《Appendix A. Supplementary data》

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.eng.2020.07.032.