《1. Introduction》

1. Introduction

Video studies have attracted an increasing amount of attention from researchers in the computer vision community in recent years. Lately, research into topics such as object tracking [1–3], gait recognition [4,5], and activity recognition [6–8] have achieved competitive results and demonstrated promise for the future.

Abnormal event detection, which involves detecting the specific frames in a video that contains an anomaly, is one of the hottest research issues in the video field. In comparison with the tasks mentioned above, abnormal event detection has greater significance for national security and for people’s lives with the modernization of society, an increasing number of surveillance cameras are being deployed in various places, producing an enormous quantity of video every second. This much data is impossible for a human to deal with, or to determine any abnormal events contained within. However, missing even one anomaly in a surveillance video could result in unbearable loss. Thus, there is a need for the construction of an automatic video abnormality detector that can deal with millions of videos frames and can alert people in order to enable a timely and effective response when an anomaly happens.

Ref. [9] describes the many difficulties inherent in anomaly detection. Although it is simple enough to list a few types of abnormalities for a specific scene, such as the presence of a car or a bicyclist in a pedestrian crowd, it is impractical to enumerate all the abnormal events that could be possible within that scene; thus, there are countless positive classes in this classification task. Furthermore, due to a lack of abnormal samples—that is, video frames including abnormal events—the training set is severely imbalanced, making it infeasible to train a model for multi-class classification. All these difficulties indicate that the anomaly detection task is a one-class classification problem that is hard to handle.

Some methods have been suggested to deal with abnormal event detection. For example, Ref. [10] proposes a method based on histograms of the optical flow orientation descriptor. As the handcrafted feature descriptor in this case was constructed based on human experience, it did not represent the feature in a training process. Thus, it performs worse than current deep learning methods. Deep learning methods have been recently developed, as described in Refs. [11,12], largely due to the availability of big data and efficient hardware. Such methods are intensively applied in the computer vision field and have achieved great results. In Ref. [13], Wang et al. used a convolution neural network (CNN) for defect detection in product quality control. However, the original CNN for face recognition is not applicable to this task, because its training requires samples of different classes. Considering the success of the principal component analysis network (PCAnet) [14] in image classification, Ref. [15] proposes a PCAnet-based method to extract information from raw images for anomaly detection, with a one-class classifier constructed on a clustering algorithm. However, this method has natural limitations due to the K-medoids clustering algorithm, which has difficulty dealing with the highdimensional features extracted by the PCAnet. In this paper, we propose a self-supervised network, termed the abnormal event detection network (AED-Net), to deal with the task of video anomaly detection, with only normal samples being provided as training data. Since PCAnet has been demonstrated as being able to extract features as an unsupervised model, it was chosen for our selfsupervised AED-Net. In addition, a one-class classifier is used to handle the extracted high-dimensional features in order to determine the abnormality of frames.

To be more specific, this new self-supervised network uses optical flow maps as the input, because these maps are well-suited to representing motion. Next, high-level semantics of a crowd’s situation can be extracted from the PCAnet. Subsequently, a simple but effective one-classifier kernel principal component analysis (kPCA) [16] is used to classify the high-dimensional features. Having the advantages of both networks, AED-Net is trained to understand each frame and conduct detection. More importantly, a local response normalization (LRN) layer (a technique used in CNN to aid generalization) is incorporated to improve the AED-Net. It is worth noting that this new network can be trained with unlabeled data and performs better in comparison with state-of-the-art methods in an abnormal detection task. Our new self-supervised network can effectively detect abnormal events even in crowded situations, which improves the detection results according to the experiments tested on the public Monitoring Human Activity dataset from University of Minnesota (UMN dataset) and Anomaly Detection dataset from University of California, San Diego (UCSD dataset).

The rest of this paper is organized as follows. Section 2 provides a brief review of related works, and Section 3 reviews the basic algorithms of our framework, the PCAnet and kPCA. Next, the complete architecture of AED-Net is elaborated in Section 4, along with our improvements to it. Section 5 illustrates and discusses the experimental results of the UMN [17] and UCSD [18] datasets. Finally, Section 6 presents our conclusions.

《2. Related work》

2. Related work

In general, traditional methods for anomaly detection can be divided into two major classes. The first class is based on trajectory, and has been widely used in abnormal events detection [19–22]. In Refs. [23–25], the authors extract the trajectory of normal events to indicate normal modes; trajectories that differ from the normal patterns are then considered abnormal. However, occlusion between moving objects affects the effectiveness of this method when it is applied to crowded scenes. To tackle this problem, a new model is suggested in Ref. [26] to deal with the interrelatedness of human behavior and to ameliorate the representation of objects’ interaction. In Ref. [27], a discrete transformation is utilized to develop a reliable multi-target tracking algorithm that associates objects in different frames. However, the occlusion problem affects the results so much that the methods listed above do not address this issue in an effective way. Hence, the tracking strategy is not adopted in our work.

The spatiotemporal method falls into another category. Promising research on this method has been proposed. In Ref. [28], Wang et al. propose a covariance matrix as a feature descriptor, which encodes the optical flow and partial derivatives of adjacent frames. In Refs. [29–32], the authors model motion patterns with histograms of pixel changes. In Refs. [33–35], distributions of optical flow are used as the basic features, and models for detecting abnormal events are then built based on optic flow features. Ref. [36] proposes an approach to estimate the interaction between moving objects. Another study [9] uses a detector that combines time and space anomalies. The wavelet transform used in image processing can also be utilized to analyze motion [37,38]. In those cases delicate feature descriptors were designed manually, and tended to work well only under specified conditions. In our work, the features are extracted by a self-supervised network.

With its rapid development, deep learning has recently achieved outstanding results in the field of abnormal event detection. Unlike the features of manual design, features extracted by a deep learning network are obtained through a learning process. In the proposed AED-Net, a self-supervised learning method is proposed for abnormal event detection, which involves only normal samples being learned.

《3. Self-supervised feature extraction and anomaly detection》

3. Self-supervised feature extraction and anomaly detection

Self-supervised learning is a learning paradigm in which there is no external supervised information—that is, labels—as ground truth beyond the data itself. Under this paradigm, the selfsupervised learning method simply adopts the raw data as the material for training, which means that the model learns to extract latent supervised information in the data. Data categories are not employed in the training process.

The self-learning model is applicable to the anomaly detection task. Since we can only use normal data to train the model, no external supervised information is given to the model. Thus, the model must fully understand what a normal datum is from the input video clips, and then use it as supervised information to tune its parameters. Table 1 introduces the notations used in this paper.

《Table 1 》

Table 1 A description of the notations used in this paper.

《3.1. PCAnet for feature extraction》

3.1. PCAnet for feature extraction

Both traditional and deep learning methods have been applied for feature extraction from video frames. In Ref. [10], the global optical flow descriptor is used as the feature. However, optical flow only contains low-level motion information in the frames; highlevel information features such as people’s running pattern, or how many people are in the frame, cannot be represented by it. Thus, the deep learning method is used to deal with this highlevel feature extraction problem. The most popular model is CNN, which stack layers and extract deeper and deeper features step by step. However, that particular model requires strong external supervised information, which is not provided in our task. Thus, we chose the PCAnet [14], an equivalent model in feature extraction that utilizes the power of deep learning without requiring external supervised information.

PCAnet [14] is a deep learning network that has been proposed within the prevailing trend of deep learning. Although it is simple in comparison with other popular deep learning networks, such as the deep CNN, PCAnet is capable enough to handle challenging tasks such as face recognition. Thus, this model was chosen for its efficiency and competitive ability in feature extraction.

PCAnet is a cascaded linear network. A typical two-stage PCAnet architecture is shown in Fig. 1. Because it is inspired by CNN, each stage of PCAnet consists of an independent principal component analysis (PCA) filter bank that must be learned in order to perform feature extraction work. Feature maps in the first stage are linearly cascaded to the next stage to extract higher-level features. As discussed by Chan et al. [14], the performance corresponding to the number of stages shows that although a two-stage network performs better than a one-stage network, networks with more than two stages have few advantages over a two-stage network; therefore, a two-stage PCAnet is sufficient for the task at hand for the benefit of computation efficiency.

《Fig. 1》

Fig. 1. Typical structure of the two-stage PCAnet used in our method. Conv: convolution.

A two-stage PCAnet was therefore used to extract features. In the training phase, at the beginning of Stage 1, an optical flow map Ii with the shape h × w is sampled around each pixel to small patches sized k1 × k2, as shown in Fig. 1 with the upper gray arrows. Next, the samples,  ,……,patch,are vectorized and compose sample matrix . We then perform mean subtraction to X to obtain . (See Table 1 for a list of all the notations used in this paper.)

For N input optical flow maps, , PCAnet initially samples them to obtain the following:

Next, PCAnet computes L1 convolution kernels based on I by implementing PCA, as shown by the lower gray arrow in Fig. 1, to obtain the following:

where denotes the lth principal eigenvector of , and vec2mat(·) maps a vector from to a matrix . At the end of Stage 1, the convolution operation is performed to extract features:

where * denotes two-dimensional (2D) convolution, refers to the feature map of the ith input , and the number of outputs in Stage 1 is L1N. Note that the boundary of is zero-padded in order to ensure that the outputs have the same size as the input—that is, h × w. As implied, in the test phase, PCAnet will directly perform a convolution operation on inputs I using kernels obtained from the training phase.

Stage 2 is conducted in almost the same way as Stage 1. In the training phase, each input of C is sampled to patches. These patches are vectorized and compose matrix S2 after mean subtraction is performed:

whererefers to the sample matrix of . We then compute convolution kernels in Stage 2:

Finally, we obtain the outputs of Stage 2 by convolution:

The number of outputs in Stage 2 is L2 L1N.

After Stage 2, we binarize the output by the Heaviside step function H(·) , assigning 1 for positive entries and 0 for zero or negative entries. This enables the network to have nonlinearity. Thus, the network is capable of capturing high-level semantics in the optical flow maps. Each of these L1 inputs of the second stage has L2 real-valued outputs in the second stage  . Around each pixel, there are L2 binary bits; we can view them as a decimal number, converting the L2 outputs to a single integer-valued image:

Finally, the output features of PCAnet are block histograms (with bins) computed based on all . Note that one histogram does not represent the whole , but a region of it. To do this, is partitioned into B blocks and then used to calculate the histogram. A histogram is computed in each block. Next, all the histograms are concatenated into one vector, Bhist ( ). For single-input optical image , the feature is as follows:

The local block can be either overlapping or non-overlapping. The latter setting is beneficial for detection except for face detection [14], so it is set to non-overlapping in this paper. Besides the overlapping choice, the hyperparameters of the PCAnet also include the filter size k1; k2, the number of filters in each stage L1L2, and the block size for local histograms.

《3.2. A self-supervised learning method for anomaly detection: kPCA》

3.2. A self-supervised learning method for anomaly detection: kPCA

Because we can only utilize video sequence of normal scenes, and it is necessary to distinguish normal frames from abnormal frames with previously unknown anomalies, it is appropriate to class this task as one-class classification.

The common idea in one-class classification task is to train a classifier that encloses the training data—that is, the normal data—and thereby separate the abnormal data from the normal data. The support vector domain description (SVDD) classifier is a good example of this method. However, this classifier often generates a too-large decision boundary that hinders good performance. Using Gaussian process priors, Kemmler et al. [39] built a model for one-class classification that uses different measures derived from Gaussian process regression and approximate Gaussian classification. However, this model strongly relies on hyperparameter tuning of the re-parameterized kernel function.

In contrast, by learning the distribution of data, which is usually nonlinear, a kPCA classifier [16] can generate a decision boundary smoothly following the distribution of data, and tends to classify more accurately.

The structure of a kPCA classifier is shown in Fig. 2. The essential idea of this one-class classifier is that the features of normal frames have a similar distribution, while the features of abnormal frames have a very different distribution. Thus, after using PCA filters that were computed based on training features—that is, normal features—in order to perform PCA on both normal features and abnormal features, we were able to observe a clear difference in reconstruction error between normal features and abnormal features. The classification could then be conducted according to this disparity.

《Fig. 2》

Fig. 2. The structure of a one-class classifier: kPCA.

As discussed by Hoffmann [16], PCA cannot capture the nonlinear structure of input. Hence, kPCA is introduced to overcome this drawback, as it maps input to feature in higherdimensional space: . PCA is then performed in the feature space. Computation here only requires the scalar product of —that is, . The scalar product is further replaced by the kernel function to perform the same task. Here, the kernel function uses the Gaussian kernel . Furthermore, we obtain from by performing mean subtraction, which can further represent , the eigenvectors of the covariance matrix in higherdimensional space. Thus, can be expressed by as follows:

It turns out that ,where ,is an eigenvalue of kernel matrix  . Each component of  —that is, is a scalar product of  and  . Similarly, each component of kernel matrix V—that is, V i;j —is a scalar product of and .  Thus,

According to Hoffmann [16], given feature FZ , the reconstruction error is calculated in feature space as follows:

where . The equation above can then be expressed more clearly, as follows:

In the equation above, is expressed as follows:

Hence, we obtain the desired form of measurement  to detect the anomaly.

The hyperparameters in this classifier are the number of eigenvectors q and the kernel width . Their values depend on the specific experiment environment.

Finally, given an input X and extracted feature Fx, we define the classifier as follows:

The threshold above is the maximum reconstruction error computed in the training phase, as shown in Fig. 2.

《4. Proposed AED-Net》

4. Proposed AED-Net

Given the task of anomaly detection in video frames, we propose AED-Net, an integral self-supervised detection framework based on the self-supervised learning method that trains on normal data. To perform the feature extraction task based on input video frames, PCAnet is adopted as an effective network. For oneclass classification, we then use kPCA, a particular one-class classifier, to determine the abnormality of the frames.

《4.1. Optical flow computation》

4.1. Optical flow computation

Initially, we obtain raw video frames, S. To detect the abnormal events in these frames, the moving area should first be separated from the static background in S in order to simplify the detection task. Optical flow, which represents the motion field between frames [40], is applicable to this motion extraction requirement.

The Horn–Schunck (H–S) method [41] can be used to compute optical flow. Considering the constraints of pixel value consistency and flow variety across the image, this method constructs an energy function and optimizes it to obtain optical flow in the form of u and v [41], which are the horizontal and vertical components of the optical flow. The constraint of smoothness is added to the function in order to mitigate the aperture problem. The proposed energy function is as follows:

where E is the global energy; Ix, Iy, and It are the pixel values across the width direction, height direction, and time direction; and α is the hyperparameter controlling the smooth term.

Next, in order to process the optical flow feature as an image is processed, we visualize the optical flow u; v and obtain optical flow maps, I, using the Munsell Color System.

《4.2. AED-Net》

4.2. AED-Net

On an intuitive level, the anomaly detection task in our proposed AED-Net is to assign a score indicating abnormality to each frame of video. During a training phase, the largest reconstruction error should be set as the threshold for anomaly detection. Thus, in the testing phase, the abnormality of the test frames can be determined by comparing the score of the test frames with the threshold. To fulfill this task, we incorporate both PCAnet and kPCA to build AED-Net.

The framework of our proposed AED-Net is shown in Fig. 3, and the proposed algorithm of AED-Net is shown in Algorithm 1. First, optical flow maps, I are used as input of the whole framework for training and testing. Next, the PCAnet model is trained to learn to extract high-level information that better represents the situation of the scenes from the spatiotemporal features. Finally, utilizing the block-wise histograms as classification features extracted by PCAnet, kPCA is trained to learn the nonlinear data distribution of normal scenes and to determine the max normality score as the threshold computing by reconstruction error.

《Fig. 3》

Fig. 3. Architecture of the whole framework.

During the test time, in order to minimize the influence of frames that carry little relevant information, foreground detection is first performed and frames in the test video clip that contain few people are removed. Next, k block-wise features are extracted by the PCAnet trained previously, and a test score is computed for every frame by kPCA. Finally, the test score is compared with the max normality score to determine whether the frame is abnormal.

《4.3. Improved PCAnet with normalization technique》

4.3. Improved PCAnet with normalization technique

In the machine learning field, the generalization of an algorithm is an important but difficult task that measures the algorithm’s performance on new data. Nowadays, the most popular and effective normalization technique in the deep learning field is batch normalization (BN) [42]. BN improves the network’s generalization ability such that when given a sample as input, the output is determined by a whole mini-batch; thus, it never produces a deterministic output for a sample. The role of BN in elevating a model’s generalization ability has been proven experimentally [42]. However, BN is not applicable to our self-supervised model because it has two trainable parameters in the implementation: γ and β. In AED-Net, we could not find ways to train these parameters. Besides, we do not feed data by mini-batches in our method. However, LRN, a light-weight normalization technique with no trainable parameters, is applicable to our task and achieved good results in the experiments.

Proposed by Krizhevsky et al. [43], the LRN scheme has been found to aid the generalization ability of a model. Response competition among contiguous outputs with the same spatial position is introduced. For an output value on the ith feature map, the normalized output can be calculated as follows:

where are configurable parameters; denotes the weights on outputs of adjacent frames; is the bias term for computational safety; controls the total magnitude of the normalization term; and denotes how many adjacent frames are included in the normalization. The feature maps of a network are arranged once the network is initialized.

We introduced this scheme from CNN to PCAnet in order to improve the model’s ability to generalize. It is added after computing the feature maps by convolution operation at each stage. In addition, the LRN parameters are all set intuitively before training and are not learnable, making the LRN suitable for our unsupervised framework.

《5. Experiments》

5. Experiments

We carried out experiments on the UMN dataset [17] and the UCSD Ped1 and Ped2 datasets [18] for local abnormal event detection. These public datasets, which are open to the entire research community, were used to evaluate the proposed AED-Net with different criteria: the frame-level criterion and the pixel-level criterion. The UMN dataset was used to evaluate the model’s capacity with the frame-level criterion, the UCSD Ped1 and Ped2 datasets were used to evaluate it with both pixel-level criterion and frame-level criterion. Both the evaluation criteria are based on truth-positive rates (TPR) and false-positive rates (FPR), in which "abnormal events” are denoted as "positive,” while "normal status” are denoted as "negative.” The results of the experiments were compared with other state-of-the-art methods, and demonstrated the superiority of our method.

《5.1. Detection performance on the UMN dataset》

5.1. Detection performance on the UMN dataset

The UMN dataset [17] is composed of three scenes—namely, a lawn, interior, and plaza—with a resolution of 240 × 320. All scenes are related to the escaping action of crowds. In this dataset [17], the evacuation behaviors of crowds are assigned as abnormal. We detect the anomalism of each frame, which is measured by frame-level criteria. Fig. 4 shows a couple of frames from each UMN scene. For computational efficiency, all optical flow maps extracted from the original video frames are resized to small sizes, which have been proved to contain sufficient information for detection.

《Fig. 4》

Fig. 4. Examples of video frame for three scenes. (a, d) show a scene on a lawn, (b, e) show an indoor scene, and (c, f) show a scene in a plaza. The evacuation behaviors of crowds (d–f) are assigned as abnormal.

Foreground detection is used in this experiment to avoid the disturbance of no-meaning frames. Frames that contain fewer than three whole human body motion shapes, as shown in Fig. 5, are detected directly in our work by measuring moving foreground blobs.

《Fig. 5》

Fig. 5. Examples of abnormal video frames detected by considering the area of foreground in the frame due to its disturbance to detection. As before, (a) shows a scene on a lawn, (b) shows an indoor scene, and (c) shows a scene in a plaza.

To improve the generalization ability of the AED-Net, a data augmentation technique is adopted in this experiment. An optical flow map is first resized to 120 × 160 and nine sub-maps sized 96 × 128 are cut from the resized map. Next, all ten maps (one of 120 × 160 and nine of 96 × 128) are resized uniformly to 24 × 32 for training and testing.

After removing interfering frames, we construct a training set and test set for each scene. 760 normal frames in the scene on the lawn are used for training, which forms a training set of 7600, while other normal and abnormal frames are used for testing. For the indoors scene and the plaza scene, the number of frames for training are 1100 and 1000, respectively.

For all three scenes, the hyperparameters in AED-Net are set as follows: the filter at each stage is sized as 3 × 3. Both stages have eight filters to reserved enough variance. The final block size is 8 × 8. The hyperparameters in the classifier, kernel size , and number of filters , differ for each scene. They are set at (1, 2800) , (1, 3800) , and (0.25, 4200) for for the scene on the lawn, indoor, and on the plaza, respectively, after crossvalidation. The receiver operating characteristic (ROC) curve, area under curve (AUC), and equal error rate (EER) are analyzed with the frame-level criterion. When plotting the ROC curve, the threshold for determining the anomalism of the frames is altered. The results, along with comparisons with other methods, are presented in Table 2 [9,15,23,34,36]. As shown in Table 2, our method achieves respectable results on frame-level anomaly detection as measured by both AUC and EER. Given the simplicity of whole framework, this result is remarkable, and is better than the state-of-the-art methods.

《Table 2》

Table 2 Results comparison on the UMN dataset.

Bold values indicate the present work study of this paper. SF: social force.

《5.2. Detection performance on the UCSD dataset》

5.2. Detection performance on the UCSD dataset

The UCSD dataset [18] contains video clips with a resolution of 158 × 238 obtained from a camera hung above pedestrian walkways. There are 34 training samples and 36 test samples in the Ped1 scene, and 16 training samples and 12 testing samples in the Ped2 scene, which includes people walking in different directions. The video clips that are labeled as abnormal have single anomalies such as a car, bicyclist, and so on. One of the frames with a car anomaly is shown in Fig. 6. Each video frame is partitioned into patches sized 12 × 16, which contain part of either walking people or the anomaly. These patches are then utilized as raw data. Assigning the anomalism of these patches is called “anomaly detection on pixel-level criteria” because it involves classifying the abnormality of a different section of the pixels of a frame.

《Fig. 6》

Fig. 6. Examples of frames of video clips containing an anomaly. (a) Frame of video clip with the anomaly of a bicyclist; (b) frame of video clip with the anomaly of a car.

Similar to previous experiments, foreground detection is performed here to avoid disturbance. After that, normal patches from the video frames containing the anomaly of a bicyclist are used as the training set, and abnormal patches from two frames of two video clips are used as the test set. The hyperparameters in AED-Net are set as follows: k1 = k2 = 5, L1= L2 = 7, and block size 7 × 7 for experiments. The hyperparameters in the kPCA classifier are set as follows: 0ð Þ :8; 1350 for .

Ped1 pixel-level and frame-level results, along with a comparison with other methods, are shown in Fig. 7 and Table 3 [9,18,28,34,36]. Ped2 pixel-level and frame-level results are shown in Table 4 [9,18,36]. In all the experiments, the proposed framework outperforms the state-of-the-art methods, especially in terms of AUC.

《Fig. 7》

Fig. 7. Results for the Ped1 scene. (a) Pixel-level ROC for Ped1; (b) frame-level ROC for Ped1.

《Table 3》

Table 3 Results comparison for the UCSD Ped1 scene.

MPPCA: mixture of probabilistic principal component analyzers; CDAE: covariance matrix of optical flow features for detection of abnormal events.

《Table 4》

Table 4 Results comparison for the UCSD Ped2 scene.

《5.3. Experiments on the improved AED-Net》

5.3. Experiments on the improved AED-Net

After adding a LRN layer to the PCAnet, the whole framework was tested on the UCSD dataset, using the same experimental setting as the previous one used on the UCSD dataset. The hyperparameters of LRN were set as γ= 2, δ=1 × 10-4 , where n = 5 and β= 0:75.

The results (shown in Fig. 7 and Tables 3 and 4) indicate that after the addition of the LRN, the whole framework shows better performance in detecting anomalies as measured by both AUC and EER. These findings indicate that this strategy improves our method by promoting its generalization ability.

《6. Conclusion》

6. Conclusion

In this work, we propose a simple but efficient framework, AEDNet, based on a self-supervised learning method. Raw data from surveillance video clips are used to calculate optical flow maps; their high-level features are then extracted by PCAnet, which is further used to determine the anomalism of local abnormal events and global abnormal events. The experimental results show that the framework performs well in detecting both global abnormal events and local abnormal events. Furthermore, after a LRN layer was added to address the overfitting problem, the performance of this framework improved. The framework achieves results that are better than state-of-the-art methods, indicating that it can effectively extract motion patterns from raw video and use them to detect anomalies.

《Acknowledgements》

Acknowledgements

This work is partially supported by the National Key Research and Development Program of China (2016YFE0204200), the National Natural Science Foundation of China (61503017), the Fundamental Research Funds for the Central Universities (YWF-18-BJ-J-221), the Aeronautical Science Foundation of China (2016ZC51022), and the Platform CAPSEC (capteurs pour la sécurité) funded by Région Champagne-Ardenne and FEDER (fonds européen de développement régional).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Tian Wang, Zichen Miao, Yuxin Chen, Yi Zhou, Guangcun Shan, and Hichem Snoussi declare that they have no conflict of interest or financial conflicts to disclose.