Resource Type

Journal Article 79

Year

2023 12

2022 19

2021 17

2020 4

2019 6

2018 4

2017 7

2016 1

2015 1

2008 1

2007 2

2006 1

2004 1

2002 1

2000 1

1999 1

open ︾

Keywords

Computer vision 6

Deep learning 6

Artificial intelligence 5

application scenarios 3

Optical flow 2

Simultaneous localization and mapping (SLAM) 2

Visual tracking 2

3D parametric model 1

3D visual knowledge 1

5G technology 1

AI root technology 1

API protocol mining 1

ATM research and development 1

ATM solution 1

Active vision 1

Additive manufacturing 1

Adsorption 1

Aerial manipulation 1

Air traffic management (ATM) 1

open ︾

Search scope:

排序: Display mode:

On visual understanding Perspective

Yunhe PAN

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 9,   Pages 1287-1289 doi: 10.1631/FITEE.2130000

Abstract: 1 Problems and development in the field of visual recognition From the beginning of artificial intelligence (AI), pattern recognition has been an important aspect of the field. In recent years, the maturity of deep neural networks (DNNs) has significantly improved the accuracy of visual recognition. DNN has been widely used in applications such as medical image classification, vehicle identification, and facial recognition, and has thus promoted the development of the AI industry to a climax. However, there are currently critical defects in visual recognition based on DNN technology. For example, these networks usually require a very large amount of labeled training data, and have weak cross-domain transferability and task generalization. Their learning and reasoning processes are still hard to understand, which leads to unexplainable predictions. These challenges present an obstacle to the development of AI research and application. If we look at the current visual recognition technology from a larger and broader perspective, we can find that the above defects are fundamental, because the currently used DNN model needs to be trained with a large amount of labeled visual data, and then used in the process of visual recognition. In essence, it is a classification process based on data statistics and pattern matching (), so it is heavily dependent on training sample distribution. However, to have interpretability and transferability, visual classification is not good enough, while visual understanding becomes indispensable. 2 Three-step model of visual understanding Visual recognition is not equivalent to visual understanding. We propose that there are three steps in visual understanding, of which classification is only the first. After classification, one proceeds to the second step: visual parsing. In the process of visual parsing, the components of the visual object and their structural relationship are further identified and compared. Identification involves finding components and structures in visual data that correspond to the components and structures of known visual concepts. Parsing verifies the correctness of the classification results and establishes the structure of visual object data. After completing visual parsing, one proceeds to the third step: visual simulation. In this step, predictive motion simulation and operations including causal reasoning are carried out on the structure of the visual objects to judge the rationality of meeting physical constraints in reality, so as to verify the previous recognition and parsing results. We can take a picture of a cat as an example to illustrate the modeling process of visual understanding. The process is as follows: 1. Recognition: It is a cat. Extract the visual concept of the cat and proceed to the next step; otherwise, stop here. 2. Parsing: Based on the structure contained in the visual concept, identify whether the cat’s head, body, feet, tail, and their relationships are suitable for the cat concept. If not, return to step 1 for re-identification; if yes, proceed to the next step. 3. Simulation: Simulate various activities of the cat to investigate whether the cat’s activities in various environments can be completed reasonably. If not, return to step 2; if yes, proceed to the next step. 4. End visual understanding: Incorporate the processed structured data into the knowledge about cats. 3 Characteristics of the three-step visual understanding model To further understand the above-mentioned three-step visual understanding model, we will further discuss some of its characteristics: 1. The key step in visual understanding is visual parsing. This is an identification of the components contained in the object according to a conceptual structure based on the visual concept (), obtained by visual recognition. Parsing a visual object, in order from top to bottom, is a process of identifying and constructing visual data from the root of the concept tree to the branches and leaves. 2. Human visual parsing tasks are often aimed only at the main components of concepts. The main components have existing, commonly used names. For subsidiary parts that have not been described in language, such as the area between the cheekbones and chin of the face, only experts specialized in anatomy (such as doctors or artists) have professional concepts and memories. Therefore, visual parsing is a cross-media () process that incorporates multiple knowledge () including vision and language. 3. Visual knowledge () is essential for visual parsing and visual simulation, because the visual concept structure provides a reliable source for component identification and comparison. Parents and teachers play a large role in establishing visual knowledge. When they say to a child, “Look, this is a kitten. Kittens have pointed ears, round eyes, long whiskers, and four short legs. When they run fast and leap high, they can catch a mouse,” they are guiding children in constructing basic visual knowledge in their long-term memory. 4. Visual data that have been understood have actually been structured to form visual knowledge. Such visual knowledge can easily be incorporated into long-term memory. For example, when one sees a cat whose head is very small, or whose fur color and markings are unusual, or who has a particular gait, this information may be included in one’s “cat” memory by expanding the concept of “cat” (). The category of visual concepts is very important, and its extent reflects the general degree of knowledge. In fact, it is not always useful to collect a large amount of sample data to train a DNN model. However, the more widely distributed and balanced the data are within a concept category, the better, because the robustness and generalization ability of the model trained based on such sample data are stronger. 5. The learned visual information can naturally be explained, because it has deep structural cognition; it can also be used for transfer learning because the semantic concepts have cross-media relevance. This semantic information can clearly indicate the reasonable direction of transferable recognition. 4 Advancing visual recognition to visual understanding Visual understanding is important, because it can potentially work with visual knowledge () and multiple knowledge representation () to open a new door to AI research. Visual understanding involves not only in-depth visual recognition, but also thorough learning and application of visual knowledge (). AI researchers have been studying visual recognition for more than half a century. Speech recognition, a research task started in parallel with visual recognition, moved on to analysis of words, sentences, and paragraphs quite early, and has successfully developed human-computer dialogue and machine translation, setting a well-known milestone. Therefore, we suggest that it is necessary to advance visual recognition to visual understanding, and that this is an appropriate time to target this deeper visual intelligence behavior.

混合-增强智能:协作与认知 Review

南宁 郑,子熠 刘,鹏举 任,永强 马,仕韬 陈,思雨 余,建儒 薛,霸东 陈,飞跃 王

Frontiers of Information Technology & Electronic Engineering 2017, Volume 18, Issue 2,   Pages 153-179 doi: 10.1631/FITEE.1700053

Abstract: 本文讨论人机协同的混合-增强智能的基本框架,以及基于认知计算的混合-增强智能的基本要素:直觉推理与因果模型、记忆和知识演化;特别论述了直觉推理在复杂问题求解中的作用和基本原理,以及基于记忆与推理的视觉场景理解的认知学习网络

Keywords: 人-机协同;混合增强智能;认知计算;直觉推理;因果模型;认知映射;视觉场景理解;自主驾驶汽车    

Visual knowledge: an attempt to explore machine creativity Perspectives

Yueting Zhuang, Siliang Tang,yzhuang@zju.edu.cn,siliang@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5,   Pages 615-766 doi: 10.1631/FITEE.2100116

Abstract: 长期以来困扰人工智能领域的一个问题是:人工智能是否具有创造力,或者说,算法的推理过程是否可以具有创造性。本文从思维科学的角度探讨人工智能创造力的问题。首先,列举形象思维推理的相关研究;然后,重点介绍一种特殊的视觉知识表示形式,即视觉场景图;最后,详细介绍视觉场景图构造问题与潜在应用。所有证据表明,视觉知识和视觉思维不仅可以改善当前人工智能任务的性能,而且可以用于机器创造力的实践。

Keywords: 思维科学;形象思维推理;视觉知识表达;视觉场景图    

Unsupervised object detection with scene-adaptive concept learning Research Articles

Shiliang Pu, Wei Zhao, Weijie Chen, Shicai Yang, Di Xie, Yunhe Pan,xiedi@hikvision.com

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5,   Pages 615-766 doi: 10.1631/FITEE.2000567

Abstract: Object detection is one of the hottest research directions in computer vision, has already made impressive progress in academia, and has many valuable applications in the industry. However, the mainstream detection methods still have two shortcomings: (1) even a model that is well trained using large amounts of data still cannot generally be used across different kinds of scenes; (2) once a model is deployed, it cannot autonomously evolve along with the accumulated unlabeled scene data. To address these problems, and inspired by theory, we propose a novel scene-adaptive evolution algorithm that can decrease the impact of scene changes through the concept of object groups. We first extract a large number of object proposals from unlabeled data through a pre-trained detection model. Second, we build the dictionary of object concepts by clustering the proposals, in which each cluster center represents an object prototype. Third, we look into the relations between different clusters and the object information of different groups, and propose a graph-based group information propagation strategy to determine the category of an object concept, which can effectively distinguish positive and negative proposals. With these pseudo labels, we can easily fine-tune the pre-trained model. The effectiveness of the proposed method is verified by performing different experiments, and the significant improvements are achieved.

Keywords: 视觉知识;无监督视频目标检测;场景自适应学习    

Miniaturized five fundamental issues about visual knowledge Perspectives

Yun-he Pan,panyh@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5,   Pages 615-766 doi: 10.1631/FITEE.2040000

Abstract: 认知心理学早已指出,人类知识记忆中的重要部分是视觉知识,被用来进行形象思维。因此,基于视觉的人工智能(AI)是AI绕不开的课题,且具有重要意义。本文继《论视觉知识》一文,讨论与之相关的5个基本问题:(1)视觉知识表达;(2)视觉识别;(3)视觉形象思维模拟;(4)视觉知识的学习;(5)多重知识表达。视觉知识的独特优点是具有形象的综合生成能力,时空演化能力和形象显示能力。这些正是字符知识和深度神经网络所缺乏的。AI与计算机辅助设计/图形学/视觉的技术联合将在创造、预测和人机融合等方面对AI新发展提供重要的基础动力。视觉知识和多重知识表达的研究是发展新的视觉智能的关键,也是促进AI 2.0取得重要突破的关键理论与技术。这是一块荒芜、寒湿而肥沃的“北大荒”,也是一块充满希望值得多学科合作勇探的“无人区”。

Keywords: 视觉知识表达;视觉识别;视觉形象思维模拟;视觉知识学习;多重知识表达    

Three-dimensional shape space learning for visual concept construction: challenges and research progress Perspective

Xin TONG

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 9,   Pages 1290-1297 doi: 10.1631/FITEE.2200318

Abstract: Human beings can easily categorize three-dimensional (3D) objects with similar shapes and functions into a set of “visual concepts” and learn “visual knowledge” of the surrounding 3D real world (). Developing efficient methods to learn the computational representation of the visual concept and the visual knowledge is a critical task in artificial intelligence (). A crucial step to this end is to learn the shape space spanned by all 3D objects that belong to one visual concept. In this paper, we present the key technical challenges and recent research progress in 3D shape space learning and discuss the open problems and research opportunities in this area.

Keywords: 视觉概念;视觉知识;三维几何学习;三维形状空间;三维结构    

The Entropy Perspective on Human Illness and Aging

Zhiguo Wang

Engineering 2022, Volume 9, Issue 2,   Pages 22-26 doi: 10.1016/j.eng.2021.08.014

A quantitative attribute-based benchmark methodology for single-target visual tracking Article

Wen-jing KANG, Chang LIU, Gong-liang LIU

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 3,   Pages 405-421 doi: 10.1631/FITEE.1900245

Abstract: In the past several years, various visual object tracking benchmarks have been proposed, and some of them have been used widely in numerous recently proposed trackers. However, most of the discussions focus on the overall performance, and cannot describe the strengths and weaknesses of the trackers in detail. Meanwhile, several benchmark measures that are often used in tests lack convincing interpretation. In this paper, 12 frame-wise visual attributes that reflect different aspects of the characteristics of image sequences are collated, and a normalized quantitative formulaic definition has been given to each of them for the first time. Based on these definitions, we propose two novel test methodologies, a correlation-based test and a weight-based test, which can provide a more intuitive and easier demonstration of the trackers’ performance for each aspect. Then these methods have been applied to the raw results from one of the most famous tracking challenges, the Video Object Tracking (VOT) Challenge 2017. From the tests, most trackers did not perform well when the size of the target changed rapidly or intensely, and even the advanced deep learning based trackers did not perfectly solve the problem. The scale of the targets was not considered in the calculation of the center location error; however, in a practical test, the center location error is still sensitive to the targets’ changes in size.

Keywords: Visual tracking     Performance evaluation     Visual attributes     Computer vision    

Visual commonsense reasoning with directional visual connections Research Articles

Yahong Han, Aming Wu, Linchao Zhu, Yi Yang,yahong@tju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5,   Pages 615-766 doi: 10.1631/FITEE.2000722

Abstract: To boost research into cognition-level visual understanding, i.e., making an accurate inference based on a thorough understanding of visual details, (VCR) has been proposed. Compared with traditional visual question answering which requires models to select correct answers, VCR requires models to select not only the correct answers, but also the correct rationales. Recent research into human cognition has indicated that brain function or cognition can be considered as a global and dynamic integration of local neuron connectivity, which is helpful in solving specific cognition tasks. Inspired by this idea, we propose a to achieve VCR by dynamically reorganizing the that is contextualized using the meaning of questions and answers and leveraging the directional information to enhance the reasoning ability. Specifically, we first develop a GraphVLAD module to capture to fully model visual content correlations. Then, a contextualization process is proposed to fuse sentence representations with visual neuron representations. Finally, based on the output of , we propose to infer answers and rationales, which includes a ReasonVLAD module. Experimental results on the VCR dataset and visualization analysis demonstrate the effectiveness of our method.

Keywords: 视觉常识推理;有向连接网络;视觉神经元连接;情景化连接;有向连接    

Visual Inspection Technology and its Application

Ye Shenghua,Zhu Jigui,Wang Zhong,Yang Xueyou

Strategic Study of CAE 1999, Volume 1, Issue 1,   Pages 49-52

Abstract:

Visual inspection, especially, the active visual inspection and passive visual inspection based on triangulation method has advantages of non-contact, rapid speed, flexibility, etc. Visual inspection is a advanced inspection technology, satisfies modern manufacturing demands. This paper discusses the principle of visual inspection, studies several developed applied visual inspection systems, these systems demostrate wide application foreground of visual inspection from different points of view.

Keywords: active visual inspection     passive visual inspection     inspection system     modern manufacturing    

Study on Fire Design in Performance-based Design

Xu Liang,Zhang Heping,Yang Yun,Zhu Wuba

Strategic Study of CAE 2004, Volume 6, Issue 1,   Pages 64-67

Abstract:

Fire design is a key step in performance-based design. In this paper, several fire design methods have been introduced and fire design in a high-rack warehouse has been presented as an example to show how to apply fire design methods.

Keywords: fire design     fire growth curve     heat release rate    

On visual knowledge Perspective

Yun-he PAN

Frontiers of Information Technology & Electronic Engineering 2019, Volume 20, Issue 8,   Pages 1021-1025 doi: 10.1631/FITEE.1910001

Abstract: 提出“视觉知识”概念。视觉知识是知识表达的一种新形式. 它与迄今为止人工智能(AI)所用知识表达方法不同. 其中视觉概念具有典型(prototype)与范畴结构、层次结构与动作结构等要素. 视觉概念能构成视觉命题,包括场景结构与动态结构,视觉命题能构成视觉叙事。指出重构计算机图形学成果可实现视觉知识表达及其推理与操作,重构计算机视觉成果可实现视觉知识学习。实现视觉知识表达、推理、学习和应用技术将是AI 2.0取得突破的重要方向之一。

Keywords: None    

Performance analysis of visualmarkers for indoor navigation systems Article

Gaetano C. LA DELFA,Salvatore MONTELEONE,Vincenzo CATANIA,Juan F. DE PAZ,Javier BAJO

Frontiers of Information Technology & Electronic Engineering 2016, Volume 17, Issue 8,   Pages 730-740 doi: 10.1631/FITEE.1500324

Abstract: The massive diffusion of smartphones, the growing interest in wearable devices and the Internet of Things, and the exponential rise of location based services (LBSs) have made the problem of localization and navigation inside buildings one of the most important technological challenges of recent years. Indoor positioning systems have a huge market in the retail sector and contextual advertising; in addition, they can be fundamental to increasing the quality of life for citizens if deployed inside public buildings such as hospitals, airports, and museums. Sometimes, in emergency situations, they can make the difference between life and death. Various approaches have been proposed in the literature. Recently, thanks to the high performance of smartphones’ cameras, marker-less and marker-based computer vision approaches have been investigated. In a previous paper, we proposed a technique for indoor localization and navigation using both Bluetooth low energy (BLE) and a 2D visual marker system deployed into the floor. In this paper, we presented a qualitative performance evaluation of three 2D visual markers, Vuforia, ArUco marker, and AprilTag, which are suitable for real-time applications. Our analysis focused on specific case study of visual markers placed onto the tiles, to improve the efficiency of our indoor localization and navigation approach by choosing the best visual marker system.

Keywords: Indoor localization     Visual markers     Computer vision    

Grasp Planning and Visual Servoing for an Outdoors Aerial Dual Manipulator Article

Pablo Ramon-Soria, Begoña C. Arrue, Anibal Ollero

Engineering 2020, Volume 6, Issue 1,   Pages 77-88 doi: 10.1016/j.eng.2019.11.003

Abstract:

This paper describes a system for grasping known objects with unmanned aerial vehicles (UAVs) provided with dual manipulators using an RGB-D camera. Aerial manipulation remains a very challenging task. This paper covers three principal aspects for this task: object detection and pose estimation, grasp planning, and in-flight grasp execution. First, an artificial neural network (ANN) is used to obtain clues regarding the object’s position. Next, an alignment algorithm is used to obtain the object’s six-dimensional (6D) pose, which is filtered with an extended Kalman filter. A three-dimensional (3D) model of the object is then used to estimate an arranged list of good grasps for the aerial manipulator. The results from the detection algorithm—that is, the object’s pose—are used to update the trajectories of the arms toward the object. If the target poses are not reachable due to the UAV’s oscillations, the algorithm switches to the next feasible grasp. This paper introduces the overall methodology, and provides the experimental results of both simulation and real experiments for each module, in addition to a video showing the results.


 

 

 

 

 

 

 

Keywords: Aerial manipulation     Grasp planning     Visual servoing    

Visual Prostheses: Technological and Socioeconomic Challenges Perspective

John B. Troy

Engineering 2015, Volume 1, Issue 3,   Pages 288-291 doi: 10.15302/J-ENG-2015080

Abstract:

Visual prostheses are now entering the clinical marketplace. Such prostheses were originally targeted for patients suffering from blindness through retinitis pigmentosa (RP). However, in late July of this year, for the first time a patient was given a retinal implant in order to treat dry age-related macular degeneration. Retinal implants are suitable solutions for diseases that attack photoreceptors but spare most of the remaining retinal neurons. For eye diseases that result in loss of retinal output, implants that interface with more central structures in the visual system are needed. The standard site for central visual prostheses under development is the visual cortex. This perspective discusses the technical and socioeconomic challenges faced by visual prostheses.

Keywords: neuroprostheses     vision     eye disease     restoration of function     rehabilitation    

Title Author Date Type Operation

On visual understanding

Yunhe PAN

Journal Article

混合-增强智能:协作与认知

南宁 郑,子熠 刘,鹏举 任,永强 马,仕韬 陈,思雨 余,建儒 薛,霸东 陈,飞跃 王

Journal Article

Visual knowledge: an attempt to explore machine creativity

Yueting Zhuang, Siliang Tang,yzhuang@zju.edu.cn,siliang@zju.edu.cn

Journal Article

Unsupervised object detection with scene-adaptive concept learning

Shiliang Pu, Wei Zhao, Weijie Chen, Shicai Yang, Di Xie, Yunhe Pan,xiedi@hikvision.com

Journal Article

Miniaturized five fundamental issues about visual knowledge

Yun-he Pan,panyh@zju.edu.cn

Journal Article

Three-dimensional shape space learning for visual concept construction: challenges and research progress

Xin TONG

Journal Article

The Entropy Perspective on Human Illness and Aging

Zhiguo Wang

Journal Article

A quantitative attribute-based benchmark methodology for single-target visual tracking

Wen-jing KANG, Chang LIU, Gong-liang LIU

Journal Article

Visual commonsense reasoning with directional visual connections

Yahong Han, Aming Wu, Linchao Zhu, Yi Yang,yahong@tju.edu.cn

Journal Article

Visual Inspection Technology and its Application

Ye Shenghua,Zhu Jigui,Wang Zhong,Yang Xueyou

Journal Article

Study on Fire Design in Performance-based Design

Xu Liang,Zhang Heping,Yang Yun,Zhu Wuba

Journal Article

On visual knowledge

Yun-he PAN

Journal Article

Performance analysis of visualmarkers for indoor navigation systems

Gaetano C. LA DELFA,Salvatore MONTELEONE,Vincenzo CATANIA,Juan F. DE PAZ,Javier BAJO

Journal Article

Grasp Planning and Visual Servoing for an Outdoors Aerial Dual Manipulator

Pablo Ramon-Soria, Begoña C. Arrue, Anibal Ollero

Journal Article

Visual Prostheses: Technological and Socioeconomic Challenges

John B. Troy

Journal Article