Search | Engineering

Subscribe Submit

Home Journals Focus Achievement Fronts About Us 中文版

Resource Type

Journal Article 123

Year

2023 13

2022 16

2021 17

2020 7

2019 12

2018 7

2017 15

2016 3

2015 2

2014 1

2012 2

2011 2

2010 2

2009 1

2007 3

2006 4

2005 4

2004 6

2002 1

2001 3

open ︾

Keywords

Deep learning 9

Computer vision 6

pattern recognition 6

Artificial intelligence 5

Emotion recognition 2

Optical flow 2

Random forest 2

Simultaneous localization and mapping (SLAM) 2

Speech recognition 2

Visual tracking 2

automatic target recognition 2

performance evaluation 2

21st-Century Maritime Silk Road 1

3D parametric model 1

3D visual knowledge 1

AR model 1

Active vision 1

Adverse geology 1

Aerial manipulation 1

open ︾

Search scope:

排序： Display mode:

Miniaturized five fundamental issues about visual knowledge Perspectives

Yun-he Pan,panyh@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5, Pages 615-766 doi: 10.1631/FITEE.2040000

Abstract: 认知心理学早已指出，人类知识记忆中的重要部分是视觉知识，被用来进行形象思维。因此，基于视觉的人工智能（AI）是AI绕不开的课题，且具有重要意义。本文继《论视觉知识》一文，讨论与之相关的5个基本问题：（1）视觉知识表达；（2）视觉识别；（3）视觉形象思维模拟；（4）视觉知识的学习；（5）多重知识表达。视觉知识的独特优点是具有形象的综合生成能力，时空演化能力和形象显示能力。这些正是字符知识和深度神经网络所缺乏的。AI与计算机辅助设计/图形学/视觉的技术联合将在创造、预测和人机融合等方面对AI新发展提供重要的基础动力。视觉知识和多重知识表达的研究是发展新的视觉智能的关键，也是促进AI 2.0取得重要突破的关键理论与技术。这是一块荒芜、寒湿而肥沃的“北大荒”，也是一块充满希望值得多学科合作勇探的“无人区”。

Keywords：视觉知识表达；视觉识别；视觉形象思维模拟；视觉知识学习；多重知识表达

HTML PDF Collect

Visual recognition of cardiac pathology based on 3D parametric model reconstruction Research Article

Jinxiao XIAO, Yansong LI, Yun TIAN, Dongrong XU, Penghui LI, Shifeng ZHAO, Yunhe PAN

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 9, Pages 1324-1337 doi: 10.1631/FITEE.2200102

Abstract: Visual recognition of cardiac images is important for and treatment. Due to the limited availability of annotated datasets, traditional methods usually extract features directly from two-dimensional slices of three-dimensional (3D) heart images, followed by pathological classification. This process may not ensure the overall anatomical consistency in 3D heart. A new method for classification of cardiac pathology is therefore proposed based on reconstruction. First, 3D heart models are reconstructed based on multiple 3D volumes of cardiac imaging data at the end-systole (ES) and end-diastole (ED) phases. Next, based on these reconstructed 3D hearts, s are constructed through the statistical shape model (SSM), and then the heart data are augmented via the variation in shape parameters of one with visual knowledge constraints. Finally, shape and motion features of 3D heart models across two phases are extracted to classify cardiac pathology. Comprehensive experiments on the automated cardiac diagnosis challenge (ACDC) dataset of the Statistical Atlases and Computational Modelling of the Heart (STACOM) workshop confirm the superior performance and efficiency of this proposed approach.

Keywords： 3D visual knowledge 3D parametric model Cardiac pathology diagnosis Data augmentation

HTML PDF Collect

Three-dimensional shape space learning for visual concept construction: challenges and research progress Perspective

Xin TONG

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 9, Pages 1290-1297 doi: 10.1631/FITEE.2200318

Abstract: Human beings can easily categorize three-dimensional (3D) objects with similar shapes and functions into a set of “visual concepts” and learn “visual knowledge” of the surrounding 3D real world (). Developing efficient methods to learn the computational representation of the visual concept and the visual knowledge is a critical task in artificial intelligence (). A crucial step to this end is to learn the shape space spanned by all 3D objects that belong to one visual concept. In this paper, we present the key technical challenges and recent research progress in 3D shape space learning and discuss the open problems and research opportunities in this area.

Keywords：视觉概念；视觉知识；三维几何学习；三维形状空间；三维结构

HTML PDF Collect

A quantitative attribute-based benchmark methodology for single-target visual tracking Article

Wen-jing KANG, Chang LIU, Gong-liang LIU

Frontiers of Information Technology & Electronic Engineering 2020, Volume 21, Issue 3, Pages 405-421 doi: 10.1631/FITEE.1900245

Abstract: In the past several years, various visual object tracking benchmarks have been proposed, and some of them have been used widely in numerous recently proposed trackers. However, most of the discussions focus on the overall performance, and cannot describe the strengths and weaknesses of the trackers in detail. Meanwhile, several benchmark measures that are often used in tests lack convincing interpretation. In this paper, 12 frame-wise visual attributes that reflect different aspects of the characteristics of image sequences are collated, and a normalized quantitative formulaic definition has been given to each of them for the first time. Based on these definitions, we propose two novel test methodologies, a correlation-based test and a weight-based test, which can provide a more intuitive and easier demonstration of the trackers’ performance for each aspect. Then these methods have been applied to the raw results from one of the most famous tracking challenges, the Video Object Tracking (VOT) Challenge 2017. From the tests, most trackers did not perform well when the size of the target changed rapidly or intensely, and even the advanced deep learning based trackers did not perfectly solve the problem. The scale of the targets was not considered in the calculation of the center location error; however, in a practical test, the center location error is still sensitive to the targets’ changes in size.

Keywords： Visual tracking Performance evaluation Visual attributes Computer vision

HTML PDF Collect

Visual commonsense reasoning with directional visual connections Research Articles

Yahong Han, Aming Wu, Linchao Zhu, Yi Yang,yahong@tju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5, Pages 615-766 doi: 10.1631/FITEE.2000722

Abstract: To boost research into cognition-level visual understanding, i.e., making an accurate inference based on a thorough understanding of visual details, (VCR) has been proposed. Compared with traditional visual question answering which requires models to select correct answers, VCR requires models to select not only the correct answers, but also the correct rationales. Recent research into human cognition has indicated that brain function or cognition can be considered as a global and dynamic integration of local neuron connectivity, which is helpful in solving specific cognition tasks. Inspired by this idea, we propose a to achieve VCR by dynamically reorganizing the that is contextualized using the meaning of questions and answers and leveraging the directional information to enhance the reasoning ability. Specifically, we first develop a GraphVLAD module to capture to fully model visual content correlations. Then, a contextualization process is proposed to fuse sentence representations with visual neuron representations. Finally, based on the output of , we propose to infer answers and rationales, which includes a ReasonVLAD module. Experimental results on the VCR dataset and visualization analysis demonstrate the effectiveness of our method.

Keywords：视觉常识推理；有向连接网络；视觉神经元连接；情景化连接；有向连接

HTML PDF Collect

Visual knowledge: an attempt to explore machine creativity Perspectives

Yueting Zhuang, Siliang Tang,yzhuang@zju.edu.cn,siliang@zju.edu.cn

Frontiers of Information Technology & Electronic Engineering 2021, Volume 22, Issue 5, Pages 615-766 doi: 10.1631/FITEE.2100116

Abstract: 长期以来困扰人工智能领域的一个问题是：人工智能是否具有创造力，或者说，算法的推理过程是否可以具有创造性。本文从思维科学的角度探讨人工智能创造力的问题。首先，列举形象思维推理的相关研究；然后，重点介绍一种特殊的视觉知识表示形式，即视觉场景图；最后，详细介绍视觉场景图构造问题与潜在应用。所有证据表明，视觉知识和视觉思维不仅可以改善当前人工智能任务的性能，而且可以用于机器创造力的实践。

Keywords：思维科学；形象思维推理；视觉知识表达；视觉场景图

HTML PDF Collect

Visual Inspection Technology and its Application

Ye Shenghua,Zhu Jigui,Wang Zhong,Yang Xueyou

Strategic Study of CAE 1999, Volume 1, Issue 1, Pages 49-52

Abstract:

Visual inspection, especially, the active visual inspection and passive visual inspection based on triangulation method has advantages of non-contact, rapid speed, flexibility, etc. Visual inspection is a advanced inspection technology, satisfies modern manufacturing demands. This paper discusses the principle of visual inspection, studies several developed applied visual inspection systems, these systems demostrate wide application foreground of visual inspection from different points of view.

Keywords： active visual inspection passive visual inspection inspection system modern manufacturing

HTML PDF Collect

On visual knowledge Perspective

Yun-he PAN

Frontiers of Information Technology & Electronic Engineering 2019, Volume 20, Issue 8, Pages 1021-1025 doi: 10.1631/FITEE.1910001

Abstract: 提出“视觉知识”概念。视觉知识是知识表达的一种新形式. 它与迄今为止人工智能（AI）所用知识表达方法不同. 其中视觉概念具有典型（prototype）与范畴结构、层次结构与动作结构等要素. 视觉概念能构成视觉命题，包括场景结构与动态结构，视觉命题能构成视觉叙事。指出重构计算机图形学成果可实现视觉知识表达及其推理与操作，重构计算机视觉成果可实现视觉知识学习。实现视觉知识表达、推理、学习和应用技术将是AI 2.0取得突破的重要方向之一。

Keywords： None

HTML PDF Collect

Performance analysis of visualmarkers for indoor navigation systems Article

Gaetano C. LA DELFA,Salvatore MONTELEONE,Vincenzo CATANIA,Juan F. DE PAZ,Javier BAJO

Frontiers of Information Technology & Electronic Engineering 2016, Volume 17, Issue 8, Pages 730-740 doi: 10.1631/FITEE.1500324

Abstract: The massive diffusion of smartphones, the growing interest in wearable devices and the Internet of Things, and the exponential rise of location based services (LBSs) have made the problem of localization and navigation inside buildings one of the most important technological challenges of recent years. Indoor positioning systems have a huge market in the retail sector and contextual advertising; in addition, they can be fundamental to increasing the quality of life for citizens if deployed inside public buildings such as hospitals, airports, and museums. Sometimes, in emergency situations, they can make the difference between life and death. Various approaches have been proposed in the literature. Recently, thanks to the high performance of smartphones’ cameras, marker-less and marker-based computer vision approaches have been investigated. In a previous paper, we proposed a technique for indoor localization and navigation using both Bluetooth low energy (BLE) and a 2D visual marker system deployed into the floor. In this paper, we presented a qualitative performance evaluation of three 2D visual markers, Vuforia, ArUco marker, and AprilTag, which are suitable for real-time applications. Our analysis focused on specific case study of visual markers placed onto the tiles, to improve the efficiency of our indoor localization and navigation approach by choosing the best visual marker system.

Keywords： Indoor localization Visual markers Computer vision

HTML PDF Collect

Grasp Planning and Visual Servoing for an Outdoors Aerial Dual Manipulator Article

Pablo Ramon-Soria, Begoña C. Arrue, Anibal Ollero

Engineering 2020, Volume 6, Issue 1, Pages 77-88 doi: 10.1016/j.eng.2019.11.003

Abstract:

This paper describes a system for grasping known objects with unmanned aerial vehicles (UAVs) provided with dual manipulators using an RGB-D camera. Aerial manipulation remains a very challenging task. This paper covers three principal aspects for this task: object detection and pose estimation, grasp planning, and in-flight grasp execution. First, an artificial neural network (ANN) is used to obtain clues regarding the object’s position. Next, an alignment algorithm is used to obtain the object’s six-dimensional (6D) pose, which is filtered with an extended Kalman filter. A three-dimensional (3D) model of the object is then used to estimate an arranged list of good grasps for the aerial manipulator. The results from the detection algorithm—that is, the object’s pose—are used to update the trajectories of the arms toward the object. If the target poses are not reachable due to the UAV’s oscillations, the algorithm switches to the next feasible grasp. This paper introduces the overall methodology, and provides the experimental results of both simulation and real experiments for each module, in addition to a video showing the results.

Keywords： Aerial manipulation Grasp planning Visual servoing

HTML PDF Collect

Visual Prostheses: Technological and Socioeconomic Challenges Perspective

John B. Troy

Engineering 2015, Volume 1, Issue 3, Pages 288-291 doi: 10.15302/J-ENG-2015080

Abstract:

Visual prostheses are now entering the clinical marketplace. Such prostheses were originally targeted for patients suffering from blindness through retinitis pigmentosa (RP). However, in late July of this year, for the first time a patient was given a retinal implant in order to treat dry age-related macular degeneration. Retinal implants are suitable solutions for diseases that attack photoreceptors but spare most of the remaining retinal neurons. For eye diseases that result in loss of retinal output, implants that interface with more central structures in the visual system are needed. The standard site for central visual prostheses under development is the visual cortex. This perspective discusses the technical and socioeconomic challenges faced by visual prostheses.

Keywords： neuroprostheses vision eye disease restoration of function rehabilitation

HTML PDF Collect

On visual understanding Perspective

Yunhe PAN

Frontiers of Information Technology & Electronic Engineering 2022, Volume 23, Issue 9, Pages 1287-1289 doi: 10.1631/FITEE.2130000

Abstract: 1 Problems and development in the field of visual recognition From the beginning of artificial intelligence (AI), pattern recognition has been an important aspect of the field. In recent years, the maturity of deep neural networks (DNNs) has significantly improved the accuracy of visual recognition. DNN has been widely used in applications such as medical image classification, vehicle identification, and facial recognition, and has thus promoted the development of the AI industry to a climax. However, there are currently critical defects in visual recognition based on DNN technology. For example, these networks usually require a very large amount of labeled training data, and have weak cross-domain transferability and task generalization. Their learning and reasoning processes are still hard to understand, which leads to unexplainable predictions. These challenges present an obstacle to the development of AI research and application. If we look at the current visual recognition technology from a larger and broader perspective, we can find that the above defects are fundamental, because the currently used DNN model needs to be trained with a large amount of labeled visual data, and then used in the process of visual recognition. In essence, it is a classification process based on data statistics and pattern matching (), so it is heavily dependent on training sample distribution. However, to have interpretability and transferability, visual classification is not good enough, while visual understanding becomes indispensable. 2 Three-step model of visual understanding Visual recognition is not equivalent to visual understanding. We propose that there are three steps in visual understanding, of which classification is only the first. After classification, one proceeds to the second step: visual parsing. In the process of visual parsing, the components of the visual object and their structural relationship are further identified and compared. Identification involves finding components and structures in visual data that correspond to the components and structures of known visual concepts. Parsing verifies the correctness of the classification results and establishes the structure of visual object data. After completing visual parsing, one proceeds to the third step: visual simulation. In this step, predictive motion simulation and operations including causal reasoning are carried out on the structure of the visual objects to judge the rationality of meeting physical constraints in reality, so as to verify the previous recognition and parsing results. We can take a picture of a cat as an example to illustrate the modeling process of visual understanding. The process is as follows: 1. Recognition: It is a cat. Extract the visual concept of the cat and proceed to the next step; otherwise, stop here. 2. Parsing: Based on the structure contained in the visual concept, identify whether the cat’s head, body, feet, tail, and their relationships are suitable for the cat concept. If not, return to step 1 for re-identification; if yes, proceed to the next step. 3. Simulation: Simulate various activities of the cat to investigate whether the cat’s activities in various environments can be completed reasonably. If not, return to step 2; if yes, proceed to the next step. 4. End visual understanding: Incorporate the processed structured data into the knowledge about cats. 3 Characteristics of the three-step visual understanding model To further understand the above-mentioned three-step visual understanding model, we will further discuss some of its characteristics: 1. The key step in visual understanding is visual parsing. This is an identification of the components contained in the object according to a conceptual structure based on the visual concept (), obtained by visual recognition. Parsing a visual object, in order from top to bottom, is a process of identifying and constructing visual data from the root of the concept tree to the branches and leaves. 2. Human visual parsing tasks are often aimed only at the main components of concepts. The main components have existing, commonly used names. For subsidiary parts that have not been described in language, such as the area between the cheekbones and chin of the face, only experts specialized in anatomy (such as doctors or artists) have professional concepts and memories. Therefore, visual parsing is a cross-media () process that incorporates multiple knowledge () including vision and language. 3. Visual knowledge () is essential for visual parsing and visual simulation, because the visual concept structure provides a reliable source for component identification and comparison. Parents and teachers play a large role in establishing visual knowledge. When they say to a child, “Look, this is a kitten. Kittens have pointed ears, round eyes, long whiskers, and four short legs. When they run fast and leap high, they can catch a mouse,” they are guiding children in constructing basic visual knowledge in their long-term memory. 4. Visual data that have been understood have actually been structured to form visual knowledge. Such visual knowledge can easily be incorporated into long-term memory. For example, when one sees a cat whose head is very small, or whose fur color and markings are unusual, or who has a particular gait, this information may be included in one’s “cat” memory by expanding the concept of “cat” (). The category of visual concepts is very important, and its extent reflects the general degree of knowledge. In fact, it is not always useful to collect a large amount of sample data to train a DNN model. However, the more widely distributed and balanced the data are within a concept category, the better, because the robustness and generalization ability of the model trained based on such sample data are stronger. 5. The learned visual information can naturally be explained, because it has deep structural cognition; it can also be used for transfer learning because the semantic concepts have cross-media relevance. This semantic information can clearly indicate the reasonable direction of transferable recognition. 4 Advancing visual recognition to visual understanding Visual understanding is important, because it can potentially work with visual knowledge () and multiple knowledge representation () to open a new door to AI research. Visual understanding involves not only in-depth visual recognition, but also thorough learning and application of visual knowledge (). AI researchers have been studying visual recognition for more than half a century. Speech recognition, a research task started in parallel with visual recognition, moved on to analysis of words, sentences, and paragraphs quite early, and has successfully developed human-computer dialogue and machine translation, setting a well-known milestone. Therefore, we suggest that it is necessary to advance visual recognition to visual understanding, and that this is an appropriate time to target this deeper visual intelligence behavior.

HTML PDF Collect

一种易用的实体识别消歧系统评测框架 Article

辉陈,宝刚魏,一鸣李,Yong-huai LIU,文浩朱

Frontiers of Information Technology & Electronic Engineering 2017, Volume 18, Issue 2, Pages 195-205 doi: 10.1631/FITEE.1500473

Abstract: 实体识别消歧是知识库扩充和信息抽取的重要技术之一。近些年该领域诞生了很多研究成果，提出了许多实体识别消歧系统。但由于缺乏对这些系统的完善评测对比，该领域依然处于良莠淆杂的状态。本文提出一个实体识别消歧系统的统一评测框架，用于公平地比较各个实体识别消歧系统的效果。该框架代码开源，可以采用新的系统、数据集、评测机制扩展。本文分析对比了几个公开的实体识别消歧系统，并总结出了一些有用的结论。

Keywords：实体识别消歧；评测框架；信息抽取

HTML PDF Collect

Analysis of Operator's Visual Process Using a Cognitive Information Processing Model

Jin Yinhua,Li Zhenye,Gu Hui,Tang Yiping

Strategic Study of CAE 2007, Volume 9, Issue 5, Pages 57-61

Abstract:

A cognitive information processing model has been developed. It consists of a perceptual processor, a cognitive processor, a motor processor and a short-term, a working, and a long-term memory. The model is installed on a PC to analyze the visual process of the operator monitoring the overview panel of plant simulator. Visual process is decided by characteristic of panel information, operator's factors, and parameters of perceptual processor and so on. The simulation results coincide qualitatively with observations of actual plant operations and simulator training. This model can be used to analyze the generation mechanism of various types of human errors.

Keywords： cognitive information processing model perceptual processor visual process mental state human errors

HTML PDF Collect

The Current Situation of China’s Ophthalmology and Visual Science Bioengineering, and a Development Strategy

Xie Lixin,Zhou Qingjun,Xu Haifeng and Lin Ping

Strategic Study of CAE 2017, Volume 19, Issue 2, Pages 100-105 doi: 10.15302/J-SSCAE-2017.02.017

Abstract:

With the largest number of blind patient in the world, China is home to more than 12 million people who are blind. The most promising research direction in the bioengineering field for the treatment of blindness involves searching for bioengineering materials to restore visual function, particularly using stem cell and biochip technology. This paper introduces the development situation of China's bioengineering research in ophthalmology and visual science, and analyzes the main problems affecting current bioengineering research in corneal and retinal areas. We also present strategies and recommendations for research and development directions, the approval system, achievement translation, and the construction of a research platform, based on the current situation in China.

Keywords： ophthalmology and visual science bioengineering current situation strategy