Search | Engineering

订阅投稿

首页工程期刊工程焦点工程成就工程前沿关于我们 English

资源类型

期刊论文 45

会议信息 6

会议视频 2

年份

2023 6

2022 12

2021 9

2020 10

2019 4

2018 1

2017 6

2016 1

2015 1

2007 2

1999 1

展开︾

关键词

2020 5

计算机视觉 4

人工智能 3

ICMVA 2

神经假体 2

第三届 2

脉冲神经网络 2

CMVIT 1

CVIDL 1

ICCVDM 1

Vidar相机 1

k-最近邻分类 1

三维视觉知识；三维参数模型；心脏病理诊断；数据增强 1

主动视觉 1

主成分分析 1

主观意图 1

人-机协同；混合增强智能；认知计算；直觉推理；因果模型；认知映射；视觉场景理解；自主驾驶汽车 1

人为操作失误 1

展开︾

检索范围：

排序：展示方式：

视觉知识的五个基本问题 Perspectives

潘云鹤

《信息与电子工程前沿（英文）》 2021年第22卷第5期页码 615-766 doi: 10.1631/FITEE.2040000

摘要：认知心理学早已指出，人类知识记忆中的重要部分是视觉知识，被用来进行形象思维。因此，基于视觉的人工智能（AI）是AI绕不开的课题，且具有重要意义。本文继《论视觉知识》一文，讨论与之相关的5个基本问题：（1）视觉知识表达；（2）视觉识别；（3）视觉形象思维模拟；（4）视觉知识的学习；（5）多重知识表达。视觉知识的独特优点是具有形象的综合生成能力，时空演化能力和形象显示能力。这些正是字符知识和深度神经网络所缺乏的。AI与计算机辅助设计/图形学/视觉的技术联合将在创造、预测和人机融合等方面对AI新发展提供重要的基础动力。视觉知识和多重知识表达的研究是发展新的视觉智能的关键，也是促进AI 2.0取得重要突破的关键理论与技术。这是一块荒芜、寒湿而肥沃的“北大荒”，也是一块充满希望值得多学科合作勇探的“无人区”。

关键词：视觉知识表达；视觉识别；视觉形象思维模拟；视觉知识学习；多重知识表达

HTML PDF 收藏

面向视觉概念构建的三维形状空间学习：挑战与研究进展 Perspective

童欣

《信息与电子工程前沿（英文）》 2022年第23卷第9期页码 1290-1297 doi: 10.1631/FITEE.2200318

摘要：人类可以熟练的对真实世界中物体按照形状或者功能进行分类，并在思维中建立每类物体的视觉概念和周围真实世界的视觉知识（Pan, 2019）。Pan（2021）指出建立这些视觉概念和视觉知识的计算表达是发展下一代人工智能的一个关键步骤。学习同一视觉概念下所有物体的三维形状空间是实现视觉概念计算表达的一个关键步骤。

关键词：视觉概念；视觉知识；三维几何学习；三维形状空间；三维结构

HTML PDF 收藏

基于定量属性的单目标视觉跟踪算法评价体系研究 Article

Wen-jing KANG, Chang LIU, Gong-liang LIU

《信息与电子工程前沿（英文）》 2020年第21卷第3期页码 405-421 doi: 10.1631/FITEE.1900245

摘要：视觉跟踪是计算机视觉领域热门研究课题之一。近年来，很多先进跟踪算法和性能评价基准相继发布，并取得巨大成功。首先，归纳整理了12个反映图像序列不同特性的帧间视觉属性，并首次定量给出其归一化公式。

关键词：视觉跟踪；性能评价；视觉属性；计算机视觉

HTML PDF 收藏

面向视觉常识推理的有向视觉连接 Research Articles

韩亚洪1,2,武阿明1,朱霖潮3,杨易3

《信息与电子工程前沿（英文）》 2021年第22卷第5期页码 615-766 doi: 10.1631/FITEE.2000722

摘要：为推动认知层面视觉内容理解的研究，即基于视觉细节的深入理解做出精确推理，视觉常识推理的概念被提出。相比仅需模型正确回答问题的传统视觉问答，视觉常识推理不仅需要模型正确地回答问题，还需给出相应解释。通过使用问题和答案的语义来情景化视觉神经元从而动态重组神经元连接，以及借助方向信息增强推理能力，所提方法能有效实现视觉常识推理。具体地，首先开发一个GraphVLAD模块来捕捉能够充分表达视觉内容相关性的视觉神经元连接。然后提出一个情景化模型来融合视觉和文本表示。

关键词：视觉常识推理；有向连接网络；视觉神经元连接；情景化连接；有向连接

HTML PDF 收藏

视觉知识：智能创意初探 Perspectives

庄越挺,汤斯亮

《信息与电子工程前沿（英文）》 2021年第22卷第5期页码 615-766 doi: 10.1631/FITEE.2100116

摘要：首先，列举形象思维推理的相关研究；然后，重点介绍一种特殊的视觉知识表示形式，即视觉场景图；最后，详细介绍视觉场景图构造问题与潜在应用。所有证据表明，视觉知识和视觉思维不仅可以改善当前人工智能任务的性能，而且可以用于机器创造力的实践。

关键词：思维科学；形象思维推理；视觉知识表达；视觉场景图

HTML PDF 收藏

视觉检测技术及应用

叶声华,邾继贵,王仲,杨学友

《中国工程科学》 1999年第1卷第1期页码 49-52

摘要：

视觉检测技术，尤其是基于三角法的主动和被动视觉检测技术具有非接触、速度快、柔性好等特点，是一种先进的检测手段，适合现代制造业的需要。文章论述了视觉检测技术原理，讨论了已经研制的多个实际视觉检测系统，从不同角度展示了视觉检测技术在现代制造业中广阔的应用前景。

关键词：主动视觉被动视觉检测系统现代制造

HTML PDF 收藏

论视觉知识 Perspective

Yun-he PAN

《信息与电子工程前沿（英文）》 2019年第20卷第8期页码 1021-1025 doi: 10.1631/FITEE.1910001

摘要：提出“视觉知识”概念。视觉知识是知识表达的一种新形式. 它与迄今为止人工智能（AI）所用知识表达方法不同. 其中视觉概念具有典型（prototype）与范畴结构、层次结构与动作结构等要素.视觉概念能构成视觉命题，包括场景结构与动态结构，视觉命题能构成视觉叙事。指出重构计算机图形学成果可实现视觉知识表达及其推理与操作，重构计算机视觉成果可实现视觉知识学习。实现视觉知识表达、推理、学习和应用技术将是AI 2.0取得突破的重要方向之一。

关键词： None

HTML PDF 收藏

室内导航系统视觉标记性能分析 Article

Gaetano C. LA DELFA,Salvatore MONTELEONE,Vincenzo CATANIA,Juan F. DE PAZ,Javier BAJO

《信息与电子工程前沿（英文）》 2016年第17卷第8期页码 730-740 doi: 10.1631/FITEE.1500324

摘要：近年来，得益于智能手机相机性能的大幅提升，无标记点和有标记点的计算机视觉方法得到开发。在之前的研究中，我们提出了一种利用低功耗蓝牙和嵌入地面的2D视觉标记系统进行室内定位导航的技术。在本文中，我们对3种可服务于实时应用的2D视觉标记（Vuforia，ArUco标记和AprilTag）进行了定性的性能评估。本文重点研究了附于地表瓷砖的3种视觉标记在特定情况下的表现，提出了最优视觉标记的甄选原则，为我们提出的室内定位导航技术提供技术支撑。

关键词：室内定位；视觉标记；计算机视觉

HTML PDF 收藏

户外空中双机械手抓取设计和视觉伺服 Article

Pablo Ramon-Soria, Begoña C. Arrue, Anibal Ollero

《工程（英文）》 2020年第6卷第1期页码 77-88 doi: 10.1016/j.eng.2019.11.003

摘要：

本文介绍了一种配备有RGB-D摄像机的使用带有双机械手的无人飞行器（unmanned aerial vehicle, UAV）抓取已知物体的系统。空中操纵仍然是一项极具挑战性的任务。本文主要从三个方面对这一任务进行了评价：目标检测与姿态估计、抓取设计、飞行中的抓取动作。人工神经网络（artificial neural network, ANN）首先被用来获得有关物体位置的线索。接下来，使用对齐算法获取对象的六维（six-dimensional, 6D）姿态，并使用扩展的卡尔曼滤波器进行滤波。然后，使用物体的三维（three-dimensional, 3D）模型来估计空中机械手可实现良好抓取的排列清单。检测算法的结果（即对象的姿态）用于更新手臂朝向对象的轨迹。如果由于无人机的振荡而无法达到目标姿态，则算法将切换到下一个可行的抓取。本文介绍了总体方法，给出了每个模块的仿真实验结果和实际实验结果，并提供了视频演示结果。

关键词：空中操纵，抓取设计，视觉伺服

HTML PDF 收藏

视觉假体：技术和社会经济挑战 Perspective

John B. Troy

《工程（英文）》 2015年第1卷第3期页码 288-291 doi: 10.15302/J-ENG-2015080

摘要：

视觉假体目前已经进入临床市场。最初，视觉假体用来治疗因视网膜色素变性(RP) 导致失明的患者。2015年7月下旬，视网膜假体首次用于治疗干性年龄相关性黄斑变性。而对视网膜输出功能完全丧失的眼部疾病，则需要植入与视觉中枢进行接口的假体类型。目前正在研发的中枢视觉假体的代表是视皮层假体。本文探讨了视觉假体所面临的技术方面和社会经济方面的挑战。

关键词：神经假体视觉眼部疾病功能恢复康复

HTML PDF 收藏

论视觉理解 Perspective

潘云鹤

《信息与电子工程前沿（英文）》 2022年第23卷第9期页码 1287-1289 doi: 10.1631/FITEE.2130000

摘要： 1 Problems and development in the field of visual recognition From the beginning of artificial intelligence (AI), pattern recognition has been an important aspect of the field. In recent years, the maturity of deep neural networks (DNNs) has significantly improved the accuracy of visual recognition. DNN has been widely used in applications such as medical image classification, vehicle identification, and facial recognition, and has thus promoted the development of the AI industry to a climax. However, there are currently critical defects in visual recognition based on DNN technology. For example, these networks usually require a very large amount of labeled training data, and have weak cross-domain transferability and task generalization. Their learning and reasoning processes are still hard to understand, which leads to unexplainable predictions. These challenges present an obstacle to the development of AI research and application. If we look at the current visual recognition technology from a larger and broader perspective, we can find that the above defects are fundamental, because the currently used DNN model needs to be trained with a large amount of labeled visual data, and then used in the process of visual recognition. In essence, it is a classification process based on data statistics and pattern matching (), so it is heavily dependent on training sample distribution. However, to have interpretability and transferability, visual classification is not good enough, while visual understanding becomes indispensable. 2 Three-step model of visual understanding Visual recognition is not equivalent to visual understanding. We propose that there are three steps in visual understanding, of which classification is only the first. After classification, one proceeds to the second step: visual parsing. In the process of visual parsing, the components of the visual object and their structural relationship are further identified and compared. Identification involves finding components and structures in visual data that correspond to the components and structures of known visual concepts. Parsing verifies the correctness of the classification results and establishes the structure of visual object data. After completing visual parsing, one proceeds to the third step: visual simulation. In this step, predictive motion simulation and operations including causal reasoning are carried out on the structure of the visual objects to judge the rationality of meeting physical constraints in reality, so as to verify the previous recognition and parsing results. We can take a picture of a cat as an example to illustrate the modeling process of visual understanding. The process is as follows: 1. Recognition: It is a cat. Extract the visual concept of the cat and proceed to the next step; otherwise, stop here. 2. Parsing: Based on the structure contained in the visual concept, identify whether the cat’s head, body, feet, tail, and their relationships are suitable for the cat concept. If not, return to step 1 for re-identification; if yes, proceed to the next step. 3. Simulation: Simulate various activities of the cat to investigate whether the cat’s activities in various environments can be completed reasonably. If not, return to step 2; if yes, proceed to the next step. 4. End visual understanding: Incorporate the processed structured data into the knowledge about cats. 3 Characteristics of the three-step visual understanding model To further understand the above-mentioned three-step visual understanding model, we will further discuss some of its characteristics: 1. The key step in visual understanding is visual parsing. This is an identification of the components contained in the object according to a conceptual structure based on the visual concept (), obtained by visual recognition. Parsing a visual object, in order from top to bottom, is a process of identifying and constructing visual data from the root of the concept tree to the branches and leaves. 2. Human visual parsing tasks are often aimed only at the main components of concepts. The main components have existing, commonly used names. For subsidiary parts that have not been described in language, such as the area between the cheekbones and chin of the face, only experts specialized in anatomy (such as doctors or artists) have professional concepts and memories. Therefore, visual parsing is a cross-media () process that incorporates multiple knowledge () including vision and language. 3. Visual knowledge () is essential for visual parsing and visual simulation, because the visual concept structure provides a reliable source for component identification and comparison. Parents and teachers play a large role in establishing visual knowledge. When they say to a child, “Look, this is a kitten. Kittens have pointed ears, round eyes, long whiskers, and four short legs. When they run fast and leap high, they can catch a mouse,” they are guiding children in constructing basic visual knowledge in their long-term memory. 4. Visual data that have been understood have actually been structured to form visual knowledge. Such visual knowledge can easily be incorporated into long-term memory. For example, when one sees a cat whose head is very small, or whose fur color and markings are unusual, or who has a particular gait, this information may be included in one’s “cat” memory by expanding the concept of “cat” (). The category of visual concepts is very important, and its extent reflects the general degree of knowledge. In fact, it is not always useful to collect a large amount of sample data to train a DNN model. However, the more widely distributed and balanced the data are within a concept category, the better, because the robustness and generalization ability of the model trained based on such sample data are stronger. 5. The learned visual information can naturally be explained, because it has deep structural cognition; it can also be used for transfer learning because the semantic concepts have cross-media relevance. This semantic information can clearly indicate the reasonable direction of transferable recognition. 4 Advancing visual recognition to visual understanding Visual understanding is important, because it can potentially work with visual knowledge () and multiple knowledge representation () to open a new door to AI research. Visual understanding involves not only in-depth visual recognition, but also thorough learning and application of visual knowledge (). AI researchers have been studying visual recognition for more than half a century. Speech recognition, a research task started in parallel with visual recognition, moved on to analysis of words, sentences, and paragraphs quite early, and has successfully developed human-computer dialogue and machine translation, setting a well-known milestone. Therefore, we suggest that it is necessary to advance visual recognition to visual understanding, and that this is an appropriate time to target this deeper visual intelligence behavior.

HTML PDF 收藏

基于人的信息处理模型分析操作人员视觉信息处理过程

金银花,李桢业,古辉,汤一平

《中国工程科学》 2007年第9卷第5期页码 57-61

摘要：报告了由知觉、思维、运动３个处理器和短期记忆、工作记忆和长期记忆构成的研究用计算机实现人的信息处理模型；把人的信息处理模型安装在一台PC机上，模拟生产操作人员监视锅炉厂模拟机计算机屏幕的过程表明，视觉信息处理过程取决于画面因素

关键词：人的信息处理模型知觉处理器视觉信息处理心理状态人为操作失误

HTML PDF 收藏

我国眼科学和视觉科学领域生物工程研究现状和应对策略

谢立信,周庆军,徐海峰,林萍

《中国工程科学》 2017年第19卷第2期页码 100-105 doi: 10.15302/J-SSCAE-2017.02.017

摘要：本文介绍了我国眼科学和视觉科学领域生物工程研究的发展现状，分析了角膜和视网膜领域生物工程研究存在的主要问题，并结合我国国情，针对研发方向、审批制度、成果转化和研究平台建设提出了应对策略和政策建议。

关键词：眼科学和视觉科学生物工程现状对策

HTML PDF 收藏

感存算一体化智能视觉芯片展望

潘汶, 郑纪元, 汪莱, 罗毅

《工程（英文）》 2022年第14卷第7期页码 19-21 doi: 10.1016/j.eng.2022.01.009

HTML PDF 收藏

基于计算机视觉的民用基础设施的检查与监测研究进展 Review

Billie F. Spencer Jr.,Vedhus Hoskere,Yasutaka Narazaki

《工程（英文）》 2019年第5卷第2期页码 199-222 doi: 10.1016/j.eng.2018.11.030

摘要：

计算机视觉技术与远程摄像机和无人机（UAVs）的采集相结合，为民用基础设施状况评估提供了前景良好的非接触式解决方案。这种系统的最终目标是自动且稳健地将图像或视频数据转换为可操作的信息。本文概述了将计算机视觉技术应用于民用基础设施状态评估的最新进展。特别介绍了计算机视觉、机器学习和结构工程领域的相关研究。评估工作分为两类：检查应用和监测应用。最后，文章指出了为实现基于自动化视觉的民用基础设施和监测目标而持续存在的一些关键挑战，以及为解决这些挑战而正在进行的工作。

关键词：结构检查和监测人工智能计算机视觉机器学习光流