Computing over Space: Status, Challenges, and Opportunities

Yaoqi Liu , Yinhe Han , Hongxin Li , Shuhao Gu , Jibing Qiu , Ting Li

Engineering ›› 2025, Vol. 54 ›› Issue (11) : 20 -25.

PDF
Engineering ›› 2025, Vol. 54 ›› Issue (11) : 20 -25. DOI: 10.1016/j.eng.2025.06.005
Views & Comments

Computing over Space: Status, Challenges, and Opportunities

Author information +
History +
PDF

Graphical abstract

Cite this article

Download citation ▾
Yaoqi Liu, Yinhe Han, Hongxin Li, Shuhao Gu, Jibing Qiu, Ting Li. Computing over Space: Status, Challenges, and Opportunities. Engineering, 2025, 54(11): 20-25 DOI:10.1016/j.eng.2025.06.005

登录浏览全文

4963

注册一个新账户 忘记密码

1. Introduction

The rapid expansion of satellite constellations in recent years has resulted in the generation of massive amounts of data. This surge in data, coupled with diverse application scenarios, underscores the escalating demand for high-performance computing over space. Computing over space entails the deployment of computational resources on platforms such as satellites to process large-scale data under constraints such as high radiation exposure, restricted power consumption, and minimized weight.

For instance, computing over space can be applied to satellite remote sensing. As the ground resolution of remote-sensing images has developed from 10.0 to 0.3 m, the data volume under the same swath has increased by approximately 1000 times. However, the bandwidth of satellite–ground communications and the duration of the link while a satellite passes over the ground station are limited [1], and onboard computing may save several days of transmission and processing time [2], which is too long for specialized tasks such as emergency response. Computing over space can extract high-value information from huge amounts of data and significantly reduce the required transmission bandwidth and service time. Based on this concept, Prof. Deren Li from Wuhan University has proposed the Oriental Smart Eye (OSE) satellite constellation to provide real-time intelligent services for satellite remote-sensing information [3].

In addition to remote sensing, satellite communication is a critical application of space computing. Onboard deployment of the core network is crucial for realizing satellite networks, and the performance of space computing has always been a key limiting factor for tasks such as signal processing, multiplexing, traffic management, and resource allocation [[4], [5], [6]]. Therefore, Prof. Shangguang Wang from Beijing University of Posts and Telecommunications has proposed the Tiansuan Constellation and conducted experiments on the onboard deployment of the core network [7]. Communication satellites have numerous users, making network planning and optimization important. Prof. Ping Zhang has proposed semantic communication [8], which can improve communication efficiency by extracting and utilizing semantic information. Still, semantic communication will place even greater demands on onboard computing capability.

Furthermore, tasks such as autonomous collision avoidance for spacecraft [9] and robotic space exploration [10] require high-performance computing to complete complex algorithms such as artificial intelligence algorithms. To address this issue, Lumen Orbit has proposed the concept of the space data center [11]. By leveraging advantages such as solar power resources and low-temperature cooling in space, a space data center can reduce the comprehensive cost to about one twentieth of that on the ground [11].

While the demand for computing over space is continuously growing, the performance of computing over space is typically low. For example, the RAD5500 processors that are commonly used in space have a performance of only 0.9 giga floating-point operations per second (GFlops). In contrast, the NVIDIA A100, a commercial off-the-shelf (COTS) chip commonly used on the ground, already achieves a performance of 156 tera floating-point operations per second (TFlops). As shown in Fig. 1, the performance of the chips commonly used in satellites and the COTS chips on the ground consistently differ by three to four orders of magnitude, largely because of space radiation. The electronic components used in space usually require radiation hardening or radiation-resistant treatment to withstand the cumulative effects of radiation [12].

The onboard use of COTS devices, along with system-level hardening measures to alleviate the reliability deficiencies of such devices, is an important technical approach to meet the increasingly higher demand for computing over space. In 2003, Behr et al. [13] experimented with the use of COTS devices for computing over space. In 2021, Hewlett Packard Enterprise (HPE) and National Aeronautics and Space Administration (NASA) collaborated to send the HPE Spaceborne Computer 2 to the International Space Station. This computer carried an NVIDIA T4 graphics processing unit‌ (GPU), providing 65 tera operations per second (TOPS) of computing performance. Our team developed the Jiguang 1000 space intelligent computer, which is equipped with Cambricon neural processing unit (NPU) chips, achieving 32 TOPS of computing performance; the computer was launched aboard the Jilin-1 01A01 satellite in 2022. In 2024, Jiguang 1000-OSE was launched on the OSE 01 satellite. On this computing platform, we have implemented the inference of a visual large language model (VLLM). In addition to the aforementioned applications, there are a multitude of applications for similar COTS devices within space computing systems, including but not limited to those outlined in Refs. [[14], [15], [16], [17], [18]].

Although the use of COTS devices has improved the performance of computing over space, there is still a significant gap in performance between current space computing systems and the most advanced systems on the ground. To further develop computing over space, it is necessary to address the following key issues: first, to design a computing architecture and fault-tolerance measures to ensure reliability; second, to design effective thermal control systems for high-heat-flux-density COTS devices in the vacuum space environment; and third, to develop intelligent applications to meet diverse scenario requirements. The next section discusses these challenges and provides possible solutions.

2. Key technologies for computing over space

2.1. The computing architecture

The architecture of space computing systems plays a critical role in determining both performance and reliability. Over the past decades, guided by the pursuit of higher reliability, enhanced performance, reduced power consumption, and minimized costs, the development of computing architecture over space can be summarized into four phases, as shown in Fig. 2:

Phase 1: distributed embedded systems (DES), where the satellite system consists of independent embedded systems;

Phase 2: integrated electronic systems (IES), which centralize multiple systems on a single platform;

Phase 3: external intelligent systems (EIS), in which additional high-performance computing devices are added to IES, enabling the execution of complex algorithms such as artificial intelligence algorithms;

Phase 4: integrated intelligent systems (IIS), where EIS and IES are unified into an integrated intelligent system, with lower power consumption, smaller volume, and higher performance.

The widespread adoption of EIS and IIS heavily relies on the increased reliability of space computing systems. In the extreme environment of space, the effects of radiation create high failure risks. Since COTS components have inherent reliability limitations, it is essential to analyze potential failures and system reliability and to carry out targeted system fault-tolerant reinforcement design. However, the complexity of the computing architecture increases the difficulty of reliability analysis. As shown in Fig. 3, a space computing system may include multiple boards, such as main control boards, exchange boards, storage boards, computing boards, and so forth. Along with cold/hot redundant backup strategies, these boards form a complex interdependent system. To address the reliability analysis and reinforcement issues of a complex system, a hierarchical fault-tolerant theory model is necessary. Modeling methods including Monte Carlo simulation [19], vulnerability analysis [20], state transition models [21], reliability block diagrams [22], and fault tree analysis [23], in combination with simulation and testing methods including fault injection [24], irradiation experiments, and system testing experiments, can quantify complex uncertainty factors into probability curves, providing a theoretical basis for fault-tolerant mechanism design.

To incorporate fault-tolerant hardening methods, a collaborative fault-tolerant system should be established at different levels such as the component, system architecture, operating system, and algorithm levels. For example, at the component level, methods such as instruction-level time redundancy [25] and multi-device redundancy [26] can be used to improve error-correction capability at the cost of significant performance loss. At the system architecture level, methods such as key module redundancy, cold/hot backup, and watchdog [27,28] can be used to increase tolerance to critical component failures. At the software level, technologies such as cloud native [29] or microkernel [30] can improve the availability of the operating system. Finally, at the algorithm level, redundancy can be applied to data [31] or neural network models [32] to reduce silent data errors.

2.2. The thermal control system

In the vacuum space environment, with its huge temperature differences, the heat-dissipation capacity of the thermal control system is crucial to the performance of computing systems. Electronic components have operational temperature ranges, with industrial-grade devices typically rated from –40 to 85 °C. Exceeding these limits can lead to performance degradation or even component failure [33]. Heat flux density measures the thermal power generated per unit area of a device; the higher the heat flux density is, the more challenging the heat dissipation becomes. Since computing performance, power consumption, and heat flux density are usually positively correlated, high-performance COTS devices exhibit high heat flux density. Moreover, compared with the environment of ground-based computing systems, the working environment of space computing is extremely harsh. The side of the satellite exposed to direct sunlight can reach temperatures of over 100 °C, while the temperature of the shaded side can drop to as low as –100 to –200 °C.

In space, common thermal control designs such as fins [34] are invalid. Common heat-transfer methods for space computing devices include solid conduction, heat pipes, and fluid loops. Among these, solid conduction is the most commonly used method; it relies on the natural properties of materials and structural design to conduct heat. Its supporting maximum heat flux density is around 20 W·cm−2. Heat pipes consist of sealed pipes and internal working fluids that conduct heat through evaporation and condensation, circulating through capillary action or gravity. Fluid loops achieve heat transfer through fluid convection; they can support a higher heat flux density but are limited in terms of weight, layout, micro-vibrations, and so forth [35]. As a result, fluid loops are usually used in large spacecraft such as the International Space Station [36] and the Shenzhou spacecraft [37]. Radiation-hardened (rad-hard) devices typically have low power consumption. For instance, the RAD750 has a power consumption of only 5 W. Although solid conduction is sufficient, it cannot support high-performance GPU devices. For example, the NVIDIA A100 has a power consumption of around 300 W and a chip area of approximately 8.26 cm2, with a heat flux density reaching about 36.3 W·cm−2. Thus, fluid loops may become an important solution for future computing over space.

At present, due to considerations of reliability, weight, power consumption costs, and so forth, few space computing systems utilize fluid loops. However, fluid circuits have become an almost essential approach to achieving the highest possible computing capability in space. Substantial research gaps remain on the topic of how to optimize fluid circuits. In order to reduce the impact of fluid circuits on reliability, we propose a hybrid passive–active cooling (HPAC) method, as shown in Fig. 4. The active cooling part is responsible for cooling high-power chips, while the passive cooling part is responsible for cooling low-power chips. The HPAC system ensures basic functionality even in case of fluid loop failures.

2.3. Applications

The use of high-performance COTS devices in space unlocks advanced data processing and analysis capabilities, making complex intelligent applications possible. For example, Jiguang 1000-OSE has implemented algorithms including object recognition, cloud image discrimination, and image compression, significantly increasing onboard data-utilization efficiency.

Future advancements in satellite application algorithms will improve data-fusion capabilities and operational efficiency, which are critical for time-sensitive applications such as disaster monitoring. Integrating large language models (LLMs) into satellite systems offers a promising solution for achieving intelligent information fusion and the natural language interpretation of human instructions. As illustrated in Fig. 5(a), the proposed multimodal VLLM architecture demonstrates the advantages of bidirectional natural language communication with ground operators and automated analysis of remote-sensing imagery.

Based on the VLLM architecture, we constructed Jiguang VLLM and conducted text question-answering experiments on the Jiguang 1000-OSE platform. To address the relatively weak computing performance and limited bandwidth in the satellite–ground communication link, we have implemented optimization techniques, including model distillation and parameter quantization [38]. As shown in Fig. 5(b), the experiment successfully validated the possibility of the VLLM’s onboard inference

Further advancements and refinements are anticipated in the future. For example, the integration of federated learning within satellite computing networks offers a pathway for satellites to collaboratively leverage distributed data resources while maintaining data privacy and security [39]. Moreover, the incorporation of emerging modules alongside customized, task-specific modules into configurable VLLM architectures has the potential to significantly increase the model’s adaptability and capacity for continuous evolution in dynamic and heterogeneous environments [40]. Such advancements are expected to pave the way for more robust, scalable, and adaptive intelligent systems, thereby expanding the potential applications of VLLM in complex and dynamic spatial computing scenarios.

3. Conclusions

In conclusion, high-performance computing over space represents a transformative capability, enabling real-time data processing and analysis across diverse fields such as remote sensing and communications. This paper highlighted the pivotal role of COTS devices in advancing high-performance space computing; addressed critical technical challenges, including system reliability, thermal control, and applications; and proposed potential solutions. In regard to computing architecture, the evolution of EIS and IIS relies heavily on the increased reliability of COTS-based space computing systems. By implementing reliability analyses and fault-tolerant methods at multiple levels, the overall system reliability can be enhanced. In regard to thermal control systems, the challenges of computing over space are significant due to the extreme temperature differentials in space and the lack of air convection in a vacuum. The HPAC method holds promise as an important solution to heat-dissipation issues. On the application front, by converting massive image data into high-value natural language text, the VLLM opens up new possibilities for rapid information services. As these technologies mature, computing over space is poised to revolutionize fields such as autonomous exploration and space-based data centers.

CRediT authorship contribution statement

Yaoqi Liu: Writing – original draft. Yinhe Han: Writing – original draft. Hongxin Li: Writing – original draft. Shuhao Gu: Software. Jibing Qiu: Writing – original draft. Ting Li: Resources.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (2022YFB3902802), in part by the Beijing Natural Science Foundation (L241013), and in part by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA000000).

References

[1]

Giggenbach D, Horwath J, Epple B. Optical satellite downlinks to optical ground stations and high-altitude platforms.In: Proceedings of the 2007 16th IST Mobile and Wireless Communications Summit; 2007 Jul 1–15; Budapest, Hungary. New York City: IEEE; 2007. p. 1–4.

[2]

Zhang B, Wu Y, Zhao B, Chanussot J, Hong D, Yao J. Progress and challenges in intelligent remote sensing satellite systems. IEEE J Sel Top Appl Earth Obs Remote Sens 2022; 15:1814-1822.

[3]

Li D. From the Luojia series satellites to the Oriental Smart Eye constellation. Geomat Inf Sci Wuhan Univ 2023;48(10). Chinese.

[4]

Kodheli O, Lagunas E, Maturo N, Sharma SK, Shankar B, Montoya JFM. Satellite communications in the new space era: a survey and future challenges. IEEE Commun Surv Tutor 2020; 23(1):70-109.

[5]

Su Y, Liu Y, Zhou Y, Yuan J, Cao H, Shi J. Broadband LEO satellite communications: architectures and key technologies. IEEE Wirel Commun 2019; 26(2):55-61.

[6]

Zhou D, Sheng M, Li J, Han Z. Aerospace integrated networks innovation for empowering 6G: a survey and future challenges. IEEE Commun Surv Tutor 2023; 25(2):975-1019.

[7]

Wang S, Li Q, Xu M, Ma X, Zhou A, Sun Q. Tiansuan constellation: an open research platform. In: Proceedings of the 2021 IEEE International Conference on Edge Computing (EDGE); 2021 Sep 5–10; Chicago, IL, USA. New York City: IEEE; 2021. p. 94–101.-

[8]

Zhang P, Xu W, Gao H, Niu K, Xu X, Qin X, et al. Toward wisdom-evolutionary and primitive-concise 6G: a new paradigm of semantic communication networks. Engineering 2022; 8:60-73.

[9]

Uriot T, Izzo D, Sim LFões, Abay R, Einecke N, Rebhan S, et al. Spacecraft collision avoidance challenge: design and results of a machine learning competition. Astrodynamics 2022; 6:121-140.

[10]

Gao Y, Chien S. Review on space robotics: toward top-level science through space exploration. Sci Robot 2017; 2(7):eaan5074.

[11]

Ezra F, Adi O, Philip J. Why we should train AI in space. Lumen Orbit, Inc., Redmond 2024.

[12]

Gaillardin M, Raine M, Paillet P, Martinez M, Marcandella C, Girard S, et al. Radiation effects in advanced SOI devices: new insights into total ionizing dose and single-event effects. In: Proceedings of the 2013 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference; 2013 Oct 7–10; Monterey, CA, USA. New York City: IEEE; 2013. p. 1–2.

[13]

Behr P, Bärwald W. Fault tolerance and COTS: next generation of high performance satellite computers. In: Proceedings of the DASIA 2003; 2003 Jun 2–6; Prague, Czech Republic. Noordwijk: ESA Publications Division; 2003. p. 532.

[14]

Schmidt AG, Weisz G, French M, Flatley T, Villal-pando CY. SpaceCubeX: a framework for evaluating hybrid multi-core CPU/FPGA/DSP architectures. In: Proceedings of the IEEE Aerospace Conference; 2017 Mar 4–11; Big Sky, MT, USA. New York City: IEEE; 2017. p. 1–10.

[15]

Wang M, Zhang Z, Zhu Y, Dong Z, Li Y. Embedded GPU implementation of sensor correction for on-board real-time stream computing of high-resolution optical satellite imagery. J Real-Time Image Process 2018; 15:565-581.

[16]

Adams C, Spain A, Parker J, Hevert M, Roach J, Cotton D. Towards an integrated GPU accelerated SoC as a flight computer for small satellites. In: Proceedings of the 2019 IEEE Aerospace Conference; 2019 Mar 2–9; Big Sky, MT, USA. New York City: IEEE; 2019. p. 1–7.

[17]

Buonauto N, Louie M, Aarestad J, Mital R, Mateik D, Sivilli R, et al. Satellite identification imaging for small satellites using NVIDIA. Utah State University, Logan, UT, USA. Logan 2017.

[18]

Ciardi R, Giuffrida G, Benelli G, Cardenio C, Maderna R. GPU@ SAT: a general-purpose programmable accelerator for on board data processing and satellite autonomy. In: Proceedings of the International Conference on Applied Intelligence and Informatics; 2022 Sep 1–3; Reggio Calabria, Italy. Cham: Springer Nature Switzerland; 2022. p. 35–47.

[19]

Guo J, Monas L, Gill E. Statistical analysis and modelling of small satellite reliability. Acta Astronaut 2014; 98:97-110.

[20]

Cherezova N, Shibin K, Jenihhin M, Jutman A. Understanding fault-tolerance vulnerabilities in advanced SoC FPGAs for critical applications. Microelectron Reliab 2023; 146:115010.

[21]

Jiang L, Yang G, Li H, Hu W, Xu P. Reliability research and design of on-board computers of micro-satellite. J Syst Eng Electron 2009; 31(1):238-240.

[22]

Guo H, Yang X. A simple reliability block diagram method for safety integrity verification. Reliab Eng Syst Saf 2007; 92(9):1267-1273.

[23]

Van Breukelen ED, Hamann RJ, Overbosch EG. Qualitative fault tree analysis applied as a design tool in a low cost satellite design: method and lessons learned. In: Proceedings of the 57th International Astronautical Congress; 2006 Oct 2–6; Valenica, Spain. Reston: American Institute of Aeronautics and Astronautics; 2006.

[24]

Ziade H, Ayoubi RA, Velazco R. A survey on fault injection techniques. Int Arab J Inf Technol 2004; 1(2):171-186.

[25]

Rebaudengo M, Reorda MS, Violante M. A new software-based technique for low-cost fault-tolerant application. In: Proceedings of the Annual Reliability and Maintainability Symposium; 2003 Jan 27–30; Tampa, FL, USA. New York City: IEEE; 2003. p. 25–8.

[26]

Wang X, Sun HX. The new fault tolerant onboard computer for microsatellite missions. J China Univ Post Telecommun 2006; 13(1):6-9.

[27]

Fayyaz M, Vladimirova T. Fault-tolerant distributed approach to satellite on-board computer design. In: Proceedings of the 2014 IEEE Aerospace Conference; 2014 Mar 1–8; Big Sky, MT, USA. New York City: IEEE; 2014. p. 1–12.

[28]

Zhang Y, Zheng Y, Yang M, Li H, Jin Z. Design and implementation of the highly-reliable, low-cost housekeeping system in the ZDPS-1A pico-satellite. J Zhejiang Univ Sci 2012; 13(2):83-89.

[29]

Wang C, Zhang Y, Li Q, Zhou A, Wang S. Satellite computing: a case study of cloud-native satellites.In: Proceedings of the 2023 IEEE International Conference on Edge Computing and Communications; 2023 Jul 2–8; Chicago, IL, USA. New York City: IEEE; 2023. p. 262–70.

[30]

Gu J, Hua Z, Li M, Chen H. Innovations and applications of operating system security with a hardware–software co-design. Chin Sci Bull 2022; 67:3861-3871.

[31]

Roffe S, George AD. Evaluation of algorithm-based fault tolerance for machine learning and computer vision under neutron radiation. In: Proceedings of the 2020 IEEE Aerospace Conference; 2020 Mar 7–14; Big Sky, MT, USA. New York City: IEEE; 2020. p. 1–9.

[32]

Zhu J, Conde J, Gao Z, Reviriego P, Liu S, Lombardi F. Concurrent linguistic error detection (CLED) for large language models.2024. arXiv: 2403.16393.

[33]

Ma C, Tu Y, Ren Y, Zhou S, Wang Z. Review of thermal characteristics simulation analysis of electronic components, IEEE, Dalian, China. New York City 2022, pp. 1-5

[34]

Mahajan R, Chiu C, Chrysler G. Cooling a microprocessor chip. Proc IEEE 2006; 94(8):1476-1486.

[35]

Liu Q, Huang L, Yi H. Current status and suggestions of satellite borne single-phase fluid loops. IOP Publishing 2022; 2403(1):012035.

[36]

Patel VP, Winton D, Ibarra TH. A selected operational history of the internal thermal control system (ITCS) for International Space Station (ISS). Marshall Space Flight Center, SAE Technical Paper. Huntsville 2004.

[37]

Huang J, Fan Y, Yu S, Yu X. On-orbit performance evaluation of single-phase fluid loop system for Shenzhou-7 spaceship. Spacecr Eng 2009; 18(4):37-43.

[38]

Xu X, Li M, Tao C, Shen T, Cheng R, Li J, et al. A survey on knowledge distillation of large language models.2024. arXiv: 2402.13116.

[39]

Chen H, Xiao M, Pang Z. Satellite-based computing networks with federated learning. IEEE Wirel Commun 2022; 29(1):78-84.

[40]

Xiao C, Luo Y, Zhang W, Zhang P, Han X, Lin Y, et al. Variator: accelerating pre-trained models with plug-and-play compression modules.2023. arXiv: 2310.15724.

AI Summary AI Mindmap
PDF

4581

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/