The goal of an autonomous machine is to achieve an objective by making decisions while negotiating a dynamic environment. Given complete knowledge of a system’s current state, artificial intelligence and machine learning can excel at this, and even outperform humans at certain tasks — for example, when playing arcade and turn-based board games1. But beyond the idealized world of games, real-world deployment of automated machines is hampered by environments that can be noisy and chaotic, and which are not adequately observed. The difficulty of devising long-term strategies from incomplete data can also hinder the operation of independent AI agents in real-world challenges. Writing in Nature, Bellemare et al.2 describe a way forward by demonstrating that stratospheric balloons, guided by AI, can pursue a long-term strategy for positioning themselves about a location on the Equator, even when precise knowledge of buffeting winds is not known.
Fixed-volume balloons, known as super-pressure balloons, are often used to carry out unmanned experiments in the upper atmosphere (Fig. 1). Station-keeping is the act of maintaining the position of such a balloon within a certain horizontal distance of a ground location (the station). This involves changing the balloon’s height to move it between regions in which winds blow in different directions — when the balloon is driven away from its station by winds at one height, it moves to a different height where the winds can blow it back again (Fig. 2).
Self-navigating balloons do one of two things to stay within range of their stations. When a balloon is outside its range, the onboard controller seeks winds pointing to within a small angle of the station. However, the balloon preferentially seeks out lighter winds when inside the target range and close to the station. Balloons that are more active in exploring winds above and below them are more likely to find suitable winds to help achieve station-keeping, but this comes at the expense of using battery power that might be needed for other tasks, such as relaying telecommunications or environmental monitoring. These competing factors need to be weighed up carefully.
A type of machine learning known as reinforcement learning can be used to train an artificial agent to make an optimal sequence of decisions. In the case of a super-pressure balloon, the decisions are whether to rise, fall or do nothing, and are based on a historical record of global winds3, local observed and forecast winds, and projected future flight paths. Crucially, the available wind data are sparse and do not fully constrain the flight controller’s decision-making.
In their system, Bellemare et al. filled in the gaps by adding randomly generated ‘noise’ to the wind data, to better map out the range of winds that could plausibly occur, and to improve assessments of the variety of paths the balloon might take in the future. The resulting wind information and its statistical uncertainty, together with a small number of balloon-relevant parameters, were used to train a machine-learning system known as an artificial neural network, and ultimately improved decision-making time during flights compared with previously used control systems, using similar battery power.
Earlier applications of reinforcement learning, which included playing classic board and arcade games, were trained using complete information sets1 — the same information that is available to human controllers4. These allowed like-for-like performance comparisons between humans and AI players. However, the challenge confronting Bellemare and colleagues was that incomplete knowledge of environmental winds not only makes it difficult to judge the optimal actions to take, but also makes forecasts of future states following these actions uncertain. These problems are further compounded by other practical uncertainties that don’t affect game controllers, such as those associated with internal balloon motions, power management and battery health. Bellemare and co-workers’ success therefore represents a big advance in the use of reinforcement learning for real-world applications.
Station-keeping performance is ultimately limited by the range of wind speeds and directions in the region surrounding the balloons (at heights of 15–20 kilometres, for the current study). The winds must also switch direction so that balloons can adjust their trajectory to stay within range of the station. These special conditions only persist for months at a time within the Equatorial stratosphere, where Bellemare and colleagues’ study was carried out — and where a slow procession of opposing winds peak in strength near 30 km, before descending and dissipating near 15 km, switching direction every 14 months or so5.
Such wind diversity also occurs elsewhere, but is less reliable and generally occurs beyond the range of heights at which a single super-pressure balloon can operate. During the flight campaign described in the current study, larger wind disturbances originating from high latitudes occurred in the tropical stratosphere, and probably assisted station-keeping. Bellemare and colleagues’ system might therefore struggle to achieve the same success at other locations. However, smaller, more rapid wind variations can also occur, including atmospheric waves of various types6, which a skilful controller could navigate to its advantage.
The advent of effective autonomous super-pressure balloons would open up a range of commercial and scientific applications for probing Earth’s atmosphere and that of other planets. Such balloons are already used to study small and large-scale waves in the tropical stratosphere7, and to detect low-frequency sounds produced by the ocean8, lightning9 and earthquakes10. They have also been proposed for use in future explorations of Venus’s atmosphere11, to search for signs of active volcanism and chemical signatures of life12. Moreover, the ability to fix a balloon’s geographical position is crucial if balloons are to be used to build an aerial wireless network for telecommunications — an early objective of Project Loon, the owners of the balloons used in Bellemare and colleagues’ study.
Station-keeping a balloon for months at a time would allow long-term environmental monitoring, for example, of air quality over cities, of carbon fluxes from heat-stressed forests and of regions of thawing permafrost. Other applications include monitoring animal- migration routes and illicit trafficking of goods and people across borders. These applications will become increasingly relevant as the effects of climate change become more pronounced, as restrictions on movement are imposed by global events such as COVID-19, and as long-term climate-change mitigation involving aviation prompts the search for alternative platforms for making aerial observations.
Nature 588, 33-34 (2020)