LaTex2Web logo

Documents Live, a web authoring and publishing system

If you see this, something is wrong

Table of contents

First published on Tuesday, Apr 22, 2025 and last modified on Tuesday, Apr 22, 2025 by François Chaplais.

Closed-Loop Neural Operator-Based Observer of Traffic Density
arXiv
Published version: 10.48550/arXiv.2504.04873

Alice Harting Division of Decision and Control Systems, Digital Futures, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden Email

Karl Henrik Johansson Division of Decision and Control Systems, Digital Futures, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden Email

Matthieu Barreau Division of Decision and Control Systems, Digital Futures, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden Email

Abstract

We consider the problem of traffic density estimation with sparse measurements from stationary roadside sensors. Our approach uses Fourier neural operators to learn macroscopic traffic flow dynamics from high-fidelity microscopic-level simulations. During inference, the operator functions as an open-loop predictor of traffic evolution. To close the loop, we couple the open-loop operator with a correction operator that combines the predicted density with sparse measurements from the sensors. Simulations with the SUMO software indicate that, compared to open-loop observers, the proposed closed-loop observer exhibit classical closed-loop properties such as robustness to noise and ultimate boundedness of the error. This shows the advantages of combining learned physics with real-time corrections, and opens avenues for accurate, efficient, and interpretable data-driven observers.

1 Introduction

Freeway congestion is a major issue in metropolitan areas, leading to increased travel times and excessive fuel consumption [1]. To address this, a core component of Intelligent Transportation Systems is freeway traffic control, which operates at the macroscopic level (ramp management and variable speed limits) or at the vehicle level (intelligent vehicle-based control).

Traffic control relies on measurements of state variables such as traffic density, which can be collected from stationary sensors (inductive loops) or mobile sensors (connected vehicles). However, these measurements are often sparse and noisy. To complete and denoise the state observation, estimation approaches combine data and prior knowledge of traffic dynamics [2].

Model-driven methods use physical models with calibrated parameters. Traffic state estimation then boils down to solving boundary-value problems of partial differential equations (PDEs) using numerical methods such as the Godunov scheme [3] or switching mode models [4]. To manage measurement noise and disturbance, it is common to use Kalman Filter approaches [5]. While these methods require little data and offer high interpretability, their accuracy is highly dependent on model choice and calibration [2].

Data-driven methods, on the other hand, learns a system model directly from historical data, thereby alleviating explicit PDE assumptions and requirements of well-posed boundary-value problems [2]. Classical methods include ARIMA variants, Gaussian process (GP) models, and state-space Markov models. The rise of deep learning for nonlinear function approximation has enabled more expressive models of traffic flow prediction [6]. In particular, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are often used as building blocks to capture spatiotemporal correlation. While deep learning approaches are expressive and require few assumptions, they are limited by a high data demand and computational complexity, and low interpretability. Thus, lightweight real-time prediction with explainable models remains an open challenge [7].

In pursuit of data-efficient and interpretable models, hybrid models combine data-driven methods with prior physical knowledge. For example, [8] reconstructs the traffic density state using physics-informed neural networks (PINNs) trained on offline data from sparse mobile sensors while regularizing with a continuous traffic PDE model. In [9], traffic models are brought into a GP-based architecture via latent force models and regularized kernel learning. However, these methods are restricted by explicit model assumptions on global traffic behavior. Furthermore, online estimation can be computationally demanding, since it requires retraining the network (PINNs) or performing matrix inversion with cubic complexity relative to the size of the dataset (GPs).

We identify the need for a data-driven observer of traffic density that encodes a learned governing model in an interpretable manner, offers lightweight real-time prediction, and integrates online measurements. For finite-dimensional nonlinear systems, there are learning-based solutions in the form of closed-loop data-driven observers where training is done offline and low-cost inference is performed online [10]. The natural extension to PDE systems would be to consider Fourier neural operators (FNOs) [11] which has emerged as a method to learn operators between infinite-dimensional function spaces from finite-dimensional data.

Notably, [12] successfully applied FNOs to learn a PDE solution operator for initial-boundary value problems and inverse problems of a first-order macroscopic model of traffic density flow. With this method, inference with new input data is significantly cheaper than with PINNs, since it is performed by a single forward pass through the operator network. However, the solver in this work is designed to generate either an offline reconstruction or open-loop prediction, without systematic integration of online data.

Motivated by this, we introduce a data-driven, closed-loop neural observer of traffic density flow from sparse measurements. Our contributions are leveraging FNOs to learn a prediction operator of traffic density flow from high-fidelity data, and integrating Luenberger observer theory to get a robust density estimate using online measurements. We statistically evaluate the performance of the observer and compare it to open-loop variants.

The paper is organized as follows. Section 2 gives a brief introduction to traffic flow theory and neural operators, concluding with the problem statement. Section 3 describes the methodology for designing three types of observers: open-loop, open-loop with reset, and closed-loop. Section 4 presents an extensive numerical evaluation of the proposed observers. Finally, Section 5 concludes the paper and outlines new research directions.

Notation: Let \( \mathcal{H}^{d}_{D}\triangleq H^2_{\text{per}}(D; \mathbb{R}^d)\) denote the Sobolev space of order two containing periodic functions with domain \( D\) and codomain dimension \( d\) . Further, we denote an operator with \( \mathcal{G}\) or \( \mathcal{N}\) . Finally, a Gaussian Process (GP) is denoted \( \mathcal{GP}\) , and the posterior process associated with data \( \mathcal{D}\) is denoted \( \mathcal{GP}_\mathcal{D}\) . We assume that the mean and covariance functions are chosen such that sample paths belong to \( \mathcal{H}^{d}_{D}\) almost surely.

2 Background and Problem Statement

We begin by briefly introducing basic traffic modeling theory, and then present neural operators in this context. This is followed by a formal statement of the research problem.

2.1 Modeling of Traffic Density Flow

Traffic density is defined as the normalized number of vehicles per space unit at location \( x\) and time \( t\) , \( \rho(x,t)\in[0,1]\) . Macroscopic models describe the dynamics of \( \rho\) directly. While they are fast and simple, they typically rely on extensive assumptions. In contrast, microscopic models track each vehicle and offer greater precision, but are more computationally intensive and less convenient to use in freeway traffic control [1].

2.1.1 Macroscopic scale

A fundamental continuous first-order macroscopic model for traffic density flow is given by the Lighthill–Whitham–Richards framework. It expresses conservation of vehicles through the one-dimensional continuity equation where the flux \( Q\in C^2([0,1])\) is uniquely determined by the local density [13]:

\[ \begin{equation} \begin{split} \frac{\partial \rho(x,t)}{\partial t} + \frac{\partial Q(\rho(x,t))}{\partial x} = 0&, ~ x \in \mathbb{R}, ~ t \geq 0,\\ \rho(\cdot, 0)=\rho_0&, ~ x\in\mathbb{R}. \end{split} \end{equation} \]

(1)

In general, classical solutions to the initial-value problem of the hyperbolic PDE (1) where \( Q\) is smooth may develop discontinuities in finite time even when the initial data \( \rho_0\) is smooth [14]. The weak solutions, on the other hand, are non-unique but can be filtered for physical relevancy using an entropy condition associated with the conservation law (1). In fact, whenever \( Q\) is smooth and \( \rho_0\) is well-behaved, there exist a unique admissible (entropic) solution \( \rho(\cdot, t)\) that is continuous in time and locally well-behaved [14, Theorem 6.2.1, 6.2.2]. In this view, traffic density estimation is a well-defined problem. However, a key challenge for macroscopic models is to properly model the flow function \( Q\) .

2.1.2 Microscopic scale

To generate representations of higher fidelity without direct modeling of \( Q\) , traffic flow can instead be simulated using higher-order models at the vehicle level[15]. The open-source SUMO framework [16] integrates road network data, infrastructure like traffic lights, and stochastic demand to produce realistic vehicle flows. In particular, we will consider ring-road dynamics as implemented by [17]: first, the initialization process gradually introduces vehicles to the road to create the initial state; then, the higher-order microscopic dynamics are run in loop. To convert the discrete distribution of vehicles to a continuous density function, a grid of vehicle-populated cells is convolved with a Gaussian filter [18]. In this way, microscopic models generate high-fidelity traffic representations with few restrictive assumptions. However, a key challenge is the lack of a compact representation of the dynamics that effectively integrates with macroscopic-level data and control strategies.

2.2 Fourier Neural Operators (FNOs)

To address this challenge, neural operators provide a framework for learning the governing dynamics on the macroscopic level. Generally, neural operators learn a mapping between infinite-dimensional function spaces, that is spaces containing functions with continuous domains, using finite-dimensional data. A special case is when the operator represents a forward solver for a hyperbolic PDE, such that the input function is the initial state and the output function is the solution at a final time \( T\) .

We introduce neural operators in the context of the traffic density flow arising from the higher-order, microscopic dynamics in SUMO simulations on a ring road \( \Omega=[0, L]\) . Consider pairs of functions

\[ (\zeta_0, \rho_T)\in\mathcal{H}^{n}_{\Omega}\times\mathcal{H}^{1}_{\Omega}, \]

where \( \zeta_0=[\rho(\cdot, 0), \rho(\cdot, -\Delta t), \rho(\cdot, -2\Delta t), ...]\) represents the discrete \( n^{\text{th}}\) -order initial state of step length \( \Delta t\) , and \( \rho_T= \rho(\cdot, T)\) represents the resulting traffic density at time \( T\) . Assume there exists an operator \( \mathcal{G}\) such that

\[ \mathcal{G}: \begin{array}[t]{rcl} \mathcal{H}^{n}_{\Omega}&\to&\mathcal{H}^{1}_{\Omega}, \\ \zeta_0&\mapsto&\rho_T, \end{array} \]

as can be motivated by the existence of a solution in the first-order case (1). Given observations of such pairs \( \{(\zeta_0^i, \rho_T^i)\}_{i=1}^N\) resulting from simulations initialized by \( \zeta_0^i\sim\mu_{\text{SUMO}}\) , the goal is to learn an approximate solution operator \( \mathcal{G}_{\theta}\approx\mathcal{G}\) where \( \mathcal{G}_{\theta}\) is parametrized by finite-dimensional vector \( \theta\) .

The approximate solution operator \( \mathcal{G}_{\theta}:\mathcal{H}^{n}_{\Omega}\to\mathcal{H}^{1}_{\Omega}\) can be modeled by a neural operator as defined in [19]:

\[ \begin{equation} \begin{split} v_1(x)=\mathcal{P}[\zeta_0](x),&\\ v_{M}(x)=\mathcal{L}_{M}\circ ...\circ\mathcal{L}_1[v_1](x),&\\ \rho_T(x)=\mathcal{Q}[v_{M}](x),~\forall x\in \Omega,& \end{split} \end{equation} \]

(2)

where \( \mathcal{P}\) is defined by a local transformation \( P:\mathbb{R}^{n} \times \Omega\to\mathbb{R}^{d_v}\) , \( \mathcal{P}[\zeta_0](x)\triangleq P(\zeta_0(x), x)\) acting on function inputs and outputs directly to lift the input state to higher-dimensional space of size \( v\) . Similarly, \( \mathcal{Q}\) is defined by a local transformation \( Q:\mathbb{R}^{d_v}\times\Omega\to\mathbb{R}\) , \( \mathcal{Q}[v](x)\triangleq Q(v(x),x)\) projecting back to the physical space. Both are typically shallow neural networks. The intermediate spectral convolution layers \( \mathcal{L}_l:\mathcal{V}\to\mathcal{V}\) , where \( \mathcal{V}=\mathcal{V}(\Omega;\mathbb{R}^{d_v})\) , are structured as

\[ \begin{align*} \mathcal{L}_l[v](x)=\sigma\big(W_l[v](x)+\mathcal{K}_{\phi_l}[v](x)\big),~\forall x\in \Omega, \end{align*} \]

where \( \sigma\) is a local nonlinear activation function, and \( W_l\) is a local pointwise affine transformation. The key component for the infinite-dimensional setting is the non-local bounded linear operator \( \mathcal{K}_{\phi_l}\in\mathcal{L}(\mathcal{V}, \mathcal{V})\) , which acts in the function space.

FNOs, proposed by [11], select \( \mathcal{K}_{\phi_l}\) as a kernel integral operator such that

\[ \begin{equation} \mathcal{K}_{\phi_l}[v]\triangleq\kappa_{\phi_l} * v, \end{equation} \]

(3)

where \( \kappa_{\phi_l}:\bar{\Omega} \to\mathbb{R}^ {d_v\times d_v}\) is a periodic function. The choice (3) allows for efficient computation in the Fourier domain via the equivalent expression

\[ \begin{align} \mathcal{K}_{\phi_l}[v](x) = \mathcal{F}^{-1}\big(\mathcal{F}(\kappa_{\phi_l})(k)\cdot \mathcal{F}(v)(k)\big)(x) \end{align} \]

(4)

where \( \mathcal{F}\) is the Fourier transform and \( k\in\mathbb{Z}\) . The kernel can then be parametrized directly in the Fourier space as \( R_{\phi_l}\triangleq\mathcal{F}(\kappa_{\phi_l})\) . We will work with a uniform discretization of \( \Omega\) , thus \( \mathcal{F}\) may be replaced by Fast Fourier Transform \( \hat{\mathcal{F}}\) .

Learning the parameters in the frequency domain, as opposed to the physical domain, enables a finite-dimensional representation of the operator while maintaining its ability to map between infinite-dimensional function spaces. The finite-dimensional representation is obtained by truncating the Fourier expansions to \( k_{max}\) frequency modes, such that \( \hat{\mathcal{F}}(v)\in\mathbb{C}^{k_{max}\times d_v}\) and \( R_{\phi_l}\in\mathbb{C}^{k_{max}\times d_v\times d_v}\) . Still, (4) is defined on a continuous domain, since \( \hat{\mathcal{F}}^{-1}\) can project on the basis \( e^{2\pi i\langle \cdot,k\rangle}\) for any \( x\in \Omega\) . In practice, this allows us to learn mappings between function spaces with continuous domains even though the data consists of function evaluations on discretized grids. The resulting architecture \( \mathcal{G}_{\theta}\) is fast and achieves state-of-the art results for PDE problems [11].

Similar to neural networks, FNOs are capable of approximating a large class of operators between infinite-dimensional spaces up to arbitrary accuracy. Below, we present the universal approximation theorem from [19] as applied to the setting of this paper.

Theorem 1

Let \( \mathcal{G}:\mathcal{H}^{n}_{\Omega} \to \mathcal{H}^{1}_{\Omega}\) be a continuous operator, and \( K\subset \mathcal{H}^{n}_{\Omega}\) a compact subset. Let \( \mathcal{G}_{\theta}:\mathcal{H}^{n}_{\Omega}\to \mathcal{H}^{1}_{\Omega}\) be defined as in (2) with the coefficients \( \theta\triangleq\{\theta_P, \theta_Q, \theta_{W_l}, R_{\phi_l}\}\) . Then, \( \forall\epsilon>0\) \( \exists\theta\) such that \( \mathcal{G}_{\theta}\) , continuous as an operator, satisfies

\[ \sup_{a\in K} \int\limits_\Omega\|\mathcal{G}[a](x)-\mathcal{G}_{\theta}[a](x)\|^2dx\leq \epsilon. \]

2.3 Problem Statement

Following this discussion, we model the SUMO simulations as an \( n^{\text{th}}\) -order discrete-time dynamical system

\[ \begin{align} \mathcal{S}_{\text{SUMO}}[\bar{\zeta_0}]: \left\{ \begin{array}{l} \rho(\cdot, t+\Delta t) = \mathcal{G}[\zeta_t],~ t > 0, \\ \zeta_t=[\rho(\cdot, t), \rho(\cdot, t-\Delta t), \dots],\\ \zeta_{0}=\bar{\zeta_0}, \end{array}\right. \end{align} \]

(5)

where \( \rho(\cdot, t) \in \mathcal{H}^{1}_{\Omega}\) is the true density, \( \zeta_t\in\mathcal{H}^{n}_{\Omega}\) is the state at time \( t\) , \( \bar{\zeta_0}\) is the initial state, and \( \mathcal{G}\) represents the solution operator of the unknown dynamics.

The objective of this paper is to develop an algorithm in the form of a dynamical system that observes (5) in closed-loop by generating real-time predictions \( \hat{\rho}\) of the system state \( \rho\) every \( \Delta t\) time unit such that \( \| \rho - \hat{\rho}\|\) is minimized. To achieve this goal, we assume access to online measurements \( y(x,t)=\rho(x,t)+\epsilon\) with i.i.d. noise \( \epsilon\) at stationary, sparse sensor locations \( x\in\mathcal{M}\subset\Omega\) , collectively denoted \( \mathbf{y}_{\mathcal{M}}(t)\) . We also assume access to offline data from physically identical systems, \( \{\mathcal{S}_{\text{SUMO}}[\zeta_0^i]\}_{i\in\mathcal{I}}\) .

Remark 1

The problem is inherently ill-posed due to the incomplete observation of the initial state. Thus, our goal is to learn the most probable solution that remains physically consistent with the dynamics in the offline observations.

3 Methodology

To build a closed-loop observer of (5), the first step is to obtain a representation of the open-loop prediction, \( \mathcal{G}\) . Thus, our approach starts by learning an approximate solution operator \( \mathcal{G}_{\theta} \approx \mathcal{G}\) from offline SUMO data. We then compare two methods of iteratively applying \( \mathcal{G}\) to observe (5) in an open-loop manner, \( \hat{\mathcal{S}}^{ol}\) and \( \hat{\mathcal{S}}^{ol\text{-}r}\) . To close the loop, we finally design an algorithm, \( \hat{\mathcal{S}}^{cl}\) , that integrates sparse online measurements with recursive predictions using \( \mathcal{G}\) . The research problem is then addressed by using \( \hat{\mathcal{S}}^{cl}\) together with \( \mathcal{G}_{\theta}\) , resulting in a data-driven, closed-loop observer.

3.1 Solution Operator: Model Identification

Given the offline SUMO scenarios \( \{\mathcal{S}_{\text{SUMO}}[\zeta_0^i]\}_{i\in\mathcal{I}}\) , the goal is to learn a representation of the solution operator

\[ \mathcal{G}_k: \begin{array}[t]{rcl} \mathcal{H}^{n}_{\Omega}&\to&\mathcal{H}^{1}_{\Omega}, \\ \zeta_0&\mapsto&\rho_{k\Delta t}, \end{array} \]

which maps the input state \( \zeta_0\) to the solution function at \( k\) time steps later, \( \rho_{k\Delta t}\) . While both \( \zeta_0\) and \( \rho_{k\Delta t}\) are infinite-dimensional functions, SUMO outputs finite-dimensional representations by evaluating them on a uniformly discretized grid \( \mathcal{X}\subset\Omega\) . This yields \( \zeta_0\rvert_\mathcal{X}\in\mathbb{R}^{n\times \lvert\mathcal{X}\rvert}\) and \( \rho_{k\Delta t}\rvert_\mathcal{X}\in\mathbb{R}^{\lvert\mathcal{X}\rvert}\) . To bridge the gap between finite-dimensional data and function space learning, the approximate solution operator \( \mathcal{G}_{\theta}\) is modeled as an FNO.

With a target offset \( n_{out}\) for the observer predictions, \( \mathcal{G}_{\theta}\) is trained to predict the density over the entire horizon,

\[ \begin{equation} \mathcal{G}_{\theta}: \begin{array}[t]{rcl} \mathcal{H}^{n}_{\Omega} & \to & \mathcal{H}^{n_{out}}_{\Omega}, \\ \zeta_0 & \mapsto & [\hat{\rho}_{\Delta t}, \hat{\rho}_{2\Delta t}, \dots, \hat{\rho}_{n_{out}\Delta t}]. \end{array} \end{equation} \]

(6)

Including the horizon up to the target estimate \( \hat{\rho}_{n_{out}\Delta t}\) was empirically found to improve performance, and it reflects the setup in [20]. Thus, the corresponding learning objective is

\[ \begin{equation} \min_{\theta}\mathbb{E}_{a\sim\mu_{\text{SUMO}}}\bigg[\sum\limits_{k=1}^{n_{out}}\int\limits_\Omega \|\mathcal{G}_{k}[a](x)-\mathcal{G}_{\theta}^k[a](x)\|^2dx\bigg]. \end{equation} \]

(7)

Fig. 1 illustrates a forward pass of a learned operator \( \mathcal{G}_{\theta}\) for \( n_{out}=100\) . We refer to this as open-loop prediction in the observer framework.

Open-loop prediction with solution operator G_{}) . It predicts _{ t},_{2 t},...,_{n_{out} t}) given an initial state =[_{0}, _{- t},]) . Here, n_{out}=100) .
Figure 1. Open-loop prediction with solution operator \( \mathcal{G}_{\theta}\) . It predicts \( \hat{\rho}_{\Delta t},\hat{\rho}_{2\Delta t},...,\hat{\rho}_{n_{out}\Delta t}\) given an initial state \( \zeta=[\rho_{0}, \rho_{-\Delta t},\dots]\) . Here, \( n_{out}=100\) .

3.2 Open-Loop Observers

Given a representation of \( \mathcal{G}\) , the goal is now to develop the observer algorithm. We begin by considering standard autoregression, wherein past estimates serve as input to succeeding predictions through the predicted state estimate

\[ \hat{\zeta}_{t} = [\hat{\rho}(\cdot, t), \hat{\rho}(\cdot, t-\Delta t), \dots]\in\mathcal{H}^{n}_{\Omega}. \]

Thus, triggering an autoregressive rollout only requires an estimate of the initial state \( \hat{\zeta}_{0}\) . However, this is only partially observed in our problem, since measurements are restricted to the sparse sensor locations \( \mathcal{M}\subsetneq\Omega\) . To complete the initial state estimate, we interpolate between the sparse measurements \( \mathbf{y}_{\mathcal{M}}(t)\) with GP regression,

\[ \hat{\zeta}_{0}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(0), \mathbf{y}_{\mathcal{M}}(-\Delta t),\dots]} \in\mathcal{H}^{n}_{\Omega}. \]

Thereafter, the estimates \( \hat{\rho}(\cdot,t)\) are generated by shifting the autoregressive prediction window in increments of \( \Delta t\) .

We found that using multi-step ahead prediction \( \mathcal{G}_{n_{d} + 1}\) with a delayed state estimate \( \hat{\zeta}_{t -n_{d}\Delta t}\) offset by \( n_{d}>n-1\) improved stability. This creates the open-loop observer \( \hat{\mathcal{S}}^{ol}\) :

\[ \begin{align} \hat{\mathcal{S}}^{ol}[\mathcal{G},\mathbf{y}_\mathcal{M}]: \left\{ \begin{array}{l} \hat{\rho}(\cdot, t+\Delta t) = \mathcal{G}_{n_{d} + 1}[\hat{\zeta}_{t -n_{d}\Delta t}], ~ ~ ~ \\ t = 0,\Delta t,\dots\\ \hat{\zeta}_{t -n_{d}\Delta t}=[\hat{\rho}(\cdot, t-n_{d}\Delta t), \dots],\\ \hat{\zeta}_{ -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(-n_{d}\Delta t),\dots]}. \end{array} \right. \end{align} \]

(8)

The gap between the initial state and the first estimate, \( \hat{\rho}(\cdot, t),\)  \( t\in[-n_{d}\Delta t + \Delta t,\dots, 0]\) , is filled by interpolating the corresponding measurements with GP regression. The complete algorithm is outlined in Algorithm 1.


Algorithm 1 Open-Loop Observer \( \hat{\mathcal{S}}^{ol}\)
1.input : \( \mathcal{G}, \mathbf{y}_{\mathcal{M}}\)
2.output : \( \hat{\rho}\)
3.\( t = 0\)
4.\( \hat{\zeta}_{ -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(-n_{d}\Delta t), \dots]},  x\in\mathcal{X}\)
5.\( \hat{\rho}(\cdot, t)\sim\mathcal{GP}_{\mathbf{y}_{\mathcal{M}}(t)},   -n_{d}\Delta t<t\leq0,  x\in\mathcal{X}\)
6.repeat
7.\( \hat{\rho}(\cdot, t+\Delta t) \gets \mathcal{G}_{n_{d}+1}[\hat{\zeta}_{t -n_{d}\Delta t}],  x\in\mathcal{X}\)
8.\( \hat{\zeta}_{t -n_{d}\Delta t + \Delta t}\gets [\hat{\rho}(\cdot, t -n_{d}\Delta t + \Delta t), \dots]\)
9.\( t\gets t+\Delta t\)
10.until end
11.return \( \hat{\rho}\)

While the purely autoregressive approach benefits from consistent predictions, it is sensitive to disturbance in the initial state. As reported in [21], autoregressive rollouts of neural operators suffer from unbounded out-of-distribution (OOD) error growth for long-term forecasts. Since we have access to online measurements, we compare \( \hat{\mathcal{S}}^{ol}\) with another open-loop observer that resets the input state for every new prediction based on recent measurements. Specifically, the data-based state estimate \( \hat{\zeta}^{\mathcal{D}}_{t}\) is obtained by interpolating the corresponding measurements \( [\mathbf{y}_{\mathcal{M}}(t), \mathbf{y}_{\mathcal{M}}(t-\Delta t),\dots]\) using GP regression,

\[ \begin{equation} \hat{\zeta}^{\mathcal{D}}_{t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(t), \mathbf{y}_{\mathcal{M}}(t-\Delta t),\dots]} \in\mathcal{H}^{n}_{\Omega}. \end{equation} \]

(9)

For comparability with (8), we consider a delayed state estimate \( \hat{\zeta}^{\mathcal{D}}_{t -n_{d}\Delta t}\) . The resulting dynamical system is denoted by an open-loop observer with reset \( \hat{\mathcal{S}}^{ol\text{-}r}\) , and defined as

\[ \begin{align} \hat{\mathcal{S}}^{ol\text{-}r}[\mathcal{G},\mathbf{y}_\mathcal{M}]: \left\{ \begin{array}{l} \hat{\rho}(\cdot, t+\Delta t) = \mathcal{G}_{n_{d} + 1}[\hat{\zeta}^{\mathcal{D}}_{t -n_{d}\Delta t}], ~ ~ ~ \\ t = 0, \Delta t,\dots\\ \hat{\zeta}^{\mathcal{D}}_{t -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(t - n_{d}\Delta t),\dots]},\\ \hat{\zeta}^{\mathcal{D}}_{ -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(-n_{d}\Delta t),\dots]}. \end{array} \right. \end{align} \]

(10)

The implementation details are given in Algorithm 2.


Algorithm 2 Open-Loop Observer with Reset \( \hat{\mathcal{S}}^{ol\textrm{-}r}\)
1.input : \( \mathcal{G}, \mathbf{y}_{\mathcal{M}}\)
2.output : \( \hat{\rho}\)
3.\( t=0\)
4.\( \hat{\zeta}^{\mathcal{D}}_{ -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(-n_{d}\Delta t),\dots]},  x\in\mathcal{X}\)
5.\( \hat{\rho}(\cdot, t)\sim\mathcal{GP}_{\mathbf{y}_{\mathcal{M}}(t)},   -n_{d}\Delta t<t\leq0,  x\in\mathcal{X}\)
6.repeat
7.\( \hat{\rho}(\cdot, t+\Delta t) \gets \mathcal{G}_{n_{d}+1}[\hat{\zeta}^{\mathcal{D}}_{t -n_{d}\Delta t}],  x\in\mathcal{X}\)
8.\( \hat{\zeta}^{\mathcal{D}}_{ t -n_{d}\Delta t + \Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(t-n_{d}\Delta t+\Delta t),\dots]}\)
9.\( t\gets t+\Delta t\)
10.until end
11.return \( \hat{\rho}\)

A limitation of \( \hat{\mathcal{S}}^{ol\text{-}r}\) is that each prediction relies primarily on few recent data points. By property of the disconnected predictions, the observer does not leverage the full historical dataset.

3.3 Closed-Loop Observer

Combining the consistent predictions of \( \hat{\mathcal{S}}^{ol}\) with the online updates of \( \hat{\mathcal{S}}^{ol\text{-}r}\) leads to our main contribution: a closed-loop observer \( \hat{\mathcal{S}}^{cl}\) that integrates an autoregressive rollout of \( \mathcal{G}\) with online measurements using a correction operator \( \mathcal{N}\) . Analogous to a Kalman filter, \( \mathcal{N}\) adjusts the predicted state estimate using recent measurements before it is passed as input to the next prediction. Measurements are introduced via a Luenberger-type error operator \( \mathcal{E}\) , which computes the error estimate \( \hat{e}\) as the difference between the predicted state estimate \( \hat{\zeta}\) and the data-based estimate \( \hat{\zeta}^{\mathcal{D}}\) . The correction operator \( \mathcal{N}\) then processes \( \hat{\zeta}\) and \( \hat{e}\) to generate an updated state estimate \( \hat{\zeta}^{u}\in \mathcal{H}^{n}_{\Omega}\) , which is passed as input to the next prediction.

More precisely, we consider updates to a moving window of delayed extended state estimates,

\[ \begin{equation} \begin{array}{l} \displaystyle \hat{\zeta}^{+}_{t -n_{d}\Delta t} = [\hat{\rho}(\cdot, t - (n-1)\Delta t),\dots,~~~~\\ \hat{\rho}(\cdot, t-n_{d}\Delta t),\dots,~~~~\\ \hat{\rho}(\cdot, t - n_{d}\Delta t - (n-1)\Delta t)],\\ \end{array} \end{equation} \]

(11)

assuming \( n_{d}>n-1\) as in the previous section. Thus, \( \hat{\zeta}^{+}\) contains \( \hat{\zeta}\) and extends forward in time such that the window length matches the prediction horizon (6) where \( n_{out}=n_{d}+ 1\) similar to (8) and (10). The correction is applied to this window of estimates, after which the updated state estimate \( \hat{\zeta}^{u}\) is extracted as \( \hat{\zeta}^{u +}_{-n:-1}\) .

Thus, we define the Luenberger error operator \( \mathcal{E}\) as

\[ \mathcal{E}[\hat{\zeta}^{+}, \hat{\zeta}^{\mathcal{D} +}]\triangleq \hat{\zeta}^{+} - \hat{\zeta}^{\mathcal{D} +}\in \mathcal{H}^{n_{d} + 1}_{\Omega} \]

where \( \hat{\zeta}^{\mathcal{D} +}\) is the extended data-based estimate. The correction operator \( \mathcal{N}\) then processes the predicted state estimate \( \hat{\zeta}^{+}\) and the error \( \hat{e}^{+} = \mathcal{E}[\hat{\zeta}^{+}, \hat{\zeta}^{\mathcal{D} +}]\) to generate the updated state estimate \( \hat{\zeta}^{u +}\) ,

\[ \begin{equation} \mathcal{N}: \begin{array}[t]{ccl} \mathcal{H}^{n_d+1}_{\Omega} \times \mathcal{H}^{n_d+1}_{\Omega} & \to & \mathcal{H}^{n_d+1}_{\Omega}, \\ \hat{\zeta}^{+}, \hat{e}^{+} & \mapsto & \hat{\zeta}^{u +}. \end{array} \end{equation} \]

(12)

The correction update window \( \hat{\zeta}^{+}_{t -n_{d}\Delta t}\) is shifted in increments of \( \Delta t\) in parallel with the prediction input window \( \hat{\zeta}_{t -n_{d}\Delta t}\) , cf. (8). Since \( \hat{\zeta}^{u}_{t -n_{d}\Delta t}\subset\hat{\zeta}^{u +}_{t -n_{d}\Delta t}\) , the state estimates are updated before being passed as input to the next prediction. This yields the closed-loop observer \( \hat{\mathcal{S}}^{cl}\) ,

\[ \begin{align*} \hat{\mathcal{S}}^{cl}[\mathcal{G}, \mathcal{N},\mathbf{y}_\mathcal{M}]: \left\{ \begin{array}{l} \hat{\rho}(\cdot, t+\Delta t) = \mathcal{G}_{n_{d} + 1} [\hat{\zeta}^{u}_{t -n_{d}\Delta t}], ~ ~ ~ \\ t = 0, \Delta t,\dots\\ \hat{\zeta}^{u}_{t -n_{d}\Delta t}=\hat{\zeta}^{u +}_{-n:-1},\\ \hat{\zeta}^{u +}=\mathcal{N}\big[\hat{\zeta}^{+}, \hat{e}^{+}\big],\\ \hat{e}^{+} = \mathcal{E}[\hat{\zeta}^{+}, \hat{\zeta}^{\mathcal{D} +}],\\ \hat{\zeta}^{+} = [\hat{\rho}(\cdot, t-(n-1)\Delta t), \dots],\\ \hat{\zeta}^{\mathcal{D} +}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(t - (n-1)\Delta t),\dots]},\\ \hat{\zeta}^{u}_{ -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(-n_{d}\Delta t),\dots]}. \end{array} \right. \end{align*} \]

The implementation details are outlined in Algorithm 3, and the system is illustrated in Fig. 2.


Algorithm 3 Closed-Loop Observer \( \hat{\mathcal{S}}^{cl}\)
1.input : \( \mathcal{G}, \mathcal{N}, \mathbf{y}_{\mathcal{M}}\)
2.output : \( \hat{\rho}\)
3.\( t=0\)
4.\( \hat{\zeta}^{u}_{ -n_{d}\Delta t}\sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(-n_{d}\Delta t),\dots]},  x\in\mathcal{X}\)
5.\( \hat{\rho}(\cdot, t)\sim\mathcal{GP}_{\mathbf{y}_{\mathcal{M}}(t)},   -n_{d}\Delta t<t\leq 0,  x\in\mathcal{X}\)
6.repeat Predict
7.\( \hat{\rho}(\cdot, t+\Delta t) \gets \mathcal{G}_{n_{d}+1}[\hat{\zeta}^{u}_{t -n_{d}\Delta t}],  x\in\mathcal{X}\)
8.\( \hat{\zeta}^{+} \gets [\hat{\rho}(\cdot, t-(n-1)\Delta t + \Delta t), \dots]\) Collect measurements
9.\( \hat{\zeta}^{\mathcal{D} +} \sim\mathcal{GP}_{[\mathbf{y}_{\mathcal{M}}(t - (n-1)\Delta t+\Delta t),\dots]}\) Correct
10.\( \hat{e}^{+}=\mathcal{E}[\hat{\zeta}^{+}, \hat{\zeta}^{\mathcal{D} +}]\)
11.\( \hat{\zeta}^{u +} \gets \mathcal{N}\big[\hat{\zeta}^{+}, \hat{e}^{+}\big]\) Extract next input
12.\( \hat{\zeta}^{u}_{t -n_{d}\Delta t + \Delta t}\gets \hat{\zeta}^{u +}_{-n:-1}\)
13.\( t\gets t+\Delta t\)
14.until end
15.return \( \hat{\rho}\)

Closed-loop observation with S^{cl}) . The horizontal black lines indicate sensor locations where measurements are taken.
Figure 2. Closed-loop observation with \( \hat{\mathcal{S}}^{cl}\) . The horizontal black lines indicate sensor locations where measurements are taken.

The correction operator \( \mathcal{N}\) is approximated using an FNO defined on a two-dimensional domain,

\[ \begin{equation} \mathcal{N}_\psi:\mathcal{H}^{1}_{\Omega\times\Gamma}\times \mathcal{H}^{1}_{\Omega\times\Gamma}\to \mathcal{H}^{1}_{\Omega\times\Gamma}, \end{equation} \]

(13)

where \( \Gamma=[\Delta t, (n_{d} + 1)\Delta t]\) and the processed functions are translated in time to take values in this domain. By using an FNO processing scalar-valued functions on a two-dimensional domain (13) instead of an FNO processing vector-valued functions on a one-dimensional domain, as suggested by (12), the combined codomain dimension of the input functions is reduced from \( 2(n_{d} + 1)\) to just two. Furthermore, \( \Gamma\) reflects the choice of window size in (11). This choice enables an extension of the learning objective for open-loop prediction (7), such that \( \mathcal{N}_\psi\) is obtained as the minimizer of

\[ \begin{equation} \begin{array}{l} \displaystyle \min_{\psi}\mathbb{E}_{a\sim\mu_{\text{SUMO}}}\bigg[\sum\limits_{k=1}^{n_{d}+1}\int\limits_{\Omega} \|J_k[a](\psi; x)\|^2dx\bigg], ~ ~ ~ ~ \\ \\ \displaystyle J_{k}[a](\psi; \cdot) \triangleq \mathcal{G}_{k}[a](\cdot) - \\ \mathcal{N}_\psi\big[\mathcal{G}_{\theta}[a], \mathcal{E}[\mathcal{G}_{\theta}[a], \rho_{\mathcal{D}}[a]]\big](\cdot, k\Delta t),\\ \end{array} \end{equation} \]

(14)

where \( \rho_{\mathcal{D}}[a]\) is a data-based estimate of the true density \( \mathcal{G}_{1:n_{d}+1}[a]\) using samples at the sensor locations, \( \{\mathcal{G}_{1:n_{d}+1}[a](\cdot)\}_{x\in \mathcal{M}}\) . By comparing the objective function of \( \mathcal{N}_\psi\) (14) with that of the open-loop predictor \( \mathcal{G}_{\theta}\) (7), we observe that \( \mathcal{N}_\psi\) is trained to refine the prediction of \( \mathcal{G}_{\theta}\) . This refinement is obtained by taking in measurements in a Luenberger manner through \( \mathcal{E}\) .

4 Numerical Simulations and Discussion

Before presenting the experimental results, the implementation details are outlined. We then illustrate the observers on a single test example; analyse the performance in the noiseless, noisy, and OOD setting; and analyse the input-to-state stability.

4.1 Implementation Details

The SUMO simulations are generated on a circular road of length \( L=6.2\) km discretized into 123 cells \( \mathcal{X}\) with six equidistant sensors \( \mathcal{M}\) . The simulations are run for a total time of \( T=40\) minutes divided into time steps of length \( \Delta t=1\) s, following [17]. The training dataset contains 20 noiseless, independent simulations per average \( \rho\in[0.1, 0.2, ..., 0.8]\) split into a total of 3360 non-overlapping input-output pairs \( \big(\zeta_0^j, [\rho^j_{k\Delta t}]_{k=1}^{n_{d} + 1}\big)\) . We found that \( n=10\) and \( n_{d}+1=100\) works well in this setting.

The test dataset consists of 10 simulations per average \( \rho\in[0.3, ..., 0.8]\) , focusing on the settings where congestion occurs. In addition, we test the observers on a noisy test set, and a noiseless OOD test set. The noisy data is produced by adding i.i.d. Gaussian noise with standard deviation \( \sigma=0.1\) to the measurements. The OOD data is generated by varying parameters in SUMO to induce more traffic jams.

The approximate solution operator \( \mathcal{G}_{\theta}\) and correction operator \( \mathcal{N}_\psi\) share a similar FNO structure: the lifting function is a linear layer \( P\) of output dimension 16, and the projection function \( Q\) is a fully-connected neural network \( Q\) with one hidden layer of size 128. In between, \( \mathcal{G}_{\theta}\) has four spectral convolution layers \( \mathcal{L}_{l}\) where \( W_l\) are one-dimensional convolution layers of sizes \( [24, 24, 32, 32]\) and the Fourier expansions are truncated to \( k_{\max}\) modes \( [15, 12, 9, 9]\) . The correction operator \( \mathcal{N}_\psi\) has two spectral convolution layers \( \mathcal{L}_{l}\) where \( W_l\) are two-dimensional convolution layers of sizes \( [24, 32]\) and the Fourier expansions are truncated to \( k_{\max}\) modes \( [15, 9]\) . For both operators, we use GELU activation functions and warp the output with sigmoid to impose \( \hat{\rho}\in[0, 1]\) . GP regression is implemented using a prior with zero mean function and squared exponential kernel with length scale 1, and we sample once during training while taking the posterior mean function during inference.

We use the Adam optimizer and train for 500 epochs. The training and testing were run on a laptop with an Intel(R) Core(TM) i7-1370p CPU @ 1.90 GHz and 14 processing cores. The code and experiments are available on GitHub for reproduction .

4.2 Numerical Simulations and Discussion

A comparison between the open-loop observer \( \hat{\mathcal{S}}^{ol}\) , open-loop observer with reset \( \hat{\mathcal{S}}^{ol\text{-}r}\) , and closed-loop observer \( \hat{\mathcal{S}}^{cl}\) applied to a single test example is shown in Fig. 3. We observe that \( \hat{\mathcal{S}}^{ol}\) quickly diverges, while \( \hat{\mathcal{S}}^{ol\text{-}r}\) generates disconnected, volatile predictions. In contrast, \( \hat{\mathcal{S}}^{cl}\) maintains stability while producing consistent predictions over the spatiotemporal domain. This highlights the benefits of \( \hat{\mathcal{S}}^{cl}\) : the continuous calibration of the autoregressive inputs both improves the prediction estimates and counteracts instability. Equipped with a stable autoregressive mechanism, \( \hat{\mathcal{S}}^{cl}\) enables physically coherent predictions between the sparse measurements. We continue by exploring the performance across the test set in terms of accuracy, robustness, and input-to-state stability.

Traffic density estimation with observer variants. The true density is shown in the top left figure, where measurement locations are indicated in black. The density estimates are shown for the open-loop observer (top right), the open-loop observer with reset (bottom left), and the closed-loop observer (bottom right).
Figure 3. Traffic density estimation with observer variants. The true density is shown in the top left figure, where measurement locations are indicated in black. The density estimates are shown for the open-loop observer (top right), the open-loop observer with reset (bottom left), and the closed-loop observer (bottom right).

The prediction accuracy of the observers across the test set in the noiseless, noisy, and OOD settings is shown in Fig. 4. In the noiseless setting, the closed-loop observer \( \hat{\mathcal{S}}^{cl}\) generally performs better than the open-loop observer \( \hat{\mathcal{S}}^{ol}\) and the open-loop observer with reset \( \hat{\mathcal{S}}^{ol\text{-}r}\) , since the median error is lower. The poor performance of \( \hat{\mathcal{S}}^{ol}\) is explained by the instability caused by disturbance in the initial state. The improved performance of \( \hat{\mathcal{S}}^{cl}\) over \( \hat{\mathcal{S}}^{ol\text{-}r}\) can be explained by the fact that \( \hat{\mathcal{S}}^{cl}\) benefits from the full historical dataset, as previous corrections influence all subsequent predictions through the continuous chain of autoregression.

Moreover, \( \hat{\mathcal{S}}^{cl}\) is generally robust to noise, as the median error remains around the same level when the observer is applied to noisy data. This is not the case of \( \hat{\mathcal{S}}^{ol\text{-}r}\) , which sees a significant downgrade in performance. This can be explained by the high dependence on few measurements in \( \hat{\mathcal{S}}^{ol\text{-}r}\) , rendering it vulnerable to noise. Instead, \( \hat{\mathcal{S}}^{cl}\) weighs the recent measurements against the prediction suggested by past measurements through the correction operator, resulting in a more accurate estimate. Furthermore, the accuracy of \( \hat{\mathcal{S}}^{cl}\) generally remains the same in the OOD dataset, suggesting that it is robust to distribution shifts. Overall, our simulations indicate that \( \hat{\mathcal{S}}^{cl}\) is both more accurate and more resilient to noise compared to \( \hat{\mathcal{S}}^{ol}\) and \( \hat{\mathcal{S}}^{ol\text{-}r}\) , as well as robust to distributional changes.

Robustness of the observers under noise and disturbance.
Figure 4. Robustness of the observers under noise and disturbance.

Finally, the input-to-state stability for the observers is analyzed. In Fig. 5, the evolution of the prediction accuracy is compared across the observers as they are rolled out. The error grows unboundedly for the open-loop observer \( \hat{\mathcal{S}}^{ol}\) , while it remains stable for the open-loop observer with reset \( \hat{\mathcal{S}}^{ol\text{-}r}\) , and gradually decreases for the closed-loop observer \( \hat{\mathcal{S}}^{cl}\) . By construction, \( \hat{\mathcal{S}}^{ol}\) is completely dependent on an imperfect estimate of the initial state, which gives rise to the unbounded growth of the error. Meanwhile, \( \hat{\mathcal{S}}^{ol\text{-}r}\) performs the same operation iteratively, so the error naturally remains stable. In contrast, \( \hat{\mathcal{S}}^{cl}\) cumulatively integrates data, resulting in an increasingly informed prediction manifested by a decreasing error. Our simulations thus indicate that \( \hat{\mathcal{S}}^{cl}\) exhibits the classical closed-loop property of ultimate-boundedness of the error.

Evolution of prediction accuracy over the observation horizon.
Figure 5. Evolution of prediction accuracy over the observation horizon.

5 Conclusion

In this work, we introduced a data-driven, closed-loop observer for traffic density estimation using sparse stationary sensor measurements. Our method trains a Fourier neural operator to learn macroscopic traffic flow dynamics from microscopic simulations, and integrates it with a correction operator to update predictions with real-time data. Numerical simulations show that the closed-loop observer outperforms open-loop alternatives in terms of accuracy, robustness to noise, and stability over time, and indicate that the closed-loop observer is robust to distribution shifts.

This study highlights the potential of this kind of learning-based, closed-loop observer and opens several interesting research directions. Future work includes extending the observer to mobile sensors, leveraging traffic data with varying resolution to learn the dynamics, and conducting formal analyses of convergence and stability.

References

[1] A. Ferrara, S. Sacone, and S. Siri, Freeway Traffic Modelling and Control. 1em plus 0.5em minus 0.4em Springer International Publishing, 2018.

[2] Toru Seo and Alexandre M. Bayen and Takahiko Kusakabe and Yasuo Asakura Traffic state estimation on highway: A comprehensive survey Annual Reviews in Control 2017

[3] Sergei K. Godunov and I. Bohachevsky Finite difference method for numerical computation of discontinuous solutions of the equations of fluid dynamics Matematičeskij sbornik 1959

[4] Laura Muñoz and Xiaotian Sun and Roberta Horowitz and Luis Alvarez Traffic Density Estimation with the Cell Transmission Model Proceedings of the American Control Conference 2003

[5] Xiaotian Sun and L. Muñoz and R. Horowitz Highway traffic state estimation using improved mixture Kalman filters for effective ramp metering control 42nd IEEE International Conference on Decision and Control 2003

[6] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction with big data: A deep learning approach,'' IEEE Transactions on Intelligent Transportation Systems, 2015.

[7] Xueyan Yin and Genze Wu and Jinze Wei and Yanming Shen and Heng Qi and Baocai Yin Deep Learning on Traffic Prediction: Methods, Analysis, and Future Directions IEEE Transactions on Intelligent Transportation Systems 2022

[8] M. Barreau, M. Aguiar, J. Liu, and K. H. Johansson, “Physics-informed learning for identification and state reconstruction of traffic density,'' in 60th IEEE Conference on Decision and Control, 2021.

[9] Yun Yuan and Zhao Zhang and Xianfeng Terry Yang and Shandian Zhe Macroscopic traffic flow modeling with physics regularized Gaussian process: A new insight into machine learning applications in transportation Transportation Research Part B: Methodological 2021

[10] M Umar B Niazi and John Cao and Matthieu Barreau and Karl Henrik Johansson KKL arXiv preprint arXiv:2501.11655 2025

[11] Zongyi Li and Nikola Kovachki and Kamyar Azizzadenesheli and Burigede Liu and Kaushik Bhattacharya and Andrew Stuart and Anima Anandkumar Fourier arXiv preprint arXiv:2010.08895 2020

[12] Bilal Thonnam Thodi and Sai Venkata Ramana Ambadipudi and Saif Eddin Jabari Fourier Transportation research part C: emerging technologies 2024

[13] Michael James Lighthill and Gerald Beresford Whitham On kinematic waves II. A theory of traffic flow on long crowded roads Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 1955

[14] C. M. Dafermos, Hyperbolic Conservation Laws in Continuum Physics. 1em plus 0.5em minus 0.4em Springer Berlin Heidelberg, 2000.

[15] Stefan Krauß Microscopic modeling of traffic flow: Investigation of collision free vehicle dynamics Hauptabteilung Mobilität und Systemtechnik des DLR Köln 1998

[16] Pablo Alvarez Lopez and others Microscopic traffic simulation using sumo 2018 21st international conference on intelligent transportation systems (ITSC) 2018

[17] M. Barreau, “SUMOsimulator,'' https://github.com/mBarreau/SUMOsimulator, 2025.

[18] Roeland Scheepens and others Composite Density Maps for Multivariate Trajectories IEEE Transactions on Visualization and Computer Graphics 2011

[19] Nikola Kovachki and Samuel Lanthaler and Siddhartha Mishra On Universal Approximation and Error Bounds for Fourier Neural Operators Journal of Machine Learning Research 2021

[20] Vignesh Gopakumar and Ander Gray and Joel Oskarsson and Lorenzo Zanisi and Stanislas Pamela and Daniel Giles and Matt Kusner and Marc Peter Deisenroth Uncertainty Quantification of Surrogate Models using Conformal Prediction arXiv preprint arXiv:2408.09881 2024

[21] M. McCabe, P. Harrington, S. Subramanian, and J. Brown, “Towards stability of autoregressive neural operators,'' Transactions on Machine Learning Research, 2023.