If you see this, something is wrong

Collapse and expand sections

To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.

Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.

Cross-references and related material

Generally speaking, anything that is blue is clickable.

Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.

Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.

Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.

Discussions

By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.

If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.

First published on Wednesday, Jun 3, 2026 and last modified on Wednesday, Jun 3, 2026 by François Chaplais.

Mitigating Gradient Pathology in PINNs through Aligned Constraint

Yichen Luo Department of Information Science and Engineering, KTH Royal Institute of Technology, Stockholm, Sweden, Peiyu Zhu School of Advanced Manufacturing and Robotics, Peking University, Beijing, China, Dongxiao Hu School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China, Jia Wang School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China, Tailin Wu Department of AI, School of Engineering, Westlake University, Hangzhou, China, Dapeng Lan Techforgood AS, Oslo, Norway, Yu Liu Techforgood AS, Oslo, Norway, Zhibo Pang School of Advanced Manufacturing and Robotics, Peking University, Beijing, China Email ,

Keywords: Machine Learning, ICML

Abstract

(b) Parameter Space — Figure 2. (a) Function Space

Table 1. Hessian-based analysis of the loss landscape under three different additive constants in both function and parameter space.
		Condition Number		Subspace Similarity
		Func. Space	Param Space	Func. Space	Param Space
0	3.97e-7	\( \infty\)	\( 1.50 \times 10^{23}\)	\( -\)	\( -\)
1	1.07e-5	\( \infty\)	\( 5.97 \times 10^{22}\)	100.0%	50.8%
-1	5.24e-8	\( \infty\)	\( 1.05 \times 10^{24}\)	100.0%	44.8%

(b) Non-Unique (Ill-Posed) — Figure 5. (a) Unique (Well-Posed)

Table 2. Experimental results on different PDEs with various backbone architectures. Stp: the first iteration step at which relative \( L_2\) falls below the threshold (lower is better); \( L_2\) : mean relative \( L_2\) error with standard deviation at a fixed iteration \( T_\mathrm{min}\) (lower is better); The best results are highlighted in bold, and the second-best results are underlined. If the performance of any one of the five seeds does not reach the threshold within the fixed iteration budget \( T_\mathrm{max}\) , it is treated as unsuccessful (reported as ‘\( -\) ’).
		MLP				PirateNets				PINNsFormer
		Heat	Pois.	NS	Helm.	Heat	Pois.	NS	Helm.	Heat	Pois.	NS	Helm.
PINN [1]	Stp	10127\( _{\pm 2839}\)		8005\( _{\pm 1191}\)	15375\( _{\pm 3583}\)	1141\( _{\pm 11}\)		8678\( _{\pm 5523}\)	3875\( _{\pm 2445}\)	1599\( _{\pm 99}\)	3197\( _{\pm 296}\)	389\( _{\pm 17}\)
	\( L_2\)	7.52e-3\( _{\pm 4\text{e-}3}\)		1.11e-2\( _{\pm 5\text{e-}3}\)	2.21e-2\( _{\pm 1\text{e-}2}\)	8.56e-3\( _{\pm 5\text{e-}3}\)		8.48e-2\( _{\pm 6\text{e-}2}\)	7.29e-3\( _{\pm 1\text{e-}2}\)	3.40e-2\( _{\pm 3\text{e-}2}\)	2.86e-1\( _{\pm 7\text{e-}2}\)	4.49e-3\( _{\pm 1\text{e-}3}\)
\( L^\infty\) -PINN [17]	Stp		12648\( _{\pm 6212}\)	4768\( _{\pm 552}\)	2801\( _{\pm 205}\)	7721\( _{\pm 2881}\)		5545\( _{\pm 5413}\)	1501\( _{\pm 595}\)	1912\( _{\pm 290}\)	3142\( _{\pm 231}\)	415\( _{\pm 97}\)	3435\( _{\pm 1058}\)
	\( L_2\)		5.07e-1\( _{\pm 2\text{e-}1}\)	8.02e-3\( _{\pm 3\text{e-}3}\)	1.34e-3\( _{\pm 4\text{e-}4}\)	1.01e-1\( _{\pm 9\text{e-}2}\)		9.12e-2\( _{\pm 7\text{e-}2}\)	4.92e-3\( _{\pm 4\text{e-}3}\)	1.41e-1\( _{\pm 1\text{e-}1}\)	2.00e-1\( _{\pm 2\text{e-}2}\)	1.38e-2\( _{\pm 1\text{e-}2}\)	1.97e-3\( _{\pm 5\text{e-}4}\)
SA-PINN [34]	Stp	8958\( _{\pm 2501}\)	8355\( _{\pm 3867}\)	6888\( _{\pm 2902}\)	8959\( _{\pm 2507}\)	4713\( _{\pm 1043}\)	9501\( _{\pm 3755}\)	8930\( _{\pm 5914}\)	4598\( _{\pm 3340}\)	1649\( _{\pm 227}\)	1888\( _{\pm 118}\)	330\( _{\pm 9}\)
	\( L_2\)	4.71e-3\( _{\pm 2\text{e-}3}\)	1.95e-1\( _{\pm 2\text{e-}1}\)	9.79e-3\( _{\pm 4\text{e-}3}\)	8.07e-3\( _{\pm 5\text{e-}3}\)	8.64e-2\( _{\pm 1\text{e-}1}\)	6.13e-1\( _{\pm 5\text{e-}1}\)	8.71e-2\( _{\pm 6\text{e-}2}\)	6.44e-3\( _{\pm 4\text{e-}3}\)	1.23e-2\( _{\pm 1\text{e-}3}\)	4.41e-2\( _{\pm 1\text{e-}2}\)	6.76e-3\( _{\pm 5\text{e-}3}\)
BRDR [21]	Stp	15213\( _{\pm 2175}\)	6137\( _{\pm 795}\)	5462\( _{\pm 605}\)	13131\( _{\pm 2505}\)	5245\( _{\pm 1058}\)	8276\( _{\pm 1326}\)	9427\( _{\pm 4083}\)	5831\( _{\pm 6922}\)	1941\( _{\pm 203}\)	2333\( _{\pm 214}\)	367\( _{\pm 13}\)
	\( L_2\)	5.58e-3\( _{\pm 2\text{e-}3}\)	8.51e-2\( _{\pm 7\text{e-}2}\)	8.66e-3\( _{\pm 3\text{e-}3}\)	2.05e-2\( _{\pm 1\text{e-}2}\)	5.51e-1\( _{\pm 3\text{e-}1}\)	5.20e-1\( _{\pm 3\text{e-}1}\)	9.26e-2\( _{\pm 6\text{e-}2}\)	6.81e-3\( _{\pm 1\text{e-}2}\)	1.67e-2\( _{\pm 1\text{e-}2}\)	8.65e-2\( _{\pm 4\text{e-}2}\)	6.25e-3\( _{\pm 1\text{e-}3}\)
DB-PINN [23]	Stp	6988\( _{\pm 329}\)	9451\( _{\pm 6738}\)	3981\( _{\pm 294}\)	11005\( _{\pm 1662}\)	12689\( _{\pm 6402}\)		6054\( _{\pm 4412}\)	2304\( _{\pm 1113}\)	1159\( _{\pm 132}\)		331\( _{\pm 36}\)	4882\( _{\pm 245}\)
	\( L_2\)	6.82e-2\( _{\pm 1\text{e-}2}\)	4.56e-1\( _{\pm 2\text{e-}1}\)	7.89e-3\( _{\pm 2\text{e-}3}\)	1.22e-2\( _{\pm 1\text{e-}2}\)	3.12e-1\( _{\pm 2\text{e-}1}\)		3.47e-2\( _{\pm 2\text{e-}2}\)	3.69e-3\( _{\pm 2\text{e-}3}\)	1.67e-3\( _{\pm 8\text{e-}3}\)		5.69e-3\( _{\pm 2\text{e-}3}\)	5.37e-3\( _{\pm 2\text{e-}3}\)
	Stp	1577\( _{\pm 130}\)	3868\( _{\pm 1932}\)	1269\( _{\pm 966}\)	1004\( _{\pm 450}\)	713\( _{\pm 496}\)	4880\( _{\pm 726}\)	2597\( _{\pm 748}\)	410\( _{\pm 401}\)	746\( _{\pm 115}\)	1936\( _{\pm 78}\)	201\( _{\pm 54}\)	1659\( _{\pm 1604}\)
CAML (Ours)	\( L_2\)	1.16e-3\( _{\pm 8\text{e-}5}\)	5.00e-3\( _{\pm 9\text{e-}4}\)	4.73e-3\( _{\pm 3\text{e-}4}\)	1.56e-4\( _{\pm 5\text{e-}5}\)	4.98e-3\( _{\pm 4\text{e-}3}\)	3.24e-2\( _{\pm 1\text{e-}2}\)	2.79e-2\( _{\pm 2\text{e-}2}\)	1.19e-3\( _{\pm 5\text{e-}4}\)	4.04e-3\( _{\pm 9\text{e-}4}\)	8.29e-2\( _{\pm 1\text{e-}2}\)	4.07e-3\( _{\pm 2\text{e-}3}\)	9.10e-4\( _{\pm 3\text{e-}4}\)

Table 3. Full-process gradient cosine similarity \( \cos(\phi)\) on Heat and Poisson benchmarks. \( \cos(\phi)\) is only a diagnostic statistic, and should not be interpreted as a performance metric.
		MLP		PirateNets		PINNsFormer
		Heat	Pois.	Heat	Pois.	Heat	Pois.
PINN	[1]	5.89%	3.85%	1.44%	4.09%	5.13%	7.07%
\( L^\infty\) -PINN	[17]	11.98%	62.81%	40.22%	84.08%	54.68%	87.78%
SA-PINN	[34]	4.95%	9.73%	5.72%	11.18%	5.27%	6.98%
BRDR	[21]	4.57%	21.28%	6.56%	4.63%	4.02%	8.00%
DB-PINN	[23]	0.02%	3.19%	7.82%	10.01%	5.17%	16.32%
CAML	(Ours)	39.16%	52.75%	39.41%	31.58%	41.45%	55.50%

(b) Poisson Benchmark — Figure 16. (a) Heat Benchmark

Table 4. Ablation study on Heat and Poisson benchmarks (MLP backbone). The definition of Stp, \( L_2\) and \( \cos(\phi)\) is the same as that mentioned in previous experiment. Configurations that cannot achieve the expected accuracy within the maximum training budget \( T_\mathrm{max}\) are indicated by ‘\( -\) ’.
	Heat			Poisson
	Stp	\( L_2\)	\( \cos(\phi)\)	Stp	\( L_2\)	\( \cos(\phi)\)
PINN	10127\( _{\pm 2839}\)	7.52e-3\( _{\pm 4\text{e-}3}\)	5.89%	\( -\)	\( -\)	3.85%
AC-Only	1491\( _{\pm 186}\)	1.15e-3\( _{\pm 6\text{e-}5}\)	36.12%	\( -\)	\( -\)	45.22%
DR-Only	8367\( _{\pm 991}\)	7.16e-3\( _{\pm 4\text{e-}3}\)	4.89%	9648\( _{\pm 3075}\)	4.63e-2\( _{\pm 3\text{e-}2}\)	14.62%
CAML	1577\( _{\pm 130}\)	1.16e-3\( _{\pm 8\text{e-}5}\)	39.16%	3868\( _{\pm 1932}\)	5.00e-3\( _{\pm 9\text{e-}4}\)	52.75%

Table 5. Comparison of CAML with different optimizers on Heat benchmarks (MLP backbone). The definitions of Stp, \( L_2\) , and \( \cos(\phi)\) are the same as those mentioned in the previous experiment. We recorded the values of \( L_2\) at both \( T_\mathrm{min}\) and \( T_\mathrm{max}\) .
		Stp	\( L_2@T_{\min}\)	\( L_2@T_{\max}\)	\( \cos(\phi)\)
Adam	[36]	1577\( _{\pm 130}\)	1.16e-3\( _{\pm 8\text{e-}5}\)	1.12e-3\( _{\pm 4\text{e-}5}\)	39.16%
L-BFGS	[39]	55\( _{\pm 3}\)	6.93e-4\( _{\pm 3\text{e-}6}\)	6.93e-4\( _{\pm 2\text{e-}6}\)	26.00%
DCGD	[24]	991\( _{\pm 372}\)	1.15e-3\( _{\pm 8\text{e-}5}\)	1.08e-3\( _{\pm 3\text{e-}5}\)	36.37%
ConFIG	[25]	738\( _{\pm 238}\)	1.13e-3\( _{\pm 6\text{e-}5}\)	1.06e-3\( _{\pm 2\text{e-}5}\)	37.14%

Appendix

Algorithm 1 CAML: Constraint-Aligned Loss with Manifold Lifting

1. Input: PDE operator \( \mathcal{N}\) , boundary operator \( \mathcal{B}\) , network \( u_\theta\)

2. Parameters: weights \( w_\mathrm{res}, w_\mathrm{bc}\) , delay schedule \( \lambda(t)\)

3.Initialize network parameters \( \theta\)

4.for training step \( t=1\) to \( T\) do

5.Sample interior points \( {x_i}\) and boundary points \( {x_b}\)

6.Compute residuals: \[ r_i = \mathcal{N}[u_\theta](x_i) - f(x_i), s_b = \mathcal{B}[u_\theta](x_b) - g(x_b) \]

7.Store all derivative terms \( \mathcal{D}[u_\theta](x_i)\) and \( \beta_b\, \nabla u_\theta(x_b)\cdot\mathbf{n}_b\) temporarily

8. Solve offset \( c\) :

9.if zeroth-order terms are linear then

10.Compute \( c\) by closed-form weighted least squares

11.else

12.\( K \leftarrow K_{\mathrm{few}} \cdot \mathbb{I}(t<t_c) + K_{\mathrm{init}} \cdot \mathbb{I}(t=1)\)

13.Update \( c\) using \( K\) Newton steps on \( \mathcal{L}(c)\)

14.end if

15.Apply aligned residuals \( \bar r_i = r_i(u_\theta+c)\) , \( \bar s_b = s_b(u_\theta+c)\) , where all derivative terms are directly loaded from cache

16.\( \mathcal{L} = w_\mathrm{res}\lambda(t)\mathcal{L}_\mathrm{res}^\mathrm{alg} + w_{bc}\mathcal{L}_\mathrm{bc}^\mathrm{alg}\)

17.Update \( \theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}\)

18.end for

Table 6. Hyperparameters used for training the neural networks.
Category	Hyperparameter	Value
Network Architecture	Number of hidden layers	\( 3\)
	Neurons per hidden layer	\( 20\)
	Activation function	tanh
Training Setup	Optimizer	Adam
	Learning rate	\( 1\times10^{-3}\)
	Batch size	Full-batch
	Total training steps	\( 3000\)

Table 7. Hyperparameters used for training the neural networks.
Category	Hyperparameter	Value
Network Architecture	Number of hidden layers	\( 5\)
	Neurons per hidden layer	\( 80\)
	Activation function	tanh
Training Setup	Optimizer	Adam
	Learning rate	\( 1\times10^{-3}\)
	Batch size	Full-batch
	Total training steps	\( 6000\)
Sample Mode	Inner collocation points	\( 8000\)
	Boundary collocation points	\( 600\times 4\)

Figure 31. (a) Global Trajectory (Epoch 0-10000)

(b) Local Trajectory (Epoch 3800-4000) — Figure 31. (a) Global Trajectory (Epoch 0-10000)

Table 8. Configuration of MLP across all benchmarks.
Configuration	Value
Number of hidden layers	\( 4\)
Neurons per hidden layer	\( 64\)
Activation function	tanh

Table 9. Configuration of PirateNets across all benchmarks.
Configuration	Value
Number of hidden layers	\( 3\)
Neurons per hidden layer	\( 32\)
Activation function	tanh

Table 10. Configuration of PINNsFormer across all benchmarks.
Configuration	Value
Number of attention layers in encoder	\( 2\)
Number of attention layers in decoder	\( 2\)
Head number in attention layer	\( 1\)
Neurons per hidden layer	\( 32\)
Activation function	WaveAct

Table 11. Hyperparameters of MLP across all benchmarks. Where \( \eta\) is the learning rate, \( w_\mathrm{res}\) is the PDE residual loss weight, \( w_\mathrm{bc}\) is the boundary condition loss weight, \( T_\mathrm{min}\) is the minimum number of iterations, \( T_\mathrm{max}\) is the maximum number of iterations, and \( L_2^\mathrm{stop}\) is the precision threshold for terminating training.
Benchmark	\( \eta\)	\( w_\mathrm{res}\)	\( w_\mathrm{bc}\)	\( T_\mathrm{min}\)	\( T_\mathrm{max}\)	\( L_2^\mathrm{stop}\)
Heat	1.0e-3	1	5	6000	20000	2.0e-3
Poisson	1.0e-3	1	100	6000	20000	1.0e-2
NS	1.0e-3	1	100	6000	20000	5.0e-3
Helmholtz	1.0e-3	1	10	4000	20000	1.0e-3

Table 12. Hyperparameters of PirateNets across all benchmarks. Where \( \eta\) is the learning rate, \( w_\mathrm{res}\) is the PDE residual loss weight, \( w_\mathrm{bc}\) is the boundary condition loss weight, \( T_\mathrm{min}\) is the minimum number of iterations, \( T_\mathrm{max}\) is the maximum number of iterations, and \( L_2^\mathrm{stop}\) is the precision threshold for terminating training.
Benchmark	\( \eta\)	\( w_\mathrm{res}\)	\( w_\mathrm{bc}\)	\( T_\mathrm{min}\)	\( T_\mathrm{max}\)	\( L_2^\mathrm{stop}\)
Heat	1.0e-3	1	1	6000	20000	1.0e-2
Poisson	1.0e-3	1	100	6000	20000	5.0e-2
NS	3.0e-3	1	100	6000	20000	1.0e-2
Helmholtz	1.0e-3	1	100	4000	20000	5.0e-3

Table 13. Hyperparameters of PINNsFormer across all benchmarks. Where \( \eta\) is the learning rate, \( w_\mathrm{res}\) is the PDE residual loss weight, \( w_\mathrm{bc}\) is the boundary condition loss weight, \( T_\mathrm{min}\) is the minimum number of iterations, \( T_\mathrm{max}\) is the maximum number of iterations, and \( L_2^\mathrm{stop}\) is the precision threshold for terminating training.
Benchmark	\( \eta\)	\( w_\mathrm{res}\)	\( w_\mathrm{bc}\)	\( T_\mathrm{min}\)	\( T_\mathrm{max}\)	\( L_2^\mathrm{stop}\)
Heat	1.0e-3	1	1	2000	10000	1.0e-2
Poisson	1.0e-3	1	100	2000	10000	5.0e-2
NS	1.0e-3	1	100	2000	10000	1.0e-2
Helmholtz	1.0e-3	1	100	1500	10000	1.0e-3

Table 14. Additional hyperparameters of CAML across all benchmarks. Where \( t_d\) and \( t_r\) are the delay step number and the ramp length of the delay-residual gating function, \( K_\mathrm{init}\) and \( K_\mathrm{few}\) are the initial Newton steps and within-loop Newton steps of \( c\) in non-linear PDE benchmark, and \( t_c\) is the maximum iteration to stop update \( c\) in non-linear PDE benchmark.
Benchmark	\( t_d\)	\( t_r\)	\( K_\mathrm{init}\)	\( K_\mathrm{few}\)	\( t_c\)
Heat	25	50	\( -\)	\( -\)	\( -\)
Poisson	200	800	\( -\)	\( -\)	\( -\)
NS	25	50	10	2	1000
Helmholtz	25	50	\( -\)	\( -\)	\( -\)

Table 15. The computational cost of CAML in linear PDEs. FP: the forward propagation process; AD: automatic differentiation of PDE residual and boundary conditions; AC: calculation of the additive constant \( c\) ; BP: the backward propagation process; Percentage: The proportion of time spent on calculating \( c\) in the total duration.
		FP	AD	AC	BP	Percentage
MLP	Single Iteration (\( ms\) )	0.1619	1.6322	0.1439	2.6541	3.13%
	Full Process (\( s\) )	1.1971	10.7513	0.9939	17.2778	3.29%
PirateNets	Single Iteration (\( ms\) )	0.7503	7.0681	0.1763	15.4657	0.75%
	Full Process (\( s\) )	6.1116	57.5850	1.1378	116.7246	0.63%
PINNsFormer	Single Iteration (\( ms\) )	2.3134	21.4542	0.1487	47.0843	0.21%
	Full Process (\( s\) )	6.8850	66.9518	0.4497	140.9999	0.21%

Table 16. The computational cost of CAML in non-linear PDEs. FP: the forward propagation process; AD: automatic differentiation of PDE residual and boundary conditions; AC: calculation of the additive constant \( c\) ; BP: the backward propagation process; Percentage: The proportion of time spent on calculating \( c\) in the total duration.
			FP	AD	AC	BP	Percentage
\( K_{\mathrm{few}}=1\)	MLP	Single Iteration (\( ms\) )	0.1799	1.4494	1.0979	2.6280	20.50%
		Full Process (\( s\) )	1.2951	11.5819	3.9947	18.6597	11.24%
	PirateNets	Single Iteration (\( ms\) )	0.7328	8.0003	1.0322	14.8213	4.20%
		Full Process (\( s\) )	6.0822	55.2902	4.0421	112.3832	2.27%
	PINNsFormer	Single Iteration (\( ms\) )	2.5648	23.8952	1.8702	55.1467	2.24%
		Full Process (\( s\) )	7.2837	70.0029	0.6324	151.3379	0.28%
\( K_{\mathrm{few}}=2\)	MLP	Single Iteration (\( ms\) )	0.1895	1.4653	2.0220	2.7328	31.55%
		Full Process (\( s\) )	1.4321	12.8582	8.3121	21.5747	18.81%
	PirateNets	Single Iteration (\( ms\) )	0.7807	7.2948	2.3412	15.0520	9.19%
		Full Process (\( s\) )	6.1321	59.8055	9.3241	116.2348	4.87%
	PINNsFormer	Single Iteration (\( ms\) )	2.3818	21.5656	3.0841	47.1944	4.16%
		Full Process (\( s\) )	7.8837	71.2378	1.0790	155.3432	0.46%
\( K_{\mathrm{few}}=5\)	MLP	Single Iteration (\( ms\) )	0.1614	1.5289	5.0107	2.4561	54.72%
		Full Process (\( s\) )	1.3320	10.0022	19.7822	16.9772	41.13%
	PirateNets	Single Iteration (\( ms\) )	0.7361	8.2706	5.2069	15.1342	17.74%
		Full Process (\( s\) )	6.9929	57.9483	21.3250	118.1247	10.43%
	PINNsFormer	Single Iteration (\( ms\) )	2.3477	22.1690	6.0726	49.3347	7.60%
		Full Process (\( s\) )	6.4899	73.5383	2.7600	157.1937	1.15%

Table 17. Comparison of CAML with a learnable output bias.
	Stp	\( L_2\)	\( \cos(\phi)\)
PINN	10127\( _{\pm 2839}\)	7.52e-3\( _{\pm 4\text{e-}3}\)	5.89%
PINN + Learnable Bias	9264\( _{\pm 1877}\)	6.53e-3\( _{\pm 2\text{e-}3}\)	16.13%
CAML (AC-Only)	1491\( _{\pm 186}\)	1.15e-3\( _{\pm 6\text{e-}5}\)	36.12%

Table 18. Sensitivity of CAML to the delay-residual schedule \( (t_d, t_r)\) on the Poisson benchmark (MLP backbone). The definitions of \( \text{Stp}\) and \( L_2\) are consistent with those in the main experiment. Configurations that cannot achieve the expected accuracy within the maximum training budget \( T_{\max}\) are indicated by ‘\( -\) ’.
\( t_d/t_r\)	0/0	40/160	80/320	120/480	160/640	200/800	800/3200
Stp	\( -\)	5521	4713	5595	4574	3868	1984
\( L_2\)	\( -\)	9.99e-3	6.25e-3	1.00e-2	6.98e-3	5.00e-3	2.98e-3

Table 19. Comparison of CAML on toy Poisson benchmark.
	Stp	\( L_2\)	\( \cos(\phi)\)	\( c\)
PINN	2472\( _{\pm 398}\)	3.96e-3\( _{\pm 9\text{e-}4}\)	19.88%	\( -\)
CAML (AC-Only)	2328\( _{\pm 221}\)	3.92e-3\( _{\pm 6\text{e-}4}\)	23.21%	3.16

(b) Prediction — Figure 34. (a) Ground Truth

References

[1] Maziar Raissi and Paris Perdikaris and George Em Karniadakis Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations Journal of Computational Physics 2019 378 686–707

[2] Ehsan Haghighat and Maziar Raissi and Adrian Moure and Hector Gomez and Ruben Juanes A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics Computer Methods in Applied Mechanics and Engineering 2021 379 113741

[3] E. Samaniego and C. Anitescu and S. Goswami and V.M. Nguyen-Thanh and H. Guo and K. Hamdia and X. Zhuang and T. Rabczuk An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications Computer Methods in Applied Mechanics and Engineering 2020 362 112790

[4] Maziar Raissi and Alireza Yazdani and George Em Karniadakis Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations Science 2020 367 6481 1026–1030

[5] Xiaowei Jin and Shengze Cai and Hui Li and George Em Karniadakis NSFnets Journal of Computational Physics 2021 426 109951

[6] Shengze Cai and Zhicheng Wang and Sifan Wang and Paris Perdikaris and George Em Karniadakis Physics-Informed Neural Networks for Heat Transfer Problems Journal of Heat Transfer 2021 143 6 060801

[7] Youssef Haddout and Soufiane Haddout Deep Physics-Informed Neural Networks for Stratified Forced Convection Heat Transfer in Plane Couette Flow: Toward Sustainable Climate Projections in Atmospheric and Oceanic Boundary Layers Fluids 2025 10 12

[8] Yuyao Chen and Lu Lu and George Em Karniadakis and Luca Dal Negro Physics-informed neural networks for inverse problems in nano-optics and metamaterials Optics Express 2020 28 8 11618–11633

[9] Joowon Lim and Demetri Psaltis MaxwellNet APL Photonics 2021

[10] Aditi Krishnapriyan and Amir Gholami and Shandian Zhe and Robert Kirby and Michael W Mahoney Characterizing possible failure modes in physics-informed neural networks 35th International Conference on Neural Information Processing Systems (NeurIPS) 2021

[11] Sifan Wang and Yujun Teng and Paris Perdikaris Understanding and mitigating gradient flow pathologies in physics-informed neural networks SIAM Journal on Scientific Computing 2021 43 5 A3055–A3081

[12] Sifan Wang and Xinling Yu and Paris Perdikaris When and why PINNs fail to train: A neural tangent kernel perspective Journal of Computational Physics 2022 449 110768

[13] Pratik Rathore and Weimu Lei and Zachary Frangella and Lu Lu and Madeleine Udell Challenges in training PINNs: a loss landscape perspective 41st International Conference on Machine Learning (ICML) 2024

[14] Yesom Park and Changhoon Song and Myungjoo Kang Beyond derivative pathology of PINNs: Variable splitting strategy with convergence analysis Journal of Machine Learning Research 2024

[15] Changhoon Song and Yesom Park and Myungjoo Kang How does PDE order affect the convergence of PINNs? 38th International Conference on Neural Information Processing Systems (NeurIPS) 2024

[16] Jeremy Yu and Lu Lu and Xuhui Meng and George Em Karniadakis Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems Computer Methods in Applied Mechanics and Engineering 2022 393 114823

[17] Chuwei Wang and Shanda Li and Di He and Liwei Wang Is L2 physics-informed loss always suitable for training physics-informed neural network? 36th International Conference on Neural Information Processing Systems (NeurIPS) 2022

[18] Apostolos F Psaros and Kenji Kawaguchi and George Em Karniadakis Meta-learning PINN loss functions Journal of Computational Physics 2022 458 111121

[19] Yiheng Du and Nithin Chalapathi and Aditi S. Krishnapriyan Neural Spectral Methods: Self-supervised learning in the spectral domain 12th International Conference on Learning Representations (ICLR) 2024

[20] Rui Zhang and Gordon P. Warn and Aleksandra Radlińska Physics-Informed Parallel Neural Networks with self-adaptive loss weighting for the identification of continuous structural systems Computer Methods in Applied Mechanics and Engineering 2024 427 117042

[21] Wenqian Chen and Amanda A. Howard and Panos Stinis Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks Journal of Computational Physics 2025 542 114226

[22] Bo Gao and Ruoxia Yao and Yan Li Physics-informed neural networks with adaptive loss weighting algorithm for solving partial differential equations Computers and Mathematics with Applications 2025 181 216–227

[23] Chenhong Zhou and Jie Chen and Zaifeng Yang and Ching Eng Png Dual-Balancing for Physics-Informed Neural Networks 34th International Joint Conference on Artificial Intelligence (IJCAI) 2025

[24] Youngsik Hwang and Dong-Young Lim Dual Cone Gradient Descent for Training Physics-Informed Neural Networks 38th International Conference on Neural Information Processing Systems (NeurIPS) 2024

[25] Qiang Liu and Mengyu Chu and Nils Thuerey ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks 13th International Conference on Learning Representations (ICLR) 2025

[26] Sifan Wang and Bowen Li and Yuhan Chen and Paris Perdikaris PirateNets Journal of Machine Learning Research 2024 25 402 1–51

[27] Luning Sun and Han Gao and Shaowu Pan and Jian-Xun Wang Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data Computer Methods in Applied Mechanics and Engineering 2020 361 112732

[28] Songming Liu and Zhongkai Hao and Chengyang Ying and Hang Su and Jun Zhu and Ze Cheng A unified hard-constraint framework for solving geometrically complex PDEs 36th International Conference on Neural Information Processing Systems (NeurIPS) 2022

[29] Gregory Kang Ruey Lau and Apivich Hemachandra and See-Kiong Ng and Bryan Kian Hsiang Low PINNACLE 12th International Conference on Learning Representations (ICLR) 2024

[30] Hao Li and Zheng Xu and Gavin Taylor and Christoph Studer and Tom Goldstein Visualizing the loss landscape of neural nets 32nd International Conference on Neural Information Processing Systems (NeurIPS) 2018

[31] Nitish Shirish Keskar and Dheevatsa Mudigere and Jorge Nocedal and Mikhail Smelyanskiy and Ping Tak Peter Tang On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima 5th International Conference on Learning Representations (ICLR) 2017

[32] Timur Garipov and Pavel Izmailov and Dmitrii Podoprikhin and Dmitry Vetrov and Andrew Gordon Wilson Loss surfaces, mode connectivity, and fast ensembling of DNNs 32nd International Conference on Neural Information Processing Systems (NeurIPS) 2018

[33] Pierre Foret and Ariel Kleiner and Hossein Mobahi and Behnam Neyshabur Sharpness-aware minimization for efficiently improving generalization 9th International Conference on Learning Representations (ICLR) 2021

[34] Levi D. McClenny and Ulisses M. Braga-Neto Self-adaptive physics-informed neural networks Journal of Computational Physics 2023 474 111722

[35] Zhiyuan Zhao and Xueying Ding and B. Aditya Prakash PINN 12th International Conference on Learning Representations (ICLR) 2024

[36] Diederik P. Kingma and Jimmy Ba Adam: A Method for Stochastic Optimization 3rd International Conference on Learning Representations (ICLR) 2015

[37] Adam Paszke and Sam Gross and Francisco Massa and Adam Lerer and James Bradbury and Gregory Chanan and Trevor Killeen and Zeming Lin and Natalia Gimelshein and Luca Antiga and Alban Desmaison and Andreas Köpf and Edward Yang and Zach DeVito and Martin Raison and Alykhan Tejani and Sasank Chilamkurthy and Benoit Steiner and Lu Fang and Junjie Bai and Soumith Chintala PyTorch 33rd International Conference on Neural Information Processing Systems (NeurIPS) 2019

[38] Yunshu Du and Wojciech M. Czarnecki and Siddhant M. Jayakumar and Mehrdad Farajtabar and Razvan Pascanu and Balaji Lakshminarayanan Adapting Auxiliary Losses Using Gradient Similarity arXiv preprint arXiv:1812.02224 2018

[39] Dong C. Liu and Jorge Nocedal On the limited memory BFGS method for large scale optimization Mathematical Programming 1989 45 503–528

[40] Johannes Müller and Marius Zeinhofer Achieving High Accuracy with PINNs via Energy Natural Gradient Descent 40th International Conference on Machine Learning (ICML) 2023

[41] Nima Hosseini Dashtbayaz and Ghazal Farhani and Boyu Wang and Charles X. Ling Physics-informed neural networks: minimizing residual loss with wide networks and effective activations 33rd International Joint Conference on Artificial Intelligence (IJCAI) 2024

[42] N. Sukumar and Ankit Srivastava Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks Computer Methods in Applied Mechanics and Engineering 2022 389 114333

[43]

[44] Anima Anandkumar and Kamyar Azizzadenesheli and Kaushik Bhattacharya and Nikola Kovachki and Zongyi Li and Burigede Liu and Andrew Stuart Neural Operator: Graph Kernel Network for Partial Differential Equations ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations 2020 Workshop paper

[45] Zongyi Li and Nikola Borislavov Kovachki and Kamyar Azizzadenesheli and Burigede liu and Kaushik Bhattacharya and Andrew Stuart and Anima Anandkumar Fourier Neural Operator for Parametric Partial Differential Equations 9th International Conference on Learning Representations (ICLR) 2021

[46] Fabricio Dos Santos and Tara Akhound-Sadegh and Siamak Ravanbakhsh Physics-Informed Transformer Networks NeurIPS 2023 Workshop on The Symbiosis of Deep Learning and Differential Equations III 2023 Workshop paper

[47] Hailong Sheng and Chao Yang PFNN Journal of Computational Physics 2021 428 110085

[48] Lu Lu and Raphaël Pestourie and Wenjie Yao and Zhicheng Wang and Francesc Verdugo and Steven Johnson Physics-Informed Neural Networks with Hard Constraints for Inverse Design SIAM Journal on Scientific Computing 2021 43 B1105–B1132

[49] Yuchen Xie and Yu Ma and Yahui Wang Automatic boundary fitting framework of boundary dependent physics-informed neural network solving partial differential equation with complex boundary conditions Computer Methods in Applied Mechanics and Engineering 2023 414 116139

[50] Hang Zhou and Yuezhou Ma and Haixu Wu and Haowen Wang and Mingsheng Long Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers 42nd International Conference on Machine Learning (ICML) 2025

Dynamic display of documents.

Collapse and expand sections

Cross-references and related material

Discussions

Table of contents

Abstract

1 Main result

1.1 Introduction

1.2 Related Work

1.2.1 General Optimization Strategies

1.2.2 Domain-Specific Hard Constraint Methods

1.2.3 Training Enhancements

1.3 Preliminaries and Problem Setup

1.4 Theoretical Analysis

1.4.1 Loss Landscape Perspective

1.4.1.1 PDE Residual Term

1.4.1.2 Boundary Condition Terms

1.4.2 Optimization Dynamics Analysis

1.4.2.1 Phase I: Rapid Descent into Loss Valley

1.4.2.2 Phase II: Meandering Within the Valley

1.5 Methodology

1.5.1 Aligned Constraints

1.5.2 Delay Factor for Residual Loss

1.6 Experiments and Results

1.6.1 Benchmarks and Experiments Setup

1.6.2 Results and Discussions

1.6.3 Ablation Study

1.6.4 Optimizer Selection

1.6.5 Applicability and Failure Discussion

1.7 Conclusion

Acknowledgements

Impact Statement

Author Contributions

A Theoretical Proof

A.1 Existence of Loss Valleys in PDE Residual Term

A.1.1 Non-Uniqueness of Solutions

A.1.2 Functional Non-Uniqueness and Flat Minima

A.1.3 Mapping Functional Valleys to Parameter Space

A.2 Structural Origin of Gradient Conflict in PINNs

A.2.1 Geometric Decomposition near the Low-Residual Manifold

A.2.2 Generic Gradient Conflict

A.3 Aligned Constraints and Enlarged Feasible Set

A.3.1 Enlargement of Sublevel Sets and Feasible Overlap

A.3.2 Intersection with the PDE Residual Manifold

B Methodological Supplement

B.1 Optimal Offset for Zeroth-Order Terms

B.1.1 Linear Case

B.1.2 Nonlinear Case

B.2 Pipeline

B.3 Discussion and Future Work

C Experimental Supplement

C.1 Loss Valley Visualization

C.1.1 Figure 1(a): Loss Landscape in Function Space

C.1.2 Figure 1(b): Loss Landscape in Parameter Space

C.1.3 Table 1: Data Statistics

C.2 Demonstration of Two-Phase Optimization Dynamics

C.2.1 Experiment Setup

C.2.2 Observed Two-Phase Behavior

C.3 Main Experimental Setup

C.3.1 Benchmarks

C.3.1.1 Heat.

C.3.1.2 Poisson.

C.3.1.3 Navier–Stokes (NS).

C.3.1.4 Helmholtz.

C.3.2 Model and Training Configuration

C.4 Computational Overhead

C.5 Difference with Learnable Bias

C.6 Parameter Sensitivity

C.7 Failure Experiment

C.8 Benchmark Visualization

References

Discussion: create topic login to participate.