askadam.optmisation

Usage

obj = askadam;
out = obj.optimisation( data, mask, weights, parameters, fitting, FWDfunc, varargin);

I/O overview

Input	Description
data	(Masked) N-D (imaging) data
mask	(1-3)D signal mask applied on FWDfunc, NOTE this mask does NOT apply on data
weights	N-D weights, same dimension as ‘data’ (optional)
parameters	structure variable containing starting points of all model parameters to be estimated (optional)
fitting	structure contains fitting algorithm parameters
fitting.optimiser	Algorithm for parameter update, ‘adam’ (default) \| ‘sgdm’ \| ‘rmsprop’
fitting.model_params	1xM cell variable, name of the model parameters, e.g. {‘S0’,’R2star’}
fitting.lb	1xM numeric variable, fitting lower bound, same order as field ‘model_params’, e.g. [0.5, 0]
fitting.ub	1xM numeric variable, fitting upper bound, same order as field ‘model_params’, e.g. [2, 1]
fitting.isDisplay	boolean, display optimisation process in graphic plot
fitting.initialLearnRate	(initial) learn rate of Adam optimiser, default = 0.001
fitting.iteration	maximum number of optimisation iterations, default = 4000
fitting.tol	stop if total loss < tol, default = 1e-3
fitting.lambda	regularisation parameter(s), default = 0 (no regularisation)
fitting.regmap	model parameter(s) to which regularisation is applied
fitting.TVmode	mode for total variation (TV) regularisation, ‘2D’ (default) \| ‘3D’
fitting.lossFunction	loss function for data fidelity term, ‘L1’ (default) \| ‘L2’ \| ‘huber’ \| ‘mse’
fitting.randomness	randomness of starting point; 0 = fixed (default), 1 = fully random
fitting.debug	display extra messages and enable GPU memory tracking, default = false
FWDfunc	function handle for forward signal generation; output size must match size of ‘data’
varargin	additional input for FWDfunc other than ‘parameter’ and ‘mask’

Output	Description
out	structure contains optimisation result
out.final	output structure at final iteration
out.final.loss	total loss = loss_fidelity + loss_reg
out.final.loss_fidelity	loss of data consistency term
out.final.loss_reg	loss of regularisation term
out.final.(model_params{k})	estimated model parameter(s)
out.min	output structure at minimum loss iteration
out.min.(model_params{k})	estimated model parameter(s) at minimum loss iteration
out.final.memoryUsage	estimated GPU memory usage in GB (requires fitting.debug = true for full tracking)

Stopping criteria

askadam supports multiple stopping criteria that can be used independently or in combination. The optimisation terminates when any active criterion is satisfied.

Basic stopping criteria

These are always active.

Option	Default	Description
fitting.iteration	4000	Stop when the maximum number of iterations is reached
fitting.tol	1e-3	Stop when total loss falls below this threshold
fitting.convergenceValue	1e-8	Stop when the convergence signal falls below this threshold for `patienceConvergence` consecutive checks
fitting.patienceConvergence	5	Number of consecutive checks below `convergenceValue` required before stopping
fitting.patience	5	Shared default for all patience counters; individual patience values override this

Convergence model

Controls how the convergence signal is computed from the loss. Applies to the loss-based stopping criterion above.

Option	Default	Description
fitting.convergenceModel	‘ema’	Method for computing convergence signal from loss history. `'linear'`: slope of loss over last `convergenceWindow` iterations. `'ema'`: relative change in exponential moving average (EMA) of loss — more robust to short-term oscillations.
fitting.convergenceWindow	20	Number of iterations used to compute slope (`'linear'` model only)
fitting.emaDecay	0.95	EMA decay factor (`'ema'` model only); higher values smooth more aggressively

Robust convergence (v1.1)

When enabled, detects voxels that are not improving relative to the rest of the population and downweights their contribution to the gradient computation. The convergence signal is then computed on the main (non-outlier) population only, preventing a small number of stuck voxels from masking genuine convergence of the majority.

Outlier classification is based on two independent criteria, both of which must be satisfied for a voxel to be flagged:

Criterion A: the voxel has improved by less than outlierVoxelThres over the last outlierCheckWindow checks, while the median voxel has improved by more than outlierPopThres.
Criterion B: the voxel has improved by less than outlierInitThres relative to its own loss at initialisation, while the median voxel has improved by more than outlierInitPopThres.

Once flagged, a voxel remains downweighted for at least outlierMinFlagDuration checks before it can be reinstated, giving the downweighting time to take effect.

Note

Outlier downweighting applies to the data fidelity gradient only. TV regularisation gradients are unaffected. The outlier classification lags by one weightUpdateInterval because extractdata breaks the autodiff graph — this is intentional.

Option	Default	Description
fitting.robustConvergence	false	Enable robust convergence mode
fitting.outlierWeight	0.1	Gradient contribution of outlier voxels relative to main population (0-1)
fitting.weightUpdateInterval	5	Number of iterations between outlier mask and weight updates
fitting.outlierCheckWindow	5	Number of checks used to assess improvement in criterion A
fitting.outlierMinFlagDuration	5	Minimum number of checks a voxel remains flagged before reassessment
fitting.outlierVoxelThres	0.01	Criterion A: minimum fractional improvement required per voxel (1%)
fitting.outlierPopThres	0.05	Criterion A: minimum fractional improvement required for median voxel (5%)
fitting.outlierInitThres	0.05	Criterion B: minimum fractional improvement from initialisation per voxel (5%)
fitting.outlierInitPopThres	0.20	Criterion B: minimum fractional improvement from initialisation for median (20%)

Additional convergence signals (v1.1)

These optional signals provide additional stopping criteria independent of robustConvergence. Each is disabled by default (value = 0) and activates when set to a positive value. Each uses the same patience mechanism as the loss-based criterion.

Step norm (analogous to StepTolerance in lsqnonlin):

Stops when the relative norm of the parameter update step falls below threshold, indicating that parameters have effectively stopped moving:

\[\frac{\| \Delta\theta \|_2}{1 + \| \theta \|_2} < \texttt{convergenceStepTol}\]

Option	Default	Description
fitting.convergenceStepTol	0	Relative step norm threshold; 0 = disabled
fitting.patienceStep	5	Consecutive checks below threshold required before stopping

Gradient norm:

Stops when the raw gradient norm (before Adam correction) falls below threshold, indicating that the loss landscape is genuinely flat:

Option	Default	Description
fitting.convergenceGradTol	0	Gradient norm threshold; 0 = disabled
fitting.patienceGrad	5	Consecutive checks below threshold required before stopping

Note

The step norm and gradient norm signals are complementary. The step norm catches parameter stagnation; the gradient norm catches loss landscape flatness. With Adam, a small step norm does not necessarily imply a small gradient norm since Adam normalises gradients via its second moment estimate.

Example: enabling robust convergence with EMA

fitting.convergenceModel    = 'ema';       % use EMA-smoothed convergence signal
fitting.robustConvergence   = true;        % enable outlier-aware convergence
fitting.outlierWeight       = 0.1;         % outlier voxels contribute 10% gradient weight
fitting.weightUpdateInterval = 5;          % update outlier mask every 5 iterations

obj = askadam;
out = obj.optimisation(data, mask, weights, parameters, fitting, FWDfunc, varargin);