Deep Unfoldings for STELA
- Type of project: Research Project/Hauptseminar/Thesis (literature review + coding)
- Contact: eduardo.perez@tu-ilmenau.de
One of the current hot approaches in Model-Based Deep Learning is the so-called Deep Unfolding. In deep unfoldings, an iterative optimization algorithm is unfolded so that each iteration is represented by a layer in a neural network. This borrows from the idea that, in many iterative optimization algorithms, each iteration performs a combination of linear and non-linear operations much in the same way a neural network does. However, such algorithms often enjoy performance guarantees and can be interpreted much more easily than a conventional neural network. By combining the two, we get several advantages. Classical algorithms can be made to converge in fewer iterations (or layers, if you prefer), and the usage of a model makes it so that the training requires less data than would be needed for an arbitrary neural network.
The classical example is sparse recovery with the Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), where each iteration of the classical ISTA algorithm is a combination of a gradient step and a soft-thresholding step. These can be interpreted as an affine transformation followed by a non-linear activation function, exactly as is common in dense neural networks. We can turn these iterations into layers of a network and allow the matrices and vectors in the affine transformations to be trainable parameters. Additionally, the gradient step size and the soft-thresholding parameter, both of which are hyperparameters in ISTA, can be treated as learnable parameters as well. This leaves us with LISTA: a neural network that implements the ISTA algorithm with large flexibility regarding trainable parameters. Do we want the matrices and vectors to be trainable or not? If so, do we want them to be different for each layer? How about the hyperparameters: trainable or not? Same for each layer or not? Lots of flexibility!
In this research topic, we want to take a look at sparse recovery using a deep unfolding of the Soft-Thresholding with Exact Line search Algorithm (STELA). This algorithm has the advantage of having fewer hyperparameters. Particularly, the step sizes are chosen as part of the algorithm, leaving only the sparsity parameter. As a side note however, keep in mind that the optimizer used on the network implementing the deep unfolding has its own hyperparameters, including its own step size. Don’t mix up the two!
This topic requires the student to get familiarized with sparse recovery, the STELA algorithm, and deep unfoldings so that a deep unfolding of STELA can be designed and tested on synthetic data. As one of the core applications dealt with by the SigMaSense group is ultrasound, this reference may be helpful in getting familiar with the signal model and sparse recovery. Additionally, a toy implementation of the classical STELA algorithm is shown below. In this example, STELA is used to recover the time delays of a known pulse shape in a measured signal, a common task in sparse recovery. This also receives the name sparse deconvolution, as the process of delaying and scaling a pulse shape can be written as a convolution between a sparse signal (representing the time delays) and a known pulse shape.
import numpy as np
from numpy import newaxis as nax
import scipy as sp
import matplotlib.pyplot as plt
def soft_thresholding(v, l):
r"""
apply the soft-thresholding operator to the vector v with sparsity parameter l
Parameters
----------
v : 1d numpy array of floats
input vector to be soft-thresholded
l : float
soft-thresholding parameter; larger values yield sparser solutions
Returns
-------
soft_v : 1d numpy array of floats
soft-thresholded vector, same size as `v`
"""
thresholded = np.abs(v) - l
thresholded[thresholded < 0] = 0
return thresholded*np.sign(v)
def STELA(y, A, x0, l, iters):
r"""
apply `iters` iterations of the STELA algorithm to the data `y` with dictionary
matrix `A` and sparsity parameter `l`, using `x0` as an initial guess of the sparse
solution to y = Ax.
Parameters
----------
y : 1d numpy array of floats, size m
data to be fit
A : 2d numpy array of floats, size m x n
dictionary matrix such that y = Ax
x0 : 1d numpy array of floats, size n
initial guess of the sparse solution to y = Ax
l : float
soft-thresholding parameter in intervar [0,1]; larger values yield sparser
solutions.
iters : int
the algorithm will be halted after this number of iterations
Returns
-------
x_hat : 1d numpy array of floats, size n
estimate of a sparse solution to y = Ax
"""
lambd = l*np.amax(np.abs(A.T@y)) # heuristic for thresholding parameter
x_t = x0 # current guess of solution, size n
Bx_t = x0 # soft-thresholded Jacobi iteration from point x_t, size n
ABx_tx_t = np.zeros(len(y)) # forward mapping of update direction, size m
g_t = 0 # momentum parameter found via exact line search
norms = np.sum(A**2, axis=0) # squared 2-norms of each column of A, size n
r_t = -y # residual at iteration t, size m
for t in range(iters):
r_t = r_t + g_t*ABx_tx_t # update of residual
Bx_t = norms*x_t - A.T@r_t # Jacobi iteration
Bx_t = soft_thresholding(Bx_t, lambd)/norms # soft-thresholding and rescaling
l1_Bx_t = np.sum(np.abs(Bx_t)) # 1-norm of current Bx_t
l1_x_t = np.sum(np.abs(x_t)) # 1-norm of previous x_t
ABx_tx_t = A@(Bx_t - x_t) # forward mapping of the difference Bx_t - x_t
# compute the optimal momentum to update x_t using its previous value and Bx_t
g_t = -(r_t.dot(ABx_tx_t) + lambd*(l1_Bx_t - l1_x_t))/(ABx_tx_t.dot(ABx_tx_t))
g_t = np.clip(g_t, 0, 1) # clip values between 0 and 1 (convex combination)
x_t = x_t + g_t*(Bx_t - x_t) # update x_t with momentum
return x_t
# let's try it out on a signal that looks somewhat like ultrasound data
fs = 40e6 # sampling frequency
fc = 4e6 # center frequency
Nt = 256 # number of samples in the time domain
t = (np.arange(Nt)/fs)[:, nax] # time axis
alpha = fc**2 # bandwidth factor (the larger this is, the shorter the pulse duration)
amplitudes = np.array([1.4, -2.1, 0.3])
delays = (np.array([50.3, 84.9, 150.4])/fs)[nax, :]
signal = (np.cos(2*np.pi*fc*(t - delays))*np.exp(-alpha*(t - delays)**2))@amplitudes
# and now let's try and fish out the time delays using classical STELA
Nt_long = 2*Nt - 1
t_long = (np.arange(Nt_long) - (Nt - 1))/fs
pulse_long = np.cos(2*np.pi*fc*t_long)*np.exp(-alpha*t_long**2)
A = sp.linalg.toeplitz(pulse_long[Nt-1:], pulse_long[:Nt][::-1])
x0 = np.zeros(Nt)
l = 0.01
iters = 500
x_hat = STELA(signal, A, x0, l, iters)
t = t[:, 0]*1e6
delays = delays[0, :]*1e6
plt.plot(t, signal)
plt.plot(t, x_hat, linewidth=2)
markerline, _, _ = plt.stem(delays, amplitudes, basefmt=" ", markerfmt="s", linefmt="g--")
markerline.set_markerfacecolor('none')
plt.ylabel("Amplitude");
plt.xlabel(r"Time $[\mathrm{\mu s}]$");
plt.legend(("measured signal", "sparse recovery (STELA)", "ground truth"));