📈 Bregman Learning

Our Bregman learning framework aims at training sparse neural networks in an inverse scale space manner, starting with very few parameters and gradually adding only relevant parameters during training. We train a neural network $f_\theta:\mathcal{X}\rightarrow\mathcal{Y}$ parametrized by weights $\theta$ using the simple baseline algorithm

$\begin{cases}v\gets\,v-\tau\hat{\nabla}\mathcal{L}(\theta),\\\theta\gets\mathrm{prox}_{\delta\,J}(\delta\,v),\end{cases}$

where

$\mathcal{L}$ denotes a loss function with stochastic gradient $\hat{\nabla}\mathcal{L}$ ,
$J$ is a sparsity-enforcing functional, e.g., the $\ell_1$ -norm,
$\mathrm{prox}_{\delta\,J}$ is the proximal operator of $J$ .

Our algorithm is based on linearized Bregman iterations [2] and is a simple extension of stochastic gradient descent which is recovered choosing $J=0$ . We also provide accelerations of our baseline algorithm using momentum and Adam [3].

The variable $v$ is a subgradient of $\theta$ with respect to the elastic net functional

$J_\delta(\theta)=J(\theta)+\frac1\delta\|\theta\|^2$

and stores the information which parameters are non-zero.

🎲 Initialization

We use a sparse initialization strategy by initializing parameters non-zero with a small probability. Their variance is chosen to avoid vanishing or exploding gradients, generalizing Kaiming-He or Xavier initialization.

🔬 Experiments

The different experiments can be executed as Jupyter notebooks in the notebooks folder.

Classification

Mulit Layer Perceptron

In this experiment we consider the MNIST classification task using a simple multilayer perceptron. We compare the LinBreg optimizer to standard SGD and proximal descent. The respective notebook can be found at MLP-Classification.

Convolutions and Group Sparsity

In this experiment we consider the Fashion-MNIST classification task using a simple convolutional net. The experiment can be excecuted as a notebook, namely via the file ConvNet-Classification.

ResNet

In this experiment we consider the CIFAR10 classification task using a ResNet. The experiment can be excecuted as a notebook, namely via the file ResNet-Classification.

NAS

This experiment implements the neural architecture search as proposed in [4].

The corresponding notebooks are DenseNet and Skip-Encoder.

Miscellaneous

The notebooks will throw errors if the datasets cannot be found. You can change the default configuration 'download':False to 'download':True in order to automatically download the necessary dataset and store it in the appropriate folder.

If you want to run the code on your CPU you should replace 'use_cuda':True, 'num_workers':4 by 'use_cuda':False, 'num_workers':0 in the configuration of the notebook.

📝 References

[1] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. “A Bregman Learning Framework for Sparse Neural Networks.” Journal of Machine Learning Research 23.192 (2022): 1-43. https://www.jmlr.org/papers/v23/21-0545.html

[2] Woatao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon. “Bregman iterative algorithms for \ell_1-minimization with applications to compressed sensing.” SIAM Journal on Imaging sciences 1.1 (2008): 143-168.

[3] Diederik Kingma, Jimmy Lei Ba. “Adam: A Method for Stochastic Optimization.” arXiv preprint arXiv:1412.6980 (2014). https://arxiv.org/abs/1412.6980

[4] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. “Neural Architecture Search via Bregman Iterations.” arXiv preprint arXiv:2106.02479 (2021). https://arxiv.org/abs/2106.02479

Bregman Learning

📈 Bregman Learning

Tim Roith

📈 Bregman Learning

🎲 Initialization

🔬 Experiments

Classification

Mulit Layer Perceptron

Convolutions and Group Sparsity

ResNet

NAS

Miscellaneous

📝 References

You May Also Enjoy

Dr.-Klaus-Körper Prize

MirrorCBO: A consensus-based optimization method in the spirit of mirror descent

Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization

COMFORT