π Bregman Learning
Our Bregman learning framework aims at training sparse neural networks in an inverse scale space manner, starting with very few parameters and gradually adding only relevant parameters during training. We train a neural network  parametrized by weights 
 using the simple baseline algorithm
      
where
- 
denotes a loss function with stochastic gradient , 
- 
is a sparsity-enforcing functional, e.g., the -norm, 
- 
is the proximal operator of . 
Our algorithm is based on linearized Bregman iterations [2] and is a simple extension of stochastic gradient descent which is recovered choosing . We also provide accelerations of our baseline algorithm using momentum and Adam [3].
The variable  is a subgradient of 
 with respect to the elastic net functional
      
and stores the information which parameters are non-zero.
π² Initialization
We use a sparse initialization strategy by initializing parameters non-zero with a small probability. Their variance is chosen to avoid vanishing or exploding gradients, generalizing Kaiming-He or Xavier initialization.
π¬ Experiments
The different experiments can be executed as Jupyter notebooks in the notebooks folder.
Classification
       
Mulit Layer Perceptron
In this experiment we consider the MNIST classification task using a simple multilayer perceptron. We compare the LinBreg optimizer to standard SGD and proximal descent. The respective notebook can be found at MLP-Classification.
       
Convolutions and Group Sparsity
In this experiment we consider the Fashion-MNIST classification task using a simple convolutional net. The experiment can be excecuted as a notebook, namely via the file ConvNet-Classification.
ResNet
In this experiment we consider the CIFAR10 classification task using a ResNet. The experiment can be excecuted as a notebook, namely via the file ResNet-Classification.
NAS
       
This experiment implements the neural architecture search as proposed in [4].
The corresponding notebooks are DenseNet and Skip-Encoder.
 Miscellaneous
 Miscellaneous
The notebooks will throw errors if the datasets cannot be found. You can change the default configuration 'download':False to 'download':True in order to automatically download the necessary dataset and store it in the appropriate folder.
If you want to run the code on your CPU you should replace 'use_cuda':True, 'num_workers':4 by 'use_cuda':False, 'num_workers':0 in the configuration of the notebook.
π References
[1] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. βA Bregman Learning Framework for Sparse Neural Networks.β Journal of Machine Learning Research 23.192 (2022): 1-43. https://www.jmlr.org/papers/v23/21-0545.html
[2] Woatao Yin, Stanley Osher, Donald Goldfarb, Jerome Darbon. βBregman iterative algorithms for \ell_1-minimization with applications to compressed sensing.β SIAM Journal on Imaging sciences 1.1 (2008): 143-168.
[3] Diederik Kingma, Jimmy Lei Ba. βAdam: A Method for Stochastic Optimization.β arXiv preprint arXiv:1412.6980 (2014). https://arxiv.org/abs/1412.6980
[4] Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger. βNeural Architecture Search via Bregman Iterations.β arXiv preprint arXiv:2106.02479 (2021). https://arxiv.org/abs/2106.02479
