- jsleb333/quadboost: QuadBoost, a machine learning boosting algorithm based on the quadratic loss

This project aims to implement a performant boosting algorithm to classify images based on the quadratic loss. Its inherent multiclass nature makes it a good alternative to classical boosting algorithms with simple class reductions. The project is written in Python and tries to follow the philosophy of the scikit-learn project.

This package not only provides two versions of the QuadBoost algorithm, but also a complete framework to boost any multiclass weak learners. Therefore, callbacks to customize the training are provided, as well as a variety of weak learners and outputs encoding.

To be able to run the minimal working examples of the program, you need a dataset. The package has an integrated support for the MNIST dataset and the CIFAR-10 dataset. The module datasets provides resources to easily handle datasets. In the file ./quadboost/datasets/datasets.py, you will find the functions _generate_mnist_dataset and _generate_cifar10_dataset. These will automatically download the datasets, create a MNISTDataset or CIFAR10Dataset object and save it to the specified directory (which is ./quadboost/data/ by default).

Alternatively, if you already have MNIST or CIFAR-10 downloaded, you can create the file ./quadboost/datasets/datasets_path.py containing a dict of the form:

path_to = {'mnist':'path/to/mnist/raw/',
           'cifar10':'path/to/cifar10/raw'}

and the functions should create the datasets objects without downloading.

The datasets.py file provides the classes MNISTDataset and CIFAR10Dataset, which handle the datasets and can center and/or reduce them if desired. This class can pickle the dataset, which make it faster to load in subsequent uses. The use of the datasets is required to run the minimal working examples.

This project relies on the following Python libraries:

scikit-learn
numpy
matplotlib
pytorch (used in ./quadboost/weak_learner/random_convolution.py)
torchvision (used in ./quadboost/weak_learner/random_convolution.py)
scikit-image (optional, used in ./quadboost/mnist_ideals/ideal_preprocessing.py)
tblib (optional, used in ./quadboost/utils/multiprocessing_utils.py)
colorama (optional, used in ./quadboost/utils/timed.py)

The file quadboost.py provides an implementation of a general QuadBoost algorithm, with other specific implementations (QuadBoost.MH and QuadBoost.MHCR). A main() function with minimal working example is also provided.

The module weak_learner provides some weak learners to be used with QuadBoost, such as a MulticlassDecisionStump and a MulticlassDecisionTree based on the former. Is also included a RandomConvolution feature extractor that wraps around a weak learner. All weak learners can be used as standalone. A _WeakLearnerBase parent class is provided to facilitate the implementations of other weak learners that can easily be passed to the QuadBoost algorithm.

The file label_encoder.py provides an implementation of LabelEncoder and inherited classes. These LabelEncoder can transform a set of labels into vectors encoding the classes, such as one-hot encoding or all-pairs encodings. The class provides a method to encode and decode labels, and supports custom encodings. Many examples of such custom encodings are presented in the encodings.json file, such as idealized MNIST characters, or mean haar transformed pictures.

The module data_preprocessing provides scripts to preprocess MNIST to extract features. Current version only implements 2D Haar wavelet transform on features.

The boosting algorithm works with the help of callbacks on each step of the iteration. Callbacks are handled by a CallbacksManagerIterator which appropriately calls functions on beginnig of iteration, beginning of step, end of step, end of iteration and on exception exit. Callbacks include BreakCallbacks which can end the iteration on various conditions, ModelCheckpoint and CSVLogger which saves the model or the logs and Progession which outputs formatted information on the training steps.

Name		Name	Last commit message	Last commit date
Latest commit History 516 Commits
quadboost		quadboost
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

jsleb333/quadboost

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages