- google-ai-edge/ai-edge-quantizer: AI Edge Quantizer: flexible post training quantization for LiteRT models.

A quantizer for advanced developers to quantize converted LiteRT models. It aims to facilitate advanced users to strive for optimal performance on resource demanding models (e.g., GenAI models).

Build Type	Status
Unit Tests (Linux)
Nightly Release
Nightly Colab

Python versions: 3.9, 3.10, 3.11, 3.12
Operating system: Linux, MacOS
TensorFlow:

Nightly PyPi package:

pip install ai-edge-quantizer-nightly

The quantizer requires two inputs:

An unquantized source LiteRT model (with FP32 data type in the FlatBuffers format with .tflite extension)
A quantization recipe (details below)

and outputs a quantized LiteRT model that's ready for deployment on edge devices.

In a nutshell, the quantizer works according to the following steps:

Instantiate a Quantizer class. This is the entry point to the quantizer's functionalities that the user accesses.
Load a desired quantization recipe (details in subsection).
Quantize (and save) the model. This is where most of the quantizer's internal logic works.

qt = quantizer.Quantizer("path/to/input/tflite")
qt.load_quantization_recipe(recipe.dynamic_wi8_afp32())
qt.quantize().export_model("/path/to/output/tflite")

Please see the getting started colab for the simplest quick start guide on those 3 steps, and the selective quantization colab with more details on advanced features.

Please refer to the LiteRT documentation for ways to generate LiteRT models from Jax, PyTorch and TensorFlow. The input source model should be an FP32 (unquantized) model in the FlatBuffers format with .tflite extension.

The user needs to specify a quantization recipe using AI Edge Quantizer's API to apply to the source model. The quantization recipe encodes all information on how a model is to be quantized, such as number of bits, data type, symmetry, scope name, etc.

Essentially, a quantization recipe is defined as a collection of the following command:

“Apply Quantization Algorithm X on Operator Y under Scope Z with ConfigN”.

For example:

"Uniformly quantize the FullyConnected op under scope 'dense1/' with INT8 symmetric with Dynamic Quantization".

All the unspecified ops will be kept as FP32 (unquantized). The scope of an operator in TFLite is defined as the output tensor name of the op, which preserves the hierarchical model information from the source model (e.g., scope in TF). The best way to obtain scope name is by visualizing the model with Model Explorer.

The simplest recipe to get started with is using existing recipes from recipe.py. This is demonstrated in the getting started colab example.

Please refer to the LiteRT deployment documentation for ways to deploy a quantized LiteRT model.

There are many ways the user can configure and customize the quantization recipe beyond using a template in recipe.py. For example, the user can configure the recipe to achieve these features:

Selective quantization (exclude selected ops from being quantized)
Flexible mixed scheme quantization (mixture of different precision, compute precision, scope, op, config, etc)
4-bit weight quantization

The selective quantization colab shows some of these more advanced features.

For specifics of the recipe schema, please refer to the OpQuantizationRecipe in [recipe_manager.py].

For advanced usage involving mixed quantization, the following API may be useful:

Use Quantizer:load_quantization_recipe() in quantizer.py to load a custom recipe.
Use Quantizer:update_quantization_recipe() in quantizer.py to extend or override specific parts of the recipe.

The table below outlines the allowed configurations for available recipes.


Config		DYNAMIC_WI8_AFP32	DYNAMIC_WI4_AFP32	STATIC_WI8_AI16	STATIC_WI4_AI16	STATIC_WI8_AI8	STATIC_WI4_AI8	WEIGHTONLY_WI8_AFP32	WEIGHTONLY_WI4_AFP32
activation	num_bits	None	None	16	16	8	8	None	None
	symmetric	None	None	TRUE	TRUE	[TRUE, FALSE]	[TRUE, FALSE]	None	None
	granularity	None	None	TENSORWISE	TENSORWISE	TENSORWISE	TENSORWISE	None	None
	dtype	None	None	INT	INT	INT	INT	None	None
weight	num_bits	8	4	8	4	8	4	8	4
	symmetric	TRUE	TRUE	TRUE	TRUE	TRUE	TRUE	[TRUE, FALSE]	[TRUE, FALSE]
	granularity	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]	[CHANNELWISE, TENSORWISE]
	dtype	INT	INT	INT	INT	INT	INT	INT	INT
explicit_dequantize		FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	TRUE	TRUE
compute_precision		INTEGER	INTEGER	INTEGER	INTEGER	INTEGER	INTEGER	FLOAT	FLOAT

Operators Supporting Quantization


Config	DYNAMIC_WI8_AFP32	DYNAMIC_WI4_AFP32	STATIC_WI8_AI16	STATIC_WI4_AI16	STATIC_WI8_AI8	STATIC_WI4_AI8	WEIGHTONLY_WI8_AFP32	WEIGHTONLY_WI4_AFP32
FULLY_CONNECTED	✓	✓	✓	✓	✓	✓	✓	✓
CONV_2D	✓		✓	✓	✓	✓	✓
BATCH_MATMUL	✓		✓		✓		✓
EMBEDDING_LOOKUP	✓	✓	✓		✓	✓	✓
DEPTHWISE_CONV_2D	✓		✓		✓		✓
AVERAGE_POOL_2D			✓		✓
RESHAPE			✓		✓
SOFTMAX			✓		✓
TANH			✓		✓
TRANSPOSE			✓		✓
GELU			✓		✓
ADD			✓		✓
CONV_2D_TRANSPOSE	✓		✓		✓
SUB			✓		✓
MUL			✓		✓
MEAN			✓		✓
RSQRT			✓		✓
CONCATENATION			✓		✓
STRIDED_SLICE			✓		✓
SPLIT			✓		✓
LOGISTIC			✓		✓
SLICE			✓		✓
SELECT_V2			✓		✓
SUM			✓		✓

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
./workflows		./workflows
ai_edge_quantizer		ai_edge_quantizer
colabs		colabs
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dev_requirements.txt		dev_requirements.txt
format.sh		format.sh
requirements.txt		requirements.txt
setup.py		setup.py
test_pip_package.sh		test_pip_package.sh

Repository files navigation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 13

Uh oh!

Languages

google-ai-edge/ai-edge-quantizer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 13

Uh oh!

Languages

Packages