当前位置：首页 > 人文社科 > 生活经验 >内容正文

生活经验

MLIR算子量化Quantization

发布时间：2023/11/28 生活经验 61 豆豆

生活随笔收集整理的这篇文章主要介绍了 MLIR算子量化Quantization 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

MLIR算子量化Quantization
本文概述了MLIR量化系统的设计。虽然术语“量化”是高度过载的，用于将浮点计算转换为以整数数学表示，适配的变量进行推理的技术的相当窄的范围，如低位深度推理引擎（如TFLite）所支持的，各种加速器硬件和许多DSP。
很大程度上受到了本文所采用的方法的启发，其中包含了许多扩展和修改。它具体记录了MLIR在这一主题上的立场，而不是一般性的参考。
Uniform quantization
o Fixed point values
o Affine values
o Relation
o Converting between real and fixed point or affine
• Usage within MLIR
• Quantization Dialect
o Quantized type
o Quantized type conversion operations
o Instrumentation and constraint operations
• Integration with simulated quantization at training time
• TFLite native quantization
o General algorithm
Uniform quantization均匀量子化
MLIR支持的主要量化机制，通过实数线上的等间距点，来表示不动点和仿射变换。

此外，该方案可以应用于：
•每层per-layer：应用于目标类型中的每个值。
•每轴per-axis（也称为每通道）：沿张量类型的特定轴，分别应用于每个索引。
• per-layer : Applying to every value within the target type.
• per-axis (also called per-channel) : Applying individually to each index along a specific axis of a tensor type.
定点值
定点值是实数除以刻度。将实数除以的结果称为标度值。
The $real_value = scaled_value * scale$
缩放可以解释为相邻缩放值之间的距离（以实单位表示）。例如，如果标度为 $π\pi$ ，则具有此标度的定点值只能表示 $π\pi$ 的倍数，而不能表示两者之间的值。将任意实数转换为给定值的固定点值的最大舍入误差 $s c a l e$ is $scale2\frac{scale}{2}$ 。
继续上一示例，当 $\pi$ , 最大舍入误差为 $π2\frac{\pi}{2}$ .
可以对具有不同比例的缩放值执行乘法，使用与实值乘法相同的算法（注意，乘积缩放值具有 $KaTeX parse error: Undefined control sequence: \mbox at position 32: … = scale_{left \̲m̲b̲o̲x̲{ } operand} * …$ ).
可以对缩放值执行加法，只要具有相同的缩放比例，使用相同的实值加法算法。在计算机上有符号整数表示缩放值，并对这些有符号整数执行算子运算变得很方便，因为结果将是正确的缩放值。
Affine values
从数学上讲，仿射值是将实值零点加到标度值上的结果。或者（等价地），从仿射值中减去一个零点得到一个缩放值：
$real_value = scaled_value * scale = (affine_value - zero_point) * scale$
从本质上说，仿射值是缩放值的某个常量的移动。算术（即加法、减法、乘法、除法）通常不能直接对仿射值执行；它们必须首先转换为等效的缩放值。
如上所述，使用仿射值的目的，更有效地表示在计算过程中实际遇到的实际值。将遇到的实数值不是围绕实数零对称的。假设在计算过程中遇到实零，应表示为实零。
存储由有符号整数表示的缩放值是低效的，因为某些有符号整数永远不会被使用。实际上，与这些有符号整数对应的位模式将被浪费。
为了用整数值仿射值精确地表示实零，零点必须是最小仿射值和最大仿射值（含）之间的整数。例如，给定一个由8位无符号整数表示的仿射值，我们有： $KaTeX parse error: Can't use function '\u' in math mode at position 11: 0\leq zero\̲u̲ ̲point\leq 255$ 。这一点很重要，因为在深度神经网络的卷积运算中，经常需要将输入和输出归零，所以零必须是可精确表示的，否则结果会有偏差。
Relation
实值、固定点值和仿射值通过以下等式进行关联，该等式演示了如何将一种类型的数字转换为另一种类型：
$real_value = scaled_value * scale = (affine_value - zero_point) * scale$
计算机通常使用有限位数存储数学值。虽然上述转换是精确的，但要将结果存储在有限的位中，通常必须对转换结果进行舍入（这两种情况都适用：使用浮点存储和使用定点存储）。对舍入行为的全面讨论超出了本文的范围，除非另有说明，否则可以安全地假设舍入应符合RNE的IEEE754默认值（在硬件允许的情况下）。
Converting between real and fixed point or affine
To convert a real value to a fixed point value, we must know the scale. To convert a real value to an affine value, we must know the scale and the zero point.
Real to affine
要将实值元素的输入张量（通常由浮点格式表示，通常为单精度），转换为由整数类型（例如8位无符号整数）表示的仿射元素张量，可以执行以下转换（不需要使用整型的所有可表示值）：
$KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ af&fine_value_…$
In the above, we assume that $real_value$ is a Single, $s c a l e$ is a Single, $r o u n d T o N e a r e s t I n t e g e r$ returns a signed 32-bit integer, and $zero_point$ is an unsigned 8-bit or 16-bit integer.
位深度和定点值的数目表示典型硬件上的常见类型，但不限于特定位深度或使用N位整数的整个范围的要求。
仿射到实数
要将uint8或uint16表示的仿射元素的输出张量，转换为实值元素的张量（通常用浮点格式表示，通常为单精度），可以执行以下转换：
$KaTeX parse error: No such environment: align* at position 8: \begin{̲a̲l̲i̲g̲n̲*̲}̲ re&al_value_{S…$
在上面的例子中，假设减法的结果，32位有符号整数格式，并且 $r o u n d T o N e a r e s t F l o a t$ 返回Single精度。
仿射到不动点
当仿射标度和不动点标度相同时，从仿射值中减去零点得到等价的不固定值。
$KaTeX parse error: Undefined control sequence: \mbox at position 34: …fine_value_{non\̲m̲b̲o̲x̲{-}negative} - …$
Fixed point to affine
当仿射尺度和不动点尺度相同时，将零点加到不动点的值上，得到等价的仿射值。
$KaTeX parse error: Undefined control sequence: \mbox at position 19: …fine_value_{non\̲m̲b̲o̲x̲{-}negative} = …$
Usage within MLIR
MLIR中正在开发的量化系统有几个内容：
Quantization dialect containing:
o A family of QuantizedTypes which represent the mapping between expressed values (typically of a floating point computer type) and storage values (typically of an integral computer type).
o Type conversion ops for converting between types based on a QuantizedType and its expressed and storage sub-types.
o Instrumentation ops for assigning instrumentation points within the computation where runtime statistics may help guide the quantization process.
• Integration with simulated quantization at training time
• TFLite native quantization
o The TFLite op-set natively supports uniform-quantized variants.
o Passes and tools exist to convert directly from the TensorFlow dialect to the TFLite quantized operation set.
并不是所有的量子化应用都会用到所有这些设置。TensorFlow到TensorFlow Lite的转换，使用QuantizedTypes，但有自己的类型转换算子和支持数学的表达式。
Quantization Dialect
Quantized type
TODO: Flesh this section out.
• QuantizedType base class
• UniformQuantizedType
Quantized type conversion operations
• qcast : Convert from an expressed type to QuantizedType
• dcast : Convert from a QuantizedType to its expressed type
• scast : Convert between a QuantizedType and its storage type
Instrumentation and constraint operations
• const_fake_quant : Emulates the logic of the historic TensorFlow fake_quant_with_min_max_args operation.
• stats_ref : Declares that statistics should be gathered at this point with a unique key and made available to future passes of the solver.
• stats : Declares inline statistics (per layer and per axis) for the point in the computation. stats_ref ops are generally converted to statistical operations once trial runs have been performed.
• coupled_ref : Declares points in the computation to be coupled from a type inference perspective based on a unique key.
Integration with simulated quantization at training time
训练时与模拟量化的集成
TensorFlow历来使用tf.quantization.fake_quant_模拟训练时，量化效果的算子族。
正如最初实现的那样，TensorFlow Lite是推理时此类操作的主要对象。当启用量化推断时，如果每个合格的张量都经过一个适当的伪量化节点（张量可以应用伪量化的规则，多少有些牵扯），那么TensorFlow Lite将使用伪量化操作的属性，判断如何从量化算子转换为使用kernel子集。
在基于MLIR的量化中，伪量化算子将它们转换成一个序列来处理的，该序列是qcast*（quantize），然后是dcast（dequantize），具有适当的UniformQuantizedType作为qcast算子的对象。

后续的编译器传递保留量化，以某种方式模拟的知识，同时允许编译器灵活地移动类型转换，简化了计算，并将其转换为基于积分算子的形式。
允许部分量化的计算，其中不能简化为积分运算的部分，仍然以浮点形式执行，并在边界处进行适当的转换。
TFLite native quantization
TODO: Flesh this out
General algorithm

Take input min/max information and set the ArrayInfo (which really is InputOrOutputArrayInfo.
In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q -> tfl.DQ).
Hardcode logic/propagation needs to happen here.
Run TF constant folding.
In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).
Run quantization pass that take (tfl.DQ (for both input and weights) -> op -> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with (constant_quant).

总结

以上是生活随笔为你收集整理的MLIR算子量化Quantization的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇：最大限度地减少块输出中间结果的计算和存储
下一篇： OpenGL在图形管道中调用了什么用户模