TensorRT INT8量化代码
Calibration file
1 简介
我们在使用Tensorrt的隐形量化时,需要生成一个cache文件,用于onnx生成engine文件使用。如果我们使用trtexec来将onnx文件生成为和我们的GPU相关的隐形量化后的engine文件时需要参数 参考A.2.1.2. Serialized Engine Generation A.2.1.4. Commonly Used Command-line Flags
QAT
附录:
for QDQ documents and how tensorrt process QDQ nodes, pls ref our developer guide: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work-with-qat-networks
And TensorRT provide a tool to do PTQ and QAT in pytorch: https://github.com/NVIDIA/TensorRT/blob/release/8.5/tools/pytorch-quantization/examples/torchvision/classification_flow.py
Besides, our team develop a sample to guide how to got best perf on Yolov7: https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat
And the QDQ best placement guide is here: https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/yolov7_qat/doc/Guidance_of_QAT_performance_optimization.md
Polygraphy
Linux shell重定向
Linux定时任务crontab
ubuntu下的GPU环境配置
Nvidia diver是最基础的跟硬件直接交互的底层软件,cuda依赖于driver,cuDNN依赖于cuda,tensorRT最终模型的推理加速依赖于前面这些基础的加速环境。
搜索显卡
下面指令分别是查看集成显卡和查看NVIDIA显卡