[Inference] Append attn FP8 quant (#9328)

* add fp8 gen files to gitignore * append_attn support fp8 quant * Unified FP8 Network * include cuda_fp8.h * simplify qwen2 network and FusedBlockMultiTransformerFP8 * simplify llama network and code check * check fp8 params * code check * check * default config for fp8 gemm
PaddlePaddle · Nov 4, 2024 · 5217a3b · 5217a3b
1 parent 582ff5e
commit 5217a3b
Show file tree

Hide file tree

Showing 32 changed files with 1,656 additions and 1,722 deletions.
diff --git a/.gitignore b/.gitignore
@@ -129,3 +129,6 @@ FETCH_HEAD
 csrc/third_party/
 dataset/
 output/
+
+# gen codes
+autogen/