Skip to content

Commit

Permalink
[Inference] Append attn FP8 quant (#9328)
Browse files Browse the repository at this point in the history
* add fp8 gen files to gitignore

* append_attn support fp8 quant

* Unified FP8 Network

* include cuda_fp8.h

* simplify qwen2 network and FusedBlockMultiTransformerFP8

* simplify llama network and code check

* check fp8 params

* code check

* check

* default config for fp8 gemm
  • Loading branch information
ckl117 authored Nov 4, 2024
1 parent 582ff5e commit 5217a3b
Show file tree
Hide file tree
Showing 32 changed files with 1,656 additions and 1,722 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,6 @@ FETCH_HEAD
csrc/third_party/
dataset/
output/

# gen codes
autogen/
Loading

0 comments on commit 5217a3b

Please sign in to comment.