Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 7th】Fundable project 5. 前沿文档多模态大模型飞桨复现 #833

Open
MqLeet opened this issue Nov 25, 2024 · 1 comment
Assignees

Comments

@MqLeet
Copy link

MqLeet commented Nov 25, 2024

GOT-OCR2.0 是由 StepFun 和中国科学院大学推出的专用于通用 OCR 任务的多模态大模型,参数量 0.6B,采用 vision encoder+input embedding layer+decoder 的 pipeline。我们需要跟进与丰富PaddleMIX中的跨模态文图模型,从模型、训练、推理等方面完善。

任务描述 详细内容 完成情况
GOT-OCR2.0基础模型复现,主要包含其依赖的相关基础组件 BlipImageEvalProcessor done
ImageEncoderViT done
GOTQwenModel done
GOTQwenForCausalLM done
GOT-OCR2.0 推理 pipeline 构建 got_ocr2_0_infer done
提供相关的 paddle 模型权重 model.safetensors done
支持并对齐 GOT-OCR2.0 的 post-training 训练 待定 ---
@MqLeet
Copy link
Author

MqLeet commented Nov 25, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants