kosmos2torch 0.0.1
Kosmos2.5
My implementation of Kosmos2.5 from Microsoft research and the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Paper Link
Appreciation
Lucidrains
Agorians
Install
Dataset Strategy
Here is a table summarizing the datasets used in the paper KOSMOS-2.5: A Multimodal Literate Model with metadata and source links:
Dataset
Modality
# Samples
Domain
Source
IIT-CDIP
Text + Layout
27.6M pages
Scanned documents
Link
arXiv papers
Text + Layout
20.9M pages
Research papers
Link
PowerPoint slides
Text + Layout
6.2M pages
Presentation slides
Web crawl
General PDF
Text + Layout
155.2M pages
Diverse PDF files
Web crawl
Web screenshots
Text + Layout
100M pages
Webpage screenshots
Link
README
Text + Markdown
2.9M files
GitHub README files
Link
DOCX
Text + Markdown
1.1M pages
WORD documents
Web crawl
LaTeX
Text + Markdown
3.7M pages
Research papers
Link
HTML
Text + Markdown
6.3M pages
Webpages
Link
License
MIT
Citations
@misc{2309.11419,
Author = {Tengchao Lv and Yupan Huang and Jingye Chen and Lei Cui and Shuming Ma and Yaoyao Chang and Shaohan Huang and Wenhui Wang and Li Dong and Weiyao Luo and Shaoxiang Wu and Guoxin Wang and Cha Zhang and Furu Wei},
Title = {Kosmos-2.5: A Multimodal Literate Model},
Year = {2023},
Eprint = {arXiv:2309.11419},
}
bold
italics
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.