Ai2 Dolma: 3T token open corpus for language model pretraining (2023)
Posted 7 hours ago by
tosh
1
points
https://allenai.org/blog/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64
0
comments