Ai2 Dolma: 3T token open corpus for language model pretraining (2023)

  • Posted 7 hours ago by tosh
  • 1 points
https://allenai.org/blog/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64

0 comments