![]() You can also search for any domain in the C4 dataset using the index hosted by the Allen Institute for AI. National Center for Biotech Info (papers)Ī huge ‘thank you!’ to Drs Jesse Dodge and Maarten Sap from the Allen Institute for AI for the revised chart in the C4 paper. PLoS – Public Library of Science (papers) What is in Common Crawl? Common Crawl includes (C4, cleaned/filtered, sorted by most tokens): #Ĭ4 (Filtered Common Crawl) contents with Wikipedia removed for dedup… GPT-3 is sometimes misspelt as: GPT3, GPT 3, GPT three, GTP-3, GTP3, GTP 3, GTP three. What is in the Pile v1? The Pile v1 contains (sorted by most tokens/effective size): What is in GPT-3? GPT-3 contains (sorted by most tokens/effective size): Note: Text provided here for indexing only, please see the Google sheet above for formatting as intended. ![]() Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |