Key Factors: Data in ChatGPT

ChatGPT DataExplanation
Data CutoffJanuary 2022
Why Not CurrentUpdate Lag
Data Size570GB of Text
Data SourcesWeb Pages, Books, etc.

All about ChatGPT’s data

  • Latest data comes from: January 2022
  • Why not current data: It takes time to gather, clean, and train data. So there’s a time lag between real-world events and updates to my database.
  • How much trained data: ChatGPT is trained on ~570GB of text data.
  • Where ChatGPT gets its from: A large dataset that includes text from websites, books, Wikipedia, and other sources.
Asking ChatGPT how does it collect data

Other web pages say ChatGPT’s most recent data is from 2021.

However, that’s wrong — these guides are outdated by now.

