Skip to main content

Posts

Showing posts with the label Website Content

How to Block ChatGPT From Using Your Website Content

There is concern about the lack of an easy way to opt out of having one’s content used to train large language models (LLMs) like ChatGPT. There is a way to do it, but it’s neither straightforward nor guaranteed to work. How AIs Learn From Your Content Large Language Models (LLMs) are trained on data that originates from multiple sources. Many of these datasets are open source and are freely used for training AIs. In general, Large Language Models use a wide variety of sources to train from. Examples of the kinds of sources used: Wikipedia Government court records Books Emails Crawled websites There are actually portals and websites offering datasets that are giving away vast amounts of information. One of the portals is hosted by Amazon, offering thousands of datasets at the Registry of Open Data on AWS. Datasets Used to Train ChatGPT ChatGPT is based on GPT-3.5, also known as InstructGPT. The datasets used to train GPT-3.5 are the same used for GPT-3. The major difference betwe...