ChatterBot Corpus

This is a corpus of dialog data that is included in the chatterbot module.

Corpus language availability

Corpus data is user contributed, but it is also not difficult to create one if you are familiar with the language. This is because each corpus is just a sample of various input statements and their responses for the bot to train itself with.

To explore what languages and sets of corpora are available, check out the chatterbot/corpus/data directory in the repository.

Note

If you are interested in contributing a new language corpus, or adding content to an existing language in the corpus, please feel free to submit a pull request on ChatterBot’s GitHub page. Contributions are welcomed!

Exporting your chat bot’s database as a training corpus

Now that you have created your chat bot and sent it out into the world, perhaps you are looking for a way to share what it has learned with other chat bots? ChatterBot’s training module provides methods that allow you to export the content of your chat bot’s database as a training corpus that can be used to train other chat bots.

chatbot = ChatBot('Export Example Bot')
chatbot.trainer.export_for_training('./export.json')

Here is an example:

# -*- coding: utf-8 -*-
from chatterbot import ChatBot

'''
This is an example showing how to create an export file from
an existing chat bot that can then be used to train other bots.
'''

chatbot = ChatBot(
    'Export Example Bot',
    trainer='chatterbot.trainers.ChatterBotCorpusTrainer'
)

# First, lets train our bot with some data
chatbot.train('chatterbot.corpus.english')

# Now we can export the data to a file
chatbot.trainer.export_for_training('./my_export.json')
corpus
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory [1].
[1]https://en.wikipedia.org/wiki/Text_corpus