pyconversations.tokenizers

class pyconversations.tokenizers.BaseTokenizer(name)[source]

The abstract Tokenizer class.

abstract tokenize(s)[source]

Splits a string into tokens.

Parameters

s (str) – The string to tokenize

Returns

list(str) – A list of tokens

Raises

NotImplementedError – Must be implemented in extensions

class pyconversations.tokenizers.DefaultTokenizer[source]

A tokenizer that just uses Python’s basic str.split function.

tokenize(s)[source]

Splits a string into tokens.

Parameters

s (str) – The string to tokenize

Returns

list(str) – A list of tokens

class pyconversations.tokenizers.LambdaTokenizer(func)[source]

An interface that wraps a lambda function

tokenize(s)[source]

Splits a string into tokens.

Parameters

s (str) – The string to tokenize

Returns

list(str) – A list of tokens

class pyconversations.tokenizers.NLTKTokenizer[source]

An NLTK-based tokenizer

tokenize(s)[source]

Splits a string into tokens.

Parameters

s (str) – The string to tokenize

Returns

list(str) – A list of tokens

class pyconversations.tokenizers.PartitionTokenizer(space=True, charset=None)[source]

A custom Tokenizer based off of Partitioner by Jake Ryland Williams.

Notes

See for more information: https://github.com/jakerylandwilliams/partitioner

tokenize(s)[source]

Splits a string into tokens.

Parameters

s (str) – The string to tokenize

Returns

list(str) – A list of tokens