Conversation Class¶

The Conversation class is the basic storage container for all UniMessage objects. It can be imported with:

from pyconversations.convo import Conversation

Instantiation¶

A Conversation is created using its constructor, which can take two optional parameters: posts (a dictionary of UniMessage objects) and convo_id (a string identifier):

convo = Conversation(convo_id='tutorial_convo')

Alternatively, if you have a JSON representation of the Conversation (something the class exports to), you can statically instantiate a Conversation using:

convo = Conversation.from_json(<JSON_DATA_HERE>)

Conversation Arithmetic¶

For ease of combining collections of posts, the Conversation overloads the addition operator:

all_posts = convo_a + convo_b  # all posts (e.g., the union of) in the two conversations

Adding and removing posts¶

Posts are added to a conversation by supplying a UniMessage object:

convo.add_post(post_object)

Likewise, by supplying the UID of a post within the conversation, a post can be removed:

convo.remove_post(uid)

User-Information Redaction¶

For Twitter and Reddit, user-regex strings are available that allow for the redaction of usernames and user mentions (within message text). This can be done in a way that retains all unique users, assigning them an integer ID:

convo.redact()

Here, the first author identified is mapped to USER0 and the Nth user is mapped to USER(N-1). This allows for an anonymized retention of unique user information to track which users said what and to whom. It can be performed at a per-Conversation level or can be merged into a mega Conversation to perform redaction at a corpus-wide level.

Alternatively, if unique user information is unnecessary or undesired, one can map all author names and user mentions to a USER token with:

convo.redact(assign_ints=False)

Alternate Representations¶

To convert a Conversation into a JSON representation that can be re-loaded later, try:

convo.to_json()

If desired, one can also get a networkx.Graph representation of the Conversation with:

convo.as_graph()

Splicing Conversations¶

A Conversation’s posts can be filtered using:

sub = convo.filter(**args)

which produces a new Conversation object from the posts within convo that meet the criteria specified in args.

Specifically, .filter() makes the following keys available:

by_langs – a set of language codes desired
min_chars – a minimum length of text
before – a max datetime criteria
after – a min datetime criteria
by_tags – a set of post tags
by_platform – a set of platform names
by_author – a single author name

Additionally, some common splits are made available through:

convo.segment()  # Splits into a list of disjoint (non-cross-referencing) sub-conversations

convo.get_parents(uid)  # post(s) that UID is replying to

convo.get_children(uid)  # post(s) that reply to UID

convo.get_before(uid)  # all posts that occurred in the conversation before UID

convo.get_after(uid)  # all posts that occurred in the conversation after UID

convo.get_ancestors(uid)  # all posts on the path from UID's parent(s) to the root post

convo.get_descendants(uid)  # all posts in UID's child sub-tree(s)

When these do not suffice, .filter() is your go-to for custom filter operations.

Other Methods and Properties¶

Conversations have several other methods that may prove useful to access.

.get_sources() - Returns the set of source posts (posts that do not reply to any other)
.time_order() - Returns a list of the UIDs of posts in time-sorted order
.text_stream() - Returns a list of the text fields of the posts in time-sorted order (e.g., collapses DAG structure into a linear ordering)

Finally, Conversations have two underlying properties:

.convo_id - A string identifier for the conversation (potentially auto-generated from source IDs)
.posts - The underlying dictionary of stored post objects