pyconversations.reader¶

The pyconversation.reader sub-module contains Classes which can read from disk into a universal, conversation format. While the ConvoReader can and should be used (especially post-conversion into the universal format), other readers are provided as examples of how to augment the basic function of a reader to get it to read other file formats one might have from an API.

class pyconversations.reader.BNCReader[source]¶

A custom Reddit Reader generated for the data format from “Before Name-calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation” (Habernal et al., 2018).

Notes

See: https://www.aclweb.org/anthology/N18-1036/

static iter_read(path_pattern, ld=True, rd=False)[source]¶

Function for creating a conversation reading iterator. Will read and parse part of a file/directory, yielding segments as queried.

Parameters

path_pattern (str) – The path to file or directory containing Conversation data
ld (bool) – Whether or not activate language detection (Default: True)
rd (bool) – Whether to use the secondary Reddit parser (RedditPost.parse_rd) or not (RedditPost.parse_raw) (Default: False)

Raises

NotImplementedError –

static read(path_pattern, ld=True)[source]¶

Reads the entire archive of posts from this dataset. Posts that violate rule 2 of the r/ChangeMyView sub-reddit are tagged with the AH=1 tag; otherwise, posts are tagged with AH=0.

Parameters

path_pattern (str) – The path to the directory containing the data
ld (bool) – Whether or not activate language detection (Default: True)

Returns

list(Conversation) – A list of all parsed and segmented disjoint Conversations within this dataset

class pyconversations.reader.BaseReader[source]¶

Abstract Reader class. Defines the two functions that Readers may implement to read from disk.

abstract static iter_read(path_pattern)[source]¶

Function for creating a conversation reading iterator. Will read and parse part of a file/directory, yielding segments as queried.

Parameters: path_pattern (str) – The path to file or directory containing Conversation data
Raises: NotImplementedError –

static read(path_pattern)[source]¶

Function for reading an entire file/directory of conversations.

Parameters: path_pattern (str) – The path to file or directory containing Conversation data
Raises: NotImplementedError –

class pyconversations.reader.ChanReader[source]¶

Reader class for reading and converting raw 4chan data

static iter_read(path_pattern, ld=True)[source]¶

Function for iteratively reading an entire file/directory of conversations. Currently expects a path_pattern that points to a directory of JSON files enumerated from 00 to 99.

Parameters

path_pattern (str) – The path to file or directory containing Conversation data
ld (bool) – Whether or not language detection should be activated. (Default: True)

Yields

2-tuple(int, Conversation) – A tuple containing which chunk (in 0..99) this Conversation originated from as well as a Conversation segment.

static read(path_pattern, ld=True)[source]¶

Function for reading an entire file/directory of conversations.

Parameters

path_pattern (str) – The path to file or directory containing Conversation data
ld (bool) – Whether or not language detection should be activated. (Default: True)

Raises

NotImplementedError –

class pyconversations.reader.ConvoReader[source]¶

Universal Conversation reader. Once parsing raw files into the Universal format, one can save them to disk and re-load them using this Reader class.

static iter_read(path_pattern)[source]¶

Function for creating a conversation reading iterator. Will read and parse part of a file/directory, yielding conversations as queried.

Parameters: path_pattern (str) – The path to a directory containing Conversation data. This path will be appended with the pattern *.json.
Yields: Conversation – A conversation, read from disk.

static read(path_pattern)[source]¶

Function for reading an entire file/directory of conversations.

Parameters: path_pattern (str) – The path to file or directory containing Conversation data
Raises: NotImplementedError –

class pyconversations.reader.QuoteReader[source]¶

A reader specifically designed to read JSONs of Quote tweet archives.

static iter_read(path_pattern, ld=True)[source]¶

Function for creating a conversation reading iterator. Will read and parse part of a file/directory, yielding segments as queried.

Parameters

path_pattern (str) – The path to file or directory containing Conversation data
ld (bool) – Whether to activate language detection (Default: True)

Raises

NotImplementedError –

static read(path_pattern, ld=True)[source]¶

Reads an entire directory of quote tweet JSONLine files, segments them into disjoint conversations, and returns the conversations.

Parameters

path_pattern (str) – The path to the directory
ld (bool) – Whether to activate language detection (Default: True)

Returns

list(Conversation) – A list of disjoint conversations

class pyconversations.reader.RawFBReader[source]¶

Reader for raw FB data

static iter_read(path_pattern, ld=True)[source]¶

Given a path_pattern that points to a directory containing raw FB data in the form of path_pattern/PAGES/RAW_DATA.json, this function will iteratively read the files and produce Conversational data.

Parameters

path_pattern (str) – The path to file or directory containing Conversation data
ld (bool) – Whether or not language detection should be activated. (Default: True)

Yields

2-tuple(str, Conversation) – The name of the page (as parsed) and an associated Conversation from that page

Raises

ValueError – If a JSON file is encountered that isn’t named as one of: post, comments, replies, attach, react, scrape

static read(path_pattern)[source]¶

Function for reading an entire file/directory of conversations.

Parameters: path_pattern (str) – The path to file or directory containing Conversation data
Raises: NotImplementedError –

class pyconversations.reader.RedditReader[source]¶

General Reddit raw data reader.

static iter_read(path_pattern, ld=True, rd=False)[source]¶

This iterative reading function assumes that the path it will be pointed towards contains raw Reddit comments and submissions, sorted/chunked by the month they were created.

Parameters

path_pattern (str) – The path to the directory containing the data
ld (bool) – Whether or not activate language detection (Default: True)
rd (bool) – Whether to use the secondary Reddit parser (RedditPost.parse_rd) or not (RedditPost.parse_raw) (Default: False)

Yields

list(Conversation) – A chunk of Conversations, as parsed

static read(path_pattern)[source]¶

Function for reading an entire file/directory of conversations.

Parameters: path_pattern (str) – The path to file or directory containing Conversation data
Raises: NotImplementedError –

class pyconversations.reader.ThreadsReader[source]¶

This is a custom Twitter “Threads” Reader. May be deprecated to adopt new Twitter reply functionality.

static iter_read(path_pattern, ld=True)[source]¶

Function for creating a conversation reading iterator. Will read and parse part of a file/directory, yielding segments as queried.

Parameters

path_pattern (str) – The path to file or directory containing Conversation data
ld (bool) – Whether to activate language detection (Default: True)

Yields

2-tuple(str, list(Conversation)) – The string ID of the threaded discussion and a list of the disjoint Conversations identified within it

static read(path_pattern)[source]¶

Function for reading an entire file/directory of conversations.

Parameters: path_pattern (str) – The path to file or directory containing Conversation data
Raises: NotImplementedError –