bp_text.pool

This module implements functionality for a (text) pool.

A text pool is a collection of annotated/tokenized, text-holding objects (e.g. PdfFiles, TxtFiles) and can be generated from a BibTexDatabase object. Its main purpose is to facilitate interacting with a corpus of texts and the metadata provided by the BibTex entries.

Created: 2025-04-25 Author: Ruben Philipp <me@rubenphilipp.com>

$$ Last modified: 22:36:31 Mon Apr 28 2025 CEST

Functions

cycle_data(ob)

This data getter function returns the next data item index and also sets the next item index to return.

random_data(ob)

This data getter function returns a random data item index.

Classes

Pool([data])

Implementation of the Pool class.

PoolItem(key[, meta, data, ...])

This class implements a PoolItem.

class bp_text.pool.Pool(data={})[source]

Bases: object

Implementation of the Pool class.

A (text) Pool is a collection of annotated/tokenized, text-holding objects (e.g. PdfFiles, TxtFiles) and can be generated from a BibTexDatabase object. Its main purpose is to facilitate interacting with a corpus of texts and the metadata provided by the BibTex entries.

The data of the pool is a dict of PoolItem objects. The keys of the dict are typically (e.g. when the Pool is created from a BibTexDatabase) citation keys.

Parameters:

data (dict) – A dict with an initial set of PoolItem objects.

__init__(data={})[source]
property data

Getter/setter for the data of the Pool.

This is a dict with PoolItem objects. Keys are usually citekeys.

get(key)[source]

Get a py:class:PoolItem from the pool by its key.

class bp_text.pool.PoolItem(key, meta={}, data=[], default_get_data_func=None)[source]

Bases: object

This class implements a PoolItem.

PoolItems are containers for metadata (e.g. retrieved from a BibTeX entry in a BibTexDatabase) and text holding objects (in the data attr), e.g. PdfFile objects.

They are meant to be placed into a Pool.

Parameters:
  • key (string) – A (unique) key. This is most likely a BibTeX citekey.

  • meta (dict) – A dict holding metadata, most likely derived from a BibTeX entry.

  • data (list) – A list holding one or more text holding objects (e.g. a PdfFile).

  • default_get_data_func (Either an integer or a function which must be a function taking the PoolItem as its argument and must return an index to the element of data which should be retrieved. Set to None to use the default.) – The default function to retrieve a data object via get_data() (cf. get_data()). This could also be an integer which is an index to an element in the data attribute of the PoolItem. Default = None, which falls back to the default which causes get_data to always return the first element of the data.

__init__(key, meta={}, data=[], default_get_data_func=None)[source]
property data

Getter/setter for the data list.

property default_get_data_func
get_data(index=0)[source]

This function returns a single data object from the data list instead of the data list itself. The index argument – which must be a function taking the PoolItem as its argument and must return an index to the element of data which should be retrieved – specifies which element should be returned. By default, it always returns the first item of the data list. There are two more functions specified, which could also be used: random_data() and cycle_data().

Example:

# this is an example using a function instead of an integer.  the
# function cycles through the data by using the pre-defined
# `cycle_data` function.
# NB: `pitm` here is a `PoolItem` object
pitm.get_data(cycle_data)
Parameters:

index (int or function) – Either an integer which is a (zero-based) index to an element in data, or a function which must be a function taking the PoolItem as its argument and must return an index to the element of data which should be retrieved. Default = 0.

property key

Getter/setter for the key.

property meta

Getter/setter for the meta dict.

property next_data

Getter/setter for the next_data id. This is an index (zero-based) to the next element that should be retrieved from the data list when using get_data(). (int)

bp_text.pool.cycle_data(ob)[source]

This data getter function returns the next data item index and also sets the next item index to return. See PoolItem.get_data() for details.

bp_text.pool.random_data(ob)[source]

This data getter function returns a random data item index. See PoolItem.get_data() for details.