bp_text.language

This module contains functionality dealing with languages (e.g. detection).

Created: 2025-03-27 Author: Ruben Philipp <me@rubenphilipp.com>

$$ Last modified: 15:36:10 Mon Apr 28 2025 CEST

Module Attributes

DEFAULT_LANGUAGES

The default languages used by the detection algorithm.

Functions

convert_langcode(langcode[, inFormat, outFormat])

Convert a language code to a different form.

iso_639_1_to_langid(iso_code[, fallback])

The inverse of langid_to_iso_639_1().

langid_to_iso_639_1(langid[, fallback])

Standardize (i.e. convert to ISO-639-1) a Bib(La)TeX langid (see BibLaTeX documentation for a list of supported langids, https://ctan.org/pkg/biblatex).

Classes

LanguageDetector([languages])

A language detector.

bp_text.language.DEFAULT_LANGUAGES = [Language.ENGLISH, Language.GERMAN, Language.FRENCH, Language.SPANISH, Language.ITALIAN, Language.DUTCH, Language.PORTUGUESE, Language.POLISH, Language.SWEDISH]

The default languages used by the detection algorithm.

class bp_text.language.LanguageDetector(languages=[Language.ENGLISH, Language.GERMAN, Language.FRENCH, Language.SPANISH, Language.ITALIAN, Language.DUTCH, Language.PORTUGUESE, Language.POLISH, Language.SWEDISH])[source]

Bases: object

A language detector.

Parameters:

languages (list of lingua.Language objects.) – A list containing all languages (cf. lingua.Language) to consider. Default = DEFAULT_LANGUAGES

__init__(languages=[Language.ENGLISH, Language.GERMAN, Language.FRENCH, Language.SPANISH, Language.ITALIAN, Language.DUTCH, Language.PORTUGUESE, Language.POLISH, Language.SWEDISH])[source]
property detector

This is the actual detector object to use for detection.

Example:

detector = language.LanguageDetector().detector
detector.detect_language_of("Hallo Welt")
property languages
update()[source]

Update the instance.

bp_text.language.convert_langcode(langcode, inFormat='langid', outFormat='iso_code_639_1')[source]

Convert a language code to a different form. This could for example be used to translate a Bib(La)TeX langid to an ISO-639-3 code (or vice versa).

Possible in- and out-formats are:

  • langid: a Bib(La)TeX langid (e.g. ngerman or english, cf.

    BibLaTeX documentation at https://ctan.org/pkg/biblatex)

  • iso_code_639_1: an ISO-639-1 code (e.g. DE or EN)

  • iso_code_639_3: an ISO-639-3 code (e.g. DEU or ENG)

Example:

convert_langcode("ngerman")
# => 'de'
convert_langcode("en", "iso_code_639_1", "langid")
# => 'english'
Parameters:
  • langcode (string) – The language code to translate.

  • inFormat (string) – The input format (i.e. of langcode). Must be one of the possible in/out formats (see above). Default = “langid”

  • outFormat (string) – The output format (i.e. of the return value). Must be one of the possible in/out formats (see above). Default = “iso_code_639_3”

Returns:

The converted langcode in the outFormat.

Return type:

string

bp_text.language.iso_639_1_to_langid(iso_code, fallback='english')[source]

The inverse of langid_to_iso_639_1(). Converts an ISO-639-1 code to a BibLaTeX langid.

Parameters:
  • iso_code (string) – The ISO-639-1 code (e.g. “de”) to convert.

  • fallback (string) – The fallback language to return if no translation found in the translation table. Should be a valid BibLaTeX langid (cf. BibLaTeX documentation). Default = “english”

bp_text.language.langid_to_iso_639_1(langid, fallback='en')[source]

Standardize (i.e. convert to ISO-639-1) a Bib(La)TeX langid (see BibLaTeX documentation for a list of supported langids, https://ctan.org/pkg/biblatex). If no matches are found, it falls back to the given fallback variable and prints a notice. If no fallback is specified, it returns None and prints a notice.

Example:

langid_to_iso_639_1("ngerman")
# => 'de'
Parameters:
  • langid (string) – The langid to standardize/parse.

  • fallback (string) – Any string (preferably an ISO-639-1 code) the function should fall back to in the case no matching translation is found. Default = “en”