bp_text.language
This module contains functionality dealing with languages (e.g. detection).
Created: 2025-03-27 Author: Ruben Philipp <me@rubenphilipp.com>
$$ Last modified: 15:36:10 Mon Apr 28 2025 CEST
Module Attributes
The default languages used by the detection algorithm. |
Functions
|
Convert a language code to a different form. |
|
The inverse of |
|
Standardize (i.e. convert to ISO-639-1) a Bib(La)TeX langid (see BibLaTeX documentation for a list of supported langids, https://ctan.org/pkg/biblatex). |
Classes
|
A language detector. |
- bp_text.language.DEFAULT_LANGUAGES = [Language.ENGLISH, Language.GERMAN, Language.FRENCH, Language.SPANISH, Language.ITALIAN, Language.DUTCH, Language.PORTUGUESE, Language.POLISH, Language.SWEDISH]
The default languages used by the detection algorithm.
- class bp_text.language.LanguageDetector(languages=[Language.ENGLISH, Language.GERMAN, Language.FRENCH, Language.SPANISH, Language.ITALIAN, Language.DUTCH, Language.PORTUGUESE, Language.POLISH, Language.SWEDISH])[source]
Bases:
objectA language detector.
- Parameters:
languages (list of lingua.Language objects.) – A list containing all languages (cf. lingua.Language) to consider. Default =
DEFAULT_LANGUAGES
- __init__(languages=[Language.ENGLISH, Language.GERMAN, Language.FRENCH, Language.SPANISH, Language.ITALIAN, Language.DUTCH, Language.PORTUGUESE, Language.POLISH, Language.SWEDISH])[source]
- property detector
This is the actual detector object to use for detection.
Example:
detector = language.LanguageDetector().detector detector.detect_language_of("Hallo Welt")
- property languages
- bp_text.language.convert_langcode(langcode, inFormat='langid', outFormat='iso_code_639_1')[source]
Convert a language code to a different form. This could for example be used to translate a Bib(La)TeX langid to an ISO-639-3 code (or vice versa).
Possible in- and out-formats are:
- langid: a Bib(La)TeX langid (e.g. ngerman or english, cf.
BibLaTeX documentation at https://ctan.org/pkg/biblatex)
iso_code_639_1: an ISO-639-1 code (e.g. DE or EN)
iso_code_639_3: an ISO-639-3 code (e.g. DEU or ENG)
Example:
convert_langcode("ngerman") # => 'de' convert_langcode("en", "iso_code_639_1", "langid") # => 'english'
- Parameters:
langcode (string) – The language code to translate.
inFormat (string) – The input format (i.e. of langcode). Must be one of the possible in/out formats (see above). Default = “langid”
outFormat (string) – The output format (i.e. of the return value). Must be one of the possible in/out formats (see above). Default = “iso_code_639_3”
- Returns:
The converted langcode in the outFormat.
- Return type:
string
- bp_text.language.iso_639_1_to_langid(iso_code, fallback='english')[source]
The inverse of
langid_to_iso_639_1(). Converts an ISO-639-1 code to a BibLaTeX langid.- Parameters:
iso_code (string) – The ISO-639-1 code (e.g. “de”) to convert.
fallback (string) – The fallback language to return if no translation found in the translation table. Should be a valid BibLaTeX langid (cf. BibLaTeX documentation). Default = “english”
- bp_text.language.langid_to_iso_639_1(langid, fallback='en')[source]
Standardize (i.e. convert to ISO-639-1) a Bib(La)TeX langid (see BibLaTeX documentation for a list of supported langids, https://ctan.org/pkg/biblatex). If no matches are found, it falls back to the given fallback variable and prints a notice. If no fallback is specified, it returns None and prints a notice.
Example:
langid_to_iso_639_1("ngerman") # => 'de'
- Parameters:
langid (string) – The langid to standardize/parse.
fallback (string) – Any string (preferably an ISO-639-1 code) the function should fall back to in the case no matching translation is found. Default = “en”