icupy 0.19.0
icupy
Python bindings for ICU4C using pybind11.
Changes from ICU4C
Naming Conventions
Renamed functions, methods, and C++ enumerators to conform to PEP 8.
Function Names:
Use lower_case_with_underscores style.
Method Names:
Use lower_case_with_underscores style.
Also, use one leading underscore only for protected methods.
C++ Enumerators: Use UPPER_CASE_WITH_UNDERSCORES style without a leading "k". (e.g., kDateOffset → DATE_OFFSET)
APIs that match Python reserved words: e.g.,
with() → with_()
Error Handling
Unlike the C/C++ APIs, icupy raises the icupy.icu.ICUError exception if an error code indicates a failure instead of receiving an error code UErrorCode.
You can access the icu::ErrorCode object from ICUError.args[0].
For example:
from icupy import icu
try:
...
except icu.ICUError as e:
print(e.args[0]) # → icupy.icu.ErrorCode
print(e.args[0].get()) # → icupy.icu.UErrorCode
Examples
icu::UnicodeString with
error callback
from icupy import icu
cnv = icu.ucnv_open('utf-8')
action = icu.UCNV_TO_U_CALLBACK_ESCAPE
context = icu.ConstVoidPtr(icu.UCNV_ESCAPE_C)
icu.ucnv_set_to_ucall_back(cnv, action, context)
utf8 = b'\x61\xfe\x62' # Impossible bytes
s = icu.UnicodeString(utf8, -1, cnv)
str(s) # → 'a\\xFEb'
action = icu.UCNV_TO_U_CALLBACK_ESCAPE
context = icu.ConstVoidPtr(icu.UCNV_ESCAPE_XML_DEC)
icu.ucnv_set_to_ucall_back(cnv, action, context)
s = icu.UnicodeString(utf8, -1, cnv)
str(s) # → 'aþb'
icu::UnicodeString with
user callback
from icupy import icu
def _to_callback(
_context: object,
_args: icu.UConverterToUnicodeArgs,
_code_units: bytes,
_length: int,
_reason: icu.UConverterCallbackReason,
_error_code: icu.UErrorCode,
) -> icu.UErrorCode:
if _reason == icu.UCNV_ILLEGAL:
_source = ''.join(['%{:02X}'.format(x) for x in _code_units])
icu.ucnv_cb_to_uwrite_uchars(_args, _source, len(_source), 0)
_error_code = icu.U_ZERO_ERROR
return _error_code
cnv = icu.ucnv_open('utf-8')
action = icu.UConverterToUCallbackPtr(_to_callback)
context = icu.ConstVoidPtr(None)
icu.ucnv_set_to_ucall_back(cnv, action, context)
utf8 = b'\x61\xfe\x62' # Impossible bytes
s = icu.UnicodeString(utf8, -1, cnv)
str(s) # → 'a%FEb'
icu::DateFormat
from icupy import icu
tz = icu.TimeZone.create_time_zone('America/Los_Angeles')
fmt = icu.DateFormat.create_instance_for_skeleton('yMMMMd', icu.Locale.get_english())
fmt.set_time_zone(tz)
dest = icu.UnicodeString()
s = fmt.format(0, dest)
str(s) # → 'December 31, 1969'
icu::MessageFormat
from icupy import icu
fmt = icu.MessageFormat(
"At {1,time,::jmm} on {1,date,::dMMMM}, "
"there was {2} on planet {0,number}.",
icu.Locale.get_us(),
)
tz = icu.TimeZone.get_gmt()
subfmts = fmt.get_formats()
subfmts[0].set_time_zone(tz)
subfmts[1].set_time_zone(tz)
date = 1637685775000.0 # 2021-11-23T16:42:55Z
obj = icu.Formattable(
[
icu.Formattable(7),
icu.Formattable(date, icu.Formattable.IS_DATE),
icu.Formattable(icu.UnicodeString('a disturbance in the Force')),
]
)
dest = icu.UnicodeString()
s = fmt.format(obj, dest)
str(s) # → 'At 4:42 PM on November 23, there was a disturbance in the Force on planet 7.'
icu::number::NumberFormatter
from icupy import icu
fmt = icu.number.NumberFormatter.with_().unit(icu.MeasureUnit.get_meter()).per_unit(icu.MeasureUnit.get_second())
print(fmt.locale(icu.Locale.get_us()).format_double(3000).to_string()) # → '3,000 m/s'
print(fmt.locale(icu.Locale.get_france()).format_double(3000).to_string()) # → '3 000 m/s'
print(fmt.locale('ar').format_double(3000).to_string()) # → '٣٬٠٠٠ م/ث'
icu::BreakIterator
from icupy import icu
text = icu.UnicodeString('In the meantime Mr. Weston arrived with his small ship.')
bi = icu.BreakIterator.create_sentence_instance(icu.Locale('en'))
bi.set_text(text)
list(bi) # → [20, 55]
# filter based on common English language abbreviations
bi = icu.BreakIterator.create_sentence_instance(icu.Locale('en@ss=standard'))
bi.set_text(text)
list(bi) # → [55]
icu::IDNA (UTS #46)
from icupy import icu
uts46 = icu.IDNA.create_uts46_instance(icu.UIDNA_NONTRANSITIONAL_TO_ASCII)
dest = icu.UnicodeString()
info = icu.IDNAInfo()
uts46.name_to_ascii(icu.UnicodeString('faß.ExAmPlE'), dest, info)
info.get_errors() # → 0
str(dest) # → 'xn--fa-hia.example'
For more examples, see tests.
Installation
Prerequisites
Python >=3.8
ICU4C (ICU - The International Components for Unicode) (>=70 recommended)
C++17 compatible compiler (see supported compilers)
CMake >=3.7
Installing prerequisites
Windows
Install the following dependencies.
Python >=3.8
Pre-built ICU4C binary package (>=70 recommended)
Visual Studio 2015 Update 3 or newer. Visual Studio 2019 or newer recommended
CMake >=3.7
Note: Add CMake to the system PATH.
Linux
To install dependencies, run the following command:
Ubuntu/Debian:
sudo apt install g++ cmake libicu-dev python3-dev python3-pip
Fedora:
sudo dnf install gcc-c++ cmake icu libicu-devel python3-devel
If your system's ICU is out of date, consider
building ICU4C from source or installing pre-built
ICU4C binary package.
Building icupy from source
Configuring environment variables:
Windows:
Set the ICU_ROOT environment variable to the root of the ICU installation (default is C:\icu).
For example, if the ICU is located in C:\icu4c:
set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
To verify settings using icuinfo (64-bit):
%ICU_ROOT%\bin64\icuinfo
or in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
Linux:
If the ICU is located in a non-regular place, set the PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables.
For example, if the ICU is located in /usr/local:
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
To verify settings using pkg-config:
$ pkg-config --cflags --libs icu-uc
-I/usr/local/include -L/usr/local/lib -licuuc -licudata
Installing from PyPI:
pip install icupy
Optionally, CMake environment variables are available.
For example, using the Ninja build system and Clang:
CMAKE_GENERATOR=Ninja CXX=clang++ pip install icupy
Alternatively, installing development version from the git repository:
pip install git+https://github.com/miute/icupy.git
Usage
Configuring environment variables:
Windows:
Set the ICU_ROOT environment variable to the root of the ICU installation (default is C:\icu).
For example, if the ICU is located in C:\icu4c:
set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
Linux:
If the ICU is located in a non-regular place, set the LD_LIBRARY_PATH environment variables.
For example, if the ICU is located in /usr/local:
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
Using icupy:
import icupy.icu as icu
# or
from icupy import icu
License
This project is licensed under the MIT License.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.