0 purchases
reagex 0.1.2
The goal of reagex (from “readable regular expression”)
is to suggest a way for writing complex regular expressions with
many capturing groups in a readable way.
At the moment, it contains just one very simple function
(called reagex) and an utility function, but any function
which could be useful for writing readable patterns is welcome.
Note: Publishing this ridiculously small project is an excuse to familiarize
with python packaging, DevOps tools and the entire workflow behind the publication
of an open-source project.
The project template was generated using https://github.com/ionelmc/cookiecutter-pylibrary/
which is obviously an overkill for a “one-function-project”.
Free software: BSD 2-Clause License
Usage
The core function reagex is just a wrapper of str.format and it works
in the same way. See the example
import re
from reagex import reagex
# A sloppy pattern for an italian address (just to show how it works)
pattern = reagex(
'{_address}, {postcode} {city} {province}',
# groups starting with "_" are non-capturing
_address = reagex(
'{street} {number}',
street = '(via|contrada|c/da|c[.]da|piazza|p[.]za|p[.]zza) [a-zA-Z]+',
number = 'snc|[0-9]+'
),
postcode = '[0-9]{5}',
city = '[A-Za-z]+',
province = '[A-Z]{2}'
)
matcher = re.compile(pattern)
match = matcher.fullmatch('via Roma 123, 12345 Napoli NA')
print(match.groupdict())
# prints:
# {'city': 'Napoli',
# 'number': '123',
# 'postcode': '12345',
# 'province': 'NA',
# 'street': 'via Roma'}
Groups starting by '_' are non-capturing. The rest are all named capturing
groups.
Why not…
Why not using just re.VERBOSE?
I think reagex is easier to write and to read:
with reagex, you first describe the structure of the pattern in terms of groups,
then you provide a pattern for each group;
with re.VERBOSE you have to define the groups in the exact position they
must be matched: to get the high-level structure of the pattern you may need
to read multiple lines at the same indentation level
with re.VERBOSE you just write a big string; with reagex you get
syntax highlighting which helps readability
white-spaces don’t need any special treatment
“{group_name}” is nicer than “(?P<group_name>)”
Installation
pip install reagex
Documentation
https://python-reagex.readthedocs.io/
Development
Possible improvements:
make some meaningful use of the format_spec
in {group_name:format_spec}
add utility functions like repeated to help writing
common patterns in a readable way
Testing
To run all the tests:
tox
Note, to combine the coverage data from all the tox environments run:
Windows
set PYTEST_ADDOPTS=--cov-append
tox
Other
PYTEST_ADDOPTS=--cov-append tox
Changelog
0.1.2 (2018-12-16)
Fix little mistake in the example (which is showed in PyPI, so a release
was necessary to update the PyPI page).
0.1.1 (2018-12-12)
Minor fixes and modifications to documentation
0.1.0 (2018-12-08)
First release on PyPI.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.