GitLocker: The Coding Marketplace

Description:

grpphati 0.4.1

GrPPHATI
is a Grounded Pipeline LoPHAT Implementation.
Overview
•
arXiv
•
Google Colab

Overview
The grounded pipeline, introduced in [1], is a method for building stable, topological descriptors of weighted digraphs.
GrPPHATI is a Python library for implementing descriptors derived from this pipeline.
In particular, GrPPHATI provides default, optimised implementations of grounded persistent path homology (GrPPH) and grounded persistent directed flag complex homology (GrPdFlH).
GrPPHATI builds the boundary matrix in Python, converts it into a sparse format and then employs LoPHAT to perform the persistence calculation in Rust.

Note
Due to its focus on GrPPH, this library only computes homology in degree 1.

Installation
To get started straight away, save a copy of this Google Colab notebook to your drive.
To install, simply run
pip install grpphati

Basic Usage
The quickest way to get started is to use the default implementation of GrPPH or GrPdFlH.
All descriptors implemented in grpphati expect a networkx.DiGraph as input with the edge-weights stored in the weight attribute.
# examples/readme_1.py
from grpphati.pipelines.grounded import GrPPH, GrPdFlH
import networkx as nx

G = nx.DiGraph()
G.add_edges_from([(0, 1), (0, 2), (1, 3), (2, 3)], weight=3)
grpph_bar = GrPPH(G).barcode
grpdflh_bar = GrPdFlH(G).barcode

print(grpph_bar)
print(grpdflh_bar)

$ python example/readme_1.py
[[0, 3]]
[[0, 6]]

Optimisations
By default, all pipelines parallelise the computation over weakly connected components.
In addition, GrPPH iteratively removes appendage edges before starting the computation.
The pipeline GrPdFlH does not use this optimisation as it may result in a different barcode (e.g. [Example A.13, 1]).
Thanks to [Theorem 4.21, 1], it is possible to parallelise the computation of GrPPH over wedge components.
The pipeline GrPPH_par_wedge implements this optimisation.
If you expect that your input has a wedge decomposition, it is highly recommended to use this pipeline since it (a) parallelises the computation and (b) significantly reduces the number of allowed 2-paths.
For example, the following script computes the GrPPH of the wedge of two complete digraphs, each on 50 nodes.
The output is from a laptop with a Ryzen 5 3500U @2.1GHz.
# examples/readme_2.py
import networkx as nx
import time
from grpphati.pipelines.grounded import (
GrPPH,
GrPPH_par_wedge,
)

def timed(f):
tic = time.time()
output = f()
toc = time.time()
elapsed = toc - tic
return (output, elapsed)

N = 50
G_1 = nx.relabel_nodes(
nx.complete_graph(N, create_using=nx.DiGraph), lambda x: (x, 1) if x > 0 else x
)
G_2 = nx.relabel_nodes(
nx.complete_graph(N, create_using=nx.DiGraph), lambda x: (x, 2) if x > 0 else x
)
G_wedge = nx.compose(G_1, G_2)
print(f"{G_wedge.number_of_nodes()} nodes in wedge graph")

(out, elap) = timed(lambda: GrPPH(G_wedge))
print("Serial:")
print(f"Size of barcode = {len(out.barcode)}")
print(f"Time elapsed = {elap}s")

print("Parallel over wedges:")
(out, elap) = timed(lambda: GrPPH_par_wedge(G_wedge))
print(f"Size of barcode = {len(out.barcode)}")
print(f"Time elapsed = {elap}s")

$ python examples/readme_2.py
99 nodes in wedge graph
Serial:
Size of barcode = 4802
Time elapsed = 10.496055603027344s
Parallel over wedges:
Size of barcode = 4802
Time elapsed = 2.702505111694336s

Standard pipelines
For ease of comparison, we also provide implementations of the standard pipelines in grpphati.pipelines.standard.
The pipeline PPH combines path homology with the shortest-path filtration (PPH_par_wedge parallelises over wedges).
The pipeline PdFlH combines directed flag complex homology with the shortest-path filtration.
grpphati_rs
If you are computing GrPPH on large networks, consider using grpphati_rs, which computes the basis of the boundary matrix in parallel, in Rust.
Advanced Usage
GrPPHATI is in fact a library for implementing instances of the grounded pipeline.
The main interface for is the function make_grounded_pipeline in the grpphati.pipelines.grounded module.
The required arguments a filtration map and a homology class, which are explained further in the following sections.
Additionally, you can specify the PH backend and provide an optimisation strategy.
The function returns a 'pipeline' which accepts a nx.DiGraph and returns the barcode.
The function make_standard_pipeline in grpphati.pipelines.standard has the same interface and returns the standard pipeline.
Filtrations
A filtration map should except a nx.DiGraph and return a grpphati.filtrations.Filtration.
To specify a custom filtration, simply implement a subclass of Filtration, implementing all abstract methods.
The main methods describing the filtration are node_iter and edge_iter which return an iterator of (node/edge, entrance_time) tuples.
See the implementation of ShortestPathFiltration for an illustrative example.
Note, if filtration represents the filtration t↦FtG then
filtration.ground(G) should return a Filtration representing the filtration t↦G∪FtG.
A default implementation of this method is provided.
If V(FtG)⊆V(G) for all t then ProperGroundedFiltration provides a more efficient iterator over nodes.
Homology
To specify your homology theory, implement a subclass of grpphati.homologies.Homology.
The key methods to implement are the get_<k>_cells(self, filtration).
For a given k, the method should return an iterator of columns in the boundary matrix.
For most digraph homology theories, you should only need to implement get_two_cells.
In order to implement your homology theory, you may need to implement new column types.
These column types should be subclasses of grpphati.column.Column.
A column object should encapsulate its entrance_time, as this will be needed for sorting the columns of the boundary matrix.
However, you should implement __eq__ and __hash__ so that columns representing the same basis are equal, regardless of entrance time.
This allows convert_to_sparse to lookup the index of each column when sparsifying the boundary matrix.
Optimisations
An optimisation should accept a pipeline (as constructed via make_grounded_pipeline) and return a new pipeline, implementing the optimisation.
For illustrative examples, see the contents of grpphati.optimisations.
Backends
By default, GrPPHATI uses LoPHAT to do the core persistence computation.
An alternative backend, relying on Eirene.jl [3] is also provided.
Here is a rough guide to setting this up:

Install Julia 1.6 (needed for Eirene compatibility). In the following assume the binary for Julia 1.6 is at /path/to/julia - it is important that the filename is julia.
Install grpphati with the optional [eirene] feature flag, ideally in a virtual environment.
Install PyCall for Julia from within your virtual environment, e.g. if using hatch

$ hatch run python
>>> import julia
>>> julia.install(julia="/path/to/julia")

Make your own pipelines via the make_grounded_pipeline interface, using the EireneBackend. For example:

# examples/eirene_simple_ex.py
import networkx as nx
from grpphati.pipelines.grounded import make_grounded_pipeline
from grpphati.homologies import RegularPathHomology
from grpphati.filtrations import ShortestPathFiltration
from grpphati.backends import EireneBackend
from grpphati.optimisations import all_optimisations_serial
from pprint import pprint

pipe = make_grounded_pipeline(
ShortestPathFiltration,
RegularPathHomology,
backend=EireneBackend(
runtime_path="/path/to/julia"
),
optimisation_strat=all_optimisations_serial,
)

G = nx.DiGraph()
G.add_edges_from(
[(0, 1), (0, 2), (1, 3), (2, 3), (3, 4), (4, 5), (5, 6), (6, 3, {"weight": 10})]
)
result = pipe(G)
print(result.barcode)
pprint(result.reps)

$ python examples/eirene_simple_ex.py
[[0.0, 1.0], [0.0, 10.0]]
[[Edge (0, 2) :: 0, Edge (0, 1) :: 0, Edge (1, 3) :: 0, Edge (2, 3) :: 0],
[Edge (5, 6) :: 0, Edge (4, 5) :: 0, Edge (3, 4) :: 0, Edge (6, 3) :: 0]]

Warning
Julia code cannot be called in parallel via PyJulia.
As such, the parallel optimisations should not be used without setting n_jobs=1.

When using EireneBackend you may notice it takes a while for the backend to initialise.
This is because Julia has to boot up and compile Eirene.
To speed this up, you can additionally provide EireneBackend with a sysimage.
This image should have Eirene pre-compiled and the necessary bootstrapping for compatibility with PyJulia.
To build such a system image:

Clone this repository and install hatch.
Install juliaup.
Move into the scripts directory and run ./build_so.sh.
The system image is now in the root of the project, named patched_sys.so.

See examples/eirene_ex.py for example usage
TODO

Implement non-regular path homology
Improve performance of sparse matrix construction?
Write unit tests
Choose benchmark datasets
Benchmark different reductions
Switch to and benchmark bit_tree_pivot_column representation
Benchmark optimisations
Expand hypothesis test coverage - test combinations of backends + optimisations
Add type hints
Add docstrings
Separate out entrance times?
Rephrase optimisations with decorators?
Benchmark Eirene vs PHAT
Setup Eirene notebook
Update to latest LoPHAT version

References
[1]
Chaplin, T., Harrington, H.A. and Tillmann, U., 2022.
Grounded persistent path homology: a stable, topological descriptor for weighted digraphs.
arXiv preprint arXiv:2210.11274.
[2]
Bauer, U., Kerber, M., Reininghaus, J. and Wagner, H., 2017.
Phat–persistent homology algorithms toolbox. Journal of symbolic computation, 78, pp.76-90.
github: xoltar/phat
[3]
Henselman, G. and Ghrist, R., 2016.
Matroid filtrations and computational persistent homology.
arXiv preprint arXiv:1606.00199.
github: Eetion/Eirene.jl