Last updated:
0 purchases
probstructs 0.2.8
Probabilistic Structures
Probstructs is easy to use Python wrapper for C++ library probstructs . It supports Exponential Histograms, Count Min Sketch (CM-Sketch), and Exponential Count Min Sketch (ECM-Sketch).
Installation
With pip:
pip install probstructs
From source:
pip install .
Classes
CountMinSketch
Count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions.
C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#countminsketch
from probstructs import CountMinSketch
cm_sketch = CountMinSketch(100, 4)
cm_sketch.inc("aaa", 1)
cm_sketch.inc("bbb", 5)
cm_sketch.inc("aaa", 2)
print(cm_sketch.get("aaa"))
# 3
print(cm_sketch.get("bbb"))
# 5
print(cm_sketch.get("ccc"))
# 0
cm_sketch = CountMinSketch(width=100, depth=4)
cm_sketch.inc(key="bbb", delta=5)
print(cm_sketch.get(key="bbb"))
# 5
ExponentialHistorgram
Exponential histogram (EH) is a probabilistic data structure that serves as a frequency counter for specific elements in the last N elements from stream..
C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#exponentialhistorgram
from probstructs import ExponentialHistorgram
eh = ExponentialHistorgram(1)
eh.inc(1, 1)
print(eh.get(1, 1))
# 1
eh.inc(1, 1)
print(eh.get(1, 1))
# 2
eh.inc(2, 1)
print(eh.get(1, 2))
# 1
eh = ExponentialHistorgram(window=1)
eh.inc(tick=1, delta=1)
print(eh.get(window=1, tick=1))
# 1
eh.inc(tick=1, delta=1)
print(eh.get(window=1, tick=1))
# 2
eh.inc(tick=2, delta=1)
print(eh.get(window=1, tick=2))
# 1
ExponentialCountMinSketch
Exponential count-min sketch (ECM-Sketch) combines CM-Sketch with EH to count number of different elements in the last N elements in the stream.
C++ documentation: https://probstructs.readthedocs.io/en/latest/classes.html#exponentialcountminsketch
from probstructs import ExponentialCountMinSketch
ecm_sketch = ExponentialCountMinSketch(100, 4, 8)
ts = 0
ecm_sketch.inc("aaa", ts, 1)
ecm_sketch.inc("bbb", ts, 4)
ecm_sketch.inc("ccc", ts, 8)
print(ecm_sketch.get("aaa", 4, ts))
# 1
print(ecm_sketch.get("bbb", 4, ts))
# 4
print(ecm_sketch.get("ccc", 4, ts))
# 8
print(ecm_sketch.get("ddd", 4, ts))
# 0
ecm_sketch = ExponentialCountMinSketch(width=100, depth=4, window=8)
ts = 0
ecm_sketch.inc(key="aaa", tick=ts, delta=1)
ecm_sketch.inc(key="bbb", tick=ts, delta=4)
ecm_sketch.inc(key="ccc", tick=ts, delta=8)
print(ecm_sketch.get(key="aaa", window=4, tick=ts))
# 1
print(ecm_sketch.get(key="bbb", window=4, tick=ts))
# 4
print(ecm_sketch.get(key="ccc", window=4, tick=ts))
# 8
print(ecm_sketch.get(key="ddd", window=4, tick=ts))
# 0
Changelog
0.2.0
Introduce named parameters
Update documentation to contain examples
0.1.0
Initial version
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.