jskiner 0.1.1

Creator: bradpython12

Last updated:

Add to Cart

Description:

jskiner 0.1.1

JSkiner
The is a python Json Schema Inference Engine with Rust's core. Its inferencing speed is about 10 times of its pure-python counterpart (jsonschema-inference).
Installation
pip install jskiner

Usage
Checking the Json Schema of a Large .jsonl file
jskiner \
--in <path_to_jsonl>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--split <number_of_split_batch_size>
--split-path <path_to_store_the_split_files>

Checking the Json Schema for a folder of json files
jskiner \
--in <path_to_jsons>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--batch-size <batch_size_for_inferencing>
--cuckoo-path <path_to_store_the_cuckoo_filter>
--cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
--cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>

Infering the Schema in Python
from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema



Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})


Calculate the Union of a List of Schema
from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema



Optional(Atomic(Int()))


Using | Operation between Two Schema
from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema



Optional(Atomic(Int()))


TODO:

Enable inference from a folder of json files
Enable ignoring of existing json files using cuckoo filter
Enable add starting schema file
Enable batch-by-batch process on large jsonl file
FIX: make sure repr escape special characters.
Auto Formatting Using Black
Enable sampling of json files
Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging)
Fix: adding UnionRecord schema object
Enable direct inferencing from API online. (able to avoid repeat download of json)
Enable Regex to represent patterned FieldSet

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.