libpy-simdjson 0.4.0

Creator: bradpython12

Last updated:

Add to Cart

Description:

libpysimdjson 0.4.0

libpy simdjson


Python bindings for simdjson using libpy.
Requirements

OS: macOS>10.15, linux.
Compiler: gcc>=9, clang >= 10 (C++17 code)
Python: libpy>=0.2.3, numpy.

Install
pip install libpy-simdjson
Note: The installation of libpy (required by libpy_simdjson) will use the python executable to figure out information about your environment. If you are not using a virtual environment or python does not point to the Python installation you want to use (checked with which python and python --version) you must point to your Python executable using the PYTHON environment variable, i.e. PYTHON=python3 make or PYTHON=python3 pip3 install libpy. Additionally, make sure that your CC and CXX environment variables point to the correct compilers.
Usage
from pathlib import Path
import libpy_simdjson as json

doc = json.load(Path("twitter.json"))
# or json.load(b"twitter.json")
# or json.load("twitter.json")
# we also support `loads` for strings.

doc is an Object. Objects act as python dicts with special methods.
isinstance(doc, json.Object)

True

We can grab keys, get the length, grab items, and access specific keys:
len(doc)

2

doc.keys()

[b'statuses', b'search_metadata']

doc[b'search_metadata'].items()

[(b'completed_in', 0.087),
(b'max_id', 505874924095815700),
(b'max_id_str', b'505874924095815681'),
(b'next_results',
b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1'),
(b'query', b'%E4%B8%80'),
(b'refresh_url',
b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1'),
(b'count', 100),
(b'since_id', 0),
(b'since_id_str', b'0')]

If you every want an actual python dictionary, use as_dict:
doc[b'search_metadata'].as_dict()

{b'completed_in': 0.087,
b'max_id': 505874924095815700,
b'max_id_str': b'505874924095815681',
b'next_results': b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1',
b'query': b'%E4%B8%80',
b'refresh_url': b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1',
b'count': 100,
b'since_id': 0,
b'since_id_str': b'0'}

However, we also support JSON Pointer sytnax via at_pointer. This will be much faster if you know what you're looking for:
doc.at_pointer(b"/statuses/50/created_at")

b'Sun Aug 31 00:29:04 +0000 2014'

doc.at_pointer(b"/statuses/50/text").decode()

'RT @Ang_Angel73: 逢坂「くっ…僕の秘められし右目が…!」\n一同「……………。」'

Let's look at statuses
statuses = doc[b'statuses']

statuses is an Array. Arrays act like python lists with special methods.
Note: statuses and doc share a single parser instance. We cannot parse a new document while these objects are alive (though we can create new parsers via libpy_simdjson.Parser.load.
isinstance(statuses, json.Array)

True

Arrays support length, indexing, iteration:
len(statuses)

100

statuses[0][b'text'].decode()

'@aym0566x \n\n名前:前田あゆみ\n第一印象:なんか怖っ!\n今の印象:とりあえずキモい。噛み合わない\n好きなところ:ぶすでキモいとこ😋✨✨\n思い出:んーーー、ありすぎ😊❤️\nLINE交換できる?:あぁ……ごめん✋\nトプ画をみて:照れますがな😘✨\n一言:お前は一生もんのダチ💖'

for status in statuses:
# this is a bad example but you get the picture
if status[b'id'] % 2 == 0:
print(status[b"text"].decode())
break
else:
print("no even ids?")

@aym0566x

名前:前田あゆみ
第一印象:なんか怖っ!
今の印象:とりあえずキモい。噛み合わない
好きなところ:ぶすでキモいとこ😋✨✨
思い出:んーーー、ありすぎ😊❤️
LINE交換できる?:あぁ……ごめん✋
トプ画をみて:照れますがな😘✨
一言:お前は一生もんのダチ💖

If you need to you can convert and Array to a list using as_list:
statuses.as_list()[1][b'metadata']

{b'result_type': b'recent', b'iso_language_code': b'ja'}

However, just like for Objects, we support JSON Pointers via at_pointer, which is much faster:
statuses.at_pointer(b"/33/created_at")

b'Sun Aug 31 00:29:06 +0000 2014'

Benchmarks
Note - unlike most other python JSON parsers, libpy_simdjson will, by design, avoid converting to native python types until as late as possible, providing you with Object and Array objects instead. libpy allows you to work with these proxy objects as if they were actual python objects without incurring the cost of object conversion until actually needed. Because the C++ simdjson library is so effficient, converting to Python objects is by far the slowest part of parsing, so we strive to do this as late and on as few fields as possible.
See the (still WIP) "overhead over python dict access" benchmarks for object conversion overhead.
For the sake of comparison, we also benchmark a full Python object conversion in libpy_simdjson_as_py_obj though this is very much not the intended use case.

------------------------------------------------------------ benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/canada.json': 8 tests -------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path0-libpy_simdjson-loads] 3.5037 (1.0) 10.9591 (1.0) 4.3319 (1.0) 0.6166 (1.0) 4.2607 (1.0) 0.2292 (1.0) 11;15 230.8454 (1.0) 162 1
test_benchmark_load[path0-pysimdjson-parse] 3.6885 (1.05) 11.1368 (1.02) 4.4029 (1.02) 0.7765 (1.26) 4.2611 (1.00) 0.5017 (2.19) 8;5 227.1254 (0.98) 118 1
test_benchmark_load[path0-pysimdjson_as_py_obj-loads] 13.5282 (3.86) 37.4092 (3.41) 24.2264 (5.59) 6.5177 (10.57) 27.0374 (6.35) 11.9384 (52.08) 16;0 41.2773 (0.18) 40 1
test_benchmark_load[path0-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 13.5544 (3.87) 44.5382 (4.06) 22.4503 (5.18) 7.1879 (11.66) 25.0067 (5.87) 12.0174 (52.43) 12;0 44.5427 (0.19) 35 1
test_benchmark_load[path0-orjson-loads] 16.1693 (4.61) 37.2228 (3.40) 25.2505 (5.83) 6.8427 (11.10) 27.4105 (6.43) 12.5579 (54.79) 19;0 39.6032 (0.17) 41 1
test_benchmark_load[path0-ujson-loads] 22.0310 (6.29) 45.6815 (4.17) 32.3445 (7.47) 7.0874 (11.49) 35.0020 (8.22) 12.6422 (55.15) 12;0 30.9171 (0.13) 27 1
test_benchmark_load[path0-python_json-loads] 49.6505 (14.17) 72.4977 (6.62) 62.0533 (14.32) 7.2277 (11.72) 64.0998 (15.04) 12.7639 (55.68) 8;0 16.1152 (0.07) 19 1
test_benchmark_load[path0-rapidjson-loads] 50.3836 (14.38) 76.2291 (6.96) 61.1768 (14.12) 7.7522 (12.57) 63.4637 (14.90) 12.2982 (53.65) 6;0 16.3461 (0.07) 17 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/citm_catalog.json': 8 tests ---------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path3-libpy_simdjson-loads] 869.1500 (1.0) 5,930.8430 (2.49) 1,123.0447 (1.0) 203.5337 (1.52) 1,112.6140 (1.0) 81.2050 (1.02) 29;35 890.4365 (1.0) 722 1
test_benchmark_load[path3-pysimdjson-parse] 875.8530 (1.01) 2,384.6440 (1.0) 1,127.0993 (1.00) 133.4820 (1.0) 1,119.4185 (1.01) 79.8040 (1.0) 72;38 887.2333 (1.00) 786 1
test_benchmark_load[path3-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 6,227.8560 (7.17) 23,751.5410 (9.96) 10,109.4607 (9.00) 5,149.2095 (38.58) 7,506.5770 (6.75) 1,209.3185 (15.15) 27;29 98.9172 (0.11) 120 1
test_benchmark_load[path3-orjson-loads] 6,445.2870 (7.42) 23,814.3440 (9.99) 10,414.0837 (9.27) 4,872.0612 (36.50) 7,897.2230 (7.10) 2,016.7280 (25.27) 26;26 96.0238 (0.11) 114 1
test_benchmark_load[path3-pysimdjson_as_py_obj-loads] 7,940.2930 (9.14) 24,984.5340 (10.48) 11,900.6726 (10.60) 5,518.1670 (41.34) 8,878.9950 (7.98) 3,686.7190 (46.20) 21;21 84.0289 (0.09) 90 1
test_benchmark_load[path3-ujson-loads] 8,323.1930 (9.58) 23,228.4760 (9.74) 12,104.0124 (10.78) 5,363.4790 (40.18) 9,344.6530 (8.40) 802.3722 (10.05) 19;20 82.6172 (0.09) 83 1
test_benchmark_load[path3-python_json-loads] 12,697.9430 (14.61) 31,279.0080 (13.12) 16,691.7376 (14.86) 5,163.8324 (38.69) 14,104.4160 (12.68) 2,390.4543 (29.95) 17;17 59.9099 (0.07) 75 1
test_benchmark_load[path3-rapidjson-loads] 14,025.9210 (16.14) 29,509.2470 (12.37) 17,785.5409 (15.84) 5,061.8217 (37.92) 15,456.7050 (13.89) 1,756.3833 (22.01) 7;7 56.2254 (0.06) 35 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/github_events.json': 8 tests ---------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path2-libpy_simdjson-loads] 25.4010 (1.0) 1,960.0230 (1.28) 36.2258 (1.0) 24.5956 (1.0) 34.6020 (1.0) 2.9000 (1.04) 72;1589 27.6046 (1.0) 12642 1
test_benchmark_load[path2-pysimdjson-parse] 25.8010 (1.02) 1,528.8010 (1.0) 37.9300 (1.05) 26.3332 (1.07) 37.3020 (1.08) 2.8010 (1.0) 189;1856 26.3644 (0.96) 16103 1
test_benchmark_load[path2-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 142.1090 (5.59) 3,436.0130 (2.25) 207.5540 (5.73) 88.5276 (3.60) 204.8130 (5.92) 17.6010 (6.28) 58;289 4.8180 (0.17) 4034 1
test_benchmark_load[path2-orjson-loads] 175.7120 (6.92) 5,736.6740 (3.75) 254.9315 (7.04) 162.0033 (6.59) 239.3160 (6.92) 20.8005 (7.43) 26;226 3.9226 (0.14) 2872 1
test_benchmark_load[path2-pysimdjson_as_py_obj-loads] 224.7150 (8.85) 2,283.7510 (1.49) 321.5699 (8.88) 72.4103 (2.94) 324.0220 (9.36) 26.0020 (9.28) 79;180 3.1097 (0.11) 2632 1
test_benchmark_load[path2-ujson-loads] 301.7190 (11.88) 7,409.5770 (4.85) 375.2140 (10.36) 180.4753 (7.34) 363.8240 (10.51) 29.0010 (10.35) 18;175 2.6651 (0.10) 2269 1
test_benchmark_load[path2-python_json-loads] 330.5200 (13.01) 2,521.5590 (1.65) 459.9910 (12.70) 86.1825 (3.50) 455.8290 (13.17) 38.7277 (13.83) 119;146 2.1740 (0.08) 1909 1
test_benchmark_load[path2-rapidjson-loads] 380.6250 (14.98) 2,082.2340 (1.36) 533.6130 (14.73) 91.0874 (3.70) 529.9340 (15.32) 43.7030 (15.60) 76;95 1.8740 (0.07) 1709 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------




----------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/mesh.json': 8 tests ------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path4-libpy_simdjson-loads] 994.5520 (1.0) 4,240.8210 (1.06) 1,202.4742 (1.0) 194.6273 (1.04) 1,190.2620 (1.0) 93.9050 (1.31) 80;42 831.6187 (1.0) 772 1
test_benchmark_load[path4-pysimdjson-parse] 1,027.3570 (1.03) 4,001.3210 (1.0) 1,257.5495 (1.05) 187.9555 (1.0) 1,253.6190 (1.05) 71.9545 (1.0) 37;50 795.1973 (0.96) 804 1
test_benchmark_load[path4-pysimdjson_as_py_obj-loads] 2,782.4560 (2.80) 16,753.7350 (4.19) 3,907.4828 (3.25) 2,316.1314 (12.32) 3,456.5430 (2.90) 264.9645 (3.68) 9;18 255.9192 (0.31) 252 1
test_benchmark_load[path4-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 2,936.7520 (2.95) 19,099.6830 (4.77) 4,066.0482 (3.38) 2,579.4705 (13.72) 3,459.0780 (2.91) 291.3400 (4.05) 10;21 245.9390 (0.30) 207 1
test_benchmark_load[path4-orjson-loads] 3,848.3120 (3.87) 16,809.8280 (4.20) 4,853.2711 (4.04) 2,387.4255 (12.70) 4,348.8370 (3.65) 282.5400 (3.93) 8;13 206.0466 (0.25) 203 1
test_benchmark_load[path4-ujson-loads] 4,224.0310 (4.25) 27,094.9610 (6.77) 5,393.1933 (4.49) 2,651.7927 (14.11) 4,906.3150 (4.12) 368.6210 (5.12) 7;13 185.4189 (0.22) 196 1
test_benchmark_load[path4-python_json-loads] 7,946.7190 (7.99) 21,914.9540 (5.48) 9,216.2188 (7.66) 2,458.6174 (13.08) 8,729.1600 (7.33) 425.9220 (5.92) 4;8 108.5044 (0.13) 108 1
test_benchmark_load[path4-rapidjson-loads] 8,958.0800 (9.01) 26,303.9020 (6.57) 10,523.4513 (8.75) 2,727.4495 (14.51) 10,081.1370 (8.47) 748.3645 (10.40) 4;4 95.0259 (0.11) 101 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


---------------------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/twitter.json': 8 tests ----------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path1-libpy_simdjson-loads] 329.3220 (1.0) 3,357.5250 (2.67) 440.8892 (1.0) 84.7542 (1.65) 441.1300 (1.0) 37.4280 (1.22) 64;75 2,268.1435 (1.0) 1637 1
test_benchmark_load[path1-pysimdjson-parse] 348.7250 (1.06) 1,258.3910 (1.0) 451.6477 (1.02) 51.2426 (1.0) 450.8825 (1.02) 30.8030 (1.0) 374;155 2,214.1149 (0.98) 2006 1
test_benchmark_load[path1-libpy_simdjson_as_py_obj-libpy_simdjson_as_py_obj] 2,206.2470 (6.70) 14,217.4510 (11.30) 2,759.9331 (6.26) 1,386.3121 (27.05) 2,551.0705 (5.78) 312.4220 (10.14) 2;3 362.3276 (0.16) 74 1
test_benchmark_load[path1-orjson-loads] 2,639.8850 (8.02) 15,075.4730 (11.98) 3,420.6474 (7.76) 1,657.3556 (32.34) 3,183.0750 (7.22) 215.1150 (6.98) 6;15 292.3423 (0.13) 270 1
test_benchmark_load[path1-ujson-loads] 3,304.1320 (10.03) 18,557.3880 (14.75) 4,286.1597 (9.72) 1,868.0725 (36.46) 4,021.4820 (9.12) 283.4200 (9.20) 5;14 233.3091 (0.10) 239 1
test_benchmark_load[path1-pysimdjson_as_py_obj-loads] 3,319.1400 (10.08) 16,355.6780 (13.00) 4,237.6133 (9.61) 1,642.9272 (32.06) 3,982.8870 (9.03) 306.1198 (9.94) 6;16 235.9819 (0.10) 259 1
test_benchmark_load[path1-python_json-loads] 4,154.6810 (12.62) 17,041.0680 (13.54) 5,474.9570 (12.42) 1,520.8511 (29.68) 5,262.9090 (11.93) 484.4330 (15.73) 6;8 182.6498 (0.08) 190 1
test_benchmark_load[path1-rapidjson-loads] 5,184.5590 (15.74) 19,217.2170 (15.27) 6,598.9679 (14.97) 1,949.8404 (38.05) 6,315.0330 (14.32) 682.2923 (22.15) 5;7 151.5388 (0.07) 149 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------- benchmark 'Random attribute access': 2 tests --------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_at[python_json-loads] 800.0000 (1.0) 1,560,881.0000 (6.79) 1,190.4413 (1.0) 6,373.7290 (3.79) 1,100.0000 (1.0) 100.0000 (1.0) 122;3966 840.0246 (1.0) 88496 1
test_benchmark_at[libpy_simdjson-loads] 1,300.0000 (1.63) 229,812.0000 (1.0) 1,848.6849 (1.55) 1,683.4805 (1.0) 1,800.0000 (1.64) 400.0000 (4.00) 143;234 540.9251 (0.64) 57801 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


---------------------------------------------------------------------------------------------------- benchmark 'Random list access': 2 tests ----------------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_list_access[python_json-loads] 700.0000 (1.0) 258,213.0000 (1.0) 1,061.7637 (1.0) 1,165.4286 (1.0) 1,000.0000 (1.0) 200.0000 (1.0) 570;5297 941.8292 (1.0) 149232 1
test_benchmark_list_access[libpy_simdjson-loads] 700.0000 (1.00) 1,340,168.0000 (5.19) 10,658.8974 (10.04) 9,211.5111 (7.90) 10,301.0000 (10.30) 9,300.0000 (46.50) 4637;493 93.8183 (0.10) 81961 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.