grobid2json 0.0.1
Grobid2Json
Extract the code to parse grobid xml into json from the s2orc-doc2json project and package it as a pypi package.
✨ Features
Process the XML files parsed by Grobid into JSON format.
📦 Installation
pip install grobid2json
🤯 Usage
from bs4 import BeautifulSoup
from grobid2json import convert_xml_to_json
file_path = "test.xml"
with open(file_path, "rb") as f:
xml_data = f.read()
soup = BeautifulSoup(xml_data, "xml")
paper_id = file_path.split("/")[-1].split(".")[0]
paper = convert_xml_to_json(soup, paper_id, "")
json_data = paper.as_json()
print(json_data)
🔗 Links
Credits
s2orc-doc2json - https://github.com/allenai/s2orc-doc2json
📝 License
This project is Apache License 2.0 licensed.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.