0 purchases
serdemol2 0.2.4
serde_mol2
Python/Rust module for mol2 format (de)serialization
Installation
Install from PyPi (required python >= 3.8):
pip install serde-mol2
After that:
-> python3
Python 3.9.5 (default, Jun 4 2021, 12:28:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import serde_mol2
>>> m = serde_mol2.read_file('example.mol2')
>>> m
[<builtins.Mol2 object at 0x7f6da9ebcae0>]
Or using a binary:
-> serde-mol2 -h
serde-mol2 0.2.2
CSC - IT Center for Science Ltd. (Jaroslaw Kalinowski <[email protected]>)
USAGE:
serde-mol2 [OPTIONS]
OPTIONS:
-a, --append Append to mol2 files when writing rather than truncate
-c, --compression <COMPRESSION> Level of compression for BLOB data, 0 means no compression
[default: 3]
--comment <COMMENT> Comment to add/filter to/by the molecule comment field
--desc <DESC> Description to add/filter to/by entries when writing to the
database
--filename-desc Add filename to the desc field when adding a batch of files
to the database
-h, --help Print help information
-i, --input <INPUT_FILE>... Input mol2 file
--limit <LIMIT> Limit the number of structures retrieved from the database.
Zero means no limit. [default: 0]
--list-desc List available row descriptions present in the database
--no-shm Do not try using shm device when writing to databases
-o, --output <OUTPUT_FILE> Output mol2 file
--offset <OFFSET> Offset when limiting the number of structures retrieved from
the database. Zero means no offset. [default: 0]
-s, --sqlite <SQLITE_FILE> Sqlite database file
-V, --version Print version information
Usage a.k.a. quick function reference
class Mol2
Mol2.to_json()
Return a JSON string for a Mol2 object.
Mol2.as_string()
Return a mol2 string for a Mol2 object.
Mol2.write_mol2( filename, append=False )
Write Mol2 object to a mol2 file.
Mol2.serialized()
Return a Mol2 object in a python serialized form.
Functions
write_mol2( list, filename, append=False )
list is a list of Mol2 objects. Functions writes all structures in the list into a mol2 file named filename.
db_insert( list, filename, compression=3, shm=True )
Insert vector of structures into a database. Append if the database exists.
Input:
list: vector of structures
filename: path to the database
compression: compression level
shm: should be try and use a database out from a temporary location?
read_db_all( filename, shm=False, desc=None, comment=None, limit=0, offset=0 )
Read all structures from a database and return as a vector
Input:
filename: path to the database
shm: should we try and use the database out of a temporary location?
desc: return only entries containing desc in the desc field
comment: return only entries containing comment in the molecule comment
limit: Limit the number of structures retrieved from the database and zero means no limit
_offset: Offset when limiting the number of structures retrieved from the database and zero means no offset
read_db_all_serialized( filename, shm=True, desc=None, comment=None, limit=0, offset=0 )
Read all structures from a database and return as a vector, but
keep structures in a serialized python form rather than binary.
Input:
filename: path to the database
shm: should we try and use the database out of a temporary location?
desc: return only entries containing desc in the desc field
comment: return only entries containing comment in the molecule comment
limit: Limit the number of structures retrieved from the database and zero means no limit
_offset: Offset when limiting the number of structures retrieved from the database and zero means no offset
read_file_to_db( filename, db-filename, compression=3, shm=True , desc=None, comment=None )
Convenience function. Read structures from a mol2 file and write directly to the database.
Input:
filename: path to the mol2 file
db-filename: path to the database
compression: compression level
shm: should we use the database out of a temporary location?
desc: add this description to structures read
comment: add this comment to the molecule comment field
read_file_to_db_batch( filenames, db-filename, compression=3, shm=True, desc=None, comment=None )
Convenience function. Read structures from a set of files directly into the database.
Input:
filenames: vector of paths to mol2 files
db-filename: path to the database
compression: compression level
shm: should we use the database out of a temporary location?
desc: add this description to structures read
comment: add this comment to the molecule comment field
read_file( filename, desc=None, comment=None )
Read a mol2 file and return a vector of structures
Input:
filename: path to the mol2 file
desc: add this description to structures read
comment: add this comment to the molecule comment field
read_file_serialized( filename, desc=None, comment=None )
Read a mol2 file and return a vector of structures, but
serialized python structures rather than a binary form.
Input:
filename: path to the mol2 file
desc: add this description to structures read
comment: add this comment to the molecule comment field
desc_list( filename, shm=False )
List unique entry descriptions found in a database.
Input:
filename: path to a database
shm: should we use the database out of a temporary location?
Notes
Compression
Compression applies to sections other than MOLECULE. Those sections are stored in the database in a binary form (BLOB) as those sections contain multiple rows. Since it is not human readable it makes sense to apply at least some compression. The algorithm of choice currently is zstd. Default level of compression here is 3. However, by default, for zstd compression 0 means default level of compression, but in this module compression level 0 means no compression.
At the time of writing the overhead that comes from (de)compressing the data is negligible compared to IO/CPU cost of rw and parsing.
SHM
When writing to the database we are writing just one row at a time. On shared filesystems writing like that is very slow. When using shm functionality the module tries to copy the database to /dev/shm and use it there, essentially performing all operations in-memory. However, this means that file in the original location is essentially not usable by other processes as it will be overwritten at the end.
Another problem with doing things in /dev/shm is that if the database is too big, we can run out of space. So make sure your database fits into memory available.
In the future there will be an option to choose a different TMPDIR than /dev/shm, for example one that points to a fast NVMe storage.
By default shm is used only when writing to the database, as reading seems to not be affected so much.
Limitations
The biggest limitation at the moment is that only the following sections are read:
MOLECULE
ATOM
BOND
SUBSTRUCTURE
All other sections are currently just dropped silently.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.