rdata 0.11.2

Creator: bradpython12

Last updated:

Add to Cart

Description:

rdata 0.11.2

rdata

Read R datasets from Python.
The package rdata offers a lightweight way to import R datasets/objects stored
in the “.rda” and “.rds” formats into Python.
Its main advantages are:

It is a pure Python implementation, with no dependencies on the R language or
related libraries.
Thus, it can be used anywhere where Python is supported, including the web
using Pyodide.
It attempt to support all R objects that can be meaningfully translated.
As opposed to other solutions, you are no limited to import dataframes or
data with a particular structure.
It allows users to easily customize the conversion of R classes to Python
ones.
Does your data use custom R classes?
Worry no longer, as it is possible to define custom conversions to the Python
classes of your choosing.
It has a permissive license (MIT). As opposed to other packages that depend
on R libraries and thus need to adhere to the GPL license, you can use rdata
as a dependency on MIT, BSD or even closed source projects.



Installation
rdata is on PyPi and can be installed using pip:
pip install rdata
It is also available for conda using the conda-forge channel:
conda install -c conda-forge rdata

Installing the develop version
The current version from the develop branch can be installed as
pip install git+https://github.com/vnmabus/rdata.git@develop



Documentation
The documentation of rdata is in
ReadTheDocs.


Examples
Examples of use are available in
ReadTheDocs.


Simple usage

Read a R dataset
The common way of reading an R dataset is the following one:
import rdata

converted = rdata.read_rda(rdata.TESTDATA_PATH / "test_vector.rda")
converted
which results in
{'test_vector': array([1., 2., 3.])}
Under the hood, this is equivalent to the following code:
import rdata

parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_vector.rda")
converted = rdata.conversion.convert(parsed)
converted
This consists on two steps:

First, the file is parsed using the function
rdata.parser.parse_file.
This provides a literal description of the
file contents as a hierarchy of Python objects representing the basic R
objects. This step is unambiguous and always the same.
Then, each object must be converted to an appropriate Python object. In this
step there are several choices on which Python type is the most appropriate
as the conversion for a given R object. Thus, we provide a default
rdata.conversion.convert
routine, which tries to select Python objects that preserve most information
of the original R object. For custom R classes, it is also possible to
specify conversion routines to Python objects.



Convert custom R classes
The basic
convert
routine only constructs a
SimpleConverter
object and calls its
convert
method. All arguments of
convert
are directly passed to the
SimpleConverter
initialization method.
It is possible, although not trivial, to make a custom
Converter
object to change the way in which the
basic R objects are transformed to Python objects. However, a more common
situation is that one does not want to change how basic R objects are
converted, but instead wants to provide conversions for specific R classes.
This can be done by passing a dictionary to the
SimpleConverter
initialization method, containing
as keys the names of R classes and as values, callables that convert a
R object of that class to a Python object. By default, the dictionary used
is
DEFAULT_CLASS_MAP,
which can convert commonly used R classes such as
data.frame
and factor.
As an example, here is how we would implement a conversion routine for the
factor class to
bytes
objects, instead of the default conversion to
Pandas
Categorical objects:
import rdata

def factor_constructor(obj, attrs):
values = [bytes(attrs['levels'][i - 1], 'utf8')
if i >= 0 else None for i in obj]

return values

new_dict = {
**rdata.conversion.DEFAULT_CLASS_MAP,
"factor": factor_constructor
}

converted = rdata.read_rda(
rdata.TESTDATA_PATH / "test_dataframe.rda",
constructor_dict=new_dict,
)
converted
which has the following result:
{'test_dataframe': class value
1 b'a' 1
2 b'b' 2
3 b'b' 3}



Additional examples
Additional examples illustrating the functionalities of this package can be
found in the
ReadTheDocs documentation.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.