intake-sqlite 0.2.0

Creator: rpa-with-ash

Last updated:

Add to Cart

Description:

intakesqlite 0.2.0

SQLite Driver for Intake Data Catalogs







This package provides a (very) thin wrapper around the more general intake-sql driver, which can be used to generate Intake
data catalogs from SQL databases.
The intake-sql driver takes an SQL Alchemy database URL and uses it to
connect to and extract data from the database. This works with just fine with
SQLite databases, but only when the database
file is stored locally and can be referenced with a simple path.
For example this path:
/home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite
would correspond to this SQL Alchemy database URL:
sqlite:///home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite
But you can’t access a remote SQLite DB this way.


Why access a remote SQLite DB?

SQLite databases are great standalone, standardized containers for relational data,
that can be accessed using a huge variety of tools on almost any computer platform.
They are even accepted as an archival format by the US Library of Congress!
Data evolves over time, and it’s often useful to have easy access to several
different versions of it, and to know exactly which version you’re working with.
Cloud object storage is extremely cheap and convenient, and makes it easy to
publish and store historical file-based data releases.
Managing your own bespoke local filesystem hierarchy filled with data – and
coordinating with colleagues so that everyone is using the same filesystem
organizational scheme – is a pain.
Intake catalogs can provide easy access to metadata and let you manage data versions
just like software versions. Installing a new version of the data catalog points you
at the new version of the data.
The overhead and cost associated with setting up and maintaining a database that uses
a client-server model is relatively large compared to distributing a few files that
change infrequently, are essentially read-only resources, and only take up a few
gigabytes of space.



How does it work?
Rather than using an SQL Alchemy database URL to reference the SQLite DB, this intake
driver takes a local path or a remote URL, like:

../pudl-work/sqlite/pudl.sqlite
https://global-power-plants.datasettes.com/global-power-plants.db
s3://cloudy-mc-cloudface-databucket/v1.2.3/mydata.db

For local paths, it resolves the path and prepends sqlite:// before handing it off
to intake-sql to do all the hard work.
For remote URLs it uses fsspec
to cache a local copy
of the database, and then gives intake-sql a database URL that points to the cached
copy.
import intake_sqlite

gpp_cat = intake_sqlite.SQLiteCatalog(
urlpath="https://global-power-plants.datasettes.com/global-power-plants.db",
storage_options={"simplecache": {"cache_storage": "/home/zane/.cache/intake"}},
)

list(gpp_cat)

# ['global-power-plants',
# 'global-power-plants_fts',
# 'global-power-plants_fts_config',
# 'global-power-plants_fts_data',
# 'global-power-plants_fts_docsize',
# 'global-power-plants_fts_idx']


About Catalyst Cooperative
Catalyst Cooperative is a small group of data
wranglers and policy wonks organized as a worker-owned cooperative consultancy.
Our goal is a more just, livable, and sustainable world. We integrate public
data and perform custom analyses to inform public policy (Hire us!). Our focus is primarily on mitigating
climate change and improving electric utility regulation in the United States.

Contact Us

For general support, questions, or other conversations around the project
that might be of interest to others, check out the
GitHub Discussions
If you’d like to get occasional updates about our projects
sign up for our email list.
Want to schedule a time to chat with us one-on-one? Join us for
Office Hours
Follow us on Twitter: @CatalystCoop
More info on our website: https://catalyst.coop
For private communication about the project or to hire us to provide customized data
extraction and analysis, you can email the maintainers:
pudl@catalyst.coop

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.