GitLocker: The Coding Marketplace

Description:

catalystcoop.cheshire 0.4.1

Cheshire: a Python Template Repository for Catalyst

This template repository helps make new Python projects easier to set up and more
uniform. It contains a lot of infrastructure surrounding a minimal Python package named
cheshire (the cat who isn’t entirely there…).

Create a new repository from this template

Choose a name for the new package that you are creating.
The name of the repository should be the same as the name of the new Python package
you are going to create. E.g. a repository at catalyst-cooperative/cheshire should
be used to define a package named cheshire.
Fork this template repository to create a new Python project repo.
See these instructions.
Clone the new repository to your development machine.
Run pre-commit install in the newly clone repository to install the
pre-commit hooks defined in .pre-commit-config.yaml
Create the cheshire conda environment by running conda env create or
(preferably) mamba env create in the top level of the repository.
Activate the new conda environment with conda activate cheshire.
Run tox from the top level of the repository to verify that everything is working
correctly.

Rename the package and distribution
Once you know that your forked version of the cheshire package is working as
expected, you should update the package and distribution names in your new repo to
reflect the name of your new package. The package name is determined by the name of
the directory under src/ which contains the source code, and is the name you’ll use
to import the package for use in a program, script, or notebook. E.g.:
import cheshire
The distribution name is the name that is used to install the software using a
program like pip, conda, or mamba. It is often identical to the package
name, but can also contain a prefix namespace that indicates the individual or
organization responsible for maintaining the pacakge. See PEP 423
PEP 423 for more on Python package naming
conventions. We are using the catalystcoop namespace for the packages that we
publish, so our pudl package becomes catalystcoop.pudl in the
Python Package Index (PyPI) or on conda-forge. Similarly the cheshire package
becomes the catalystcoop.cheshire distribution. The distribution name is determined
by the project.name defined in pyproject.toml
pip install catalystcoop.cheshire
The package and distribution names are referenced in many of the files in the template
repository, and they all need to be replaced with the name of your new package. You can
use grep -r to search recursively through all of the files for the word cheshire
at the command line, or use the search-and-replace functionality of your IDE / text
editor. The name of the package directory under src/ will also need to be changed.

Supply any required tokens, e.g. for CodeCov
Rename the src/cheshire directory to reflect the new package name.
Search for cheshire and replace it as appropriate everywhere. Sometimes
this will be with a distribution name like catalystcoop.cheshire
(the package as it appears for pip or PyPI) and sometimes this will be the
importable package name (the name of the directory under src e.g. cheshire)
Create the new project / package at Read The Docs.

What this template provides

Python Package Skeleton

The src directory contains the code that will be packaged and deployed on the user
system. That code is in a directory with the same name as the package.
Using a separate src directory helps avoid accidentally importing the package when
you’re working in the top level directory of the repository.
A simple python module (dummy.py), and a separate module providing a command line
interface to that module (cli.py) are included as examples.
Any files in the src/package_data/ directory will also be packaged and deployed.
What files are included in or excluded from the package on the user’s system is
controlled by the MANIFEST.in file and some options in the call to setup() in
setup.py.
The CLI is deployed using project.scripts defined in pyproject.toml.
We use setuptools_scm to obtain the package’s version directly from git tags,
rather than storing it in the repository and manually updating it.
README.rst is read in and used for the pacakge’s long_description. This is
what is displayed on the PyPI page for the package.
By default we create at least three sets of “extras” – additional optional package
dependencies that can be installed in special circumstances: dev, docs`, and
tests. The packages listed there are used in development, building the docs, and
running the tests (respectively) but aren’t required for a normal user who is just
installing the package from pip or conda. These are defined under the
project.optional-dependencies section of pyproject.toml
Python has recently evolved a more diverse community of build and packaging tools.
Which flavor is being used by a given package is indicated by the contents of
pyproject.toml. That file also contains configuration for a few other tools,
including black and ruff.

Pytest Testing Framework

A skeleton pytest testing setup is included in the
tests/ directory.
Tests are split into unit and integration categories.
Session-wide test fixtures, additional command line options, and other pytest
configuration can be added to tests/conftest.py
Exactly what pytest commands are run during continuous integration controlled by Tox.
Pytest can also be run manually without using Tox, but will use whatever your
personal python environment happens to be, rather than the one specified by the
package. Running pytest on its own is a good way to debug new or failing tests
quickly, but we should always use Tox and its virtual environment for actual testing.

Test Coordination with Tox

We define several different test environments for use with Tox in tox.ini
Tox is used to run pytest in an isolated Python
virtual environment.
We also use Tox to coordinate running the code linters, building the documentation,
and releasing the software to PyPI.
The default Tox environment is named ci and it will run the linters, build the
documentation, run all the tests, and generate test coverage statistics.

Git Pre-commit Hooks

A variety of sanity checks are defined as git pre-commit hooks – they run any time
you try to make a commit, to catch common issues before they are saved. Many of these
hooks are taken from the excellent pre-commit project.
The hooks are configured in .pre-commit-config.yaml
For them to run automatically when you try to make a commit, you must install the
pre-commit hooks in your cloned repository first. This only has to be done once.
These checks are run as part of our CI, and the CI will fail if the pre-commit hooks
fail.
We also use the pre-commit.ci service to run the same
checks on any code that is pushed to GitHub, and to apply standard code formatting
to the PR in case it hasn’t been run locally prior to being committed.

Code Formatting & Linting
To avoid the tedium of meticulously formatting all the code ourselves, and to ensure as
standard style of formatting and sytactical idioms across the codebase, we use the
black and ruff code formatters, which run as pre-commit hooks. These can be
integrated directly into your text editor or IDE with the appropriate plugins. The
formatters are included in .pre-commit-config.yaml. The ruff linter / formatter
has a huge array of configuration options and different kinds of checks it can run,
which are defined under the tool.ruff section of pyproject.toml.
We also have a custom hook that clears Jupyter notebook outputs prior to committing.

Code & Documentation Linters
To catch errors before commits are made, and to ensure uniform formatting across the
codebase, we also use linters outside of ruff. They don’t change the code or
documentation files, but they will raise an error or warning when something doesn’t
look right so you can fix it.

doc8
mypy Does static type checking,
and ensures that our code uses type annotations.
pre-commit has a collection of built-in checks that use
pygrep to search Python files for
common problems like blanket # noqa annotations, as well as language agnostic
problems like accidentally
checking large binary files into the repository or having unresolved merge conflicts.
hadolint checks Dockerfiles for errors
and violations of best practices. It runs as a pre-commit hook.

Test Coverage

We use Tox and a the pytest coverage
plugin to measure and record what percentage of our codebase is being tested, and to
identify which modules, functions, and individual lines of code are not being
exercised by the tests.
When you run tox or tox -e ci (which is equivalent) a summary of the test
coverage will be printed at the end of the tests (assuming they succeed). The full
details of the test coverage is written to coverage.xml.
There are some configuration options for this process set in the .coveragerc file
in the top level directory of the repository.
When the tests are run via the tox-pytest workflow in GitHub Actions, the test
coverage data from the coverage.xml output is uploaded to a service called
CodeCov that saves historical data about our test
coverage, and provides a nice visual representation of the data – identifying which
subpackages, modules, and individual lines of are being tested. For example, here are
the results
for the cheshire repo.
The connection to CodeCov is configured in the .codecov.yml YAML file.
In theory, we should be able to automatically turn CodeCov on for all of our GitHub
repos, and it just Just Work, but in practice we’ve had to turn it on in the GitHub
configuration one-by-one. Open source repositories are also supposed to be able to
upload to the CodeCov site without requiring authentication, but this also hasn’t
worked, so thus far we’ve needed to request a new token for each repository. This
token is stored in .codecov.yml.
Once it’s enabled, CodeCov also adds a couple of test coverage checks to any pull
request, to alert us if a PR reduces overall test coverage (which we would like to
avoid).

Documentation Builds

We build our documentation using Sphinx.
Standalone docs files are stored under the docs/ directory, and the Sphinx
configuration is there in conf.py as well.
We use Sphinx AutoAPI to
convert the docstrings embedded in the python modules under src/ into additional
documentation automatically.
The top level documentation index simply includes this README.rst, the
LICENSE.txt and CODE_OF_CONDUCT.md files are similarly referenced. The only
standalone documentation file under docs/ right now is the release_notes.rst.
Unless you’re debugging something specific, the docs should always be built using
tox -e docs as that will lint the source files using doc8 and rstcheck,
and wipe previously generated documentation to build everything from scratch. The docs
are also rebuilt as part of the normal Tox run (equivalent to tox -e ci).
If you add something to the documentation generation process that needs to be cleaned
up after, it should be integrated with the Sphinx hooks. There are some examples of
how to do this at the bottom of docs/conf.py in the “custom build operations”
section. For example, this is how we automatically regenerate the data dictionaries
based on the PUDL metadata whenever the docs are built, ensuring that the docs stay up
to date.

Documentation Publishing

We use the popular Read the Docs service to host our
documentation.
When you open a PR, push to dev or main, or tag a release, the associated
documentation is automatically built on Read the Docs.
There’s some minimal configuration stored in the .readthedocs.yml file, but
setting up this integration for a new repository requires some setup on the Read the
Docs site.
Create an account on Read the Docs using your GitHub identity, go to “My Projects”
under the dropdown menu in the upper righthand corner, and click on “Import a
Project.” It should list the repositories that you have access to on GitHub. You may
need to click on the Catalyst Cooperative logo in the right hand sidebar.
It will ask you for a project name – this will become part of the domain name for the
documentation page on RTD and should be the same as the distribution name, but with
dots and underscores replaced with dashes. E.g. catalystcoop-cheshire or
catalystcoop-pudl-catalog.
Under Advanced Settings, make sure you
enable builds on PRs.
This will add a check ensuring that the documentation has built successfully on RTD
for any PR in the repo.
Under the Builds section for the new project (repo) you’ll need to tell it which
branches you want it to build, beyond the default main branch.
Once the repository is connected to Read the Docs, an initial build of the
documentation from the main branch should start.

Dependabot
We use GitHub’s Dependabot
to automatically update the allowable versions of packages we depend on. This applies
to both the Python dependencies specified in setup.py and to the versions of the
GitHub Actions that we employ. The dependabot
behavior is configured in .github/dependabot.yml

GitHub Actions
Under .github/workflows are YAML files that configure the GitHub Actions associated with the repository. We use GitHub
Actions to:

Run continuous integration using tox on several different
versions of Python.
Build a Docker container directly and push it to Docker Hub using the
docker-build-push action.
Release a new version of the package on PyPI when a version tag is pushed.
Automatically merge bot PRs from pre-commit.ci and the dependabot.

About Catalyst Cooperative
Catalyst Cooperative is a small group of data
wranglers and policy wonks organized as a worker-owned cooperative consultancy.
Our goal is a more just, livable, and sustainable world. We integrate public
data and perform custom analyses to inform public policy (Hire us!). Our focus is primarily on mitigating
climate change and improving electric utility regulation in the United States.

Contact Us

For general support, questions, or other conversations around the project
that might be of interest to others, check out the
GitHub Discussions
If you’d like to get occasional updates about our projects
sign up for our email list.
Want to schedule a time to chat with us one-on-one? Join us for
Office Hours
Follow us on Twitter: @CatalystCoop
More info on our website: https://catalyst.coop
For private communication about the project or to hire us to provide customized data
extraction and analysis, you can email the maintainers:
pudl@catalyst.coop