GitLocker: The Coding Marketplace

Description:

opendatapipeline 0.2.1

TODO:

Add A LOT more PRINT statements
Add comments
Add documentation (README and docs site)

The latter will be necesarry once we move to dockerfiles and actions

Add tests 😅

including CLI tests

Use arcgis package for geocoding

Use batch geocoding (had problem with Token... can register as anonymous user?)
- [x] Use Socrata package (register API key) for data fetching from datasets published on Socrata
Use requests package for data fetching from datasets published on odata

Use github python package to keep config.yaml updated after successful runs

Can also use to update JS datafiles at end of analysis (see below)
Just used requests and api directly

These should be very small and generated by pandas analysis of the data

results should be in a github release (data files) (can zip them)

Use GH CLI in bash script because pre-installed in Actions
We can then just use the OctoKit JS package to point to the LINKS of the files and when you click on them it will download them
then web page to enable file downloads and show some graphs (basic --> records over time for each dataset)

what charting frameowkr to use?
Need an action to update the frontend codebase with the new data

Store in JSON format
- [ ] add website to socrata key

Make a container to run the whole pipeline (so no downloads for users)

Host on GHCR

MAKE OUR OWN UNIQUE IDENTIFIERS FOR ALL DATASETS COMBINED

SAME COLUMN NAME IN ALL DATASETS, THEN DON'T HAVE TO PROVIDE IDENTIFIER COLUMN IN config.yaml
Also allows for better merging of datasets (i.e. records + drugs + geo)
- [ ] DO we want to publish a Web API as well?
- [ ] Then weould need DB

No Windows support due to drug extraction tool usage

I think, if my math is right, we can do ~20 minutes / day of actions... (2,000 minutes per month limit for free)
*** make a note it is very important to often PULL to stay updated with the CONFIG