opendatapipeline 0.2.1
TODO:
Add A LOT more PRINT statements
Add comments
Add documentation (README and docs site)
The latter will be necesarry once we move to dockerfiles and actions
Add tests 😅
including CLI tests
Use arcgis package for geocoding
Use batch geocoding (had problem with Token... can register as anonymous user?)
- [x] Use Socrata package (register API key) for data fetching from datasets published on Socrata
Use requests package for data fetching from datasets published on odata
Use github python package to keep config.yaml updated after successful runs
Can also use to update JS datafiles at end of analysis (see below)
Just used requests and api directly
These should be very small and generated by pandas analysis of the data
results should be in a github release (data files) (can zip them)
Use GH CLI in bash script because pre-installed in Actions
We can then just use the OctoKit JS package to point to the LINKS of the files and when you click on them it will download them
then web page to enable file downloads and show some graphs (basic --> records over time for each dataset)
what charting frameowkr to use?
Need an action to update the frontend codebase with the new data
Store in JSON format
- [ ] add website to socrata key
Make a container to run the whole pipeline (so no downloads for users)
Host on GHCR
MAKE OUR OWN UNIQUE IDENTIFIERS FOR ALL DATASETS COMBINED
SAME COLUMN NAME IN ALL DATASETS, THEN DON'T HAVE TO PROVIDE IDENTIFIER COLUMN IN config.yaml
Also allows for better merging of datasets (i.e. records + drugs + geo)
- [ ] DO we want to publish a Web API as well?
- [ ] Then weould need DB
No Windows support due to drug extraction tool usage
I think, if my math is right, we can do ~20 minutes / day of actions... (2,000 minutes per month limit for free)
*** make a note it is very important to often PULL to stay updated with the CONFIG
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.