PyGemPick is the cummulation of Joseph
Marsilla’s research project under Dr.
Avi Chakrabartty. This module contains functions that enable filtering,
detection, and modeling of immunogold particles on TEM micrographs.
The main project goal was to greate an open source batch gold particle
picking module built in python that could detect gold particles
regardless of the amount of counterstaining present in the IGEM
(Immunogold Electron Microscopy) micrograph.

pyGemPick has three main dependencies that are needed before usage
1. [OpenCV (cv2)](
2. [Pandas (pd)](
3. [Numpy (np)](
I would suggest installing a new anaconda environment using
terminal into which you can import all the required modules for your
project. Having trouble installing OpenCv, use the solution outlined
here: (install using
Pandas and Numpy can also be installed through any terminal using *pip
install pandas, numpy*
The project will be updated in the upcoming weeks with tutorials on how
to use the functions given within
pygempick. This module was
built to help researchers that are building therasnotic solutions
(therapy based as well as diagnostic innovations) to help pateints with
rare protein misfolding diseases like ATTR amyloidosis , Alzheimer’s
Frontotemporal Dementia
(FTD) and
Amyotrophic Lateral Sclerosis (also known as ALS or Lou Gehrig’s
disease) using novel
Immunogold diagnostic techniques.
*NEW:* This update contains supplementary 11 supplementary documents
that will help you use the module. We cover image compression, image
picking with singular and duplicate filtering, statistical analysis,
separation & efficiency tests to test the algorithm’s useability.
Sample Image Data will be provided and shall be located in the DATA

pip install pygempick

> import pygempick.core as py
> import pygempick.modeling as mod
> import pygempick.spatialstats as spa

Note numpy, pandas and opencv modules dependencies are needed prior installation.
For more information visit the github!



a function that takes an original large scale electron micrograph
image and compresses it such that 1px = aproximately one
nanometer. the exact pixle dimentions for a 3.1x compression are
given below.


background equalization provided by solution presented

*py.hclap_filt(p,image, noise)*

New High Contrast Laplace Filter.
Applies a HCLAP
Takes odd scaling parameter p > 5 with a regular compressed image
if noise == ‘yes’ will add median blur after filter applied.

*py.hlog_filt(p, image, noise)*

New High-Contrast Laplace of Gaussian Filter.
Applies a HCLOG
to each image to produce a single binary image as an output.
Takes odd and even scaling # parameters 18+
input image is regular py.compress image output,
if noise == ‘yes’ will add median blur after filter applied.

*py.dog_filt(p, image)*

Difference of Gaussian
Input is an odd number p to determine size of DOG kernel,
input is an py.compress output image,
if noise == ‘yes’ will add median blur after filter applied.

*py.bin_filt(p, image)*

Smart Binary Filtering. Uses the average gray pixel intensity
values to determing the starting threshold
Takes odd scaling parameter p, input image is a py.compress output

*Note:* TEM migrograph filtering using simple binary thresholding
was first completed in 2003 with one of the first gold particle
picking algorithms

*New: key_filt(keypoints1, keypoints2)*
Allows you to scandetected keypoints and eliminate duplicates! Allows
you to detect partciles with more than one filter. Returns updated
keypoints 1 with the removed keypoints and number of duplicate(s)
*py.pick(image, minAREA, minCIRC, minCONV, minINER, minTHRESH)*

Input image is a binary image from one of the above filters, next
have to set the parameters to optimize OpenCv’s Simple Blob
Detects immunogold particles on filtered binary image by
optimizing picking across 4 main paramaters using OpenCv’s simple
blob detector.
Have to optimize picking for each set separately on a per class or
per trial basis.

Gold Particle Picking Parameters
1. minArea = lowest area in pixels of a detected gold particle (20 px**2)
2. minCirc = lowest circularity value of a detected gold particle [.78 is square]
3. minConv = lowest convextivity parameter which is Convexity is defined as the (Area of the gold particle / Area of it’s convex hull)
4. minINER = minimum inertial ratio (filters gold particles based on eliptical properties, 1 is a complete circle)

*py.snapshots(folder, keypoints, gray_img, i)*

folder = folder location where snapshots will be saved, keypoints
= the detected keypoints from py.pick function , gray_img =
compressed grayscale image, i = image number.
Takes an compressed grayscale image and uses the detected
keypoints as a marker to take a snapshot of within a 100px radius
of that gold particle’s position. Researchers use this to analyze
the morphological properties of protein aggregates

*mod.draw(n, test_number, noise, images)*

function to draws test micrograph sets that will be used in subsequent
efficiency or separation tests.
1. Test number 1 is draw only circles, 2 is draw both circles and ellipses.
2. Noise if == 'yes' then, randomly distibuted gaussian noise will be drawn
according to mu1, sig1.
3. images are the number of images in the set - used with n which is number of
particles detected in the actual set to calulate the particle density of model


Uses a compressed grayscale image from
and returns the intensity histogram and related bins position w/


Let p be a range of integers ranging from [1, x], let image be a
grayscale image produced after original image compression and
conversion to grayscale using OpenCv’s function
cv2.cvtColor(orig_img, cv2.COLOR_RGB2GRAY).
Completes separation test for single filter comparrison.

*New mod.septest2(p, image, hlogkey)*

let p be a range of integers ranging from [1, x] , let image be a
grayscale image produced after original image compression and
conversion to grayscale using OpenCv’s function
cv2.cvtColor(orig_img, cv2.COLOR_RGB2GRAY).
hlogkey = the keypoints of detected image fitered with HLOG filter
- this ensures faster particle detection since we aren’t running
the same filtering step more than once!
Completes separation test for *dual high-contrase filter


Data is the input from a csv created by sta.bin2csv
file is in format of pcf-dr#-error.csv’.
Function initially created to plot graphs for image set with
varrying concentrations of AB aggregates in solution

Output: built to produce one graph, with fitted curve for
positive control(s).
Equation fitted to probability distribution for Complete Spatial
Randomness of the distribution of IGEM particles across EM


a = width of image in pixels
b = height of the image in pixels
r is the diatance of the donut from which correlation was

Function taken from work by Philemonenko et al
that was used as a window covariogram to correct Ripley’s K
for boundary conditions.

*spa.pcf(r, N, p0, p1)*

r is the radius of the donut taken with bin width dr.
N is the degree PCF (Pair Correlation Function) is the probability
distribution of a CSR related process that we will used to fit our
normalized version of
This is a python based solution to Philmoneko’s Statistical
Evaluation of Colocalization Patterns in Immunogold Labeling
The PCF distribution for calculating the colocolization of
immunogold particles on transmission electorn microgrpahs is
represented here.

*spa.record_kp(i, keypoints, data)*

i is the image number counter
keypoints is the list of keypoints of Gold particles detected by
data is an empty pandas dataframe.

This function recods the x,y positions of the keypoints detected in
each image. Run in for loop to add results for each image to
dataframe which can be then exported into a csv for easy access.
(completed in spa.bin2csv )

function takes a list of filelocations from glob.glob (asks for the
filtering parameters) then it outputs a csv of the x and y
coordinates of keypoints for every image in images. (For example, row
1 contains the x coordinate of the keypoints in image 1 and row 2
contains the y coordinates in image 1 ect…)

images is a set of images from folder using glob.glob() function,
Output records the keypoint positions found in each image and
outputs a pandas df with detected keypoint centers in (x,y) pixel

*spa.csv2pcf(data, dr)*

takes the filename data from a csv produced by bin2csv() and
outputs non-normalized scale invarient k (cross-corelation) and
pcf (pair-correlation) statisticaldata from the spatial
distribution of the paticles on each micrograph. (determines
wheter the nul-hypothesis of CSR Complete Spatial
is upheld or voided…). Analyzed by bin2csv. Example output
provided in docs.
dr is the donut width as defined by philmonenko et al, 2000

*spa.keypoints2pcf(data_set, dr)*

Input folder with CSV files of keypoints for different tests Need
to know Image number and average particles detected in each set
(example: data_set =
dr is the donut width as defined by philmonenko et al, 2000
on immunogold particle colocolization and spatial statistcs.
output: pcf-dr{}-error.csv - columns dr (sampling radius), pcf
(pair correlation
dpcf (propogated uncertainty in pcf)


