We do a lot of projects involving comparing a protein’s expression in the nucleus versus cytoplasm. Many proteins show activation upon translocation from cytoplasm to nucleus. Below are some example steps that we perform to obtain a measurement of the ratio on a cell-basis. There are a wide number of variations to these approaches. The examples below are entirely from our own in-house software.
Below is representative images from five whole slide images of a protein that shows primary expression in the cytoplasm. Deliberately, we are using a difficult example, because of the wide variation in staining patterns from slide to slide. We will train on one slide, and then apply to all other slides with no adjusting of parameters between slides.
Let’s walk through the steps coded in this example, using one of the five as the test set and then applying the result with no tuning of parameters to the other four slides. Implementation in C++ makes the most efficient, fastest running algorithm.
First, we will run color deconvolution (from Ruifrock and Johnson’s seminal paper in 2001 on color deconvolution). Notice that there may be two approaches to separating the target from non-target tissue – either use the morphological differences in the nuclei in hemotoxylin, or the differences in staining in the DAB. We could use either, or a hybrid, but in this example we will use the latter, the differential staining pattern in the DAB stain.
Now we run a filter to smooth the DAB image slightly and then use Otsu’s method applied globally, looking for two classes, positive DAB stained target tissue and negative DAB stained non-target tissue. We use a number of statistics based thresholding methods, but Otsu’s (named after Nobuyuki Otsu) works best in examples like these where we cannot make any assumptions about the images in the experiment, especially the percentage of the image covered by target and non-target tissue.
The next step is find and filter out nuclei based on the DAB mask prepared above. This is shown below — there are a wide number of ways to find nuclei, this particular implementation does not involve any hardcoded thresholds, but rather is looking for objects of this approximate size and shape.
Now we want to identify cytoplasm and then determine the level of protein expression based on DAB in these cells. In this case the cell membranes are not differentially stained (or left unstained), so we approximating the cytoplasm two ways, first with a hybrid propagation / watershed approach and second with a defined distance approach (e.g. donut appearance).
Notice that this method performs remarkably well, although there is some concern in the southeast corner, where there was not enough nuclei to effectively assign cytoplasm. To avoid this, let’s look at a second approach for defining cytoplasm, that defines cytoplasm as a circular distance from nuclei. This should generate similar results to the propagation/watershed hybrid, and in our in-house image analysis services we often will run both approaches to make sure that there is minimal differences between them.
Finally, we apply the propagation / watershed hybrid to the other 5 whole slide images. The results from the same representative regions are shown below. We can then compute statistics with differential ratios between nuclei and cytoplasm across the target tissue on each slide. The two approaches should yield similar results.







