Project 1: Colorizing the Prokudin-Gorskii photo collection

Evan Huang

Approach:

I tried various methods to align the channels, including various metrics and algorithm parameters. Some metrics I tried out included L2 distance, normalized cross-correlation, structural similarity, and phase cross-correlation. Structural similarity and phase cross-correlation were most effective for single-scale images, but were often much too computationally expensive for the larger .tif files. My solution to this was to use structural similarity on the base layer of the image pyramid, so it only applied to the coarsest (and thus smallest) image on the stack. From there, I used other metrics that were less computationally demanding on the larger images. On the larger images, I first used NCC, but the results were often blurry so I used structural similarity again but restricted the images to only a 500x500 window in the center of the image to avoid arduous computation (I refer to this as SSIM_small). All images are cropped constant amounts before any processing is done.

My pyramid algorithm recursively downsamples the input image with a factor of 0.8 until the image is smaller than 300x300 pixels. Then it checks a [-30,30] window for this smallest image to maximize SSIM (disregarding a 30-pixel border). Using the optimal displacements from this layer, each following layer checks a 2-pixel window to maximize either NCC or SSIM_small. I experimented with many different window sizes and number of layers and found that this combination gave a good balance of accuracy and speed. Processing all of the images took <10 minutes in total, even with edge detection.

I also utilized edge detection to improve the similarity metrics. I used a Sobel filter on relevant layers to achieve this. This helped some of the images, but made little difference in others.

Results for SSIM with/without edge detection

SSIM_small SSIM_small w/ edge detection
cathedral
SSIM - Cathedral
R: (12, 3)
G: (5, 2)
cathedral
SSIM w/ Edge Detection - Cathedral
R: (12, 3)
G: (5, 2)
church
SSIM - Church
R: (26, -7)
G: (23, 4)
church
SSIM w/ Edge Detection - Church
R: (28, -2)
G: (24, 4)
emir
SSIM - Emir
R: (17, -325)
G: (47, 18)
The red channel is still skewed.
emir
SSIM w/ Edge Detection - Emir
R: (88, 34)
G: (50, 23)
Using edge detection fixed the red channel's issue.
harvesters
SSIM - Harvesters
R: (102, 6)
G: (57, 18)
harvesters
SSIM w/ Edge Detection - Harvesters
R: (102, 15)
G: (49, 18)
icon
SSIM - Icon
R: (61, 23)
G: (40, 18)
icon
SSIM w/ Edge Detection - Icon
R: (76, 24)
G: (40, 18)
lady
SSIM - Lady
R: (86, 1)
G: (47, 7)
lady
SSIM w/ Edge Detection - Lady
R: (97, 9)
G: (49, 1)
melons
SSIM - Melons
R: (158, 8)
G: (81, 10)
melons
SSIM w/ Edge Detection - Melons
R: (158, 14)
G: (74, 6)
monastery
SSIM - Monastery
R: (3, 2)
G: (-3, 2)
monastery
SSIM w/ Edge Detection - Monastery
R: (3, 2)
G: (-3, 2)
onion church
SSIM - Onion Church
R: (88, 37)
G: (50, 26)
onion church
SSIM w/ Edge Detection - Onion Church
R: (95, 37)
G: (50, 26)
sculpture
SSIM - Sculpture
R: (120, -27)
G: (33, -10)
sculpture
SSIM w/ Edge Detection - Sculpture
R: (121, -27)
G: (34, -10)
self_portrait
SSIM - Self Portrait
R: (149, 34)
G: (76, 26)
self_portrait
SSIM w/ Edge Detection - Self Portrait
R: (151, 32)
G: (66, 19)
three_generations
SSIM - Three Generations
R: (97, 11)
G: (52, 16)
three_generations
SSIM w/ Edge Detection - Three Generations
R: (91, 6)
G: (52, 16)
tobolsk
SSIM - Tobolsk
R: (6, 3)
G: (3, 3)
tobolsk
SSIM w/ Edge Detection - Tobolsk
R: (6, 3)
G: (3, 2)
train
SSIM - Train
R: (74, 22)
G: (42, 7)
train
SSIM w/ Edge Detection - Train
R: (66, 34)
G: (44, 3)

Results for NCC with/without edge detection

NCC NCC w/ edge detection
cathedral
NCC - Cathedral
R: (12, 3)
G: (5, 2)
cathedral
NCC w/ Edge Detection - Cathedral
R: (12, 3)
G: (5, 2)
church
NCC - Church
R: (27, -17)
G: (24, 1)
church
NCC w/ Edge Detection - Church
R: (28, -3)
G: (24, 4)
emir
NCC - Emir
R: (15, -325)
G: (43, 3)
The red channel is very skewed here. I suspect it is due to the repeated pattern on the shirt inflating SSIM, since I found this issue when I directly tested the base (coarsest) layer.
emir
NCC w/ Edge Detection - Emir
R: (94, 39)
G: (50, 23)
Using edge detection fixed the red channel's issue.
harvesters
NCC - Harvesters
R: (102, 7)
G: (58, 16)
harvesters
NCC w/ Edge Detection - Harvesters
R: (102, 14)
G: (60, 17)
icon
NCC - Icon
R: (63, 23)
G: (40, 17)
icon
NCC w/ Edge Detection - Icon
R: (78, 23)
G: (42, 17)
lady
NCC - Lady
R: (88, 1)
G: (47, 8)
lady
NCC w/ Edge Detection - Lady
R: (97, 10)
G: (49, 9)
melons
NCC - Melons
R: (158, 2)
G: (81, 10)
melons
NCC w/ Edge Detection - Melons
R: (158, 12)
G: (74, 0)
monastery
NCC - Monastery
R: (3, 2)
G: (-3, 2)
monastery
NCC w/ Edge Detection - Monastery
R: (3, 2)
G: (-3, 2)
onion church
NCC - Onion Church
R: (88, 37)
G: (51, 26)
onion church
NCC w/ Edge Detection - Onion Church
R: (96, 37)
G: (51, 26)
sculpture
NCC - Sculpture
R: (121, -26)
G: (33, -10)
sculpture
NCC w/ Edge Detection - Sculpture
R: (121, -26)
G: (33, -17)
self_portrait
NCC - Self Portrait
R: (151, 33)
G: (76, 26)
self_portrait
NCC w/ Edge Detection - Self Portrait
R: (152, 32)
G: (63, 15)
three_generations
NCC - Three Generations
R: (91, 11)
G: (54, 14)
three_generations
NCC w/ Edge Detection - Three Generations
R: (91, 8)
G: (54, 12)
tobolsk
NCC - Tobolsk
R: (6, 3)
G: (3, 3)
tobolsk
NCC w/ Edge Detection - Tobolsk
R: (6, 3)
G: (3, 2)
train
NCC - Train
R: (76, 27)
G: (44, 6)
train
NCC w/ Edge Detection - Train
R: (66, 35)
G: (43, 8)