Images of the Russian Empire:
Colorizing the Prokudin-Gorskii Photo Collection

UC Berkeley
CS 180 Fall 2024 Project 1

Project Overview

Sergei Mikhailovich Prokudin-Gorskii (Сергей Михайлович Прокудин-Горский) was an early pinoeer in photography. In 1907, he began a project to document the Russian Empire using color photography. Specifically, he took photographs of the same scene with a red, a green and a blue filter. However, matching the three images requires some nontrivial effort. Since the specifics of his camera is unknown and the objects might be dynamic, we cannot simply align the images together. In this project, I use a multiscale pyramid approach that matches the RBG images based on the NCC similarity score. For Bells & Whistles, I experimented with using the edges (detected through the Canny edge detector) instead of the raw pixels to improve the matching. My final implementation successfully aligns the images and takes about a minute for the large .tif images.

Raw Data: Glass Plates

The Library of Congress purchased and digitized the raw glass plates from Prokudin-Gorskii's collection. The plates are vertically oriented, with blue filter on the top, green filter in the middle, and red filter on the bottom. These plates are the raw data that I used. Below are a few examples.

Raw glass plates, from left to right: Lady, Emir, Icon

Brute Force Matching

For smaller (~300 x 300) images, brute force matching is feasible. When doing brute force matching, I first cut out the raw plates into BGR parts and crop 10% of each edge respectively (to prevent any effect the edges have). Afterwards, I heuristically align the R and G channels to the B channel. To do this, I define a search window (e.g. 40), shift the offsets of the R channel relative to the B channel from negative to positive search window in both height and width, and calculate their respective similarity score. The choice of the similarity score significantly affects the matching quality, and I find that the Normalized Cross Correlation (NCC) is better than L2 norm. Then, I find the offsets that maximize the similarity score and apply that to aligh the R channel to the B channel. The same procedure is applied to align the G channel to the B channel. For visualizations, I keep the common areas of the aligned and unaligned images before heuristically cutting out 5% of each edge. The unaligned images are simply stacking the BGR channels together.

Illustration of the pipeline, only matching red to green is shown for clarity
Cathedral, up is unaligned, down is aligned with brute force method
Monastery, up is unaligned, down is aligned with brute force method
Tobolsk, up is unaligned, down is aligned with brute force method

Gaussian Pyramid Approach

Brute force method works well for small images. However, for large images with thousands of rows and columns (so the optimal alignment offsets are in the order of magnitude of 100), brute force solution with a large search window is extremely slow. Instead, I adopt a hierarchical approach that uses a Gaussian pyramid. The basic idea is to resize down the image (by first running through a 5x5 Gaussian filter and then downsample) until the image is small enough to use the brute force method. Then, from the smallest to the largest resized image, we use the optimal alignment offsets of the previously downsized image to initialize a brute force search at the current image level. At each level, we fix a small search window size so the runtime is managebale (this is acceptable since we are simply refining the alignment, and if the previous alignment is accurate we only need the search window size to be the same as the downswample ratio). The hyperparameters involved are the downsample threshold (when to stop resizing, I take 256x256), the fixed search window size (I take 10), and the downsample ratio at each step (I take 4). I use NCC for the similarity score for brute force search. Below are a few examples of the aligned images using the pyramid approach. We can see that although the method works well for Lady and Icon, it works poorly for Emir. The average runtime is about 5 minutes.

Lady, up is unaligned, down is aligned with gaussian pyramid method
Emir, up is unaligned, down is aligned with gaussian pyramid method
Icon, up is unaligned, down is aligned with gaussian pyramid method

Bells and Whistles: Canny Edge Detector

As shown in the Emir photo above, the Gaussian Pyramid result using raw pixels as matching features is not perfect. For Bells and Whistles, I choose to use Canny Edge Detector to proprocess each of the BGR channels before calculating the NCC score and aligning the images. As its name suggests, the Canny Edge Detector detects edges of the image, assigning edges to 1 and non-edges to 0. Experiment shows that using edges as featues improves the alignment quality. Another advantage of using edges is that edge features are sparse, so if the numerical linear algebra library we use supports sparse matrix operation, the runtime will be a bit faster. Below is a visualization of the pipeline and also a few examples.

Illustration of the pipeline, only matching red to green is shown for clarity
Lady, up is unaligned, middle is aligned with guassian pyramid method using raw pixel features, down is aligned with gaussian pyramid method with edge features
Emir, up is unaligned, middle is aligned with guassian pyramid method using raw pixel features, down is aligned with gaussian pyramid method with edge features
Icon, up is unaligned, middle is aligned with guassian pyramid method using raw pixel features, down is aligned with gaussian pyramid method with edge features

Gallery of Aligned Photo

This is a gallery of all the images aligned using raw pixels and edges features with a pyramid approach, as described in the previous section. The optimal offsets (of the R and G channel with respect to the B channel) and the runtime (on a Intel I7 13700K 64GB RAM Ubuntu 20.04 workstation) are also listed. We can see that the average runtime is about one minute for large photos.

Textile (own choosing), unaligned
Textile (own choosing), aligned with raw pixel features
offsets: R: [-134, -34], G: [-71, -26]
runtime: 61.7s
Textile (own choosing), aligned with edge features
offsets: R: [-133, -32], G: [-73, -27]
runtime: 49.6s
Gruppa (own choosing), unaligned
Gruppa (own choosing), aligned with raw pixel features
offsets: R: [-114, -15], G: [-33, -8]
runtime: 57.9s
Gruppa (own choosing), aligned with edge features
offsets: R: [-114, -13], G: [-33, -7]
runtime: 56.9s
Kurmy (own choosing), unaligned
Kurmy (own choosing), aligned with raw pixel features
offsets: R: [-116, 38], G: [-25, 18]
runtime: 64.0s
Kurmy (own choosing), aligned with edge features
offsets: R: [-115, 38], G: [-25, 18]
runtime: 56.9s
Cathedral, unaligned
Cathedral, aligned with raw pixel features
offsets: R: [-12, -3], G: [-5, -2]
runtime: 1.70s
Cathedral, aligned with edge features
offsets: R: [-12, -3], G: [-5, -2]
runtime: 0.26s
Church, unaligned
Church, aligned with raw pixel features
offsets: R: [-58, 4], G: [-25, -4]
runtime: 346.8s
Church, aligned with edge features
offsets: R: [-58, 4], G: [-24, -4]
runtime: 53.9s
Emir, unaligned
Emir, aligned with raw pixel features
offsets: R: [-196, 404], G: [-49, -24]
runtime: 319.8s
Emir, aligned with edge features
offsets: R: [-107, -40], G: [-49, -24]
runtime: 53.9s
Harvesters, unaligned
Harvesters, aligned with raw pixel features
offsets: R: [-124, -14], G: [-60, -17]
runtime: 344.9s
Harvesters, aligned with edge features
offsets: R: [-124, -14], G: [-60, -18]
runtime: 54.0s
Icon, unaligned
Icon, aligned with raw pixel features
offsets: R: [-89, -23], G: [-41, -17]
runtime: 354.3s
Icon, aligned with edge features
offsets: R: [-90, -23], G: [-39, -16]
runtime: 52.2s
Lady, unaligned
Lady, aligned with raw pixel features
offsets: R: [-117, -12], G: [-55, -9]
runtime: 415.9s
Lady, aligned with edge features
offsets: R: [-120, -13], G: [-56, -10]
runtime: 54.5s
Melons, unaligned
Melons, aligned with raw pixel features
offsets: R: [-116, 38], G: [-82, -11]
runtime: 64.0s
Melons, aligned with edge features
offsets: R: [-179, -13], G: [-80, -10]
runtime: 349.3s
Monastery, unaligned
Monastery, aligned with raw pixel features
offsets: R: [-3, -2], G: [3, -2]
runtime: 1.87s
Monastery, aligned with edge features
offsets: R: [-3, -2], G: [3, -2]
runtime: 0.27s
Onion Church, unaligned
Onion Church, aligned with raw pixel features
offsets: R: [-108, -36], G: [-51, -27]
runtime: 350.0s
Onion Church, aligned with edge features
offsets: R: [-107, -35], G: [-52, -24]
runtime: 55.0s
Sculpture, unaligned
Sculpture, aligned with raw pixel features
offsets: R: [-140, 27], G: [-33, 11]
runtime: 424.0s
Sculpture, aligned with edge features
offsets: R: [-140, 27], G: [-33, 11]
runtime: 55.3s
Self Portrait, unaligned
Self Portrait, aligned with raw pixel features
offsets: R: [-176, -37], G: [-79, -29]
runtime: 350.3s
Self Portrait, aligned with edge features
offsets: R: [-175, -37], G: [-77, -29]
runtime: 53.3s
Three Generations, unaligned
Three Generations, aligned with raw pixel features
offsets: R: [-112, -11], G: [-53, -14]
runtime: 404.0s
Three Generations, aligned with edge features
offsets: R: [-111, -8], G: [-55, -12]
runtime: 53.6s
Tobolsk, unaligned
Tobolsk, aligned with raw pixel features
offsets: R: [-7, -3], G: [-3, -3]
runtime: 1.68s
Tobolsk, aligned with edge features
offsets: R: [-7, -3], G: [-3, -3]
runtime: 0.26s
Train, unaligned
Train, aligned with raw pixel features
offsets: R: [-87, -32], G: [-43, -6]
runtime: 352.6s
Train, aligned with edge features
offsets: R: [-85, -29], G: [-41, 0]
runtime: 54.0s