Sergei Mikhailovich Prokudin-Gorskii (Сергей Михайлович Прокудин-Горский) was an early pinoeer in photography. In 1907, he began a project to document the Russian Empire using color photography. Specifically, he took photographs of the same scene with a red, a green and a blue filter. However, matching the three images requires some nontrivial effort. Since the specifics of his camera is unknown and the objects might be dynamic, we cannot simply align the images together. In this project, I use a multiscale pyramid approach that matches the RBG images based on the NCC similarity score. For Bells & Whistles, I experimented with using the edges (detected through the Canny edge detector) instead of the raw pixels to improve the matching. My final implementation successfully aligns the images and takes about a minute for the large .tif images.
The Library of Congress purchased and digitized the raw glass plates from Prokudin-Gorskii's collection. The plates are vertically oriented, with blue filter on the top, green filter in the middle, and red filter on the bottom. These plates are the raw data that I used. Below are a few examples.
For smaller (~300 x 300) images, brute force matching is feasible. When doing brute force matching, I first cut out the raw plates into BGR parts and crop 10% of each edge respectively (to prevent any effect the edges have). Afterwards, I heuristically align the R and G channels to the B channel. To do this, I define a search window (e.g. 40), shift the offsets of the R channel relative to the B channel from negative to positive search window in both height and width, and calculate their respective similarity score. The choice of the similarity score significantly affects the matching quality, and I find that the Normalized Cross Correlation (NCC) is better than L2 norm. Then, I find the offsets that maximize the similarity score and apply that to aligh the R channel to the B channel. The same procedure is applied to align the G channel to the B channel. For visualizations, I keep the common areas of the aligned and unaligned images before heuristically cutting out 5% of each edge. The unaligned images are simply stacking the BGR channels together.
Brute force method works well for small images. However, for large images with thousands of rows and columns (so the optimal alignment offsets are in the order of magnitude of 100), brute force solution with a large search window is extremely slow. Instead, I adopt a hierarchical approach that uses a Gaussian pyramid. The basic idea is to resize down the image (by first running through a 5x5 Gaussian filter and then downsample) until the image is small enough to use the brute force method. Then, from the smallest to the largest resized image, we use the optimal alignment offsets of the previously downsized image to initialize a brute force search at the current image level. At each level, we fix a small search window size so the runtime is managebale (this is acceptable since we are simply refining the alignment, and if the previous alignment is accurate we only need the search window size to be the same as the downswample ratio). The hyperparameters involved are the downsample threshold (when to stop resizing, I take 256x256), the fixed search window size (I take 10), and the downsample ratio at each step (I take 4). I use NCC for the similarity score for brute force search. Below are a few examples of the aligned images using the pyramid approach. We can see that although the method works well for Lady and Icon, it works poorly for Emir. The average runtime is about 5 minutes.
As shown in the Emir photo above, the Gaussian Pyramid result using raw pixels as matching features is not perfect. For Bells and Whistles, I choose to use Canny Edge Detector to proprocess each of the BGR channels before calculating the NCC score and aligning the images. As its name suggests, the Canny Edge Detector detects edges of the image, assigning edges to 1 and non-edges to 0. Experiment shows that using edges as featues improves the alignment quality. Another advantage of using edges is that edge features are sparse, so if the numerical linear algebra library we use supports sparse matrix operation, the runtime will be a bit faster. Below is a visualization of the pipeline and also a few examples.
This is a gallery of all the images aligned using raw pixels and edges features with a pyramid approach, as described in the previous section. The optimal offsets (of the R and G channel with respect to the B channel) and the runtime (on a Intel I7 13700K 64GB RAM Ubuntu 20.04 workstation) are also listed. We can see that the average runtime is about one minute for large photos.