Matthew Petroff Mon, 26 Aug 2019 01:57:08 -0400 en-US hourly 1 Discernibility of (Rainbow) Colormaps Mon, 26 Aug 2019 01:57:08 +0000 Continue reading ]]> Earlier this month, the Turbo rainbow colormap was released and publicized on the Google AI Blog. This colormap attempts to mitigate the banding issues in the existing Jet rainbow colormap, while retaining the advantages of its high contrast; note that Turbo is not perceptually uniform, so care should be used where high accuracy is required, particularly for local differences. What particularly caught my attention was the fact that the author attempted to address the color vision deficiency-related shortcomings of Jet. I am of opinion that the creation of a colorblind-friendly rainbow colormap probably isn’t possible, since the confusion axes of color vision deficiencies become problematic once hue become the primary discriminator in a colormap instead of lightness;1 this made me a bit suspicious of the claim and prompted further investigation on my part. While the author’s attempt to consider color vision deficiencies in the creation of the colormap is laudable, it was unfortunately based on what I feel is a flawed analysis. Depth images visualized using the colormap were fed into an online color vision deficiency simulator, and the results were evaluated qualitatively by individuals with normal color vision; however, this particular simulator is, best I can tell, based on an outdated technique from a 1988 paper2 instead of the more recent and accurate approach of Machado et al. (2009).3 Below, I attempt what I feel to be a more accurate and quantitative analysis, which shows that Turbo isn’t really colorblind-friendly, despite the attempt to make it so.

Since rainbow colormaps are best suited for quickly judging values, their most important property is that colors in non-adjacent sections of the colormap are not confused.4 To evaluate this quantitatively, I devised the following metric. For each color in the colormap, the perceptual distance in CAM02-UCS5 is calculated for every additional color in the colormap. The weighted average of the perceptual distances is then taken, with the squares of the color location distances in the colormap used as weights. For color vision deficiencies, the method of Machado et al. (2009) is used to adjust the colors before the perceptual distance is calculated, as I did for randomly generating color sets and as was done in the development of Cividis;6 a severity of 100 was used, indicating deuteranopia, protanopia, and tritanopia. Thus, similar colors in distant locations in the colormap are penalized.

We will start with rainbow colormaps for our evaluation of colormaps by this metric, first considering Jet, the new Turbo colormap, and Matplotlib’s existing Rainbow colormap, which also attempts to address some of Jet’s shortcomings. In the plot legends, the abbreviations “Norm,” “Deut,” “Prot,” and “Trit” are used for normal color vision, deuteranopia, protanopia, and tritanopia, respectively. Higher perceptual distance, ΔE, is better, as are smoother and more consistent discernibility lines.

Discernibility plot of Jet colormap

Discernibility plot of Turbo colormap

Discernibility plot of Rainbow colormap

The discernibility lines for Turbo and Rainbow are much smoother than those for Jet, since both mitigate Jet’s significant banding issues. Although Jet’s banding issues are generally considered problematic, I, as a colorblind individual, find the banding to sometimes be a redeeming quality, since it makes it easier for me to match part of an image to the colorbar or other parts of the image. For normal color vision, Turbo’s discernibility line is smooth and fairly flat, a significant improvement over Jet, and a minor improvement over Rainbow, although Turbo arguably looks better. However, the discernibility lines for various color vision deficiencies are not nearly as uniform, for either Turbo or Rainbow. This means that for colorblind individuals some parts of the colormaps are considerably more difficult to discern than others, making data plotted with them liable to misinterpretation. Thus, while Turbo and Rainbow improve upon some of Jet’s shortcomings, neither is colorblind-friendly.

Next, we will consider cyclic rainbow colormaps. The classic, and severely flawed, version is the HSV colormap, and the improved version is Sinebow; in regards to non-cyclic rainbow colormaps, these are analogous to Jet and Turbo, respectively. Twilight, a perceptually uniform cyclic colormap, is also considered.

Discernibility plot of HSV colormap

Discernibility plot of Sinebow colormap

Discernibility plot of Twilight colormap

In evaluating the metric for these colormaps, their cyclic nature was taken into consideration in the colormap location distance calculation. Sinebow’s discernibility lines are much smoother than HSV’s, but neither does well for color vision deficiencies. Twilight is much more consistent and colorblind-friendly, although at the expense of average discernibility.

Now, we will consider two perceptually uniform linear colormaps, Viridis, the Matplotlib default, and Cividis a derivative designed with color vision deficiencies in mind.

Discernibility plot of Viridis colormap

Discernibility plot of Cividis colormap

The “V” shape of the metric for these colormaps is expected, since for a linear colormap, the center is closest to the greatest number of other colors. Note that the discernibility of Cividis, which was optimized with color vision deficiencies in mind, is the most consistent between normal color vision and various color vision deficiencies, although Viridis is also okay in this regard, and both are considerably better than any of the rainbow colormaps previously presented.

Finally, diverging colormaps will be evaluated. Here, we consider Matplotlib’s Coolwarm colormap and Peter Kovesi’s Blue-Gray-Yellow colormap.

Discernibility plot of Coolwarm colormap

Discernibility plot of Blue-Gray-Yellow colormap

These show a “V” shape, similar to linear colormaps, although this is less pronounced in Coolwarm. The Blue-Gray-Yellow colormap is linearly increasing in lightness and perceptually uniform, so its discernibility profile is much closer to that of perceptually uniform linear colormaps.

In summary, while Turbo does ameliorate many of the issues with Jet, neither Turbo nor any of the other rainbow colormaps evaluated here are colorblind-friendly, at least per the metric evaluated. It is likely that it is not possible to construct a rainbow colormap with such a property, unlike for linear, diverging, and cyclic colormaps. The Jupyter notebook used to evaluate the colormaps and produce the plots is available.

  1. It probably is possible to create a colorblind-friendly rainbow colormap for a particular type of color vision deficiency. However, creating such a colormap that simultaneously works for multiple types of color vision deficiencies as well as for normal color vision is what is likely impossible.  

  2. G. W. Meyer and D. P. Greenberg, “Color-defective vision and computer graphics displays,” in IEEE Computer Graphics and Applications, vol. 8, no. 5, pp. 28-40, Sept. 1988. doi:10.1109/38.7759  

  3. G. M. Machado, M. M. Oliveira, and L. A. F. Fernandes, “A Physiologically-based Model for Simulation of Color Vision Deficiency,” in IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1291-1298, Nov.-Dec. 2009. doi:10.1109/TVCG.2009.113  

  4. When differences between adjacent colors are important, a perceptually uniform colormap should be used.  

  5. Luo M.R., Li C. (2013) CIECAM02 and Its Recent Developments. In: Fernandez-Maloigne C. (eds) Advanced Color Image Processing and Analysis. Springer, New York, NY. doi:10.1007/978-1-4419-6190-7_2  

  6. J. R. Nuñez, C. R. Anderton, and R. S. Renslow. “Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data,” in PLoS ONE vol. 13, no. 7, pp. e0199239, Aug. 2018. doi:10.1371/journal.pone.0199239  

]]> 0
Pannellum 2.5 Sun, 14 Jul 2019 01:58:28 +0000 Continue reading ]]> Pannellum 2.5 has now been released. As with Pannellum 2.4, this was a rather incremental release. The most noteworthy change is that equirectangular panoramas will now be automatically split into two textures if too big for a given device, which means images up to 8192 px across, covering all consumer panoramic cameras, now have widespread support. There has also been a significant improvement of the rendering quality on certain mobile devices (the fragment shaders now use highp precision), and support for partial panoramas has improved. Finally, there are an assortment of more minor improvements and bug fixes. See the changelog for full details. Pannellum also now has a Zenodo DOI (and a specific DOI for each new release).

]]> 4
Preliminary Color Cycle Order Ranking Results Mon, 10 Jun 2019 13:58:56 +0000 Continue reading ]]> Last month, I presented a preliminary analysis of ranking color sets using responses collected in the Color Cycle Survey. Now, I extend this analysis to look at color ordering within a given color set. For this analysis, the same artificial neural network architecture was used as was used before, except that batch normalization, with a batch size of 2048, was used after the two Gaussian dropout layers. Determining ordering turned out to be a slightly more difficult problem, in part because the data cannot be augmented, since the ordering, obviously, matters. However, due to the way the survey is structured, with the user picking the best of four potential orderings, there are three pairwise data points per response. The same set of responses was used, ignoring the additional responses collected since the previous analysis was performed (there are now ~10k total responses).

➡ Click Here to Take Color Cycle Survey ⬅

To maximize the information gleaned from the survey responses, the network was trained in four steps. The process started with a single network and ended with a conjoined network, as before, except the single network underwent three stages of training instead of one. First, the color set responses—the responses that were used in the previous analysis—were used to train the network for 50 epochs, to learn color representations. Next, the ordering responses were used with the data augmented with all possible cyclic shifts to train the network for an additional 50 epochs, to learn internal cycle orderings. Then, the non-augmented ordering responses were used to train the network for another 100 epochs, to learn the ideal starting color. Finally, the last layer of the network was replaced, as before, to make a conjoined network, and the new network was trained for a final 100 epochs, again with the non-augmented ordering responses.

As with the previous analysis, an ensemble of 100 network instantiations was trained, and the average and standard deviation of the scores were computed. The accuracy for the ordering was a bit worse than for the color sets, with an accuracy of 56% on the training data and an accuracy of 54% on the test data. Since the ideal ordering depends on the specific color set used, the highest ranked color set from the previous analysis was used in this evaluation. The error band from the trained ensemble for this color set was larger than the error band from the set ranking analysis. While the model could be evaluated for any color set, it is likely more accurate for color sets that were ranked highly in the previous analysis, since the Color Cycle Survey only asks the user about the preferred ordering of the user’s preferred color set, so data are not collected on poorly-liked color sets.

Ranked Color Cycle Ordering Visualization

The trained network shows a clear preference for blue / purple as the first color instead of green / yellow; as many existing color cycles start with blue, this seems reasonable. The network also seems fairly confident in picking the third color, since it’s the same for the top fifteen orderings, but there’s more variation in the second color.

]]> 0
Preliminary Color Cycle Set Ranking Results Wed, 15 May 2019 21:30:17 +0000 Continue reading ]]> Since I launched my color cycle survey in December, it has collected ~9.7k responses across ~800 user sessions. Although the responses are not as numerous as I’d like, there’s currently enough data for preliminary analysis. The data are split between sets of six, eight, and ten colors with ratios of approximately 2:2:1; there are fewer ten-color color set responses as I disabled that portion of the survey months ago, to more quickly record six- and eight-color color set responses. So far, I’ve focused on analyzing the set ranking of the six-color color sets, for which there are ~4k responses, using artificial neural networks. The gist of the problem is to use the survey’s pair-wise responses to train a neural network such that it can rank 10k previously-generated color sets; these colors sets each have a minimum perceptual distance between colors, both with and without color vision deficiency simulations applied.

➡ Click Here to Take Color Cycle Survey ⬅

As inputs with identical structure are being compared, a network architecture that is invariant to input order, i.e., one that produces identical output for inputs (A, B) and (B, A), is desirable. Conjoined neural networks1 satisfy this property; they consist of two identical neural networks with shared weights, the outputs of which are combined to produce a single result. In this case, each network takes a single color set as input and produces a single scalar output, a “score” for the input color set. The two scores are then compared, with the better scoring color set of the input pair chosen as the preferred set; put more concretely, the difference of the two scores is computed and used to calculate binary cross-entropy during network training. The architecture of the network appears in the figure below and contains 2077 trainable parameters.

Artificial Neural Network Architecture Diagram

Each color set consists of six colors, which are each encoded in the perceptually-uniform CAM02-UCS colorspace, with J encoding the lightness and a and b encoding the chromacity. The first two layers of the network are used to fit an optimal encoding to each of the color inputs; this is achieved by using a pair of three-neuron fully-connected layers for each of the six colors, with network weights shared between each sub-layer. The outputs of these color-encoding layers are then concatenated and fed to two more fully-connected layers, consisting of thirty-six neurons each. A final fully-connected layer consisting of single neuron is then use to produce a single scalar output. The entire network is then duplicated for the second color set being compared, and the difference between the two outputs is computed. Exponential linear unit (ELU) activation functions are used on the interior layers, and a sigmoid activation function is used on the final layer of each network.

The colors in each color set are ordered by hue, then chromacity, then lightness. This is a sensible ordering, but since hue is cyclic, the starting color is fairly arbitrary. Thus, before training the network, the data are augmented by performing cyclic shifts on the ordering of the six colors in each set. As this augmentation is performed on each of the two color sets in each survey response pair, the total training and test data set sizes are augmented by a factor of thirty-six. Prior to data augmentation, the survey response data are split, with 80% used as the training set and 20% used as the test set. In order to reduce overfitting, Gaussian dropout is used on both of the 36-neuron layers, with a rate of 0.4; L2 kernel regularizers are used on all layers, with a penalty of 0.001. The network was implemented using Keras, with the TensorFlow backend, and trained using binary-crossentropy and the Nesterov Adam optimizer, using default optimizer parameters.

Unfortunately, training this network proved to be problematic, with it often converging into a local minimum with a loss of 0.6931 ≈ ln(0.5); the network was learning to ignore the inputs and always produce the same output, resulting in an output of zero from the conjoined network. Previous work with conjoined networks did not run into this problem, since either higher dimensionality output was used to compute a similarity metric2 or non-binary training data were used.3 To resolve this issue, the output comparison was removed as well as the last fully-connected layer of each network; this was replaced with a single-neuron fully-connected layer with sigmoid activation, joining the two existing networks into a single network with a single output. As this is no longer a conjoined architecture but instead a single network, the input order matters, so the data were additionally augmented such that both ordering of each survey response pair would be used, doubling the number of training and test pairs.

With this change, the network could be successfully trained. However, this new network only worked with pair-wise data, which was troublesome. The 10k color sets to be ranked can be paired close to fifty million ways, which grows to more than three billion inputs to evaluate once the data augmentation is applied. The conjoined network, however, requires only 60k evaluations for the ranking, since a single instance of the network, without the output comparison, can be used to directly score a given color set. Thus, a hybrid approach was devised. The single-output non-conjoined network was first trained for fifty epochs. Its last layer was then removed, and the change to the original conjoined network was undone, but the existing training weights were kept. This partially pre-trained conjoined network was then trained for an additional fifty epochs. Due to the pre-training, the conjoined network no longer became stuck in the local minimum, allowing the advantages of the conjoined network to be reaped, while avoiding the training dilemma.

Since the training data only very sparsely cover the space of possible pairing and since the network does not always training consistently well, I decided it was best to train an ensemble of model instances. To this end, I trained the model 100 times, chose the best fifty instances as determined by the metric training accuracy + test accuracy - abs(training accuracy - test accuracy), calculated scores for each of the 10k color sets using these fifty trained model instances, and averaged the resulting scores for each color set. For both the training and test sets, the average accuracy was 58%. While considerably better than guessing randomly, it does seem a bit low at first glance. However, many of the color sets are similar and aesthetic preference is subjective, so perfect accuracy isn’t possible. To approximate an upper limit on achievable accuracy, I created a modified version of the color cycle survey that always presents the same six-color color sets in the same order and then entered 100 responses each of two consecutive days; 83 / 100 of my answers were consistent for the color set preference between the two days. Thus, I think 80% is a conservative upper limit on possible accuracy; including aesthetic preference differences between individuals, I think ~70% is a more practical upper limit for achievable accuracy.

A few variants of the network were evaluated, such as increasing or decreasing the number of layers or the size of the layers, as well as changing the activation functions. Adding additional layers or increasing the size of the existing layers did not appear to have an effect on the accuracy; removing one each of the color encoding and set encoding layers only led to at most a marginal decrease in accuracy. Using rectified linear unit (ReLU) activations on the interior layers led to marginally decreased accuracy. Adjusting the Gaussian dropout rate by 0.1 or 0.2 had little effect, and Gaussian dropout seems to work slightly better than standard dropout. Originally, a hue-chromacity-luminance representation was used for the color inputs, as is used to sort the input color order, but this had noticeably decreased accuracy; I suspect that the cyclic nature of hue values was the source of this reduced accuracy.

In addition to making the results more stable, this ensemble also allows for estimating the uncertainty between training runs; the plot below shows the average color set scores as a function of rank, with a 1-sigma error band.

This shows that according to the model, that while the best color sets are definitely better than the worst color sets, color sets that are close in ranking are not necessarily any better or worse than the hundreds of color sets with similar rankings. Given the sparsity of the input data, this result is not surprising. The results can also be evaluated qualitatively; the figure below shows the fifteen lowest ranked color sets on the left and the fifteen highest ranked color sets on the right.

Ranked Color Sets Visualization

To my eye, the best color sets definitely look better than the worst color sets. The worst sets appear to be darker, more saturated, and generally a bit garish; note that the lightness and color distance limits applied when the color sets were generated excluded the vast majority of truly awful color sets for this evaluation. I find the highest-ranked color set, as well as many of the other highly-ranked color sets, to be quite pleasant; some of the other highly-ranked color sets contain blueish purplish colors that I find to be a bit over-saturated, so there’s definitely still room for improvement.

I hope that this post convincingly shows the validity of the data-driven premise on which the color cycle survey is based. It was certainly a relief to me when I was first able to get test accuracy results consistently above 50%, since it meant there wasn’t an egregious mistake in the survey code; seeing consistent color set rankings between training runs gave further relief, since it showed that the concept was working as I had hoped. Moving forward, I plan to next consider color cycle ordering for the six-color color sets. The initial plan is to use the same network architecture but to train it with the color cycle ordering responses (three pairs per response); the trained network could then be used to determine an optimal ordering by ranking the 720 possible six-color cycle orderings for a given color set and choosing the highest-ranked ordering. Once I have a workable cycle ordering analysis technique, I’ll apply both the set choice and cycle ordering analyses to the eight-color color set data, which will hopefully be straightforward.

Another interesting avenue to pursue would be to try to create a single network that can handle various sized color cycles, as this would allow all of the survey results to be used at once and would allow the results to be generalized beyond the number of colors used in the survey; however, I’m not yet sure how to approach this. An additional thought is to devise a metric that combines the network-derived score with some sort of color-nameability criterion, probably derived from the xkcd color survey, and use that to rank the color sets, favoring colors that can more easily be named, instead of just using the network-derived score directly. As I mentioned at the beginning of this post, I’d really like more data with which to improve the analysis; with increased confidence from these preliminary results, I’ll try to further promote the color cycle survey.

If you haven’t yet taken the color cycle survey (or even if you have), please consider taking it:

  1. Bromley, Jane, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. “Signature verification using a ‘Siamese’ time delay neural network.” In Advances in neural information processing systems, pp. 737-744. 1994. 

  2. Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. “Siamese neural networks for one-shot image recognition.” In ICML deep learning workshop, vol. 2. 2015. 

  3. Burges, Christopher, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. “Learning to rank using gradient descent.” In Proceedings of the 22nd International Conference on Machine learning (ICML-05), pp. 89-96. 2005. doi:10.1145/1102351.1102363 

]]> 0
Hilbert Curve Cake Tue, 02 Apr 2019 03:37:36 +0000 Continue reading ]]> Three years ago, I entered an Ashley Book of Knots Cake into the Johns Hopkins University Sheridan Libraries’ third annual Edible Book Festival. For this year’s contest, I figured I could apply my 3D-printed Hilbert curve microwave absorber research to craft a cake for Hans Sagan’s Space-Filling Curves book1 on the eponymous topic. Thus began an endeavor involving thermoplastic, silicone, and sugar.

Hilbert curve cake

I saw two ways to make a cake shaped as a Hilbert curve, using an appropriately shaped baking mold or painstakingly carving the appropriate shape out of a baked cake, with the former option being the logical path to pursue. This raised the question, how does one create such a mold? Baking molds are generally either metal or silicone, with silicone having the distinct advantage of being much easier to work with for such a shape, since it can be cast at room temperature. Thus, one needs to create a mold with which to cast the silicone baking mold. Fortunately, 3D-printing is well suited for this, and I already had experience 3D-printing Hilbert curve geometries.

Starting from my existing Hilbert Curve solid models, I designed a two-part mold for a third-order geometric approximation of the Hilbert Curve. Compared to a single-part mold, a two-part mold allows for a thinner silicone wall thickness, which reduces silicone material usage and makes it easier to turn the mold inside-out, a necessary step in removing the eventual cake from the baking mold. This mold was printed from PETG—for no particular reason besides having it around—on a Lulzbot TAZ 6 printer; as the mold is rather large, a printer with a large build volume is necessary. A 1.2 mm nozzle was used to reduce the printing time, a single wall extrusion2 and 10% infill were used to reduce material usage, and a raft was printed below the part to aid with removal from the printer’s bed. When generating the G-code, a solid layer was added just below the Hilbert curve geometry to ensure that it would print correctly with the low infill percentage.

Hilbert curve cake plastic mold top

Hilbert curve cake plastic mold bottom

Once the mold was assembled, 1 kg of food-safe silicone3 was mixed, vacuum degassed, poured into the mold, and allowed to cure.

Assembled Hilbert curve cake plastic mold

Hilbert curve cake plastic mold filled with silicone

While I had hoped that the two-part plastic mold would allow the silicone mold to be easily removed once it had cured, this was an incredibly naive notion. After all attempts to carefully disassemble the plastic mold and remove the cured silicone failed, I ended up smashing the plastic mold to bits in order to free the silicone mold.4

Remnants of Hilbert curve cake plastic mold after removing cured silicone

I then thoroughly washed the silicone mold and was finally ready to begin baking. To increase my chances of success, I decided to use a recipe meant for a Bundt pan, a lemon pound cake.5 The recipe is below, courtesy of my mother.

2.5 cups sugar
1 cup butter
4 eggs
1 teaspoon vanilla extract
0.5 teaspoon lemon extract
1 cup milk
1 tablespoon lemon juice
3 cups flour
0.5 teaspoon baking soda

In mixing bowl, cream together sugar and butter until light and fluffy. Add eggs one at a time, beating well after each. Stir in vanilla and lemon extracts. Mix lemon juice into milk. Thoroughly sift together the flour and baking soda. Add flour mixture to creamed mixture alternately with milk solution, beating well after each addition. Pour into greased mold.

The silicone baking mold was placed into an 8″ square cake pan for support6 and thoroughly greased with shortening; while silicone baking molds don’t need to be greased, in theory, I decided it was best to do so if I were to have any chance of removing the complicated cake geometry from the mold in one piece.

Greased Hilbert curve cake mold

Filling Hilbert curve cake mold with batter

Hilbert curve cake mold filled with batter

The cake was then baked at 350°F for 135 minutes, with a pan of water also in the oven to try to prevent the top crust from hardening too much. As the mold does not have a hole in the center like a Bundt pan does, this baking time is considerably longer than what was specified by the original recipe. I learned this the hard way, since my first attempt came out under-baked, ruining the curve geometry. Once removed from the oven, I allowed the cake to cool for half and hour, before placing it in the freezer for seven hours; I reasoned that a frozen cake would be the least likely to break apart while attempting to remove it from the mold. Once I removed the frozen cake from the freezer, I was able to flip it over and slowly and carefully turn the silicone mold inside-out to remove the cake in one piece, starting from the corners where the curve does not end.7

Baked Hilbert curve cake in mold

Removing Hilbert curve cake from mold

I then flipped the cake back over and sliced the bottom flat.

Slicing bottom off of Hilbert curve cake

Hilbert curve cake before decoration

Finally, the cake was ready for its finishing touch, a lemon glaze, mixed from a ratio of one cup confectioner’s sugar to two tablespoons lemon juice. I carefully and methodically brushed on four coats of glaze using a silicone basting brush, completing the cake.

Hilbert curve cake

Hilbert curve cake

Hilbert curve cake at Edible Book Festival

Sadly, I didn’t win anything, just like last time, but there was stiff competition from many excellent cakes. The model files from the mold are available.

  1. H. Sagan, Space-Filling Curves (Springer-Verlag, 1994). ISBN: 9780387942650. DOI: 10.1007/978-1-4612-0871-6

  2. This led to some gaps in the part’s wall, which allowed some silicone to leak into the interior of the mold, making removal more difficult. 

  3. Smooth-On Smooth-Sil 940 

  4. The single wall extrusion helped here, as did the single bottom layer. 

  5. I also like the taste. 

  6. By happenstance, it fit almost perfectly; I had sized the mold based on silicone and available print volumes. 

  7. One of the ends of the curve did break off partially, but it was easy to reattach. 

]]> 0