Matthew Petroff https://mpetroff.net mpetroff.net Sat, 23 Nov 2019 18:36:42 +0000 en-US hourly 1 Figure Caption Color Indicators https://mpetroff.net/2019/11/figure-caption-color-indicators/ https://mpetroff.net/2019/11/figure-caption-color-indicators/#respond Sat, 23 Nov 2019 18:36:42 +0000 https://mpetroff.net/?p=3032 Continue reading ]]> .fccip-color-underline { text-decoration-line: underline; text-decoration-style: solid; text-decoration-thickness: 0.2em; text-decoration-skip-ink: auto; } .fccip-red { text-decoration-color: #d62728; } .fccip-red-square::after { content: "\25a0"; position: relative; display: inline-block; color: #d62728; } .fccip-blue { text-decoration-color: #1f77b4; } .fccip-blue-gray { text-decoration-color: #6a6a6a; } .fccip-blue-square::after { content: "\202f\25a0"; position: relative; display: inline-block; color: #1f77b4; } .fccip-blue-gray-square::after { content: "\202f\25a0"; position: relative; display: inline-block; color: #6a6a6a; } .fccip-blue-square-mono::after { content: "\25a0"; position: relative; display: inline-block; color: #1f77b4; margin-left: 0.2rem; top: -0.08rem; } .fccip-blue-diamond::after { content: "\25c6"; font-size: 0.85rem; position: relative; top: -0.05rem; display: inline-block; color: #1f77b4; } .fccip-orange { text-decoration-color: #ff7f0e; } .fccip-orange-gray { text-decoration-color: #878787; } .fccip-orange-square::after { content: "\202f\25a0"; position: relative; display: inline-block; color: #ff7f0e; } .fccip-orange-gray-square::after { content: "\202f\25a0"; position: relative; display: inline-block; color: #878787; } .fccip-orange-circle::after { content: "\25cf"; font-size: 1.2rem; position: relative; display: inline-block; color: #ff7f0e; } .fccip-gray { text-decoration-color: #ccc; } .fccip-gray-square::after { content: "\202f\25a0"; position: relative; display: inline-block; color: #ccc; } .fccip-dotted { text-decoration-style: dotted; } .fccip-caption { margin-left: 3em; margin-right: 3em; padding: 0.5em 0; border-top: 1px solid #aaa; border-bottom: 1px solid #aaa; }

Earlier this year, I became aware of a feature in GitHub-flavored Markdown that displays a colored square inline when HTML color codes are surrounded by backticks, e.g., #1f77b4. Although I only recently became aware of this feature, it dates back to at least 2017 and is similar to a feature that Slack has had since at least 2014. When I saw this inline color presentation, I immediately thought of its applicability to figure captions, particularly in academic papers; as a colorblind individual, matching colors referenced in figure captions to features in the figures themselves can be challenging at times due to difficulties with naming colors. Thus, I added similar annotations to figure captions in my recently submitted paper, Two-year Cosmology Large Angular Scale Surveyor (CLASS) Observations: A First Detection of Atmospheric Circular Polarization at Q Band:

Fig. 2. Frequency dependence of polarized atmospheric signal at zenith for the CLASS observing site, both for circular polarization ($|V|$, shown in blue) and linear polarization ($\sqrt{Q^2+U^2}$, shown in orange). The light gray bands indicate CLASS observing frequencies, with the lowest frequency band corresponding to the Q-band telescope.

Fig. 5. Example binned azimuth profiles are shown…angle cut. The profile in blue is from a zenith angle of 43.9° and a boresight rotation angle of −45°, the profile in orange is from a zenith angle of 46.7° and a boresight rotation angle of 0°, and the profile in red is from a zenith angle of 52.8° and a boresight rotation angle of +45°.

The first caption refers to a line plot, while the second caption refers to a scatter plot with best fit lines. These examples, as well as underlining examples elsewhere in this post, display best in a browser that supports changing the underline thickness via the text-decoration-thickness CSS property. At the time of writing, this includes Firefox 70+ and Safari 12.2+ but does not include any version of Chrome; however, browser underlining support is still subpar to the underline rendered by $\LaTeX$, so the reader is encouraged to view the figures in the paper.

While the primary purpose of these annotations is to improve accessibility for individuals with color vision deficiencies, they are also helpful when a paper is printed or displayed in grayscale. For example, it is much easier to distinguish blue and orange in grayscale with the annotations than without.

As this was an experiment, I included two different methods for visualizing the color, a thick colored underline under and a colored square following the color name. Since the colors are referring to solid lines in the plot, the underlines make sense because they match the plot features, e.g., a solid blue line. Likewise, a dotted underline might make sense for a dotted blue line, although it is more difficult to discern the color of the dotted line than the solid line. I am undecided as to whether or not including the colored square is a good idea. While it adds an additional visual cue, the main reason I included it was to increase the chances of at least one of the indicators making it past the editors and into the final published paper; as the paper is currently under review, it remains to be seen if either indicator survives the publication process.

For scatter plots, however, colored shapes make perfect sense. A scatter plot with red squares (), blue diamonds (), and orange circles () should include such shapes in the figure caption when the caption refers to the corresponding points. I am undecided as to whether or not the color names in such cases should be underlined, just as I am undecided as to whether or not line plots should included a colored square. Although I have not seen any color indicators, for either lines or scatter points, in the scientific literature, the use of shapes in figure captions is not a new practice. I have found examples dating from the mid-1950s through the early 2000s. The closest example I have found is in a 1997 paper1 that refers to a symbol with both its name and a graphical representation:

Fig. 5. Couette-Taylor experiments. Logarithmic…number. The black triangles (▲) are the results obtained with smooth cylinders, and the open ones (△) correspond to those obtained with the ribbed ones. The crosses (×) show for comparison…and Swinney [8].

Other examples include a 1967 paper2 (and a 1968 paper3) that uses graphical representations inline instead of symbol names:

Fig. 13. Additional…symmetries. Points marked with ■ are the excess…nuclei, points marked with □ the excess…N = Z. The points ▼ show the differences…larger Z-values. The points △ are the differences…for even-Z–odd-N nuclei.

and a 1955 paper4 that puts the figure legend inline in the figure caption:

Fig. 1. (p,n) cross sections in millibarns. ○—measured total…isotope; □—partial…isotope; ×—observed…estimate. Curves…of r0. The dotted bands indicate…energy.

There are other examples, e.g., this 1960 paper,5 that put the legend on separate lines at the end of the caption, but doing so isn’t really the same idea. There are also papers that treated line styles in the same manner as scatter plot symbols, such as this 1962 paper:6

Fig. 1. Counting rate…in pulses per cm2 sec. Maximum…is indicated by broken lines (– – –). The zone…has been shaded.

These examples should not be considered by any means exhaustive, since searching for this sort of thing is extremely difficult.7 In particular, while I don’t know of any prior publications that include color indicators, this does not mean that they do not exist. If anyone reading this is aware of any such examples, or of other interesting figure caption indicators, please let me know.

Adoption of visual color indicators such as the ones presented here would be a significant accessibility improvement, but it would require buy-in from both publishers and authors. The chances of success are unclear but would certainly be improved with advocacy.

### Implementation

The $\LaTeX$ color annotation command was defined as

% Black square
\usepackage{amsmath}

% Define color
\usepackage{xcolor}
\definecolor{tab:blue}{RGB}{31, 119, 180}

% Color underlines with breaks for descenders, based on:
% https://tex.stackexchange.com/a/75406
% https://tex.stackexchange.com/a/24771
% https://tex.stackexchange.com/a/321235
\usepackage{soul}
\usepackage[outline]{contour}
\newcommand \colorindicator[2]{%
\begingroup%
\setul{0.25ex}{0.4ex}%
\contourlength{0.2ex}%
\setulcolor{#1}%
\ul{{\phantom{#2}}}\llap{\contour{white}{#2}} \textcolor{#1}{\tiny{$\blacksquare$}}%
\endgroup%
}


and used with \colorindicator{tab:blue}{blue}. For HTML, this CSS

.color-underline {
text-decoration-line: underline;
text-decoration-style: solid;
text-decoration-thickness: 0.2em;
text-decoration-skip-ink: auto;
}
.blue {
text-decoration-color: #1f77b4;
}
.blue-square::after {
content: "\202f\25a0";
position: relative;
display: inline-block;
color: #1f77b4;
}


was used with <span class="color-underline blue blue-square">blue</span> to produce blue. A production implementation would probably involve a symbol web font to improve and normalize the symbol appearance and possibly a better way to draw underlines.

1. Cadot, O., Y. Couder, A. Daerr, S. Douady, and A. Tsinober. “Energy injection in closed turbulent flows: Stirring through boundary layers versus inertial stirring.” Physical Review E 56, no. 1 (1997): 427. doi:10.1103/PhysRevE.56.427

2. Haque, Khorshed Banu, and J. G. Valatin. “An investigation of the separation energies of lighter nuclei.” Nuclear Physics A 95, no. 1 (1967): 97-114. doi:10.1016/0375-9474(67)90154-6

3. Aydin, C. “The spectral variations of CU Virginis (HD 124224).” Memorie della Societa Astronomica Italiana 39 (1968): 721. bibcode:1968MmSAI..39..721A

4. Blosser, H. G., and T. H. Handley. “Survey of (p, n) reactions at 12 MeV.” Physical Review 100, no. 5 (1955): 1340. doi:10.1103/PhysRev.100.1340

5. Evans, D. S., G. V. Raynor, and R. T. Weiner. “The lattice spacings of thorium-lanthanum alloys.” Journal of Nuclear Materials 2, no. 2 (1960): 121-128. doi:10.1016/0022-3115(60)90039-8

6. Vernov, S. N., E. V. Gorchakov, Yu I. Logachev, V. E. Nesterov, N. F. Pisarenko, I. A. Savenko, A. E. Chudakov, and P. I. Shavrin. “Investigations of radiation during flights of satellites, space vehicles and rockets.” Journal of the Physical Society of Japan Supplement 17 (1962): 162. bibcode:1962JPSJS..17B.162V

7. I found most of the above examples by performing full-text searches in NASA ADS for terms such as “black diamond” or “filled square” and looking through hundreds of results to find the few instances that included both the search terms and the symbols.

]]>
https://mpetroff.net/2019/11/figure-caption-color-indicators/feed/ 0
Discernibility of (Rainbow) Colormaps https://mpetroff.net/2019/08/discernibility-of-rainbow-colormaps/ https://mpetroff.net/2019/08/discernibility-of-rainbow-colormaps/#respond Mon, 26 Aug 2019 01:57:08 +0000 https://mpetroff.net/?p=2974 Continue reading ]]> Earlier this month, the Turbo rainbow colormap was released and publicized on the Google AI Blog. This colormap attempts to mitigate the banding issues in the existing Jet rainbow colormap, while retaining the advantages of its high contrast; note that Turbo is not perceptually uniform, so care should be used where high accuracy is required, particularly for local differences. What particularly caught my attention was the fact that the author attempted to address the color vision deficiency-related shortcomings of Jet. I am of opinion that the creation of a colorblind-friendly rainbow colormap probably isn’t possible, since the confusion axes of color vision deficiencies become problematic once hue become the primary discriminator in a colormap instead of lightness;1 this made me a bit suspicious of the claim and prompted further investigation on my part. While the author’s attempt to consider color vision deficiencies in the creation of the colormap is laudable, it was unfortunately based on what I feel is a flawed analysis. Depth images visualized using the colormap were fed into an online color vision deficiency simulator, and the results were evaluated qualitatively by individuals with normal color vision; however, this particular simulator is, best I can tell, based on an outdated technique from a 1988 paper2 instead of the more recent and accurate approach of Machado et al. (2009).3 Below, I attempt what I feel to be a more accurate and quantitative analysis, which shows that Turbo isn’t really colorblind-friendly, despite the attempt to make it so.

Since rainbow colormaps are best suited for quickly judging values, their most important property is that colors in non-adjacent sections of the colormap are not confused.4 To evaluate this quantitatively, I devised the following metric. For each color in the colormap, the perceptual distance in CAM02-UCS5 is calculated for every additional color in the colormap. The weighted average of the perceptual distances is then taken, with the squares of the color location distances in the colormap used as weights. For color vision deficiencies, the method of Machado et al. (2009) is used to adjust the colors before the perceptual distance is calculated, as I did for randomly generating color sets and as was done in the development of Cividis;6 a severity of 100 was used, indicating deuteranopia, protanopia, and tritanopia. Thus, similar colors in distant locations in the colormap are penalized.

We will start with rainbow colormaps for our evaluation of colormaps by this metric, first considering Jet, the new Turbo colormap, and Matplotlib’s existing Rainbow colormap, which also attempts to address some of Jet’s shortcomings. In the plot legends, the abbreviations “Norm,” “Deut,” “Prot,” and “Trit” are used for normal color vision, deuteranopia, protanopia, and tritanopia, respectively. Higher perceptual distance, ΔE, is better, as are smoother and more consistent discernibility lines.

The discernibility lines for Turbo and Rainbow are much smoother than those for Jet, since both mitigate Jet’s significant banding issues. Although Jet’s banding issues are generally considered problematic, I, as a colorblind individual, find the banding to sometimes be a redeeming quality, since it makes it easier for me to match part of an image to the colorbar or other parts of the image. For normal color vision, Turbo’s discernibility line is smooth and fairly flat, a significant improvement over Jet, and a minor improvement over Rainbow, although Turbo arguably looks better. However, the discernibility lines for various color vision deficiencies are not nearly as uniform, for either Turbo or Rainbow. This means that for colorblind individuals some parts of the colormaps are considerably more difficult to discern than others, making data plotted with them liable to misinterpretation. Thus, while Turbo and Rainbow improve upon some of Jet’s shortcomings, neither is colorblind-friendly.

Next, we will consider cyclic rainbow colormaps. The classic, and severely flawed, version is the HSV colormap, and the improved version is Sinebow; in regards to non-cyclic rainbow colormaps, these are analogous to Jet and Turbo, respectively. Twilight, a perceptually uniform cyclic colormap, is also considered.

In evaluating the metric for these colormaps, their cyclic nature was taken into consideration in the colormap location distance calculation. Sinebow’s discernibility lines are much smoother than HSV’s, but neither does well for color vision deficiencies. Twilight is much more consistent and colorblind-friendly, although at the expense of average discernibility.

Now, we will consider two perceptually uniform linear colormaps, Viridis, the Matplotlib default, and Cividis a derivative designed with color vision deficiencies in mind.

The “V” shape of the metric for these colormaps is expected, since for a linear colormap, the center is closest to the greatest number of other colors. Note that the discernibility of Cividis, which was optimized with color vision deficiencies in mind, is the most consistent between normal color vision and various color vision deficiencies, although Viridis is also okay in this regard, and both are considerably better than any of the rainbow colormaps previously presented.

Finally, diverging colormaps will be evaluated. Here, we consider Matplotlib’s Coolwarm colormap and Peter Kovesi’s Blue-Gray-Yellow colormap.

These show a “V” shape, similar to linear colormaps, although this is less pronounced in Coolwarm. The Blue-Gray-Yellow colormap is linearly increasing in lightness and perceptually uniform, so its discernibility profile is much closer to that of perceptually uniform linear colormaps.

In summary, while Turbo does ameliorate many of the issues with Jet, neither Turbo nor any of the other rainbow colormaps evaluated here are colorblind-friendly, at least per the metric evaluated. It is likely that it is not possible to construct a rainbow colormap with such a property, unlike for linear, diverging, and cyclic colormaps. The Jupyter notebook used to evaluate the colormaps and produce the plots is available.

1. It probably is possible to create a colorblind-friendly rainbow colormap for a particular type of color vision deficiency. However, creating such a colormap that simultaneously works for multiple types of color vision deficiencies as well as for normal color vision is what is likely impossible.

2. G. W. Meyer and D. P. Greenberg, “Color-defective vision and computer graphics displays,” in IEEE Computer Graphics and Applications, vol. 8, no. 5, pp. 28-40, Sept. 1988. doi:10.1109/38.7759

3. G. M. Machado, M. M. Oliveira, and L. A. F. Fernandes, “A Physiologically-based Model for Simulation of Color Vision Deficiency,” in IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1291-1298, Nov.-Dec. 2009. doi:10.1109/TVCG.2009.113

4. When differences between adjacent colors are important, a perceptually uniform colormap should be used.

5. Luo M.R., Li C. (2013) CIECAM02 and Its Recent Developments. In: Fernandez-Maloigne C. (eds) Advanced Color Image Processing and Analysis. Springer, New York, NY. doi:10.1007/978-1-4419-6190-7_2

6. J. R. Nuñez, C. R. Anderton, and R. S. Renslow. “Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data,” in PLoS ONE vol. 13, no. 7, pp. e0199239, Aug. 2018. doi:10.1371/journal.pone.0199239

]]>
https://mpetroff.net/2019/08/discernibility-of-rainbow-colormaps/feed/ 0
Pannellum 2.5 https://mpetroff.net/2019/07/pannellum-2-5/ https://mpetroff.net/2019/07/pannellum-2-5/#comments Sun, 14 Jul 2019 01:58:28 +0000 https://mpetroff.net/?p=2962 Continue reading ]]> Pannellum 2.5 has now been released. As with Pannellum 2.4, this was a rather incremental release. The most noteworthy change is that equirectangular panoramas will now be automatically split into two textures if too big for a given device, which means images up to 8192 px across, covering all consumer panoramic cameras, now have widespread support. There has also been a significant improvement of the rendering quality on certain mobile devices (the fragment shaders now use highp precision), and support for partial panoramas has improved. Finally, there are an assortment of more minor improvements and bug fixes. See the changelog for full details. Pannellum also now has a Zenodo DOI (and a specific DOI for each new release).

]]>
https://mpetroff.net/2019/07/pannellum-2-5/feed/ 8
Preliminary Color Cycle Order Ranking Results https://mpetroff.net/2019/06/preliminary-color-cycle-order-ranking-results/ https://mpetroff.net/2019/06/preliminary-color-cycle-order-ranking-results/#respond Mon, 10 Jun 2019 13:58:56 +0000 https://mpetroff.net/?p=2945 Continue reading ]]> Last month, I presented a preliminary analysis of ranking color sets using responses collected in the Color Cycle Survey. Now, I extend this analysis to look at color ordering within a given color set. For this analysis, the same artificial neural network architecture was used as was used before, except that batch normalization, with a batch size of 2048, was used after the two Gaussian dropout layers. Determining ordering turned out to be a slightly more difficult problem, in part because the data cannot be augmented, since the ordering, obviously, matters. However, due to the way the survey is structured, with the user picking the best of four potential orderings, there are three pairwise data points per response. The same set of responses was used, ignoring the additional responses collected since the previous analysis was performed (there are now ~10k total responses).

To maximize the information gleaned from the survey responses, the network was trained in four steps. The process started with a single network and ended with a conjoined network, as before, except the single network underwent three stages of training instead of one. First, the color set responses—the responses that were used in the previous analysis—were used to train the network for 50 epochs, to learn color representations. Next, the ordering responses were used with the data augmented with all possible cyclic shifts to train the network for an additional 50 epochs, to learn internal cycle orderings. Then, the non-augmented ordering responses were used to train the network for another 100 epochs, to learn the ideal starting color. Finally, the last layer of the network was replaced, as before, to make a conjoined network, and the new network was trained for a final 100 epochs, again with the non-augmented ordering responses.

As with the previous analysis, an ensemble of 100 network instantiations was trained, and the average and standard deviation of the scores were computed. The accuracy for the ordering was a bit worse than for the color sets, with an accuracy of 56% on the training data and an accuracy of 54% on the test data. Since the ideal ordering depends on the specific color set used, the highest ranked color set from the previous analysis was used in this evaluation. The error band from the trained ensemble for this color set was larger than the error band from the set ranking analysis. While the model could be evaluated for any color set, it is likely more accurate for color sets that were ranked highly in the previous analysis, since the Color Cycle Survey only asks the user about the preferred ordering of the user’s preferred color set, so data are not collected on poorly-liked color sets.

The trained network shows a clear preference for blue / purple as the first color instead of green / yellow; as many existing color cycles start with blue, this seems reasonable. The network also seems fairly confident in picking the third color, since it’s the same for the top fifteen orderings, but there’s more variation in the second color.

]]>
https://mpetroff.net/2019/06/preliminary-color-cycle-order-ranking-results/feed/ 0
Preliminary Color Cycle Set Ranking Results https://mpetroff.net/2019/05/preliminary-color-cycle-set-ranking-results/ https://mpetroff.net/2019/05/preliminary-color-cycle-set-ranking-results/#respond Wed, 15 May 2019 21:30:17 +0000 https://mpetroff.net/?p=2877 Continue reading ]]> Since I launched my color cycle survey in December, it has collected ~9.7k responses across ~800 user sessions. Although the responses are not as numerous as I’d like, there’s currently enough data for preliminary analysis. The data are split between sets of six, eight, and ten colors with ratios of approximately 2:2:1; there are fewer ten-color color set responses as I disabled that portion of the survey months ago, to more quickly record six- and eight-color color set responses. So far, I’ve focused on analyzing the set ranking of the six-color color sets, for which there are ~4k responses, using artificial neural networks. The gist of the problem is to use the survey’s pair-wise responses to train a neural network such that it can rank 10k previously-generated color sets; these colors sets each have a minimum perceptual distance between colors, both with and without color vision deficiency simulations applied.

As inputs with identical structure are being compared, a network architecture that is invariant to input order, i.e., one that produces identical output for inputs (A, B) and (B, A), is desirable. Conjoined neural networks1 satisfy this property; they consist of two identical neural networks with shared weights, the outputs of which are combined to produce a single result. In this case, each network takes a single color set as input and produces a single scalar output, a “score” for the input color set. The two scores are then compared, with the better scoring color set of the input pair chosen as the preferred set; put more concretely, the difference of the two scores is computed and used to calculate binary cross-entropy during network training. The architecture of the network appears in the figure below and contains 2077 trainable parameters.

Each color set consists of six colors, which are each encoded in the perceptually-uniform CAM02-UCS colorspace, with J encoding the lightness and a and b encoding the chromacity. The first two layers of the network are used to fit an optimal encoding to each of the color inputs; this is achieved by using a pair of three-neuron fully-connected layers for each of the six colors, with network weights shared between each sub-layer. The outputs of these color-encoding layers are then concatenated and fed to two more fully-connected layers, consisting of thirty-six neurons each. A final fully-connected layer consisting of single neuron is then use to produce a single scalar output. The entire network is then duplicated for the second color set being compared, and the difference between the two outputs is computed. Exponential linear unit (ELU) activation functions are used on the interior layers, and a sigmoid activation function is used on the final layer of each network.

The colors in each color set are ordered by hue, then chromacity, then lightness. This is a sensible ordering, but since hue is cyclic, the starting color is fairly arbitrary. Thus, before training the network, the data are augmented by performing cyclic shifts on the ordering of the six colors in each set. As this augmentation is performed on each of the two color sets in each survey response pair, the total training and test data set sizes are augmented by a factor of thirty-six. Prior to data augmentation, the survey response data are split, with 80% used as the training set and 20% used as the test set. In order to reduce overfitting, Gaussian dropout is used on both of the 36-neuron layers, with a rate of 0.4; L2 kernel regularizers are used on all layers, with a penalty of 0.001. The network was implemented using Keras, with the TensorFlow backend, and trained using binary-crossentropy and the Nesterov Adam optimizer, using default optimizer parameters.

Unfortunately, training this network proved to be problematic, with it often converging into a local minimum with a loss of 0.6931 ≈ ln(0.5); the network was learning to ignore the inputs and always produce the same output, resulting in an output of zero from the conjoined network. Previous work with conjoined networks did not run into this problem, since either higher dimensionality output was used to compute a similarity metric2 or non-binary training data were used.3 To resolve this issue, the output comparison was removed as well as the last fully-connected layer of each network; this was replaced with a single-neuron fully-connected layer with sigmoid activation, joining the two existing networks into a single network with a single output. As this is no longer a conjoined architecture but instead a single network, the input order matters, so the data were additionally augmented such that both ordering of each survey response pair would be used, doubling the number of training and test pairs.

With this change, the network could be successfully trained. However, this new network only worked with pair-wise data, which was troublesome. The 10k color sets to be ranked can be paired close to fifty million ways, which grows to more than three billion inputs to evaluate once the data augmentation is applied. The conjoined network, however, requires only 60k evaluations for the ranking, since a single instance of the network, without the output comparison, can be used to directly score a given color set. Thus, a hybrid approach was devised. The single-output non-conjoined network was first trained for fifty epochs. Its last layer was then removed, and the change to the original conjoined network was undone, but the existing training weights were kept. This partially pre-trained conjoined network was then trained for an additional fifty epochs. Due to the pre-training, the conjoined network no longer became stuck in the local minimum, allowing the advantages of the conjoined network to be reaped, while avoiding the training dilemma.

Since the training data only very sparsely cover the space of possible pairing and since the network does not always training consistently well, I decided it was best to train an ensemble of model instances. To this end, I trained the model 100 times, chose the best fifty instances as determined by the metric training accuracy + test accuracy - abs(training accuracy - test accuracy), calculated scores for each of the 10k color sets using these fifty trained model instances, and averaged the resulting scores for each color set. For both the training and test sets, the average accuracy was 58%. While considerably better than guessing randomly, it does seem a bit low at first glance. However, many of the color sets are similar and aesthetic preference is subjective, so perfect accuracy isn’t possible. To approximate an upper limit on achievable accuracy, I created a modified version of the color cycle survey that always presents the same six-color color sets in the same order and then entered 100 responses each of two consecutive days; 83 / 100 of my answers were consistent for the color set preference between the two days. Thus, I think 80% is a conservative upper limit on possible accuracy; including aesthetic preference differences between individuals, I think ~70% is a more practical upper limit for achievable accuracy.

A few variants of the network were evaluated, such as increasing or decreasing the number of layers or the size of the layers, as well as changing the activation functions. Adding additional layers or increasing the size of the existing layers did not appear to have an effect on the accuracy; removing one each of the color encoding and set encoding layers only led to at most a marginal decrease in accuracy. Using rectified linear unit (ReLU) activations on the interior layers led to marginally decreased accuracy. Adjusting the Gaussian dropout rate by 0.1 or 0.2 had little effect, and Gaussian dropout seems to work slightly better than standard dropout. Originally, a hue-chromacity-luminance representation was used for the color inputs, as is used to sort the input color order, but this had noticeably decreased accuracy; I suspect that the cyclic nature of hue values was the source of this reduced accuracy.

In addition to making the results more stable, this ensemble also allows for estimating the uncertainty between training runs; the plot below shows the average color set scores as a function of rank, with a 1-sigma error band.

This shows that according to the model, that while the best color sets are definitely better than the worst color sets, color sets that are close in ranking are not necessarily any better or worse than the hundreds of color sets with similar rankings. Given the sparsity of the input data, this result is not surprising. The results can also be evaluated qualitatively; the figure below shows the fifteen lowest ranked color sets on the left and the fifteen highest ranked color sets on the right.

To my eye, the best color sets definitely look better than the worst color sets. The worst sets appear to be darker, more saturated, and generally a bit garish; note that the lightness and color distance limits applied when the color sets were generated excluded the vast majority of truly awful color sets for this evaluation. I find the highest-ranked color set, as well as many of the other highly-ranked color sets, to be quite pleasant; some of the other highly-ranked color sets contain blueish purplish colors that I find to be a bit over-saturated, so there’s definitely still room for improvement.

I hope that this post convincingly shows the validity of the data-driven premise on which the color cycle survey is based. It was certainly a relief to me when I was first able to get test accuracy results consistently above 50%, since it meant there wasn’t an egregious mistake in the survey code; seeing consistent color set rankings between training runs gave further relief, since it showed that the concept was working as I had hoped. Moving forward, I plan to next consider color cycle ordering for the six-color color sets. The initial plan is to use the same network architecture but to train it with the color cycle ordering responses (three pairs per response); the trained network could then be used to determine an optimal ordering by ranking the 720 possible six-color cycle orderings for a given color set and choosing the highest-ranked ordering. Once I have a workable cycle ordering analysis technique, I’ll apply both the set choice and cycle ordering analyses to the eight-color color set data, which will hopefully be straightforward.

Another interesting avenue to pursue would be to try to create a single network that can handle various sized color cycles, as this would allow all of the survey results to be used at once and would allow the results to be generalized beyond the number of colors used in the survey; however, I’m not yet sure how to approach this. An additional thought is to devise a metric that combines the network-derived score with some sort of color-nameability criterion, probably derived from the xkcd color survey, and use that to rank the color sets, favoring colors that can more easily be named, instead of just using the network-derived score directly. As I mentioned at the beginning of this post, I’d really like more data with which to improve the analysis; with increased confidence from these preliminary results, I’ll try to further promote the color cycle survey.

If you haven’t yet taken the color cycle survey (or even if you have), please consider taking it: https://colorcyclesurvey.mpetroff.net/

1. Bromley, Jane, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. “Signature verification using a ‘Siamese’ time delay neural network.” In Advances in neural information processing systems, pp. 737-744. 1994.

2. Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. “Siamese neural networks for one-shot image recognition.” In ICML deep learning workshop, vol. 2. 2015.

3. Burges, Christopher, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. “Learning to rank using gradient descent.” In Proceedings of the 22nd International Conference on Machine learning (ICML-05), pp. 89-96. 2005. doi:10.1145/1102351.1102363

]]>
https://mpetroff.net/2019/05/preliminary-color-cycle-set-ranking-results/feed/ 0