Preliminary Color Cycle Set Ranking Results

Since I launched my color cycle survey in December, it has collected ~9.7k responses across ~800 user sessions. Although the responses are not as numerous as I’d like, there’s currently enough data for preliminary analysis. The data are split between sets of six, eight, and ten colors with ratios of approximately 2:2:1; there are fewer ten-color color set responses as I disabled that portion of the survey months ago, to more quickly record six- and eight-color color set responses. So far, I’ve focused on analyzing the set ranking of the six-color color sets, for which there are ~4k responses, using artificial neural networks. The gist of the problem is to use the survey’s pair-wise responses to train a neural network such that it can rank 10k previously-generated color sets; these colors sets each have a minimum perceptual distance between colors, both with and without color vision deficiency simulations applied.

As inputs with identical structure are being compared, a network architecture that is invariant to input order, i.e., one that produces identical output for inputs (A, B) and (B, A), is desirable. Conjoined neural networks1 satisfy this property; they consist of two identical neural networks with shared weights, the outputs of which are combined to produce a single result. In this case, each network takes a single color set as input and produces a single scalar output, a “score” for the input color set. The two scores are then compared, with the better scoring color set of the input pair chosen as the preferred set; put more concretely, the difference of the two scores is computed and used to calculate binary cross-entropy during network training. The architecture of the network appears in the figure below and contains 2077 trainable parameters.

Artificial Neural Network Architecture Diagram

Each color set consists of six colors, which are each encoded in the perceptually-uniform CAM02-UCS colorspace, with J encoding the lightness and a and b encoding the chromacity. The first two layers of the network are used to fit an optimal encoding to each of the color inputs; this is achieved by using a pair of three-neuron fully-connected layers for each of the six colors, with network weights shared between each sub-layer. The outputs of these color-encoding layers are then concatenated and fed to two more fully-connected layers, consisting of thirty-six neurons each. A final fully-connected layer consisting of single neuron is then use to produce a single scalar output. The entire network is then duplicated for the second color set being compared, and the difference between the two outputs is computed. Exponential linear unit (ELU) activation functions are used on the interior layers, and a sigmoid activation function is used on the final layer of each network.

The colors in each color set are ordered by hue, then chromacity, then lightness. This is a sensible ordering, but since hue is cyclic, the starting color is fairly arbitrary. Thus, before training the network, the data are augmented by performing cyclic shifts on the ordering of the six colors in each set. As this augmentation is performed on each of the two color sets in each survey response pair, the total training and test data set sizes are augmented by a factor of thirty-six. Prior to data augmentation, the survey response data are split, with 80% used as the training set and 20% used as the test set. In order to reduce overfitting, Gaussian dropout is used on both of the 36-neuron layers, with a rate of 0.4; L2 kernel regularizers are used on all layers, with a penalty of 0.001. The network was implemented using Keras, with the TensorFlow backend, and trained using binary-crossentropy and the Nesterov Adam optimizer, using default optimizer parameters.

Unfortunately, training this network proved to be problematic, with it often converging into a local minimum with a loss of 0.6931 ≈ ln(0.5); the network was learning to ignore the inputs and always produce the same output, resulting in an output of zero from the conjoined network. Previous work with conjoined networks did not run into this problem, since either higher dimensionality output was used to compute a similarity metric2 or non-binary training data were used.3 To resolve this issue, the output comparison was removed as well as the last fully-connected layer of each network; this was replaced with a single-neuron fully-connected layer with sigmoid activation, joining the two existing networks into a single network with a single output. As this is no longer a conjoined architecture but instead a single network, the input order matters, so the data were additionally augmented such that both ordering of each survey response pair would be used, doubling the number of training and test pairs.

With this change, the network could be successfully trained. However, this new network only worked with pair-wise data, which was troublesome. The 10k color sets to be ranked can be paired close to fifty million ways, which grows to more than three billion inputs to evaluate once the data augmentation is applied. The conjoined network, however, requires only 60k evaluations for the ranking, since a single instance of the network, without the output comparison, can be used to directly score a given color set. Thus, a hybrid approach was devised. The single-output non-conjoined network was first trained for fifty epochs. Its last layer was then removed, and the change to the original conjoined network was undone, but the existing training weights were kept. This partially pre-trained conjoined network was then trained for an additional fifty epochs. Due to the pre-training, the conjoined network no longer became stuck in the local minimum, allowing the advantages of the conjoined network to be reaped, while avoiding the training dilemma.

Since the training data only very sparsely cover the space of possible pairing and since the network does not always training consistently well, I decided it was best to train an ensemble of model instances. To this end, I trained the model 100 times, chose the best fifty instances as determined by the metric training accuracy + test accuracy - abs(training accuracy - test accuracy), calculated scores for each of the 10k color sets using these fifty trained model instances, and averaged the resulting scores for each color set. For both the training and test sets, the average accuracy was 58%. While considerably better than guessing randomly, it does seem a bit low at first glance. However, many of the color sets are similar and aesthetic preference is subjective, so perfect accuracy isn’t possible. To approximate an upper limit on achievable accuracy, I created a modified version of the color cycle survey that always presents the same six-color color sets in the same order and then entered 100 responses each of two consecutive days; 83 / 100 of my answers were consistent for the color set preference between the two days. Thus, I think 80% is a conservative upper limit on possible accuracy; including aesthetic preference differences between individuals, I think ~70% is a more practical upper limit for achievable accuracy.

A few variants of the network were evaluated, such as increasing or decreasing the number of layers or the size of the layers, as well as changing the activation functions. Adding additional layers or increasing the size of the existing layers did not appear to have an effect on the accuracy; removing one each of the color encoding and set encoding layers only led to at most a marginal decrease in accuracy. Using rectified linear unit (ReLU) activations on the interior layers led to marginally decreased accuracy. Adjusting the Gaussian dropout rate by 0.1 or 0.2 had little effect, and Gaussian dropout seems to work slightly better than standard dropout. Originally, a hue-chromacity-luminance representation was used for the color inputs, as is used to sort the input color order, but this had noticeably decreased accuracy; I suspect that the cyclic nature of hue values was the source of this reduced accuracy.

In addition to making the results more stable, this ensemble also allows for estimating the uncertainty between training runs; the plot below shows the average color set scores as a function of rank, with a 1-sigma error band.

This shows that according to the model, that while the best color sets are definitely better than the worst color sets, color sets that are close in ranking are not necessarily any better or worse than the hundreds of color sets with similar rankings. Given the sparsity of the input data, this result is not surprising. The results can also be evaluated qualitatively; the figure below shows the fifteen lowest ranked color sets on the left and the fifteen highest ranked color sets on the right.

Ranked Color Sets Visualization

To my eye, the best color sets definitely look better than the worst color sets. The worst sets appear to be darker, more saturated, and generally a bit garish; note that the lightness and color distance limits applied when the color sets were generated excluded the vast majority of truly awful color sets for this evaluation. I find the highest-ranked color set, as well as many of the other highly-ranked color sets, to be quite pleasant; some of the other highly-ranked color sets contain blueish purplish colors that I find to be a bit over-saturated, so there’s definitely still room for improvement.

I hope that this post convincingly shows the validity of the data-driven premise on which the color cycle survey is based. It was certainly a relief to me when I was first able to get test accuracy results consistently above 50%, since it meant there wasn’t an egregious mistake in the survey code; seeing consistent color set rankings between training runs gave further relief, since it showed that the concept was working as I had hoped. Moving forward, I plan to next consider color cycle ordering for the six-color color sets. The initial plan is to use the same network architecture but to train it with the color cycle ordering responses (three pairs per response); the trained network could then be used to determine an optimal ordering by ranking the 720 possible six-color cycle orderings for a given color set and choosing the highest-ranked ordering. Once I have a workable cycle ordering analysis technique, I’ll apply both the set choice and cycle ordering analyses to the eight-color color set data, which will hopefully be straightforward.

Another interesting avenue to pursue would be to try to create a single network that can handle various sized color cycles, as this would allow all of the survey results to be used at once and would allow the results to be generalized beyond the number of colors used in the survey; however, I’m not yet sure how to approach this. An additional thought is to devise a metric that combines the network-derived score with some sort of color-nameability criterion, probably derived from the xkcd color survey, and use that to rank the color sets, favoring colors that can more easily be named, instead of just using the network-derived score directly. As I mentioned at the beginning of this post, I’d really like more data with which to improve the analysis; with increased confidence from these preliminary results, I’ll try to further promote the color cycle survey.

If you haven’t yet taken the color cycle survey (or even if you have), please consider taking it: https://colorcyclesurvey.mpetroff.net/


  1. Bromley, Jane, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. “Signature verification using a ‘Siamese’ time delay neural network.” In Advances in neural information processing systems, pp. 737-744. 1994. 

  2. Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. “Siamese neural networks for one-shot image recognition.” In ICML deep learning workshop, vol. 2. 2015. 

  3. Burges, Christopher, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. “Learning to rank using gradient descent.” In Proceedings of the 22nd International Conference on Machine learning (ICML-05), pp. 89-96. 2005. doi:10.1145/1102351.1102363 

Posted in , , | Tagged , , , , , , | Leave a comment

Hilbert Curve Cake

Three years ago, I entered an Ashley Book of Knots Cake into the Johns Hopkins University Sheridan Libraries’ third annual Edible Book Festival. For this year’s contest, I figured I could apply my 3D-printed Hilbert curve microwave absorber research to craft a cake for Hans Sagan’s Space-Filling Curves book1 on the eponymous topic. Thus began an endeavor involving thermoplastic, silicone, and sugar.

Hilbert curve cake Continue reading


  1. H. Sagan, Space-Filling Curves (Springer-Verlag, 1994). ISBN: 9780387942650. DOI: 10.1007/978-1-4612-0871-6

Posted in | Tagged , , , , , | Leave a comment

3D-Printed Tea Bag Holder

When readily available containers do not come in the desired form factor, 3D-printing can be quite useful. In this case, I wanted a tea bag holder that fit on a small ledge, allowed the tea bag labels to be read, and allowed the tea bags to be easily removed. Although there are some similar products available commercially that would fit the space, they either seemed a bit flimsy or looked to be a tight fit around the tea bags, which would make them more difficult to remove. Thus, I designed and 3D-printed a modular holder that can be stacked. The holder was printed out of PLA, and the design files are available.

Tea Bag Holder

Posted in | Tagged , , , | 3 Comments

Color Cycle Survey

A previous post about randomly generating color sets with a minimum perceptual distance addresses the technical aspects of generating sets of colors that are visually distinct for those with normal color vision as well as for those with color vision deficiencies. However, it does not address the aesthetic aspect, which I will start to address here. To create an aesthetically pleasing color cycle—an ordered set of colors for visualizing categorical data—two aspects need to be addressed, the colors that are used and the order that they are used in. While one could take an ontological approach to this by trying to define a set of rules that make a pleasing color cycle, as is done by I Want Hue,1 such a method is error-prone and substantially biased toward the personal preferences of the drafter of the rules. Instead of using ontologies, an alternative approach that is gaining traction in many fields is to infer a pattern from a large data set using machine learning techniques. This is the approach I wish to pursue.

To this end, I’ve created a Color Cycle Survey. After presenting the user with an introduction, colorblindness questionnaire, and directions, the primary survey starts. In it, the user is presented with two color sets and is asked to choose the one that is, in the user’s opinion, more aesthetically pleasing. Then, four orderings of the chosen set are displayed, and the user again makes a selection to taste. This basic process is then repeated again and again, with sets of either six, eight, or ten colors. For the choice of color set, each set is presented ordered by hue, since this makes the two sets easier to compare than if they were randomly ordered (or ordered by RGB values). Only two sets are presented, to make for an easier choice. Additionally, a line plot or scatter plot rendering is shown with each color set. For the choice of ordering, four orders are presented, since multiple orderings are easier to compare than sets, since the colors are all the same. I considered asking the user to order the colors to taste, instead of presenting possible orderings, but I decided that while such an approach yields more information per response, it takes much longer and requires more effort, so each user will likely respond many fewer times. Thus, I went with the simpler approach.

If all goes well, I’ll amass a sizable data set from the survey, which will remain available for at least a few months. Once I have data to experiment with, I’ll work out the exact analysis method. In addition to generating an “optimal” color cycle, it would also be interesting to create a model that allows for additional constraints, such as being able to choose the first color or being able to choose the exact number of colors in the cycle. Once anonymized, I’ll release the survey data under a permissive license, probably CC BY 4.0 (I’m open to suggestions). Any generated color cycles will be release into the public domain via the CC0 public domain dedication.


  1. I Want Hue takes a fairly rudimentary approach to color vision deficiency simulation, which I find lacking; I personally have difficulty differentiating colors in the many of the sets it generates, even when its colorblind mode is turned on. It also doesn’t really address the ordering of the color set into a cycle.  

Posted in , , | Tagged , , , , | 2 Comments

Randomly Generating Color Sets with a Minimum Perceptual Distance

Earlier this year, I released a color cycle picker that enforces a minimum perceptual distance between colors, including color vision deficiency simulations, with the goal of creating a better color cycle to replace the “category 10” color palette used by default in Matplotlib, along with other data visualization packages. While the picker works well for what it was designed for—allowing a user to create a color cycle—it requires user intervention to create color sets or cycles.1 The basic technique used—performing color vision deficiency simulations2 for various types of deficiencies and enforcing a minimum perceptual difference for the simulated colors using the CAM02-UCS3 perceptually uniform color space (where each type of deficiency is treated separately) and a minimum lightness distance (for grayscale)—is still valid for the random generation of color sets; it just needs to be extended to randomly sample the color space.

A few different color sets
Continue reading


  1. A color set doesn’t have a defined order, while a color cycle does. 

  2. G. M. Machado, M. M. Oliveira, and L. A. F. Fernandes, “A Physiologically-based Model for Simulation of Color Vision Deficiency,” in IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1291-1298, Nov.-Dec. 2009. doi:10.1109/TVCG.2009.113  

  3. Luo M.R., Li C. (2013) CIECAM02 and Its Recent Developments. In: Fernandez-Maloigne C. (eds) Advanced Color Image Processing and Analysis. Springer, New York, NY. doi:10.1007/978-1-4419-6190-7_2  

Posted in , | Tagged , , , , , | Leave a comment