A bit over a year and a half ago, I wrote Nugacious, which provides random quantity comparisons. However, I’ve found a lot of the comparisons to be a bit too random, being things I had never heard of. I finally got around to mitigating this issue, by weighting potential comparisons by popularity. The quantity data Nugacious uses is from DBpedia, which is extracted from Wikipedia. Since each data point is linked to a Wikipedia page and Wikipedia keeps page view statistics, a popularity can be inferred for each data point. I integrated this data by downloading three months of Wikipedia page view statistics,1 extracting the view counts, and associating a view count with each data point. Nugacious’ matching code was then modified to use a weighted average based on these counts for close matches and random matches; one non-weighted random match is still returned for each. Nugacious’ code is available on GitHub.
This was ~190 GB. ↩