An Efficient Method of Serving Many Small Files

Certain types of web content, such as map tiles and image pyramids1, require serving many small files, but this can often be inconvenient. Transferring a large number of small files over SCP or SFTP is very slow, and managing these files on disk can also be unpleasant. The transfer problem can be solved by adding the files to a tarball before transferring them, transferring the tarball, and untarring them once transfered. However, this is an extra step, and it doesn’t address the file management problem. Mapbox devised a solution for this problem for map tiles with their MBTiles format. This container format stores map tiles in a SQLite database, and a server implementation is then used to serve tiles directly from the container. This cleanly solves the aforementioned transfer and storage problems, but it is not general purpose and only works for map tiles. This specialization allows for additional optimizations such as deduplicating identical map tiles, but it means it can’t be used for storing image pyramids or other uses.

The use of a SQLite database container format can also work as a general purpose solution, provided a general purpose database key is used. Thus, I propose a general purpose “FilesDB” format as a generic solution. This format consists of an SQLite database containing a files table, which in turn contains a filename column of type text and a data column of type blob. A directory of files is stored in the format by storing each files in the directory in the database with the file’s path relative to the the base directory2 as the filename and the file’s contents as the data. As a proof of concept, I wrote a Python script for generating a .filesdb container and a rudimentary server for serving the files from the format in Go. These, along with a basic specification document, are available from a repository on GitHub.

  1. Used by Pannellum’s multires format, for example. 

  2. Using Unix directory separators (/

This entry was posted in and tagged , , , . Bookmark the permalink.

One Response to An Efficient Method of Serving Many Small Files

  1. Brian says:

    The other issue that you run into when transferring many small files via a tarball is the code page for text files. Tarballs are the equivalent of FTPing text files in binary mode. Nothing good can come of it.

Leave a Reply

Your email address will not be published. Required fields are marked *