This post was written by Kimberly Kowal of the British Library and was originally published here. Reused with kind permission.
Without looking, you can’t know what’s there. That was our experience locating maps amongst the one-million British Library images released to the public domain. We had not guessed that 50,000 images of maps were lurking there. So how were they singled out?
Answer: with the help of our friends (the crowd!) using several methods.
A dedicated team of volunteers looked at individual images and applied the tag “map” on flickr. The work was organised using a synoptic index in Wikimedia Commons, providing a systematic method of looking at each volume and tracking shared progress. Over 29,000 map images were identified in this way.
The British Library hosted a one-day event, in concert with Wikimedia UK, to which volunteers were invited to kick-start the effort. In between working, the 30 participants enjoyed tours and talks from speakers representing online mapping efforts, including OpenStreet Map and Stroly. The day’s activities were captured in Gregory Marler’s engaging description, Lost in Piles of Maps, and a series of photographs from ATR Creative.
Ongoing crowd activity
The bulk of the work took place online over the next two months. With the wiki tools built by J.heald to guide and coordinate contributions, 51 volunteers approached the work, book by book, often focussing on geographic areas of interest. Together, they made short work of what was a huge task; 28% of the books were completed after the first 72 hours; 60% were reviewed in the first 20 days; after five weeks over 20k new maps were found in 93% of the source volumes.
But surely maps can be identified automatically? It’s true that well before the organised effort just described, one user produced algorithm-guided tags for this image set, which resulted in the addition of well over 15k map tags.
By the end of December 2014, every image in every book had been reviewed, and between the manual and automatic tagging, over 50k maps had been found. Since then, we have been working to clean up the data, including reviewing rogue tags, rotating images, splitting maps, and removing duplicates, to derive a final set of data. Next step: georeferencing.