By Jason Evans, National Wikimedian at the National Library of Wales
Imagine a world in which anyone could use an open citation database to support free knowledge, with rich information about every citable source.
Any Wikipedian or Wikipedia advocate will tell you that one of the great strengths of Wikipedia is its citations. In fact, a Wikipedia article is only as strong as its citations. They provide evidence for the statements made in an article but they also provide a gateway to reliable secondary sources for deeper learning.
In recent years Wikipedia has been overtaken as the fastest growing Wikimedia project by Wikidata – a linked open database of facts – or the Wikipedia of data, if you like. Wikidata has grown at a tremendous rate, as people and institutions use it as a hub for their data, joining up the world’s open data in an interconnected web. Quite organically, it began to act as a platform for sharing bibliographic and citation data, to the point that 40% of Wikidata’s 60 million items now describe academic papers and articles.
The emergence of Wikidata has lead to the growth of the WikiCite movement which aims, broadly speaking, to harness the power of structured data to create open structured data for all citations used in Wikipedia.
This was my first WikiCite conference, and what became clear to me from day one was that this is very much a project still exploring its scope and trying to understand its place in the Wikimedia family of projects. But already there is a growing community of librarians, Wikimedians and data scientists keen to explore the potentials of the overarching concept.
Potential benefits of WikiCite are varied and wide reaching, and they serve separate communities in different ways. For example, since Wikidata items can be labelled and described in 100s of languages, any structured citations on Wikipedia become multilingual, which has clear benefits for smaller language communities. And structured citations would make it much easier for us to analyze the diversity and quality of citations being used in Wikipedia projects. It would allow us to map works which cite other works, or pick out retracted papers, making it easier to manage the relevance and quality of citations across multiple languages.
Approximately 1% of Wikipedia users click on a citation when they read a Wikipedia article, and this rises to 30% or more for more academic topics such as mathematics and engineering. And whilst these might seem like low numbers, 1% is still around 76 million clicks a month. So structured citations, in a standardised format that links to deeper data about a work (hopefully facilitating access to a digital copy of the work or providing details of physical holdings), will certainly add value to the current system for citations which are essentially comprised of strings of textual information.
Implementing this kind of fundamental change to Wikipedia, across multiple language editions presents huge technical and social challenges in itself, and as such it has been proposed that any conversion to structured citations should start small, on smaller Wikidata-friendly language versions of Wikipedia, before tackling English Wikipedia, with its nearly 6 million articles.
However the WikiCite vision is even bigger and more ambitious.
Imagine Wikidata items for every citation on Wikipedia, and then consider the added value of a massive centralised, or ‘federated’ bibliographic commons, where individuals, institutions and organisations can give access to bibliographic corpora, ranging from collections of niche scientific papers to a country’s entire publishing output – a library catalogue for the sum of all human knowledge. That may sound implausible, but Wikipedia didn’t become the 5th largest website in the world by dreaming small.
As you can imagine, this larger ambition has a few potential issues, which is why it is currently referred to as ‘the moonshot option’. There are questions around the technical ability to host, manage and maintain all this data in a standardised and centralised way. And if you decentralise the data to multiple instances of Wikibase (the platform which powers Wikidata), then how do you ensure that all these databases retain the semantic structure required for consistent and seamless communication between instances?
Another important question which comes out of this conference is: how do we ensure that any development is inclusive of other languages and cultures? Done properly this initiative should make it possible to have a greater diversity in sources on our Wikipedia. For years, the use of Western sources to inform readers about non-western concepts, languages and societies has been bugbear for Wikipedia.
In Wales, we have already embarked on a project to share the ‘Sum of all Welsh Literature’ via Wikidata, in a bid to encourage the use of Welsh publications to cite articles about Wales, its people and culture. And we heard of similar projects getting under way in other parts of the world. In Sweden, for example, the local Wikimedia chapter are working with the National Library to openly share data for around 700,000 works from the Swedish Bibliography.
Many challenges lie ahead, but it’s clear from the diversity of people and projects at this conference, that Wikicite is very much already happening.