The first Wikimedia + Education conference

Participants in the Wikimedia Education conference – image by Jon Urbe-Foku CC BY-SA 4.0

By Jason Evans, National Wikimedian for Wales.

In April 2019 the Basque Wikimedians User Group hosted the first Wikimedia + Education conference in Donostia. Using Wikipedia, and other Wikimedia projects in education is nothing new. There is a vibrant and well established community already engaged in a diverse range of projects from Wiki Clubs in primary schools to accredited Wikipedia based modules in universities. It’s hard to believe then that this is the first official, global, gathering of Wikimedians and educators involved in this work.

In its early days Wikipedia was shunned by educators – pooh-poohed as inconsistent and unreliable. But Wikipedia has long established itself as the go-to for information for millions of people in hundreds of languages, especially our young digital natives. In Wales for example, each time we ask school children, between 80-90% say they often use Wikipedia to help with their school work.

Gradually then, educators are beginning to realise that rather than ignore the gigantic free encyclopaedia in the room, they, with their students, can actually benefit from contributing to Wikipedia. This allows them to teach a whole raft of skills, such as research, digital literacy, collaboration and critical thinking. Contributing to Wikipedia also allows young people to feel like they are contributing something to society. Wikipedia gives their school work real world value. Rather than writing as essay, which is simply marked and filled away in a drawer, students can make a lasting contribution to collective knowledge in their language, which is accessible by anyone, anywhere in the world.

One thing that struck me at this conference was the diversity of participants. Education in Wikipedia is definitely not an English or even Western Centric concept, and often Education project facilitators, be they Wikimedia chapters and groups, universities, cultural institutions or even local governments, are motivated by the desire to increase content in a given language, and to increase the use of that language in the classroom. Activities in the Basque Country and Catalonia, areas keen to protect and promote their own unique language and cultural identity, are good examples.

There was a wide range of activities presented at the conference – With Robin Owain flying the flag for Wales with a presentation on progress at home. He also presented a fantastic video produced by Aaron Morris, Wikipedian in Residence with Menter Mon, highlighting his recent work with Welsh primary schools.

University level education activities were well represented at the conference with Wikipedia based assignments proving increasingly popular in universities around the World. In the UK, Edinburgh University has lead the way with this work and in Ireland Maynooth University has found Wikipedia contribution is hugely popular amongst students. The practice is also well establish in North America and many European countries.

Some Universities, drive participation with the help of a Wikipedian in Residence, or through training librarians. In Serbia universities have appointed Wiki ambassadors and the Catalans have a group of dedicated volunteers who coordinate projects within universities.

In Wales, higher education have been slow to embrace Wikipedia as a teaching tool. Individual lecturers at Swansea and Aberystwyth Universities have begun to explore the possibilities but there is definitely great potential for more engagement. Where we have had increasing success in Wales is in Secondary schools, thanks to the work of Aaron Morris. A number of schools are now teaching students digital competencies and Welsh language skills through Wikipedia editing as part of the Welsh Baccalaureate. Schools have even started forming Wiki Clubs and Primary Schools are also teaching their children about Wikipedia.

Presentations from educators in Argentina, Armenia, France, Catalonia, the Basque Country and others show that Wales is not alone in engaging younger children with Wikipedia editing. Katherine Maher, in her Keynote, pointed out that many of the movements most valuable contributors today began editing when they were thirteen or even younger. I tweeted her quote and had replies from editors who were as young as 8 years old when they made their first Wikipedia edit.

For many teachers, Wikipedia is merely a vehicle for effectively teaching a range of skills, which they would need to teach regardless. But for the Wikimedia movement and those in local governments with a mandate for supporting the growth of a language or culture, teaching Wikipedia can be seen as a long term investment in young people – instilling the notion that they can play an active role in the future of their language by contributing information rather than simply consuming it. This also helps build up the digital presence of a language which is essential for further investment in online infrastructure, by the likes of Microsoft and Google.

Armenian Wiki-clubs have been hugely successful with more than 30 clubs active around the country. Each club has its own trained coordinators and children contribute content they consider interesting, such as cartoons, films and music. Clubs also allow children to contribute through simple tasks to Wiktionary, Commons and other Wikimedia projects – which is a great way of lowering the barriers to entry. Club coordinators are responsible for checking the quality of all contributions of students. With only a small community of editors on the Welsh Wikipedia, the ability to manage and correct large amounts of new content from younger people would definitely need consideration here and the training of new trainers, be they teachers or community leaders and the production of more documentation and guidelines would be essential in replicating any such project at scale.

Participants at the Wikimedia Education Conference 2019 – image by Maialen Andres-Foku CC BY-SA 4.0

LiAnna Davis of the Wiki Education Foundation raised another issue which deserves consideration. Increasingly our young people are consuming knowledge through video rather than through reading. This might not be a bad thing, but it’s not something Wikipedia is very good at, especially in smaller languages. Should we be considering the creation of open video content as part of educational projects? At the very least this would complement and add value to a Wikipedia article, so i think it’s definitely something to consider.

From a Welsh perspective the approach of the Basque community is probably the most inspiring and the most relevant to our ambitions for the Welsh language Wikipedia. There are actually at least 10 active education programmes in the Basque Country. Some are small, but valuable programmes such as the work by Mondragon University to rewrite the lead to Wikipedia articles related to citizenship based on perceptions of school children – an exercise they call ‘Politics through Participation’. There is the Txikipedia, children’s Wikipedia project – the only children’s Wikipedia in the world to sit within a languages main Wikipedia, as well as an interesting community project aimed at encouraging locals to write about their local area. However it’s the work of creating content in universities and secondary schools which aligns best with our ambition to raise the standard of the Welsh Wikipedia for all, including the high percentage of young people who use it to find information for school work.

University students in the Basque country editing Wikipedia – image by Xabier Cañas CC BY-SA 4.0

Basque Wikimedians took the school syllabus for primary and secondary schools and, with the help of subject specialists, used it build a list of over 1800 articles which were vital for children’s education. They then partnered with Basque language universities to develop a program for students to create content relevant to secondary school pupils, and they partnered with secondary schools to write content relevant to primary schools.

In Wales the government has a long term strategy to grow the number of Welsh speakers to 1 million by 2050, and Welsh Wikipedia (along with Wikidata) is now officially recognised as an important part of that strategy. Implementing a strategy similar to the Basque Country would help the government achieve targets around digital competencies and the Welsh language in schools, whilst at the same time educating children about the topics being added to Wikipedia, and the output of this work becomes part of the open knowledge ecosystem in the Welsh language, where all Welsh speakers stand to benefit.

As we look to build on recent success in Wales, this conference has provided valuable insight into the incredible work already happening all around the world, from the perspective of educators and Wikipedians.

See more images from the Wikimedia Education conference 2019 on Wikimedia Commons.

A call to action: Wikidata-fy your Commons photos

By Martin Poulter, Wikimedian in Residence at the Bodleian Libraries, Oxford

The speed at which Wikidata is acquiring descriptions of paintings, sculptures and other museum holdings is impressive, but there is much further to go. It’s ironic that at the same time, we already have an enormous art database hiding in plain sight.

The Commons index of Indian art may be the largest digital collection of Indian art ever created. Its collection of Cities of France in art may be the biggest such index in the world. No institution has collected as many pictures of astrolabes as are in the Commons astrolabe category. The superlatives go on.

The problem is that what we have on Commons isn’t yet structured data: it’s not possible to get all images matching a chosen set of criteria; just the criteria that are pre-baked into the category system. Meanwhile, there are items from cultural institutions which are described in Wikidata and also have an image in Commons, but the two are not yet linked up. It’s crazy but it’s just the result of the order in which our platforms developed.

Structured data is coming to Commons in the longer term, but there are things we can all start right now. We can use the existing structured database – Wikidata – to improve the content and findability of photos on Commons.

The secret weapon is the Art Photo template:

  • It imports data about the artwork from Wikidata, just needing the relevant Wikidata identifier (a Q followed by some digits)
  • It keeps distinct the properties of the photograph and of the object in the photograph.
  • It has fields for photo date and photographer, distinct from the date and authorship of the item. If you take a photograph in 2019 of a statue from the 9th century, we want to avoid the ambiguity of a single date field..

The resulting entries are more multilingual, more detailed, have links to further information, and will automatically draw updates from Wikidata. Unit conversions (e.g. between inches and centimeters) are done automatically. In other words, we take some dull work from human beings and make the computer do it instead. All this while decreasing the amount of wiki-code on Commons!

Before:

Image CC BY-SA 4.0 Wikimedia Commons

After:

Image CC BY-SA 4.0 Wikimedia Commons.

Before:

Image CC BY-SA 4.0 Wikimedia Commons.

After:

Image CC BY-SA 4.0 Wikimedia Commons.

Before:

Image CC BY-SA 4.0 Wikimedia Commons.

After:

Image CC BY-SA 4.0 Wikimedia Commons.

If you’ve photographed a museum exhibit or piece of public art and shared the image on Commons, or if you’re interested in art works of a certain kind, I urge you to take a look through those photos and to search Wikidata to see if it describes what you photographed.

If the item does not yet exist on Wikidata, it’s surprisingly easy to add it. Take the Wikidata tours to learn about the interface.

What are WikiJournals?

This article was jointly authored by Thomas Shafee and Jack Nunn from the WikiJournals board, and edited by John Lubbock of Wikimedia UK.

The WikiJournals are a new group of peer-reviewed, open-access academic journals which are free to publish in. The twist is that articles published in them are integrated into Wikipedia. At the moment, there are three:

WikiJournals are also highly unusual for academic journals, as they’re free for both readers and authors!

What WikiJournals hope to achieve:

The  aim of these journals is to generate new, high-quality peer-reviewed articles, which can form part of Wikipedia. As well as new articles, submissions can include existing Wikipedia pages, which are then subjected to the exact same rigour as any other submission.

The hope is that this new way of publishing peer-reviewed content will encourage academics, researchers, students and other experts to get involved in the process of creating and reviewing high-quality content for the Wikimedia project. It also allows participants a way of putting their contributions on their CV with an easily definable output (including DOI links and listing in indexes like Google Scholar).

When an article gets through the peer review process, there are two copies. The Journal copy can now be reliably cited and stays the same as a ‘version of record’ alongside the public reviewer comments. The Wikipedia version is free to evolve in the normal Wikipedia way as people update it over time, and is linked to the Journal article.

Since 2014, articles have been published on massive topics like Radiocarbon Dating and niche topics like Æthelflæd. They’ve also published meta analyses, original research, case studies, teaching material, diagrams and galleries!

 

Some submissions are written from scratch. Others are adapted from existing Wikipedia material. The journal editors invite academic peer reviewers to publicly comment. If published, suitable material is integrated back into Wikipedia to improve the encyclopedia. From ref.

How to get involved!

If this sounds like the sort of thing that you’d like to get involved in, support, or just spread the word on, there’s plenty of ways to contribute!

School projects

So here’s an example for a teacher. You have a class of 30 keen students who would normally all write an essay on a subject, have it read once, then never seen again. An alternative could be to have students in groups of 5 each chose a section of a neglected Wikipedia article to update and overhaul (there are millions of stub and start class articles to choose from). Each group writes a section of the article, then proofreads each others sections (WikiEdu has a great dashboard for this). Once the article is up to scratch, it’s submitted to the relevant WikiJournal who reaches out to experts in the topic to give in-depth feedback on what can be improved. If you and your students are able to fully address those comments then the article can be published and you and your students have just generated a new Wikipedia article read by thousands, and an academic article to put on their CVs!

Teachers who would consider using this method as an assessed class exercise can ask for advice from Wikimedia UK. We think that this workflow offers a useful alternative to simply having students write parts of Wikipedia articles in class, which may be harder to assess, and doesn’t provide a final product as tangible as a published journal article.

Academic outreach

The current priorities for the WikiJournals are to expand and improve representation on their editorial boards, and to invite article submissions. If you would like to volunteer in these roles, we encourage you to talk to the WikiJournal organisers.

If you are based in a UK academic institution at a course that has a strong strategic overlap with Wikimedia UK’s strategic priorities, you can also email education@wikimedia.org.uk to talk to us about providing advice on using WikiJournals as part of your course.

Individuals

The journals always welcome new submissions. Whether they’re written by a professor or a student, all go through the same process. You could get a team together to submit a brand new article. Or maybe you could overhaul and submit an existing Wikipedia page. You could even help translate an existing article.

They have a public discussion forum (typical for a wiki, unusual for a journal!) where you can share ideas for improvements, other projects they could reach out to or point out gaps in Wikipedia’s content where they could invite researchers to write an article.

Each journal has a twitter and facebook account (@WikiJMed, @WikiJSci and @WikiJHum) so feel free to chat with them there. You can even suggest social media posts or accounts to follow. Not into social media? Maybe put a poster in your university tearoom.

Reviewing Draft Articles and the Battle Against Misplaced Templates

This article is by Wikipedia administrator User:TheSandDoctor

Early this past December, I was reviewing article submissions on Wikipedia and noticed that some included templates, pages created to be included in other pages, that were inappropriate for pages that are not yet articles. This got me thinking, how widespread is this misconception? A search led to the discovery that there were over 500 — roughly one percent of the 42,939 drafts. While this is indeed a small percentage, that is still over 500 drafts which may result in confused new editors.

42,939 drafts present in the Draft namespace as of 30 December 2018. Generated using Quarry Beta (report link). Photo: TheSandDoctor.

The English Wikipedia consists of 32 namespaces, different sets of Wikipedia pages whose names begin with a particular reserved word recognized by the MediaWiki software. While it would be cumbersome to list them all here, it is worth mentioning the Draft, Main/Article (“mainspace”), and Wikipedia namespaces. The draft namespace is somewhat special as, unlike the others, it is not indexed by most search engines, including Google. This allows it to be a place where editors can develop article drafts that are not yet ready for indexing, may not yet have demonstrated adequate notability, or are notable works in progress. When a draft is deemed ready by an editor with sufficient user rights and experience, it can then be moved to the “main” article namespace, where pages most readers are familiar with are located. Another path by which a draft may make its way to the mainspace is through Articles for Creation (“AfC”), a peer review process in which experienced registered editors can either help create an article submitted by an anonymous editor or decline the article because it is unsuitable for Wikipedia.

An article may be unsuitable for Wikipedia for a number of reasons. AfC submissions can be declined based on having insufficient content, consisting of vandalism or personal attacks, posting copyrighted material, not asserting notability, and most often for not being properly sourced. We also do not accept new articles where a page on the topic already exists, even if under another name.

Articles for Creation welcome page

Articles for Creation is consistently backlogged with sometimes more than 2,000 drafts awaiting review. Any editor whose Wikipedia account is at least 90 days old, has made 500 edits to articles, has good understanding of the various notability guidelines, and agrees to review solely on a volunteer basis may apply to become a reviewer. It is important to note that despite the constant backlog, submissions must be carefully reviewed for whether or not they meet the criteria. However, in the case of those which meet any of the quick-fail criteria, the total time investment is much lower.

In order to speed up the number of submissions reviewed while maintaining review quality, the number of active reviewers must increase. If you are interested in helping out and have had an account for more than 90 days with over 500 total edits, you can find out more and how to apply here.

(Bottom) An example of an unreferenced template placed on a draft.

By its very nature, the pages within the draft namespace are designed to be separate. They are not permitted to be active members in categories nor are they permitted to be linked to within existing articles as doing so would defeat the purpose of the draft namespace — a workshop or incubator of sorts for new articles. By this very nature, having templates such as “Underlinked” or “Orphan” — both being designed for articles, not drafts — present could confuse new contributors as to its purpose, potentially giving incorrect information.

“Underlinked” refers to having too few incoming links from existing articles while “Orphan”, in the Wikipedia context, refers to having no incoming links from articles at all — in essence being orphaned from the rest of the encyclopedia. In articles, both of these are issues that could affect its visibility and discoverability, but for drafts that is the entire point. They are not ready for the spotlight that is indexing; if they were, they should not be in the namespace to begin with or should have been moved out of it already.

I have worked on three bots so far, each with vastly different purposes and two of the three performing multiple tasks. Unlike the others, TheSandBot is the most multipurpose. As was previously written about for Wikimedia UK, the bot’s first task was a temporary one moving articles and other pages.

With this in mind and armed with the statistics, — one percent is still far too high — I got to work on the second task for TheSandBot. The sole purpose of this task is to look for templates which should not be in drafts and remove them if present. So far, the bot only looks for the following four templates and any of their aliases (too many to list), but more could be added with minimal effort.

{{orphan}}
{{uncategorized}
{{underlinked}}
{{unreferenced}}

As of 17 December 2018, the task has been approved and is now set as an automatic cron job, running daily at 03:00 UTC (3am UTC, 7pm Pacific, 10pm EST). This task is different than all the rest that I have worked on. Whereas my others, while lacking a fixed start or end date, were temporary, this is my first task without an end date at all. So long as there is a need and this task is not shut off, it will run at that time for the rest of time.

I am not too sure where my bot work will take me next, but I am definitely excited for the future possibilities. Maybe I will finally be able to take over the Good Article review clerking duties from Legobot, which the current operator wishes to partially retire. Either that or I might find something else that needs fixing. Something always needs fixing on a project the size of Wikipedia. One thing I do know for sure though is that, at over 270,000 combined edits, my bots are closing in on having performed 300,000. This will most likely happen later this year.

_______________________

If you’re a developer working on the technical side of Wikimedia projects, there is a community of developers in the UK you can get help and advice from. Wikimedia UK will be running Wikidata meetup events every couple of months in London, and the best way to find out how to get involved in improving the Wikimedia projects is by talking to other developers. We also encourage Wikimediand to write for our blog about their work to encourage others to get involved. So get in touch if you have ideas!

 

How Wikipedia infiltrated academia

Martin Poulter, Wikimedian in Residence at the Bodleian Libraries Oxford – image by Jwslubbock CC BY-SA 4.0

By John Lubbock, Communications Coordinator of Wikimedia UK

It was about 2007 when Wikipedia hit the mainstream. Millions of students were using one website to get an introduction to their new course subjects, and many, of course, were not particularly careful about their use of copy and paste.

In a way, Wikipedia was the victim of its own success. It expanded rapidly and gained a place in the public consciousness before the community and organisations that support it had a chance to catch up. Wikipedia is supported by a network of charities, with the Wikimedia Foundation – which owns Wikipedia– based in San Francisco, and local Wikimedia chapters set up in other countries where big editing communities existed and had begun to organise themselves. Wikimedia UK wasn’t formally established until 2009.

By that time, many in academia had already formed a bad opinion of Wikipedia, with the perception of it as unreliable and lacking academic rigour resulting in it being discounted as a useful tool. Underneath this, some educators saw it as a competitor: what would be the use in professors if all the facts were freely available?

But by 2016, journal articles were being published which looked at the remaining obstacles to using Wikipedia in academia, with writer Piotr Konieczny noting that:

“Wikipedia is not our foe but rather an ally—a new and, perhaps, somewhat uncouth ally—but an ally nonetheless, and one that I will argue that educators should embrace more wholeheartedly for the good of our students and the wider society.”

When the UK Wikimedia chapter was founded in 2009, its community had to consider what the local organisation should do; what was its purpose? There are a number of English-speaking countries with local Wikimedia chapters, and with 3 million articles the English language Wikipedia was already the biggest of the nearly 260 language versions of the encyclopaedia, so the charity instead looked to form partnerships with the UK’s world renowned cultural institutions.

To do this, an entirely new kind of role was created, called a ‘Wikimedian in Residence’. In 2010, Wikimedia contributor Liam Wyatt began the first ever residency by organising a volunteer placement as Wikimedian in Residence at the British Museum. In 2011, the University of Bristol hosted a Wikimedia ambassador, and the British Library followed suit with an appointment from 2012-2013.

Since then the charity has collaborated with a very wide range of institutions including Bodleian Libraries, British Library, Cancer Research UK, National Library of Scotland, National Library of Wales, Royal Society, York Museums Trust, Wellcome Library, and the University of Edinburgh. Many other universities and cultural sector organisations have run smaller-scale projects and events supported by our charity and the volunteer community in the UK.

Wikimedians in Residence are often tasked with training staff members at their host institution to use Wikipedia (and its sister projects like Wikidata) as part of their work as an academic, curator, librarian or researcher. With time and increased exposure, the cynicism towards Wikipedia has turned into a realisation of its importance as a communication tool read by hundreds of millions of people every month.

Wikipedia monthly statistics (English version only) from Wikimedia Foundation stats page.

In a recent report on the PLOS science and medicine blog, Rice University’s Kaden Hazzard noted that,

“Wikipedia pages on physics have a huge impact. The numbers speak for themselves. The page “Quantum computing” is viewed in excess of 3,000 times every day. “Nanotechnology” is viewed in excess of 2,000 times per day. Even a topic like “Monte Carlo method” is viewed 2,000 times per day. I could teach every semester for my entire lifetime and not reach as many students as these Wikipedia pages reach in a single day.”

Wellcome Library Wikimedian in Residence Alice White training secondary school pupils to edit Wikipedia at Imperial College London in 2018 – image by Jwslubbock CC BY-SA 4.0

Educators are coming to realise two things: firstly, that there is no point ignoring Wikipedia, and secondly that used correctly it can complement traditional study methods in useful and new ways.

There are now lots of case studies of universities using Wikipedia in and out of the classroom, such as a professor who got his students to improve pages on Islamic art at the University of Austin, Texas; the Wikipedia editing club at Dundee dentistry school, WikiEdu’s Classroom Program, the Medieval History MA at Sheffield University, Welsh Baccalaureate including editing Welsh Wikipedia, and the Women’s Classical Committee Wikipedia workshops, to name just a few.

Back in 2013, the Guardian ran a piece entitled ‘Should university students use Wikipedia?’ Use in what way, I would ask? The article demonstrates a number of misconceptions about Wikipedia that still exist to some extent, and conflates ‘using’ Wikipedia with citing it, but does still show that attitudes were slowly changing.

Wikipedia is a summary of secondary sources, and not what we would call an ‘academic level’ source. We don’t want people to cite it, but that doesn’t mean they shouldn’t use it. Abstinence-only Wikipedia education doesn’t work, so the important thing is to get them to use it in the right way, as a starting point for research which can teach important writing, IT, and critical thinking skills.

The important question for us as the Wikimedia community in the UK now is how to encourage course convenors at universities to use the Wikimedia projects as part of their courses. Educators like Carl Gombrich, who runs the BASc Arts and Sciences combined degree at UCL, and Stefan Lutschinger who lectures in Digital Publishing at the University of Middlesex have both been using Wikimedia projects to help teach students digital skills, and we hope to see many more universities following this trend.

Wikipedia is not just a free encyclopaedia, it’s about free content that anybody can reuse, remix, and consume in any way they want to. Creating openly licensed materials that can help people educate themselves for free, and which artists, journalists, and others can use in their work is vital to keep creativity flourishing. In this, Wikimedia UK has the same aims as universities, and we provide valuable resources that they can use to educate their students.

We haven’t debased education, we’ve democratised it. Perhaps there are some universities who revel in their status as ivory towers, and will never work with us. But I don’t think that’s the majority of them, and as they are increasingly staffed by digital natives, I think that the future is promising for an ever deeper partnership between Wikimedia and education.

New Year Wikimedia ideas from Magnus Manske

Magnus Manske presenting at Wikidata Lab VIII 2018 – image by Mike Peel CC BY-SA 4.0

By Magnus Manske, MediaWiki developer and longtime Wikimedian

The new year is just over two weeks old, but the WikiVerse already celebrated a joyous event: Wikipedia’s 18th birthday! 50 million seems to be the number of the day – 50M articles across Wikipedia editions, 50M files on Commons, 50M items on Wikidata. But all this free content does not appear in a few big strokes, it comes from millions of uploads and edits. Your work, our work, build this vast repository of knowledge, one small action at a time. I would like to use this opportunity to share with you a few of those actions I have been involved with in these first few days of the year.

My hope is to inspire you to look at new areas of our project, to take a leap and follow an interesting tangent, but above all, to remember that every edit, every cited reference, every vandalism revert adds to the Sum Of All Knowledge, and that it will be valuable to someone, some day, some place.

Images

An image from the Stadtarchiv Linz am Rhein – image CC BY-SA 4.0

Sometimes, knowledge is already present in our project, it is just cleverly hidden, and begs to be released. For example, Wikidata has many items about people, some of them with an image. Wikidata also has items about paintings, and some of these have an image as well, but they might not have a “depicts” statement.

But if the image of the painting is the same used for a person, it is likely (though not guaranteed!) that the person depicted in the painting is that person. A simple SPARQL query shows us about a thousand of such item pairs. And even if the image is not of the person (for example, sometimes a painting by a painter sneaks into the item as a painting of the painter), it can be an opportunity to remove a wrong image from the item about the person.

Similarly, over 1700 Wikidata items use an image of a church, but another “church item” uses it as well, often revealing either a wrong image use, or a duplicate item.

User:Christoph_Braun has used my Flickr2Commons tool to upload over a thousand historical images released under a free license by the Linz am Rhein city archive to Commons. You can help put these pictures to good use, by finding Wikidata items (and by extension, often Wikipedia pages) without an image, by coordinates or by category. If you want to add free images to Wikidata items, but don’t want to go hunting for them, the FileCandidates tool has hundreds of thousands of prepared possible image-to-item matches waiting for you. And if you would like to add more “depicts” statements to items, topicMatcher is there for you (also offering “main subject” and “named after”).

Mix’n’Match

Mix’n’Match is one of my more popular tools, especially with the authority control data fans. It has passed 50 million entries recently, most of which are waiting to be matched to a Wikidata item. To cut down on the number of entries that need the “human touch” to be matched, I have various helper scripts running in the background to automatically match entries to items, when it is reasonable safe to do so.

One of these helper scripts uses the name, birth, and death year for biographical entries to find a match on Wikidata. Since entries are imported from many different sources, getting metadata (such as birth/death dates) for an entry is not standardized. I had already written bespoke code to extract such dates from the entry descriptions for several catalogs, but this year I sat down and systematically checked all ~2000 catalogs for date information in their entries, and to extract them where possible. The fact that one finds dates ranging from plain years, over ISO format, to free-text French, requires individual code for every single catalog with dates. This is now complete, as of a few days ago. Initial runs led to over ten thousand new matches with Wikidata. Of course, all those matches are turned into Wikidata statements as well, where the catalog has an associated property.

In a similar fashion, I have code to extract third-party identifiers (e.g. VIAF) from descriptions or web pages of entries. These can then be used to match entries to items, or to add those identifiers as statements to an already matched item. Matching on such identifiers requires them to be present in Wikidata, so adding such statements on Wikidata proper directly helps Mix’n’Match (and everyone, really). If you want to give it a try, this list has over 1000 items that likely have a GND (and probably VIAF) identifier, but are missing from Wikidata.

Some catalogs are more easy to match to Wikidata than others. Entries with ambiguous names and no description are hard. Biographical entries with a description, birth, and death date are much better. Taxonomic entries with Latin species names are easiest, as we have a Wikidata property for those, and plenty of species to match to. Usually, automated matching can get >90% for these. However, this new catalog about fossil plants has less than 3% matches. A new area to be imported and curated on Wikidata!

Matching Mix’n’Match entries to items helps Wikidata only if there is a property associated with the Mix’n’Match catalog. Likewise, it is helpful to link from a property to the associated Mix’n’Match catalog(s). I have created a new status page that shows missing links and inconsistencies. This complements my reports on individual catalogs. All of these reports are updated regularly.

This and that

Most Wikipedia articles have an associated Wikidata item. However, newly created articles are often not immediately linked to Wikidata, via a new or an existing item. These “Wikidata-orphaned” articles can be found “by wiki”, for example English Wikipedia. It is a constant battle to prune that list manually, even with a game for that purpose. The number of such orphaned articles shows a curious pattern of “mass-matching” and slow build-up. Some investigation shows that a Wikidata user regularly creates new items for all orphaned  articles, across many wikis. While this links the articles to Wikidata, it potentially creates a lot of duplicate items. Worse, since these items are blank (apart from the site link to the article, and a title), automated duplication detection is hard.

To get a handle of the issue, I found all blank items created by that user in that fashion.

That list amounts to over 745K (yes, 3/4 of a million) blank items. For convenience, I have created a PagePile for them. Please do note that this is a “snapshot”, so some of these items will receive statements, or be merged with other items, over time.

Structured Data is coming to Commons!

For starters, there are multi-lingual file descriptions, but statements should follow during the course of this year. Since this is using Wikibase (the same technology underlying Wikidata), it will use (more-or-less) the same API. I have now prepared QuickStatements to run on Commons, however, the API on Commons is not quite ready yet. Once the API is functional, you should be able to edit Commons MediaInfo data via QuickStatements, just as you can edit Wikidata items now.

I had a few reports on PetScan dying for certain queries. It turns out that using a huge category tree (say, >30K sub-categories) will cause the MySQL server to shrug, taking PetScan with it. I have re-written some of the PetScan code to run several smaller chunks of such a query instead. It seems to work well, but please report any strange results to me.

I hope this little tour has given you some ideas or motivation for work on our project. Happy new year everyone, and may your edits not be reverted!

Follow Magnus on Twitter.

#WikipediaDay – Wikipedia turns 18

Wikipedia birthday cakes made for Wikipedia’s 16th birthday – image by Beko CC BY-SA 4.0

By John Lubbock, Wikimedia UK Communications Coordinator

January 15 is the anniversary of the day on which Wikipedia was launched in 2001. I first got involved with Wikipedia in 2011, when I volunteered at a party organised by a friend of mine for Wikipedia’s 10th anniversary. 18, although a coming of age in many countries, doesn’t have quite the same ring to it as the 10th or 20th anniversary, and so there’s no big party this year, but we are marking it on social media anyway with the hashtag #WikipediaDay, and asking people to send us messages about why they value Wikipedia, why they think others should value Wikipedia, and what they would say to someone to encourage them to become a Wikipedia editor.

We’ve also released a video interview with Wikipedia co-founder Jimmy Wales, which is on our YouTube channel, as well as on Wikimedia Commons, where you can download it to reuse however you want.

We’d love to hear how everybody else is celebrating Wikipedia Day, and what you are looking forward to doing or working on with any of the Wikimedia projects this year. There are lots of important Wikimedia events coming up this year, and we hope to work with more academic and cultural institutions than ever before to grow Wikipedia and help people use it in an effective way. The Structured Data on Commons will hopefully finish, which will lead to big improvements on Commons, and there will be lots of work to promote and document Wikidata as it continues to evolve into an important project in its own right. So send us a message on social media and tell us what you’re doing and what you’re looking forward to!

Using bots to change the landscape of Wikipedia

A robot by Banksy in New York – image by Scott Lynch CC BY-SA 2.0

This post has been written by User:TheSandDoctor, an admin on English Wikipedia. An original version of this article appeared on Medium.

A Request for Comment (RfC) is a process for requesting outside input concerning disputes, policies, guidelines or article content. As an admin on the English Wikipedia, I deal with these kind of bureaucratic issues regularly.

For a bot task to be approved on the English Wikipedia, a request, called a Bot Request For Approval (BRFA), must be filed. If there is determined to be sufficient need warranting the task, a member of the body which provides oversight on bots, the Bot Approvals Group, will generally request a trial. If the trial goes to plan, the task is usually approved within a couple of days following the trial’s completion. In the event that there are issues, those are then resolved by the submitter(s) and the reviewing member(s) are notified. This is then followed by, potentially, a new trial. In the event that things went according to plan this time around during the retrial, the task would most likely be approved shortly thereafter.

After a successful Request for Comment, I knew it was time to get to work on my next Wikipedia bot. Little did I realize at the time, that this would be the most controversial task that I had filed to date and would end up triggering an unprecedented series of events I never predicted, culminating in the rare re-opening of a Request for Comment. The change that resulted in this series of events? Moving the year an election or other referendum took place from the end to the front of the page name. For example,

United States presidential election, 2016 would become 2016 United States presidential election or Electoral fraud and violence during the Turkish general election, June 2015 would be renamed Electoral fraud and violence during the June 2015 Turkish general election, with the old titles being valid redirects as to avoid the breakage of any incoming links.

It was October 17, 2018 and the opening of the approval request started off as countless others I had filed did in the past, with routine questions being asked by a volunteer Bot Approvals Group member, in this case the user named SQL. It was at this point when there were some indications that this would not go as smoothly as I had previously experienced. It was slightly unusual when the normally quiet and routine process began to attract more attention from editors and other members of the Bot Approvals Group, who began to express concerns regarding the RfC itself. In particular, concerns were expressed that there was not enough participation within the original Request for Comment and that it was inadequately advertised at the various relevant noticeboards watched by editors who may be affected by the proposed article naming convention change. By October 20th, the unprecedented happened. The decision was made to reopen the Request for Comment, and the discussion kicked off once again, with the bot approval request taking a temporary backseat. The reopening of a Request for Comment is a fairly unusual measure that while possible, is seldom done or deemed necessary.

Following the RfC’s reopening, there was thorough discussion on both sides of the debate, which lasted an additional 31 days. On November 20th, 2018, the findings of the original close were confirmed. The consensus was that the naming convention was to be updated as proposed and, as a direct side effect, the bot task which I had submitted was given a renewed life. The upholding of the initial close, this time with clearer support, effectively cleared the way for a trial run. It was decided on the task’s discussion page that roughly 150 articles would be renamed in the trial of my task approval request. The task to move the pages to correspond with the updated naming conventions was approved on November 27th, following the successful completion of the trial and after leaving a few days holding time for any further comments or technical concerns.

From November 27th until early December, TheSandBot enacted the consensus achieved by the Request for Comment, moving (renaming) over 43,000 election related pages within a couple of days.

When a page is moved/renamed, mediawiki, the wiki software which Wikipedia uses, creates a redirect from the old title to the new one. This is done in an effort to prevent the breakage of any links to the older title. Instead of visiting the old link and receiving the equivalent of a HTTP 404 error, readers are instead merely redirected to the new location. Move operations have either two or four parts, each of which takes one edit. In the case of the former, since both parts of a move operation take one edit each (a redirect page creation and a move), two edits are performed for every ‘move’. In the latter case it is slightly more complicated, but the actions are doubled. Taking advantage of this property, I was able to save time and reduce the size of the task script. As a consequence, despite the fact that approximately 21,000 articles were moved, the logs indicate 43,000 were and registered over 86,000 edits within that time frame (see figures above/below).

From left to right: total number of edits over the account lifetime, further statistics regarding the edits made within the past year.

An example of the four edits per page move mentioned above. N signifies a page creation, m signifies a minor edit, which page moves are considered automatically by the software

With the successful completion of all the specified page moves, it is the end for that particular task. Now it is time for me to move onto different ones, like the recently approved task removing article specific templates from drafts. There is always more work to do within the largest online encyclopedia that is Wikipedia.

Find out more about TheSandDoctor’s work at thesanddoctor.com.

WikiCite conference 2018

Group photo – WikiCite 2018 (can you spot Jason?) – image by Satdeep Gill, Wikimedia Commons CC BY-SA 4.0

By Jason Evans, National Wikimedian at the National Library of Wales

Imagine a world in which anyone could use an open citation database to support free knowledge, with rich information about every citable source.

Any Wikipedian or Wikipedia advocate will tell you that one of the great strengths of Wikipedia is its citations. In fact, a Wikipedia article is only as strong as its citations. They provide evidence for the statements made in an article but they also provide a gateway to reliable secondary sources for deeper learning.

In recent years Wikipedia has been overtaken as the fastest growing Wikimedia project by Wikidata – a linked open database of facts – or the Wikipedia of data, if you like. Wikidata has grown at a tremendous rate, as people and institutions use it as a hub for their data, joining up the world’s open data in an interconnected web. Quite organically, it began to act as a platform for sharing bibliographic and citation data, to the point that 40% of Wikidata’s 60 million items now describe academic papers and articles.

Watch a video about Wikicite from the 2017 Wikidata convention in Berlin

The emergence of Wikidata has lead to the growth of the WikiCite movement which aims, broadly speaking, to harness the power of structured data to create open structured data for all citations used in Wikipedia.

This was my first WikiCite conference, and what became clear to me from day one was that this is very much a project still exploring its scope and trying to understand its place in the Wikimedia family of projects. But already there is a growing community of librarians, Wikimedians and data scientists keen to explore the potentials of the overarching concept.

Potential benefits of WikiCite are varied and wide reaching, and they serve separate communities in different ways. For example, since Wikidata items can be labelled and described in 100s of languages, any structured citations on Wikipedia become multilingual, which has clear benefits for smaller language communities. And structured citations would make it much easier for us to analyze the diversity and quality of citations being used in Wikipedia projects. It would allow us to map works which cite other works, or pick out retracted papers, making it easier to manage the relevance and quality of citations across multiple languages.

Approximately 1% of Wikipedia users click on a citation when they read a Wikipedia article, and this rises to 30% or more for more academic topics such as mathematics and engineering. And whilst these might seem like low numbers, 1% is still around 76 million clicks a month. So structured citations, in a standardised format that links to deeper data about a work (hopefully facilitating access to a digital copy of the work or providing details of physical holdings), will certainly add value to the current system for citations which are essentially comprised of strings of textual information.

Implementing this kind of fundamental change to Wikipedia, across multiple language editions presents huge technical and social challenges in itself, and as such it has been proposed that any conversion to structured citations should start small, on smaller Wikidata-friendly language versions of Wikipedia, before tackling English Wikipedia, with its nearly 6 million articles.

However the WikiCite vision is even bigger and more ambitious.

Participants at Wikicite 2018 – image by DarTar, Wikimedia Commons CC BY-SA 4.0

Imagine Wikidata items for every citation on Wikipedia, and then consider the added value of a massive centralised, or ‘federated’ bibliographic commons, where individuals, institutions and organisations can give access to bibliographic corpora, ranging from collections of niche scientific papers to a country’s entire publishing output – a library catalogue for the sum of all human knowledge. That may sound implausible, but Wikipedia didn’t become the 5th largest website in the world by dreaming small.

As you can imagine, this larger ambition has a few potential issues, which is why it is currently referred to as ‘the moonshot option’. There are questions around the technical ability to host, manage and maintain all this data in a standardised and centralised way. And if you decentralise the data to multiple instances of Wikibase (the platform which powers Wikidata), then how do you ensure that all these databases retain the semantic structure required for consistent and seamless communication between instances?

[pdf-embedder url=”https://wikimedia.org.uk//wp-content/uploads/2018/12/WikiCite_2018_-_WikiCite__gender_diversity_visibility.pdf” title=”WikiCite_2018_-_WikiCite_&_gender_diversity_visibility”]

Wikicite presentation on gender diversity by Rosie Stephenson-Goodknight – Wikimedia Commons CC BY-SA 4.0

Another important question which comes out of this conference is: how do we ensure that any development is inclusive of other languages and cultures? Done properly this initiative should make it possible to have a greater diversity in sources on our Wikipedia. For years, the use of Western sources to inform readers about non-western concepts, languages and societies has been bugbear for Wikipedia.

In Wales, we have already embarked on a project to share the ‘Sum of all Welsh Literature’ via Wikidata, in a bid to encourage the use of Welsh publications to cite articles about Wales, its people and culture. And we heard of similar projects getting under way in other parts of the world. In Sweden, for example, the local Wikimedia chapter are working with the National Library to openly share data for around 700,000 works from the Swedish Bibliography.

Many challenges lie ahead, but it’s clear from the diversity of people and projects at this conference, that Wikicite is very much already happening.

To find out more about the project, check out the Wikicite Wiki page

University College London undergraduates will create their own course text using Wikibooks

UCL Arts and Sciences undergraduates working with Wikimedia UK – image by Carl Gombrich, with permission.

Professor Carl Gombrich, Programme Director for UCL’s new interdisciplinary course, Arts and Sciences (BASc), approached Wikimedia UK early this year to talk about his interest in using a Wikimedia element in the Approaches to Knowledge module of the degree.

This semester, the course began and 150 students are now working on creating chapters for an Open Educational Resources book which will be constructed by the students on Wikibooks, and then published by UCL Press, the Open Access publishing journal that UCL has recently established.

After initially discussing the use of Wikipedia itself as the basis for the course, it was decided that it would be hard to assess the contributions of a large number of students using Wikipedia. Contributions are more likely to get deleted, and the students would likely be looking at improving only a small number of quite core Wikipedia pages related to epistemology. So it was decided to have them collaboratively create a book together on Wikibooks, so that students could still gain an insight into how open source platforms like the Wikimedia projects, function.

UCL is interested in what working with Wikimedia projects can teach students in terms of research and academic skills, and the media literacy which comes with a deeper understanding of the guidelines for Wikimedia projects. They also liked the idea of being able to make a textbook and the meta-approach of people creating knowledge about knowledge.

Dr Richard Nevell has been helping as a volunteer, and Wikimedian Katie Chan held a training session for staff on Wikipedia and Wikibooks before the course began. Hannah Evans gave an opening lecture for the course before an initial workshop where students got into teams to decide what subject area they would work on.

The groups could choose from:

  • Knowledge and imperialism
  • Knowledge and truth
  • Knowledge and evidence

The groups will write chapters of 1200 words. These will all go on Wikibooks, and the best ones will be collected into a book which will also be published by UCL Press, the UCL Open Access repository. The project will also tie into a UCL education conference on April 1, 2019, where students will be presenting about the work they are doing.

Wikimedia UK is now working with many different universities across the country, and you can read more about what different courses are doing with Wikimedia projects on our website.