3000 new articles added to the Welsh Wicipedia

Volunteers at the National Library of Wales have been translating important health articles from English into Welsh – image by Jason Evans CC BY-SA 4.0

By Jason Evans, National Wikipedian for Wales

Improving health related content on the Welsh Wicipedia

The Welsh language Wicipedia is the most viewed website in the Welsh language, and articles about health related issues are among the most frequently viewed. And yet only around 2% of Welsh articles cover this subject, compared to more than 6% in English.

Welsh speakers deserve access to quality health information in their native tongue, but currently hugely important topics such as cancer, mental health and medical treatments have very little coverage. So, in July 2017 the National LIbrary of Wales, with Welsh Government funding and Wikimedia UK support, embarked on a 9 month project to improve this content. The project was called Wici-Iechyd (Wiki-Health).

A series of edit-a-thons and translation projects has already lead to the creation of over 250 hand written articles. Many have been translated from English articles prepared for use in other languages by the WikiMed project. Other are derived from text released on an open licence by project partners including the British Lung Foundation, WJEC and the Mental health information service Meddwl.org.

The big news this month is the creation of 2700 articles about human genes. The articles were created using information from Wikidata and PubMed and images from Wikimedia Commons. Since all articles about genes follow a similar format it was possible to generate and upload the 2700 articles en mass. The articles include information about the location and structure of the genes as well as synonyms. All include a bibliography with the 5 most recent publications about each gene. Wikimedia UK were involved in producing a Wikidata Infobox which pulls in an array of data, images and citations. Naturely time was also spent ensuring Wikidata had Welsh labels for items which were likely to be called on by the infobox.

Members of the Royal College and Nursing improving content at an edit-a-thon in Cardiff – image by Jason Evans CC BY-SA 4.0

It is hoped that many future improvements to health related content will link to these articles about genes giving a greater depth of information on the subject.

This upload alone represents a 2.8% increase in the total article count for Welsh Wikipedia, however with more articles being prepared, on diseases, drugs and medical pioneers we could see close to a 5% increase by the end of the project. It is likely that health related content as a percentage of the total article count will be comparable to, or better than the ratio in the much larger English Wikipedia.

The project is funded until the end of March, but it is hoped the Wici-Iechyd will continue to thrive as a Wiki project on the Welsh Wicipedia.

US Second World War propaganda films migrated to Commons

Victor Grigas, a video producer and storyteller who has worked with the Wikimedia Foundation for a number of years, posted on the Wikimedia Video Production House Facebook group yesterday that he had migrated Frank Capra’s Second World War films from YouTube to Commons so they can be used on Wikipedia.

The Why We Fight series of films was made by Frank Capra in response to Leni Riefenstahl’s Nazi propaganda film, The Triumph of the Will. Capra described Riefenstahl’s film as ‘a psychological weapon aimed at destroying the will to resist’. Capra later wrote in his 1971 autobiography,

‘I sat alone and pondered. How could I mount a counterattack against Triumph of the Will; keep alive our will to resist the master race? I was alone; no studio, no equipment, no personnel.’

All content made by the US government is Public Domain by default, and Grigas found the videos on the US National Archives YouTube channel.

Under the US Copyright Act 1976, “a work prepared by an officer or employee” of the federal government “as part of that person’s official duties” is not entitled to domestic copyright protection under U.S. law and is therefore in the public domain.

Grigas used the Video2Commons tool to migrate the files from YouTube. There is quite a lot of US government public domain video on YouTube, which you can search through Creative Commons’ search site. Although low resolution versions at 320p already existed on Commons, the transfer means there are now high quality ones available.

“I just saw the low-resolution versions on Wikipedia and thought that these films might have a better transfer out there and I was right. I saw these films in film school and they were enormously influential, I mean they copy elements of them in Star Wars. So I thought I should improve these articles”, Grigas said.

If you find any good public domain video online and add it to Commons for use on Wikipedia, why not tell us about it?

A guide to the past: hillforts and Wikimedia

Barbury Castle in Wiltshire is one of more than 4,000 prehistoric hillforts in Britain and Ireland. Photo by Geotrekker72, licensed CC BY-SA 4.0.

What is the Atlas of Hillforts?

Hillforts are enormous archaeological sites dotted around Britain and Ireland. There are some of the most impressive remains from prehistory. Just five years ago the best guess for how many there might be was ‘likely … over 4000’, but now thanks to the efforts of the University of Oxford and the University of Edinburgh we know there are 4,147 and have a wealth of information about them at our fingertips.

Back in 2013, archaeologists at Oxford and Edinburgh teamed up to work on the Atlas of Hillforts. Their four-year mission was identify every single hill fort in Britain and Ireland and their key features. This had never been done before, and as Oxford’s Prof. Gary Lock said it would allow archaeologists to “shed new light on why they were created and how they were used”.

Some hillforts like Maiden Castle are well known and archaeologists have examined them for decades, but these give us only a postage-stamp-size glimpse of the huge overall picture. There are thousands of hillforts in Britain and Ireland, so if you want to understand them it’s important to have foundational information such as how many there are, where they can be found, and to build on that by adding information on what type of site it is. The more information there is, the more analysis you can do. That’s what the Atlas set out to achieve.

When the project was under development, Wikimedia UK was supporting a Wikimedian in Residence (WIR) at the British Library, Andrew Gray. He talked to the the people involved in the project and suggested using Wikipedia to share the results of the project. After all they were going to create a free-to-access online database. Perhaps the information could be used to update Wikipedia’s various lists of hillforts?

Fast forward to the summer of 2017 when the Atlas launched. At this point Wikimedia UK was supporting a WIR at the University of Oxford, Martin Poulter. His work includes helping researchers use the Wikimedia projects to increase their impact, and he worked with the Atlas of Hillforts project to share information from their database on Wikidata. Together they selected a set of information from the Atlas which Martin then uploaded to Wikidata.

Why is this project important?

It contains a huge amount of information: details of investigations at each site, a bibliography of related sources, even what kind of dating evidence there is. If you are writing about hillforts today – whether as an academic or for Wikipedia – it would be a very good idea to start by going to the Atlas of Hillforts to see what information it has on a site and what other sources of information it signposts.

For example, here is the record for Mellor hillfort in Greater Manchester. It includes any alternative names, its reference number for the Historic Environment Record (HER), a grid reference, and a summary of the site. It also gives details of nine sources you explore for more information, and tells you when it was investigated (geophysical survey in 1998 and excavated between 1998 and 2009). It tells you what kind of dating evidence there is, and you might notice there here it doesn’t have information on how many entrances the hillfort had and what shape they were. That’s because the site has been largely destroyed, as mentioned in the summary. That gives a Wikipedia editor a lot of information to work with.

Creating an atlas like this is a crucial way to share information; it creates a gold standard for information in the field and because it is much easier to find information about a site, it’s easier to stay up to date, make comparisons with other sites, and spend more time analysing this information and pushing forward our understanding.

Map of hill forts in the British Isles, created in the Wikidata Query Service using data shared by the Atlas of Hillforts. Image created by Martin Poulter, licensed CC0.

Why is this useful for Wikipedia?

The information from the Atlas can be used to update lists as initially hoped as well as create visualisations for Wikipedia, and be used by editors to update and create articles. The English Wikipedia’s pre-existing content on hillforts was seen by 5,299 people a day in June 2017. Since the information is in Wikidata, it can be used in different language Wikipedias. The appeal of Wikimedia isn’t just the reach of the project, but the fact that in Wikimedia Commons it has a database of free-to-use images. There are nearly 3,600 media files of hillforts on Commons which complements the Atlas which only has vertical aerial photos from Google Maps.

Most importantly, the Atlas is a very high quality resource which will benefits Wikipedia’s editors and readers. It is likely to be used again and again and shape how people understand these prehistoric sites.

For more technical information on how the data from the Atlas was added to Wikidata, see Martin Poulter’s blog post on the Bodleian’s website from October.

Talking to Creative Commons’ Ryan Merkley about CC Search and Structured Data on Commons

Creative Commons’ Ryan Merkley and Wikimedia Foundation Exec Director Katherine Maher at Mozfest 2017 – Image by Jwslubbock CC BY-SA 4.0

CC Search beta was launched in February. This new tool incorporates ‘list-making features, and simple, one-click attribution to make it easier to credit the source of any image you discover.’ Its developer, Liza Daly, describes it as ‘a front door to the universe of openly licensed content.’

As a small organisation, Creative Commons did not have the resources to start by indexing all of the 1.1 billion Openly Licensed works that it estimates are available in the Commons. Liza Daly decided to start with a representative sample of about 1% of the known Commons content online, and decided to select about 10 million images rather than a cross-section of all media types, due to the fact that a majority of CC content is images.

One issue they encountered was in making sure that all the content they would include was CC licensed, where a provider (like Flickr) hosted content that was both CC and commercially licensed. They also decided to defer the use of material from Wikimedia Commons, saying that,

‘Wikimedia Commons represents a large and rich corpus of material, but rights information is not currently well-structured. The Wikimedia Foundation recently announced that a $3 million grant from the Sloan Foundation will be applied to work on this problem, but that work has just begun.’

The Wikimedia Foundation understands that the resources available through Wikimedia Commons are not as accessible as they could potentially be as a result of the ad hoc nature of much of the metadata attached to the files people have uploaded. For example, one common query is ‘Why can’t I search Commons by date’. The problem here is ‘which date?’ Is it the stated date that the photo was taken (which could be incorrect) or the date that the file was created, which could be different?

This is why Structured Data is so important. The $3m grant that the WMF has received to implement structured data on Commons, in a similar way to how it’s structured on Wikidata, will allow for much better searching and indexing of media files.

CC search wants to make CC content more discoverable, regardless of where it is hosted online. To do this, they decided to import the metadata from the selected works that they are currently indexing –  title, creator name, any known tags or descriptions. This data will link directly back to the original source so you can view and download the media. It seems that in its current, unstructured state, Wiki Commons is not very good for systematically importing this kind of metadata.

It seems that Creative Commons is even looking at the possibility of using some kind of blockchain-like ledger system to record reuse of CC licensed works so that reuse can be tracked. However, this remains a longer term goal.

I asked Creative Commons CEO Ryan Merkley some questions about how the project had been progressing since its announcement and how it might work.

WMUK: How much progress has been made on CC search since the start of 2017? Have you indexed many more than the original 10 million media items?

RM: CC has hired a Director of Product Engineering, Paola Villarreal to lead the project. We’re staffing up the team, with a Data Engineer starting soon. In addition, we’ll be pushing a series of enhancements, including adding new content, by the end of the year.

WMUK: Will you have to wait until the end of the Structured Data on Commons project to index Wikimedia content? Or does the tool only require basic metadata categories like Title, Creator, Description, Category Tags, meaning it be possible to start this before the end of the project?

RM: We’re happy to work with the Wikimedia Commons community on the project. In our initial conversations, we mutually decided to wait until some of that work was further along. We want to make sure our work is complementary.

WMUK: Is it still an ultimate ambition to use some kind of blockchain architecture to record reuse? Or is that potentially a goal that would require more resources than will likely be available for the foreseeable future?

RM: Not necessarily. There’s a lot of interesting work going on with the blockchain and distributed ledger projects. What’s most important to us is a complete, updated, and enhanced catalog of works and metadata that is fast and accessible.

WMUK: Can you explain how ledger entries would be created when someone reused a CC licensed work?

RM: The tools to track remix don’t exist right now. It’s something we’re really interested in, and our community wants as well. It will require new tools, and collaboration with platforms and creators.

There are so many incredible applications possible for all the data on Wikimedia Commons, and we hope that after the content is structured properly, it will become a valuable source which can be searched along with other CC content online using Creative Commons’ CC Search tool. Like a lot of the changes we would like to see in the way the Wikimedia products work, this will likely take some time, but we are hopeful that the wait will be worth it.

Wikipedia over Tor? Alec Muffett experiments with an Onion Wikipedia site

Alec Muffett at Mozfest 2016 – image by Jwslubbock

Alec Muffett, a director of the Open Rights Group and an ex-Facebook, now Deliveroo software engineer, has created a Wikipedia Onion site which can only be accessed through the Tor browser.

Wikimedians have long asked to be able to browse and edit Wikipedia through Tor, a browser which reroutes your IP address through multiple computer nodes, making you much harder to track online. However, debate within the community has for years been centred on whether or not this would encourage vandalism.

One proposed solution would be to only allow editing through Tor for email verified, signed in accounts. The onion site could also be set up as a read-only access mechanism, but — although this would be a valuable start — this  would miss the point that a lot of people would like to edit more securely and anonymously. Vandalism could happen through Tor, of course, but then it already does happen through “IP” editing when a person is not signed-in.

Muffett noted in a discussion in the Wikipedia Weekly Facebook group that Facebook frequently blocked people from using their site over Tor until 2013, when it decided to change its approach. “Now Facebook recognises that ~1 million people access it over Tor, and that they are a valuable readership.” He also argued that Cluebot, which identifies and reverts Wikipedia vandalism, would equally help address vandalism over Tor as well.

There has been ongoing discussion about editing via Tor since 2007/8, which you can read more about here and here – click on the Talk/Discussion tabs on the top left to see what people have said about the subject.

While you can already view Wikipedia through Tor (but not edit it), browsing via Tor is somewhat slower, because of the way it routes traffic through multiple servers and the way that exit nodes on the network can affect the browsing experience. Muffett says that having a Wikipedia presence directly on the Tor network itself (via an Onion site) would have the advantages of adding ‘speed, surety, trust’.

Another Wikimedian in the Wikipedia Weekly discussion disagrees, and argues that aside from vandalism, editing over Tor would make Sockpuppetry (one user controlling multiple accounts) easier. He stated that ‘It is fundamentally a technical problem in the sense that the tools and processes that Wikimedia communities have come up with to fight malicious behavior in the last 16 years don’t work anymore if you can obtain easily several unrelated and untraceable identities.’

Muffett says that the Facebook onion had several clear benefits:

1) A better and safer experience for people accessing Wikipedia over Tor: no interference by exit nodes, no bandwidth-contention for exit nodes, no use of exit nodes at all.

2) being “a good neighbour” – accessing Wikipedia as a Tor hidden service frees up traffic that would consume scarce exit-node bandwidth.

3) “a peace offering” – people (continue to) use Facebook over Tor; 3 years ago [Facebook] saw 500,000/month, more recently ~1 million users. Muffett, who used to work for Facebook, says that “we found (through measurement and assessment) that people using Facebook over Tor were ordinary folk wanting to do ordinary things. Especially in times of political crisis. Providing a metaphorical “olive branch” showed that we value their use of the site.”

4) Discretion & Trust. Onion Sites are considered to be about “anonymity”, but really they offer two more features: discretion (eg: your employer or ISP cannot see what you are browsing, not even what site) and trust (if you access facebookcorewwwi.onion you are *definitely* connected to Facebook, and cannot be tricked into connecting to an unsafe fake site.)

Muffett concludes that “The code is free and libre. I am doing it because it’s worth doing.”

How the .onion service works – Image via Alec Muffett

After launching the .onion site and generating quite a lot of exaggerated tech press about how there’s now a ‘Dark Web version’ of Wikipedia, Muffett’s idea attracted some interest. Unfortunately, some of that interest appeared in the form of people trying to overload the site with bad “Denial of Service”-style requests.

“This experience is a microcosm of my experiences at Facebook – people attempting to flood and break a website for unknown reasons, possibly “for the lulz”, possibly for actual malicious reasons. It’s a mitigable risk, and in fact is greatly simplified by publishing the site over Tor which stops the more mundane forms of network attack such as flooding.”

Muffett said that the attacks the service experienced in its first few days helped inspire improvements toround-off the code’s rough edges. He hopes that by demonstrating that it is possible and desirable to create a .onion service for Wikipedia will encourage people in the community to discuss and reconsider whether to allow it as an official service.

“The simplest way to demonstrate what a Tor Onion site would look like, is to do it. The technology exists (“Enterprise Onion Toolkit”, EOTK) and is solid enough for the New York Times to use… yet it will only improve. The only thing necessary is to deploy it, which is trivial”, he says.

If you would like to get involved in the discussion or help out with the project, you can read the Phabricator discussion on the topic and find the EOTK code on Github.

Help reduce the #GenderGap and win prizes in the Women in Red World Contest

Women in Red world logo – image by Susan Barnum CC BY-SA 4.0

WikiProject Women in Red is holding a biographical article creation contest throughout the month of November. They aim to create 2000 new biographical articles by the end of the month on women from every country and occupation on the planet as part of a project effort to increase diversity and the percentage of women biographies on Wikipedia (which is just 17.15% in relation to men at present).

The total prize fund is over $4500 (over £3000), and Wikimedia UK are offering Amazon voucher prizes valued at £250 (top prize £150) for any Wikimedian who writes the most satisfactory new articles on British women which are rated Start Class (1.5k bytes) or better.

UK Wikimedians may also create articles on women from any country and compete for the prizes for women of different continents and occupations. Work done on British women may also count towards the European prize for most article creations, of which WMF are offering $200 in prizes.

To take part in the contest, Wikimedians should enter their names in the participants section of the contest page and check out the list of missing biographies of women from the Oxford Dictionary of National Biography (ODNB) which you can see here. You can find missing biographies specifically of British women here. Newly created articles should be added to the bottom of the contest page. If you are competing for prizes further list your entries in the United Kingdom section on the page for Europe during the contest and the prize claims page for most new British biographies at the end of the contest.

User:Dr. Blofeld, the contest organiser, says ‘we’ll accept any UK women bios, but the emphasis is really on those notable missing dictionary entries, particularly the ODNB and the Welsh Dictionary of Biography. In just a week, over 600 articles have been produced worldwide already, but at present not many editors are doing British entries. Here is a chance to significantly increase our proportion of British women biographies and target really notable missing articles. Even if you only have time to create one or two entries, everything counts’.

So we need the UK Wikimedia community to get involved and contribute more new biographies of notable women. Let’s get editing!

WikiFeed project to create custom newsfeeds from Wikimedia data

Image by Jwslubbock CC BY-SA 4.0

Fako Berkers and Edward Saperia have been working together on a project called “WikiFeed”. It’s a framework that allows you to create custom algorithmic newsfeeds using data from Wikipedia and Wikidata.

These open algorithms could be used to discover news stories in niche areas, suggest new collaborative approaches to editorial policy, and probably other things its designers haven’t thought of yet!

Saperia told us that he was thinking about how we consume news, and that while the Wikipedia homepage is not generally thought of as news, its In The News section is probably one of the most viewed news platforms online. He said ‘News is in the news right now. Choosing headlines is a political act. I was interested in whether you could approach editorial in an open, collaborative way.’

You can see more information about the project on its Wikipedia project page here. You can see an example of the algorithmic here: Recently Edited Women WikiFeed shows articles about women, ranked by which have had the most recent edits.

It’s still in a very early stage, but for the first time next weekend (11-12 Nov) its developers are inviting people to come round to Newspeak House and play with it. Remote participation is also possible and there will be two sessions, on Saturday and Sunday, from 1-4pm.

Sign up to the event page here.

Wiki Loves Monuments 2017 winners announced!

1st Prize – Derelict West Pier on Brighton seafront by Matthew Hoser CC BY-SA 4.0

Wiki Loves Monuments is the world’s biggest photographic competition and takes place every September. Participants take photos of historic places, including buildings and archaeological sites.

Wiki Loves Monuments encourages photographers around the world to upload photos of heritage monuments to Commons so that they can be used to illustrate Wikipedia. Images from Wiki Loves Monuments in the UK have been seen nearly 14 million times in October.

This year, over 14,000 photos were submitted to Wiki Loves Monuments in the UK. The prizes are sponsored by Wikimedia UK and Archaeology Scotland, with a top prize of £250. The winning photos’ subjects range from prehistory right through to the 1930s. The overall winner was of Brighton’s derelict West Pier by Matthew Hoser, who said:

“I have been lucky enough to travel quite a lot over the past few years of studying in the UK, and so when I recently heard about the Wiki Loves Monuments photography competition I jumped at the chance to get involved for the first time. This country has such rich and varied history, so taking photos of the amazing sights around Britain is a real pleasure. I am so glad to be able to share my photos with the Wikimedia community, and hopefully to make people eager to get out and see more of the UK for themselves!”

Second prizewinner, Paul Stümke took an atmospheric photo of Glenfinnan Viaduct in Scotland, also winner of the Archaeology Scotland sponsored best photograph from Scotland. He said:

“I have not taken part before in WLM but I have seen last year’s winners. I liked the idea and since me and some friends travelled around Scotland from August to September by bicycle I was able to capture some stunning landscapes, famous monuments and other things that seemed worth photographing. When I edited the photographs back home I saw the advertisement for this year´s contest and thought to myself, why not participate? This is a great way to get some of my pictures out to the world.”

The winners of the Special Prize for Scotland (sponsored by Archaeology Scotland) and Wales depict the Smailholm Tower by Keith Proven and Craig y Mor by Sterim64 respectively.

All photos on Commons are shared on Open Licenses, such as Creative Commons Sharealike 4.0. CC licenses allow others to use the images for free as long as they attribute the author. Wikimedia UK encourages people to publish free content which anyone can use in a classroom, journalistic articles, art, on Wikipedia or for any other purpose without worrying about its copyright restrictions.

Here are the full list of winners:

1st Prize

Derelict West Pier on Brighton seafront by Matthew Hoser

2nd Prize

Glenfinnan Viaduct at Loch Shiel by Paul Stümke


3rd Prize
De La Warr Pavilion Art Deco building on Bexhill seafront by Oliver Tookey


Highly Commended

Smailholm Tower near Kelso, Scotland by Keith Proven
Martello tower at Felixstowe ferry by Tony Lockhart
Westminster and Big Ben by Farruk Ahmed Bhuiyan
Perch Rock Lighthouse portrait by Mark Warren
Smithfield Market ceiling by Stevekeiretsu
Balcombe Viaduct by Matthew Hoser
Avebury South West quarter looking North East in snow by Paul Adams

Special Prize for best photo from Scotland and Wales:

Smailholm Tower by Keith Proven

Craig y Mor by Sterim64

[slideshow_deploy id=’4111′]

 

 

Meet our new Membership and Fundraising Officer

I’m Katie Crampton and I’ve started as Wikimedia UK’s Membership, Fundraising and Operations Assistant, and I’d like to take this opportunity to introduce myself.

Wikimedia’s cause of providing free, unbiased information to all is admirable, and definitely something I can get behind. Having worked for a charity tackling socio-economic disadvantage through education, enabling easy access to information factors highly in Wikimedia’s appeal.

I started my career as a Copywriter for a digital marketing agency, and have since worked at a children’s charity as described above. I hope to bring to the role my experience of fundraising, and engage with Wikimedia UK’s supporters and members to ensure a close relationship between the charity and our community.

It’ll be great to hear from Wikipedia’s volunteers, and I look forward to seeing your work in action.

See you soon!

Wiki Project Med Foundation launches Wikipedia hosting mini WiFi computer to distribute medical information

IIAB – image by James Heilman CC BY-SA 4.0

In 2017, the world passed the 50% mark in the number of people in the world who have access to the internet. It’s easy to take for granted the fact that within the Wikimedia movement, most people have easy access to the internet, but this is still not the case for many people.

To address this lack of access to Wikipedia, groups like KiWix have been working on creating offline versions of Wikipedia for some time, and the Human Rights Foundation have been smuggling USB drives with Korean Wikipedia into North Korea for a few years now. Now a new project is addressing the lack of access to medical information.

The Offline Distribution System for Medical Content is a collaboration with Internet-in-a-box. They have created mini raspberry pi-based computers which generate a wifi signal that up to 32 people can connect to at any time. It also functions as an app store where you can download and install offline Wikipedia medical apps.

Video – Bridging the digital divide in South America

This initial version contains all of Wikipedia’s healthcare content in English, Spanish, and Arabic. It also contains WikEM, content from Practical Action in English and Spanish, and HealthPhone videos.

The device is being sold for the costs of the hardware plus shipping (£30 / $40).

James Heilman, MD, a special adviser to the project, said in a press release:We believe this device has a significant potential to benefit the more than 4 billion people globally without reliable Internet access. We are working to develop further versions with other languages and types of content. If you would like to join in this effort or wish to know more please reach out.”

To see an online example:

http://medbox.iiab.me/home/

For how to purchase:

https://meta.wikimedia.org/wiki/Internet-in-a-Box/Buy

For how to make your own:

https://meta.wikimedia.org/wiki/Internet-in-a-Box/DIY