Science Source seeks to improve reliable referencing on Wikipedia’s medical articles

  • August 30, 2018

Science Source is a new project by ContentMine which builds on their previous work on WikiFactMine, which wanted to ‘make Wikidata the primary resource for identifying objects in bioscience’. This time, they want to automate the process of looking at biomedical research to identify statements that would be useful for improving Wikipedia’s medical articles. They summarise the approach on the grant page for the project:

Once papers are transferred to standard HTML to get them to the same format for text mining they can be tagged, e.g. tagging all the cancers in red, the diseases in blue. With a decent visualisation you can see when the red and blue are close together – this is called a co-occurrence. Now you can ask a human to decide what the sentence is saying – is the cancer known to be treated by this drug, or resistant? This is the relationship you’re looking to find in wikidata terms.

Charles Matthews, Wikimedian in Residence at ContentMine, described the project as “an industrial scale version of the Find tool. Ctrl F gone mad.”

The first part of the project is to identify around 30,000 Open Access articles as a sample to work on. To do this, they need the help of the Wikimedia community, and especially medical professionals who know which articles in the literature would be most useful. Right now the list only has around 3-4,000 articles and is dominated by studies on infectious diseases.

Wikidata query visualised as a bubble chart showing breakdown of the Focus List and the diversity of subject areas represented.

In particular, the project is interested in looking at neglected diseases. These are diseases that have little research aimed at eradicating them because they often affect people in the world’s poorest countries. One example of this would be leprosy, which is nowhere near being wiped out (unlike Guinea Worm). A one country study of leprosy which would be specific about treatment and which drugs are used to treat it in that country would be useful as a reference. You can watch a video on Commons explaining neglected diseases here.

This means that there’s a systematic bias in the medical literature – rich people get more research on their diseases, and it’s important that we don’t simply  reproduce this bias on Wikipedia.

So Science Source is developing a Focus List of medical literature with a concentration on neglected diseases, and needs the help of medical professionals to suggest good research papers in this area.

In order to add articles to this list, you need to add a particular property to the Wikidata item for the research article. The workflow is as follows:

  1. Find the DOIs of papers you think should be part of this list. Pick, for example, the top 3 papers in a particular field, get the DOIs.
  2. Go to the Resolver tool on the Wikidata page for the project. Find the Wikidata item for the paper.
  3. Add a property for the ‘on focus list of Wikimedia project (P5008)’ statement to the item, with ScienceSource (Q55439927) as the object.

Science Source is done on wiki and people can participate, whereas WikiFactMine was an API which required people to be developers. So we need the Wikimedia community to help out, and especially medical professionals who are also Wikimedians.

If you know of any medical professionals who would be interested in helping out this important project to improve the quality and reliability of Wikipedia’s medical articles, please tell them about this project, or get in touch with Wikimedia UK or ContentMine for more information about how you can help. There’s also the ‘Facto Post’ mailing list which you can sign up for on Wiki to get updates about the project.

Leave a Reply

Your email address will not be published. Required fields are marked *