Science Source is a new project by ContentMine which builds on their previous work on WikiFactMine, which wanted to ‘make Wikidata the primary resource for identifying objects in bioscience’. This time, they want to automate the process of looking at biomedical research to identify statements that would be useful for improving Wikipedia’s medical articles. They summarise the approach on the grant page for the project:
The ScienceSource platform will be a collaborative MediaWiki site. It will collect and convert up to 30,000 of the most useful Open Access medical and bioscience articles and convert them.
We will work with two Wikimedia communities (Wiki Med and WikiJournal) to develop machine-assisted human-reviewing. The wiki platform will facilitate the decision-making process, driven by the human reviewers.
Articles will be annotated with terms in WikiFactMine (WFM) dictionaries. In this project, those dictionaries will include, for example, diseases, drugs, genes. This not only means that the useful terms are highlighted, but they are also linked to entries in Wikidata and therefore to any relationship that is described in Wikidata. Thus “aspirin” links to d:Q18216 with synonyms, disease targets, chemistry, etc.
Once papers are transferred to standard HTML to get them to the same format for text mining they can be tagged, e.g. tagging all the cancers in red, the diseases in blue. With a decent visualisation you can see when the red and blue are close together – this is called a co-occurrence. Now you can ask a human to decide what the sentence is saying – is the cancer known to be treated by this drug, or resistant? This is the relationship you’re looking to find in wikidata terms.
Charles Matthews, Wikimedian in Residence at ContentMine, described the project as “an industrial scale version of the Find tool. Ctrl F gone mad.”
The first part of the project is to identify around 30,000 Open Access articles as a sample to work on. To do this, they need the help of the Wikimedia community, and especially medical professionals who know which articles in the literature would be most useful. Right now the list only has around 3-4,000 articles and is dominated by studies on infectious diseases.
In particular, the project is interested in looking at neglected diseases. These are diseases that have little research aimed at eradicating them because they often affect people in the world’s poorest countries. One example of this would be leprosy, which is nowhere near being wiped out (unlike Guinea Worm). A one country study of leprosy which would be specific about treatment and which drugs are used to treat it in that country would be useful as a reference. You can watch a video on Commons explaining neglected diseases here.
This means that there’s a systematic bias in the medical literature – rich people get more research on their diseases, and it’s important that we don’t simply reproduce this bias on Wikipedia.
So Science Source is developing a Focus List of medical literature with a concentration on neglected diseases, and needs the help of medical professionals to suggest good research papers in this area.
In order to add articles to this list, you need to add a particular property to the Wikidata item for the research article. The workflow is as follows:
Find the DOIs of papers you think should be part of this list. Pick, for example, the top 3 papers in a particular field, get the DOIs.
Science Source is done on wiki and people can participate, whereas WikiFactMine was an API which required people to be developers. So we need the Wikimedia community to help out, and especially medical professionals who are also Wikimedians.
If you know of any medical professionals who would be interested in helping out this important project to improve the quality and reliability of Wikipedia’s medical articles, please tell them about this project, or get in touch with Wikimedia UK or ContentMine for more information about how you can help. There’s also the ‘Facto Post’ mailing list which you can sign up for on Wiki to get updates about the project.