Wikisym 2011 Report

From Wikimedia UK
Jump to navigation Jump to search

The Road to Digital Common-Pool Resources for All

With sustaining interest from corporate, civic and academic sectors, the Wiki spirit has evolved from the simple idea for open collaboration, to an array of real-world applications that uses the content and tools surrounding the common resource of the global Wikipedia project.

This is a report written by two researchers from the Oxford Internet Institute, Han-Teng Liao and Dr. Mark Graham. With the travel award generously provided by the Wikimedia UK, the authors attended The Wikisym 2011 and visited Wikimedia Foundation. The aim of the report is to provide a narrative that selects some of the work by researchers and practitioners to highlight the global and public impact of the Wikipedia project.


Launched in 2001, the Wikipedia project has attracted volunteers around the world to contribute both content, computer codes and financial support to maintain and grow "the free encyclopedia that anyone can edit". Celebrating its tenth anniversary this year, Wikipedia as a global project has an expansive and deep impact beyond being just an "online encyclopedia." It has triggered establishment of various institutions, hosted more data, inspired many research ideas and on which various partnerships are formed to sustain and increase the positive impact of Wikipedia ideals and practices (loosely called as the "wiki-spirit") around the world. A conference dedicated to wiki research and practice, WikiSym (International Symposium on Wikis and Open Collaboration), has also been one of the most important annual venues for participants to reflect on, exchange and experiment with new ideas and experiences surrounding wikis and the wiki open collaborative spirit.

Institutions and Participants

The continuing interest from civic, corporate and academic institutions in Wikipedia was confirmed by the three keynote speeches.

The opening keynote speaker is the CEO of Creative Commons, Ms. Cathy Casserly, whose previous job was the Director of the Open Education Resource (OER) Initiative at The William and Flora Hewlett Foundation, another civic organization which invests more than $100 million "to harness the efficiency and effectiveness of knowledge sharing worldwide." Creative Commons is a civic organization which has been the key partner for the Global Wikipedia project in content copyright (or to be precise, copyleft) licensing.

The WikiViz (Wiki-Visualization contest) invited keynote speaker, Dr Jeffrey Heer, is a Standford professor whose research aims at making sense of large data collections by investigating the perceptual, cognitive, and social factors involved. Heer's academic work in building visualization techniques and tools helps data analysts from different sectors.

The closing keynote speaker, Dr. Ed Chi is a Staff Research Scientist at Google, uses Wikipedia as one of the major cases of Social Computing for what he called "Model-Driven Research". As one of the major people working on Google Plus, he compares the online social systems such as Google Plus, Twitter, Delicious, and Wikipedia with the model he proposed.

In addition to the above sample of keynote speakers and their institutions who have used Wikipedia as source of inspiration, data and innovation, the Wikimedia Foundation has also in recent years expanded its staff power to engage the global community, modernize its data and technical infrastructure. Since the 2011 Wikisym meeting was hosted in Mountain View, near the office of the Wikimedia foundation in San Francisco, many of the staff members of Wikimedia Foundation were present at the Wikisym, including Mani Pande, Nimish Gautam, and Dario Taraborelli.

With the generous support of the Wikimedia UK and Wikimedia Foundation, the authors of this report were given the opportunity to visit the Wikimedia Foundation, present their own research and had a series of conversations with staff from different departments, including Ms. Sue Gardner the Executive Director of Wikimedia. A more detailed summary of the Wikimedia Foundation visit is presented in the conclusion of this report.

The wide interest in understanding wiki research and practice was also reflected by the institutions who sent delegates or sponsored the Wikisym 2011. In addition to academic institutions from all over the world and across disciplines, companies such as Yandex (the dominant search engine company in Russia) and (a New York-based company that aggregate Wiki Q&A and free online dictionary) had their employees attending. A list of self-submitted contact information can be found in the participant list. The authors of this report were the only UK-based participants at the conference.

Data: Missing, Comparing, Visualizing

After ten years of exponential growth, the Wikipedia global project is no longer a start-up textual project and the vast amount of social interaction with its textual and non-textual data does not need to be limited only to the web page-viewing and -editing activities. An important trend for both researchers and the broader community of Wikipedians (confirmed more so during the Wikisym event and the Wikimedia Foundation visit), is that there are increasing demands and tangible benefits to engaging with the data that Wikipedia and its sister projects have accumulated in the past ten years. It means that while the encyclopedia content remains important and significant part of the Wikipedia global project, the documented editing trails, collected traffic data, and different ways of engaging the data sets (which includes content pages, talk pages, community pages, web use logs, donations etc.) will continue to be critical for the ongoing growth of Wikipedia-related activities. In short, better understandings of data can help to address what are missing, comparing what is different, visualizing what has happened and addressing what can be done.

Missing Data and Missing Values

Arguably, no project at the conference engaged with Wikipedia data in a more compelling way than the winner of the WikiViz 2011 contest. A data challenge made jointly by the Wikisym and the Wikimedia Foundation, the WikiViz 2011 competition aimed to visualize the impact of Wikipedia beyond the scope of its own community. The winner, Ms. Jen Lowe from, presented her work titled"A Thousand Fibers Connect Us: Wikipedia's Global Reach" at the Wikisym 2011. Combining the open data provided by the Wikimedia Foundation and the World Bank, her interactive visualization allows users to explore and identify what is missing in the global participation of Wikipedia projects.

Ms. Jen Lowe combined the open source tools (R for data cleaning and Processing for visualizing) that permit users and researchers to explore the readership of different Wikipedia language versions by country, and to compare countries with high or low levels of internet access. Her goal is to connect the world and the world of Wikipedia. Her passionate speech highlighted the power of visualization to persuade, not just to present the data as is. She made a case for visualizing missing values, missing connections and their potential in highlighting what needs to be done:

"I think that visualization is amazing for its ability to force us to see what's missing; to see the missing values in a collection of data. ... I find that visualization trains my mind to notice what's missing ... The more I do visualization work, the more I notice who's missing, not just globally, but personally."

WikiViz 2011: Screenshot of the winning entry

WikiViz 2011: Screenshot of the winning entry

Jen Lowe, Wikiviz winning entry presentation, Wikisym 2011

Jen Lowe presenting her visualization at WikiSym

Missing Participation: Gender Imbalances

An important part of the conference was the attention paid to gender imbalances in the encyclopedia, both in terms of content and editors. Two papers in particular demonstrated the gender imbalances not only exist, but also significantly influence the types of information that exist in Wikipedia (the papers were titled "˜An Exploration of Wikipedia's Gender Imbalance' and "˜Gender Differences in Wikipedia Editing').

An excellent example of these imbalances (also raised by Jen Lowe) is the Wikipedia article on Feodor Vassilyev. His wife sets the record for the most children birthed by a single woman, and yet it is Mr. Vassilyev and not Mrs. Vassilyev that is deemed notable enough to have a Wikipedia article. Given the fact that the Vassilyevs were alive in the eighteenth century, the masculinist bias that went on to be recorded is perhaps not surprising. However, what is more important is for contemporary information creators on Wikipedia to become aware of such biases and actively work to not reproduce them. In other words, the issue is not just a lack of female editors, but also gender biases embedded into the ways in which we discuss and represent subjects in Wikipiedia.

Shaping Participation: Factors of Language and Geography

Some of the most revealing and fascinating work discussed at the conference was produced by Paolo Massa. Two of his tools should be of great use to the Wikipedia community.

The first, Manypedia, allows anyone to do a side-by-side comparison of the same article in different language versions of Wikipedia. This tool will be invaluable for people writing about topics in which radically different opinions can congeal around linguistic practices (such as articles about the Middle East in the Arabic, English, and Hebrew Wikipedias). Having such a tool can potentially go a long way to working around confirmation biases and assumptions embedded into much of what is written in the encyclopedia.

The second, Wikitrip, allows people to peek into the background of editors of particular Wikipedia pages. By typing in the name of any article, the tool displays a map of the location of all anonymous edits to that page and a breakdown of the genders of editors. While Wikipedia in theory allows edits from anyone, anywhere, this tool is crucially important for demonstrating the very real way in which participation in the encyclopedia can come from a very select group.

The Visit to Wikimedia Foundation

A highlight of our trip to Wikisym was the opportunity to visit the Wikimedia headquarter in San Francisco. After exchanging ideas with Wikimedia staff who worked in different areas such as community development, global outreach and technical support, both Mark Graham and Han-Teng Liao gave talks on their on-going Wikipedia-related research, where the Executive Director of Wikimedia Ms. Sue Gardner was present throughout the talk.

Mark's talk on "Wiki-related research" focused on the vast inequalities in representation in the encyclopedia. Not only are some parts of the world covered by much denser layers of representation in Wikipedia (see for instance the figure below), but it is likely that even the parts of the world that are poorly represented have many articles created by editors in Europe and North America. This work is part of an ongoing project in East Africa, North Africa, and the Middle East to examine issues of participation, representation and voice in the Arabic, English, French, Hebrew, Persian and Swahili Wikipedias.

Geotagged articles in English Wikipedia

Geotagged articles in English Wikipedia. More map available from Mark Graham's blog [or click here]

Han-Teng's work discussed the editorial, content and reception similarities and differences between Chinese Wikipedia and its major competitor Baidu Baike and why they matter for Chinese-language Internet users. For example, the webometric data shows that Chinese Wikipedia is the most visible Website for almost all Chinese-speaking regions except for the users of Yahoo! and Baidu in mainland China, whereas Baidu Baike is visible for Google across all regions and relatively less visible for Yahoo! in Taiwan. Both being much more visible than all the Chinese government websites (* and Taiwanese educational websites (* combined, Chinese Wikipedia and Baidu Baike dominate the search engine result pages (SERPs) alternately in different Chinese-speaking regions, as shown in the network graph below, where both the size of arrows and nodes indicate the estimate click-through rates (CTR) for the top-ten most visible websites. Han-Teng Liao further examined the role of user-generated encyclopedia as the major sites for "public goods" content provisions, and their cultural and political implications for shaping an online public space for those who speaks the same language.

SERP g09 top10 Aggr CTR

The Most Visible Websites: Based on the top-ten search results of a sample of 2500 search terms across nine Chinese-language search engine environments [or click here]

  • Note. The listed nine search engine environments include major search engines of Baidu, Google and Yahoo! across up to four regions (CN: mainland China, HK: Hong Kong, SG: Singapore; TW: Taiwan). The search keywords for mainland China and Singapore are in simplified-Chinese characters; while those used for Hong Kong and Taiwan are in traditional (orthodox) Chinese characters, so as to match the user-profile accordingly.

Conclusion: Public Data Intermediary and Public Media

The 2009[1]Nobel Memorial Prize in Economic Sciences winner, Elinor Ostrom has provided solid research and theoretical resources on the potential benefits for the economic and sustainable governance of the commons, or common pool resources (CPR), with examples such as forests, fisheries, oil fields, grazing lands, and irrigation systems, where the way humans interact with ecosystems to maintain long-term sustainable resource are crucial. Increasing evidence from online data has pointed to the fact that the Wikipedia projects have accumulated some of the largest digital common-pool resource for all people in the world. It is also one of the few major global platforms with high web visibility that is not owned by a commercial company. The work that Wikipedia and its sister projects are doing will thus undoubtedly shape new types of global public knowledge become fixed and produced. It is through this angle we found that the ongoing efforts made by the research community in Wikisym, the global outreach program by the Wikimedia Foundation and its local partners such as Qatar Foundation, and the upcoming wiki data and tool modernization project "Wikidata" proposed by Wikimedia Germany in Berlin, may make some important differences in how the Wikipedia project can ultimately benefit much of human kind.