Tuesday, 22 May 2012

Monmouth becomes the world's first Wiki Town



My weekend was spent with a crowd of Wikipedia enthusiasts enjoying the smart town of Monmouth. It enjoys a long history from being the site of what may be the earliest Roman fort in Wales (Blestium), a 10th century Norman castle and the birth place of King Henry V. It has become even smarter as the town Council decided to become the first town in Wales to roll out free wifi, and at the same time ensure there is relevant and fun open knowledge content about their town by partnering with Wikimedia UK to establish a world wide volunteer network (my buzzword is "e-volunteers") to write interesting Wikipedia articles about everything of interest in many, many languages.

Around the town, most visible in shop windows and public information and direction signs, there are beautifully made plaques with a name of an attraction and a two dimensional bar code. These can be scanned by anyone with a smart phone or tablet using applications like Google Goggles to whisk them off to a Wikipedia article about the relevant cultural attraction or profession. The really clever bit, is that if your mobile device uses a non-English language (such as Welsh, Spanish or Hungarian) then you are automatically directed to an article in your device's preferred language by the free QRpedia service. In terms of tourism, this last bit is very interesting indeed. Rather than the Council or Tourist Board going to the lavish expense of making brochures in many languages to help a non-English or non-Welsh speaking visitor enjoy the history of the area, they can now rely on  volunteers around the planet collaborating in making engaging and constantly updated information. For me, even more exciting is that some of those tourists will update the articles (or photographs) in their own language, while on holiday, as part of their own exploration of Welsh culture and history.

On Saturday morning I had fun working with other Wikipedians to meet various members of the public in Monmouth Library. The Library (who happen to be located in a delightful restored historic building) have quietly have got on with their experiment of adding QRcodes to books and the tops of shelves so that the public can use their mobile devices to read Wikipedia articles on author biographies and topic areas (such as the history of Monmouthshire). While I was there John Cummings (our Wikimedia "ambassador" for Monmouth) was demonstrating to the librarians how to use a mobile application to use global positioning to overlay the world around you with pointers to Wikipedia articles and images.
Monmouth Library outreach session. CC-BY-SA Rock Drum

The afternoon had a similar event in Monmouth Museum though I had to drag myself away to take part in the more ceremonial side of things going on at the historic Shire Hall, where in my role as the Wikimedia UK Chair, I got to meet a couple of the local councillors (Stuart and Giles), the Mayor (Gerry, I noted that the position of Monmouth mayor has lasted for 750 years, quite a responsibility) and the lovely boys and girls who happen to have fun managing the libraries, museums and cultural heritage of the area. As we always seem to discover when us Wikimedians get a chance to chat with GLAM professionals (that's jargon for Galleries, Libraries, Archives and Museums), we share the same values and mission to deliver free and open access to the world's knowledge. After a signing ceremony and posing for the cameras, I got to have some quiet chats to swap our stories and enthusiasm over cheese and biscuits. [You can see some videos from the event at http://commons.wikimedia.org/wiki/Category:Monmouthpedia_Day]

Dr Old, who has been part of running the Regimental Museum for the last two decades, filled me in on being a new convert, after one of our experienced ambassadors (Harry) had spent the previous afternoon helping their volunteers put up their first QRcodes inside the exhibition, making it "mobile interactive" for the first time, along with an in-house training session answering their questions about how to edit and improve Wikipedia articles for themselves. I mentioned our programme over the coming year for World War I and II history and how important it is for archives that exist in Regimental Museums such as his are, as a treasure trove of future digitisation which will enable those in other countries with a family connection or interest in these events to research and discover new things for the first time via the internet. 

Another enthusiast was Linda Tomos (director of CyMAL - Museums Archives and Libraries Wales, part of the Welsh Assembly Government), who has responsibility for Welsh cultural programme and funding. She was interested to hear about our progress inside the British Library and The National Archives, and she had some lovely case studies to consider on how open knowledge principles should apply to her projects of 3D digitisation of Welsh historic artefacts and the mapping and documentation of preserved historic buildings. As Linda is heavily involved in the funding side, I suggested we work together on promoting a set of shared values for the access, preservation and openness that might become expected criteria for funding of public projects. It's the sort of thing our on-wiki collaboration works well with, so I have knocked up a stub at Open Knowledge manifesto for Wales for everyone interested to help with creating.


Back home in leafy London, on Monday I was called by a journalist I bumped into on Saturday asking for a chat in Bush House. My radio interview about Monmouthpedia was put out on the 22nd on the BBC World Service, World Update programme. Considering the massive international press interest in Monmouth becoming the first Wiki Town, it will be fascinating to find out who gets to be the first Wiki City, Wiki State or Wiki Country. I will be happy to volunteer to help with the judging panel, just remember that I'm an unpaid volunteer so someone better offer me a cup of tea and a biscuit. :-)

Wednesday, 2 November 2011

UK copyright is about to change, did you know?

I joined the parliamentary information technology forum on 1st November for a discussion of the Hargreaves' review and to hear some expert evidence for and against it in the context of data mining.[1][2] I attended as a non-expert but as a board member for Wikimedia UK I was interested in judging possible impact on our future activities.

Hargreaves was commissioned by the Prime Minister to examine the issues with Intellectual Property and recommend changes, which there is now a commitment to adopt.[3] My layman understanding is that Hargreaves' analysis was from an almost entirely economic perspective, with a focus on how a future market in digital IP licensing could create a trading marketplace of IP to encourage UK business, as well as making it easier (supposedly guaranteed) to identify who the owner of any possible copyright is, and deal with the issues of orphan works and simplify license terms.

Possibly good things:
  • Though the recommendations avoid an equivalent to the USA's fair use, the changes would provide an exception for data mining in the UK where entire transient copies of works may be made for automated analysis, in most cases this currently being considered to break copyright terms. This would enable significant areas of research which at the moment are hampered in the UK by having to negotiate specific deals with publishers for data mining. The CIO for Nottingham University gave a good example of the currently impossible task of having already paid £5 million to publishers for access rights to academic databases and then having to attempt to negotiate separate terms and additional charges with each publisher for the confused area of data-mining.
  • Simplifying license terms would be of benefit to all, particularly as the recommendation is that contract terms would not be allowed to override license terms, for example JSTOR's contract terms for non-systematic use would no longer be enforceable in the UK.

Possibly bad things:
  • A number of publishers spoke out against the report, including the CEO of the Publishers Association, their concerns include that by allowing exceptions for data-mining this would introduce a risk of their databases being insecurely mirrored in other countries and that the changes would reduce the benefits of them acting as a "maitre d'" for access to copyright material.
  • From a Wikimedia cultural perspective the formation of a digital trading market will tend to default to allowing non-commercial use only, making more material impossible to use on our projects, and in the long term reduce the likelihood that digital collections could be used under a "no copyright known" rationale as such material would be likely to instead be exchanged on the basis of future speculative monetization that would ensure it always has a declared copyright owner.
  • Hargreaves proposes a system for dealing with suspected orphan works, in that any work not found on the national database would be licensed for reuse as an orphan work. Unfortunately, and somewhat bizarrely considering there is no copyright claim, the report suggests a "nominal" charge for use, presumably this would have the unintended consequence that no UK orphan works could ever be used on Wikimedia projects.

    Conclusion
    There was time for social chat after the main meeting and I got to meet some interesting folks from the Pirate Party[5] as well as copyright experts. As the recommendations are just that, it is hard to say how firmly they would be adopted or implemented. In the case of Wikimedia we can already side-step many issues in terms of how UK law might affect our projects, however if cultural institutions (such as the BFI and British Library) default to using the suggested "research use only" restrictions for digital archives, this may cause arbitrary restrictions locking out the reuse we would like to see available for hosting on Wikimedia Commons, Wikisource, etc. If the recommendations turn into firm proposals we may need to help some of our partners consider the impact of changes in their policies for long term public access and open knowledge.

    Links
    1. http://www.pictfor.com/sample-page/
    2. http://www.pictfor.com/2011/10/hargreaves%e2%80%99-review-%e2%80%93-data-analytics-text-and-data-mining/
    3. http://www.ipo.gov.uk/ipresponse
    4. http://www.ipo.gov.uk/ipreview.htm (The full report is available for download here)
    5. http://en.wikipedia.org/wiki/Pirate_Parties_International
    6. Event: Hargreaves’ Review – Data Analytics / Text and Data Mining 01/11/11

    Friday, 1 July 2011

    Establishing the Wikimedia GLAM e-volunteer network



    The background

    In the UK we have seen exponential growth in the number of relationships with museums and other institutions. Starting with the British Museum engaging Liam Wyatt as the first ever "Wikimedian in Residence" we have evolved the concept of a "GLAM Ambassador" and institutions are starting to discover the benefits of being able to engage with a Wikipedian/Wikimedian "e-volunteer" network.


    Institutions such as museums and archives have been able to easily understand the concept of Wikimedian in Residence as this provides a single person as a point of contact delivering a project with defined outcomes and a budget. In a matter of months the concept has been highly popular with several major institutions in the USA and similar positions (such as “Wikimedia Outreach Ambassador”) have been created and are being proposed in the UK.


    The same institutions have had mixed success in engaging with e-volunteers and as a resource they are hard to characterize and consequently it may be tricky to identify the most effective ways to communicate with them.


    A quick taxonomy of e-volunteer

    TypeCharacterizationCommunication
    Volunteer with a laptopEngaged with collections or archives and a repeat visitor who is prepared to help.Visitor events and notices
    Community seekerEnjoys meeting others to talk about the collections and collaborate on improving their presence on the internet.Visitor events, editathons, word of mouth, notices on institution websites or Wikipedia or email discussion groups and forums, press interest
    High contribution WikimedianPassionate about contributing to Wikipedia and other Wikimedia projects for a wide range of content or technology improvement.Wikimedia discussion groups, (Wikimedia Chapter) wiki-meets and conferences
    Freebie hunterEnjoys coming to an event as a reason to see the collections and find out more but may only make limited longer term contributions.Visitor or new contributor outreach events and notices, press interest
    Flyby contributorMay regularly provide quick fixes for vandalism or content errors and may be a high contribution Wikimedian but less likely to be interested in events or editing prose.Wikimedia discussion groups, press interest
    LurkerKeeps an eye on the topic and may comment but is hesitant to contribute and unlikely to attend events.Wikimedia discussion groups, press interest
    ReaderThe general public who are interested in the article, even if only for a brief time.Press releases, being on the first page of a Google search

    UK approach

    Engaging the public
    In 2010, the British Museum behind-the-scenes and workshop events demonstrated that if you arrange interesting sounding events at well known institutions then people will turn up. Of around 30 people that came to these events, approximately one third could be classed in the first 3 most useful types of contributor in the above table. In the longer term, based on later editathons and other types of event, as a rule of thumb one can plan for 20% to 30% of those that take part in a widely advertized public event to continue contributing and helping as an e-volunteer in the long term.

    A key observation is that these events are "as cheap as chips" as there are few costs for a host institution beside ensuring there is free wifi and providing a suitable space to volunteers to plug in their own laptops and talk.

    Forming a community hub
    Once we know who the long term contributors are, then they can be considered the core of this particular e-volunteer community and many have put their names up as members of Wikipedia collaborations pages to ensure they don't miss out on future activities or requests for help from curators.

    Self organizing
    In June 2011 the UK GLAM task force was started to enable more experienced volunteers to help support activities and communications. Last month saw the first GLAM networking event (GLAMcamp London) which was an opportunity for current and future Wikimedia GLAM Ambassadors to share stories of successes and workshop ideas for how UK partnerships with cultural organizations can be established and successfully maintained. In parallel, we are busy organizing relationships in Scotland so that we can sustain a community of e-volunteers with a distinct Scottish identity. The intention is for the Wikimedia UK Chapter to be able to offer a little help when needed with supporting infrastructure (particularly communications) and managing some of the formal aspects of organizational relationships but for GLAM projects to be led and implemented by Wikimedia e-volunteers without the need for any direct project management or central authority.

    Conclusion

    We are only in our first year, so are feeling our way around the e-volunteer concept and in particular balancing what can be expected of volunteers versus what ought to be put in place by host institutions or Wikimedia UK (such as official press releases or the logistics for larger events). There are a number of case studies "banked" and these are a solid basis for understanding types of events that have worked and whether these have the outcomes around improved article quality, new images or increase in new contributors that we might expect. Sharing experience and finding consistent ways of handing relationships between GLAM institutions and our diverse community of volunteers are areas needing more work, but we are rapidly improving and professionalizing.

    By April 2012 (the time for the next Wikimedia UK annual general meeting) I hope to report a diverse number of productive and enjoyable relationships with institutions and more importantly a flourishing network of self organizing e-volunteer communities (hubs) across all countries in the UK. The impact for currently resource starved but much loved museums, galleries, archives and libraries will be significant and in some cases, essential.



    Saturday, 4 June 2011

    User Story: Gamification and Wikimedia Commons mass upload

    Gamification is the use of game play in non-game applications.

    This is a user story of the near future for Wikimedia Commons, based on a mini-brainstorm chat in the pub with Tom Morris and Shimgray immediately after the British Library English & Drama editathon. This was also in the context of the recent workshop with selected Wellcome Trust researchers on how to release mass image donations and some of the known issues and discussions on Commons for previous mass donations.

    It is June 2012 and three types of mass donation over the last 6 months have resulted in the Wikimedia Foundation, two major public bodies and two Chapter coordinated volunteer groups radically revising the possible application of Commons and for the first time formed direct partnerships with external digital libraries. All three have gamification programmes.

    1Philately and crowd-sourcing the catalogue

    A large archive has donated a set of high quality scans on CD ROM of philatelic artefacts. These contain an estimated 350,000 different objects including many known to be unique to the archive. The archive does not have the resources to catalogue these artefacts fully but they are hierarchically organized into several main groups including pre-1955 postage stamps from African countries, Cinderella stamps from many countries dating from 1860 to 1968 and a wide collection of airmail stamps. They have been selected to be out of copyright but this is not guaranteed for all images due to the partial catalogue.

    The volunteer teams have access to a hosted staging area for the images and new e-volunteers from the philatelic community are encouraged to apply to help but due to possible copyright issues registration is necessary. A total of 40 participants are gradually working through the collection to apply a standard set of categorization tags using a visual front-end that seamlessly draws on the live Commons categorization structures and provides feedback to the participants on their progress, the quality of their classification work (based on sample expert checks) and the total programme progress. Where copyright is not obvious, these are flagged for further expert review.

    By April 2012 over 80,000 images have been released to the public on Commons and the same images and metadata support the archive's public catalogue (which includes the copyrighted images not released on Commons). Based on the current burn-down rate the 350,000 images will be fully catalogued by the end of July and the archive is considering now planning a collaborative digitization programme for a further 200,000 objects.

    The wider philatelic community has already recognized Commons as a reference resource and an external organization has used the Commons API to produce a simplified front end which can be used as an integrated part of their established on-line catalogue.

    2Political history and identification

    A small cultural archive has had a co-funded a volunteer programme to digitize a number of political history related artefacts to Commons. Though the majority of these date within the last 70 years, they are being released on a no known copyright basis. The programme is on-going and 3,000 high quality images have been released out of an estimated 40,000.

    A key issue with classification has been to identify a number of professional photographs that were donated from a 1970-1990's magazine archive. These were mostly interviewees for the magazine and so all had consent for release but whether the photographs were used in the final print is unknown and the photographs have no identification or dating on them.

    As they are scanned the images are uploaded the same day to Commons. Using a system of barnstars, the wider community has been encouraged to compete to classify the backlog of new images and an optional user-script has been created to help compare related images existing on Commons and automatically cross-match to TinEye prospects on the internet.

    The attribution on every Commons image page links back to the donor archive's website and as a result the archive has seen a large increase in archive access requests as these have been gradually been put in use in nearly all of the 300 available language variations of Wikipedia. The archive has put forward a funding plan for a paid intern placement to support the on-going digitization and sharing programme.
    3Medical research and funware

    A large research body has establish a image sharing programme in partnership with four other images services including Commons (in practice the Wikimedia Foundation and the UK chapter) to provide an image clearing service for research image mass donations. Each of the five organizations has access to the image staging area and planned mass donations fit minimal upload criteria of known copyright status (some may be constrained for non-commercial use), EXIF data with complete originator information and associated "at the bench" description data for the majority of uploads. There is an expectation that images hosted on the clearing service will persist for at least a year at a time before being deleted.

    Any or all of the five participating organizations are free to host the images. Due to the large nature of some of the donations (over 10,000 images in a batch), each major donation type has required Commons community discussion and planning though only the fully copyright free images are considered. Not all of the donations have been considered of likely general public educational value.

    Separate from the main Commons site, a funware rating system (cheezburger) has been developed (which can run on mobile devices) to help rank categories with over 400 images for further decomposition. As well as helping to tag images with suitable sub-categories, game players are able to subjectively compare images for interest and quality. As a result of this work, the main Commons interface includes options for these categories to display recommended images from larger sets. This has been successful enough to be extended beyond the medical research image sets in order to gameify all large categories on Commons and has proved particularly popular with a range of "soft" categories such as "1960s" and "cats".

    Thursday, 2 June 2011

    WMF in 2016 - a User Story

    Here's a SciFi user story, with the Wikimedia Foundation in mind, to illustrate the importance of a verified backup process and archive policy, it's not a prediction of the future!

    In 2016 San Francisco has a major earthquake and the servers and operational facilities for the WMF are damaged beyond repair. The emergency hot switchover to Hong Kong is delayed due to an ongoing DoS attack from Eastern European countries. The switchover eventually appears successful and data is synchronized with Hong Kong for the next 3 weeks. At the end of 3 weeks, with a massive raft of escalating complaints about images disappearing, it is realized that this is a result of local data caches expiring. The DoS attack covered the tracks of a passive data worm that only activates during back-up cycles and the loss is irrecoverable due backups aged over 2 weeks being automatically deleted. Due to a lack of operational archive strategy it is estimated that the majority of digital assets (including over 80 million photographs) have been permanently lost and estimates for 60% partial reconstruction from remaining cache snapshots and independent global archive sites run to over 2 years of work.

    Wednesday, 1 June 2011

    Mass uploading your Flickr photos to Wikimedia Commons

    Why would I want to?
    • Wikimedia Commons is the multi-media friendly website that shares images, audio and video with Wikipedia in 270+ languages plus sister projects such as Wikiversity and Wikisource. Many people and organizations would like to have some or all of the photos they are happy to share freely on Flickr, available and used on Wikipedia. A number of large institutions already use "Flickr Commons" to share images from their digital collections and if these are released on a public domain license, or only limited by requiring attribution, these are candidates for easy mass upload to Wikimedia Commons.
    • Many individuals have interesting collections of images on Flickr from their hobbies or holiday trips. With a bit of work ensuring the titles are plain English (these are used as the file names on Wikimedia Commons), tidying up Flickr tags and adding a few good descriptions these would make great additions to Wikimedia Commons.
    • Once uploaded, you have the benefit of other "e-volunteers" helping by improving descriptions, adding categories, using the images on Wikipedia and elsewhere and even digitally correcting poorly exposed or old marked photographs (if they are interesting enough!)
    Do I have to do this myself?
    Example standard Commons upload wizard in the middle of taking
    several images at once from my hard disk.
    How?
    • If you are uploading a manageable number of images (say less than 50), then the built-in upload features of Wikimedia Commons are excellent for loading a small batch of files from your hard disk, which you can then describe, categorize and suitably rename before committing to final publication. Try it out at http://commons.wikimedia.org/wiki/Special:UploadWizard and check the general help available. For very large numbers (say, more than 1,000) you may want to raise a request at COM:BATCH, even if you feel technically capable, as without a bit of planning and discussion you might find yourself spending many tedious hours correcting license details or categorization errors which you only noticed after the upload.
    • If you are in the middle ground (a couple of hundred images which you have on a Flickr stream) then check out Flickr Mass in the case study below, this avoids any retyping of the details and tags you have on Flickr and makes some pretty good guesses for you during the process.
    Case study for Flickr Mass tool
    Last weekend I spent an afternoon for the first time playing around with Flickr Mass. To use the tool you have to already have an account set up on Wikimedia Commons (or Wikipedia) and then register on the toolserver. After a bit of testing out the features I went on to upload around 250 of the photos that the London School of Economics Library has on their Flickr photo stream. These were perfect for uploading automatically as they are:
    • suitably licensed on a free license suitable for Wikimedia Commons (i.e. no non-commercial or no-derivatives restrictions)
    • have descriptive titles, full extended descriptions in the Flickr description box and include links back to the original source library digital catalogue
    Set of photos (mostly of William Beverage) loaded to the new category of LSE Commons.
    The process was intuitive and I could filter by title words, any Flickr tag or by words in the description. A handy feature was to run the tool as a simulation which shows you exactly what is going to happen before doing it for real and releasing images to Wikimedia Commons.

    The images uploaded extremely well, I did not have to re-type any of the titles or descriptions as these were so nicely done on Flickr. I uploaded filtering on certain tag names (including "poster" and "portrait").

    A key tip is that Flickr Mass will assume that image tags on Flickr make suitable categories and so will add them as categories if the words match an existing Wikimedia Commons category. I had to do a bit of tidying up on the images to remove surplus categories of "LSE" and in many cases "glasses" (where someone had tagged every portrait of people with glasses). Obviously if the Flickr stream is yours, it would be easy to tidy up the tags and add tags that have the same name as Wikimedia Commons categories that you would like to see added.

    After uploading you may well find the gadgets HotCat and Cat-a-lot (you can find these under your Wikimedia Commons Preferences) quite useful as a time saver for re-categorizing images and changing where groups of images are located.

    You can surf the final outcome my upload at http://commons.wikimedia.org/wiki/Category:LSE_Commons. In the end I spent most of a day due to a steep learning curve and the time taken afterwards fixing categories. If I were to do this again with a few hundred images I would guess that this would take realistically about an hour if the Flickr stream were already well organized and it would be much easier if I owned the Flickr stream and could re-organize the tags.

    Thursday, 26 May 2011

    Mobile Wikipedia editing is science fiction of the near future

    The following science fiction short (or User Story) is based on a conversation at JFK airport over a coffee with Maarten B:

    I am sitting in the airport and have nearly an hour to kill until my flight is due to be called but feel too tense to start reading my book. I have spare data allowance on my smart-phone and so use the free Android marketplace Meow-mobile app to automatically and securely log into my Wikipedia account. The application is pleasingly easy to use on the mobile screen and with a tap and the odd screen flick, I check recent entries on my watchlist and revert an obvious vandal edit on one of my favourite biographic articles. Someone has raised a question on my talk page and using my standard virtual keyboard I make a brief note, promising to look in more detail tomorrow. The edit form is a cut-down version to be mobile friendly and signing my reply is a simple button press.

    Meow has logged me into all sister projects for which Single User Login recognizes me for, and I seamlessly flick over to Wikimedia Commons where I note that there are no decent photos of automated airport check-in and so using the mobile friendly upload form, I add a photo taken on my phone's camera of a nearby check-in booth. The upload form remembers my preferences so there are minimal tweaks to be made. Using the mobile version of HotCat, I add three more categories to the image. I check the shared WikiProject Architecture Geo-wishlist and note there are several interesting outstanding photo requests for my travel destination, at least one I think I can easily snap on the way to my meeting.

    Flicking back to Wikipedia, I launch mobile-IGLOO (which has optionally suppressed cases with large diffs as these are hard to check on a mobile screen) and spend a happy 15 minutes checking some recent suspected vandalism and revert 7 obvious cases, three of which were top of the list due to my filter preferences.

    My flight has been called for boarding and I switch Meow to off-line mode which retains a local cache of my watchlist articles and will intelligently attempt to re-sync any changes when I am back on-line. During the flight I plan to fiddle with a draft article in my sandbox which I'm planning to raise a Did You Know for when I get home and needs some thoughtful re-wording for the lead text. Meow displays my data usage as I close it down I note I have used less than 500k of data, probably due to the smart way the application handles images.

    I have a glow of satisfaction at having used my spare waiting time to bust some vandalism and nudge up the quality of content on Wikipedia and Commons. Oops, I was so engrossed that I forgot to drink most of my coffee. Luckily I'm about to get a couple of free drinks on the flight.

    The current state of the art is fairly limited to passive Wikipedia reading on mobile devices. If you think the fantasy of a user friendly Meow-mobile style Wikimedia projects editor and vandal-buster can be made a reality in the near future, drop me a note.