Research Data Service highlights to report: August to December 2016

Research Data Service

New Research Data Service Website

The Research Data Service’s redesigned website was released in December.  The new website is more accessible and includes new and updated content in support of RDM. The new website can be visited at http://www.ed.ac.uk/information-services/research-support/research-data-service

RDM Forum Meetings

There were two RDM forum meeting held during the autumn term (7 September and 23 November). This is part of a collaborative effort that Çuna Ekmekcioglu (L&UC) and Jacqueline McMahon and Ewa Lipinska (College of Arts, Humanities and Social Sciences) organised to invite staff from CAHSS and other Colleges and Schools to meet and have discussions about RDM activities, and how these can be supported. There were almost 25 people in attendance for each meeting with another one scheduled for 28 March 2017.

A RDM forum SharePoint site has also been created to accommodate RDM resources including papers, presentation slides, work flow diagrams, guides and a collection of sample data management plans.

Visits

The Research Data Service welcomed visitors from seven universities during the autumn term with two visits from Kyoto University.

The purpose of their visits was to learn more about the services and resources we provide in support of research data management at the University of Edinburgh. The Digital Curation Centre (DCC) and senior IS staff also participated during some of the visits, which included meetings, presentations and tours.

  • Nanyang Technological University: 3 – 4 August
  • University of Auckland: 14 September
  • Kyoto University (National Institute of Informatics): 26 September
  • Malmö University: 10 – 11 October
  • University College Cork: 21 September
  • John Hopkins University: 21 September
  • Kyoto University (Kyoto University Library): 26 October
  • University of Malaya: 22 November

Data Management Planning

DMPonline had 57 new registered users and was used to create 115 data management plans (DMPs); in total, 256 DMPs were created in 2016.

There were 25 data management plan consultations from August to December.

Data Management Support

MOOC and MANTRA

A total of 1,817 learners enrolled for the 5-week RDMS MOOC rolling course from August to December, with a total of 5,466 learners enrolled for the year (2016); the MOOC started in March 2016.

2016 concluded with 22,544 MANTRA sessions recorded for the year, slightly lower than in 2015, when MANTRA had 22,950 sessions.

Active Data Infrastructure

DataStore

Active users remained consistent throughout the 2016 year with data stored on a steady rise. There was a natural decline over the summer break, which has been observed in previous years.

In 2016, the College of Arts, Humanities and Social Sciences (CAHSS) activity was distinct from other Colleges with a spike in usage.

DataSync

DataSync usage includes the following stats that were reported at the end of 2016:

  • Number of active users: 1,740
  • Number of distinct clients (IPs 2017): 5,423
  • Total DataSync storage: 3TB
  • Number of mappings to DataStore areas: 294

Data Stewardship

Pure

In 2016, 326 Pure records for datasets were created, which surpass the number of records created in 2014 (31) and 2015 (32).

DataShare

202 datasets were deposited into DataShare.

DataVault

DataVault closed the year with 21 deposits for 2016. There was a soft release of DataVault in February 2016 and plans are to commit resources to DataVault so that there can be a release in mid 2017.

Share

Introducing new support team members

Since mid-January, two new Research Data Service Assistants have joined the busy ‘virtual team’ working across divisions of Information Services to provide user support for RDM and Data Library enquiries and to quality assure DataShare submissions. You may have already come in contact with them, but a brief welcome is in order nonetheless.

Both new team members have a research background but surprisingly, from the same field and institution! Nevertheless they had not met until they arrived at our offices in Argyle House for their first day of work. Diarmuid joins us full-time, commuting daily from Glasgow, and Bob works half-time, taking advantage of a short walk from home.

mcdonnellDiarmuid McDonnell has taught a variety of research design, data management and analysis courses across a number of Scottish universities and levels. He is proficient in the use of Stata, SPSS and SAS for research and teaching purposes and is particularly experienced in the use of administrative data for social science research, which he used for his recently completed PhD thesis at Stirling University.

sandersBob Sanders recently completed his PhD at Stirling University looking at the relationship between dependency and care receipt in later life. He has extensive experience undertaking quantitative research, including the routine and advanced management and statistical analysis of large-scale longitudinal data. He is capable of conducting end-to-end data preparation, management and analysis using syntax-driven commands in Stata, with experience using other statistical software packages such as SPSS and Excel.

In addition to their repository and user support work for EDINA and Data Library, they have already made unique contributions to the service. Diarmuid has revised and taught our Data Handling in SPSS half-day workshop, as well as piloted an Introduction to Statistical Literacy workshop for Humanists. Bob has joined the Data Safe Haven development project, helping to work out operational processes and user documentation, as well as giving the online MANTRA course a thorough editing job.

Robin Rice
Data Librarian and Head, Research Data Support
EDINA and Data Library

Share

Research Data Management Forum: Third meeting – 28/03/2017

Harkening back to a bygone era of libraries, when books were printed on paper and research data management meant not accidentally burning your notes with your candle, the third meeting of the university RDM forum was held in the impressively aged Old Library in Geography’s Old Infirmary building at the end of March.

As a regular participant, I find the RDM forum is a very useful platform for everyone who has an interest in supporting research data management. It is an opportunity for me to update myself on the support and services that the university has in place in this area, to ask the daft questions but get a sensible answer and more generally, to meet the others in the university who are working in the same area as myself and face the same issues and challenges.

This edition of the RDM forum was no different. After a quick introduction of the participants, Cuna, leading the forum, took us through the following agenda:

  • Cuna Ekmekcioglu – RDM update
  • Dominic Tate – DataVault update
  • Pauline Ward – DataShare new features
  • Cuna Ekmekcioglu – development of Data Safe Haven

The session began with the RDM update which went into detail about the RDM Sharepoint site and some of the tools and documents that have been uploaded to the site. There are some useful threads looking to collect information about the different types of data that we have, as well as some guidance on recording datasets in PURE, RDM journey flowchart and sample Data Management Plans amongst other things. The Sharepoint site can be accessed by request, and can be found here: https://uoe.sharepoint.com/sites/rdmforum (access is only for UoE staff and students).

We had updates on the existing services such as DataShare and details about the development of both DataVault and the future Data Safe Haven, a system which will allow the storage and analysis of very sensitive data. There were some discussions around the new systems and practical issues such as cost and training/guidance for the new services.

It was a very worthwhile event and I shall be looking forward to the next forum.

Michelle O’Hara
Research Data & Information Officer
School of Social and Political Science

 

Share

EDINA’s ShareGeo Open content into DataShare

Many fascinating datasets can be found in our new ShareGeo Open Collection: http://datashare.is.ed.ac.uk/handle/10283/2345  .

This data represents the entire contents of EDINA’s geospatial repository, ShareGeo Open, successfully imported into DataShare. We took this step to preserve the ShareGeo Open data, after the decision was taken to end the service. Not only have we maintained the accessibility of the data but we also successfully redirected all the handle persistent identifiers so that any existing links to the data, including those included in academic journal articles, have been preserved, such as the one in this paper: http://dx.doi.org/10.1007/s10393-016-1131-y .

Similarly, should the day ever arrive when DataShare was to be closed, we would endeavour to find a suitable repository to which we could migrate our data to ensure its preservation, as per item 13 of our Preservation policy.

We were able to copy the content of almost all metadata fields from ShareGeo to DataShare. The fact both repositories use the Dublin Core metadata standard, and both were running on DSpace, made the task a little easier. The University of Edinburgh supports the Dublin Core Metadata Initiative. DataShare’s metadata schema can be found at https://www.wiki.ed.ac.uk/display/datashare/Current+metadata+schema setting out what our metadata fields are and which values are permitted in them.

Our EDINA sysadmin (and developer) George was very helpful with all our questions and discussions that took place while the team settled on the most appropriate correspondence between the two schemas. The existing documentation was a great help too. George then produced a Python script to harvest the data, using OAI-PMH to get a list of ShareGeo items, then METS for the metadata and bitstreams. He then used SWORD to deposit them all in DataShare.

The team took the opportunity to use DSpace’s batch metadata editing utility and web interface to clean up some of the metadata: adding dates to the temporal coverage field and adding placenames and country abbreviations to the spatial coverage field, to enhance the discoverability of the data.

For example “GB Postcode Areas” can be found using the original handle persistent identifier: http://hdl.handle.net/10672/51 as well as the new DOI which DataShare has given it – DOI: 10.7488/ds/1755. Each of the 255 items migrated to our ShareGeo Open Collection contains a file called metadata.xml which contains all the metadata exactly as it was when exported from ShareGeo itself. I have manually added placenames in the spatial coverage field (which was used differently in ShareGeo, with a bounding box i.e. “northlimit=60.7837;eastlimit=2.7043;southlimit=49.8176;westlimit=-7.4856;”). Many of these datasets cover Great Britain, so they don’t include Northern Ireland but do include Scotland, England and Wales. In this case I’ve added the words “Scotland”, “England” and “Wales” in Spatial Coverage (‘dc.coverage.spatial’), even though these are already implicit in the “Great Britain” value in the same field, because I believe doing so:

  • enhanced the accessibility of the data (by making the geographical extent clearer for users unfamiliar with Great Britain) and…
  • enhanced the discoverability of the data (users searching Google for “Wales” now have a chance of seeing this dataset among the hits).

James Crone who compiled this “GB Postcode Areas” data is part of EDINA’s highly renowned geospatial services team.

Part of James’ work for EDINA involves producing census geography data for the UK DataService. He has recently added updated boundary data for use with the latest anonymised census microdata (that’s from the 2011 census): see the Boundary Data Selector at https://census.ukdataservice.ac.uk/get-data/boundary-data .

Pauline Ward is a Research Data Service Assistant for the University of Edinburgh, based at EDINA.

Detail from GB Postcode Areas data, viewed using QGIS.

Detail from GB Postcode Areas data, viewed using QGIS.

Share