New data analysis and visualisation service

Statistical Analysis without Statistical Software

The Data Library now has an SDA server (Survey Documentation and Analysis), and is ready to load numeric data files for access by either University of Edinburgh users only, or ‘the world’. The University of Edinburgh SDA server is available at: http://stats.datalib.edina.ac.uk/sda/

SDA provides an interactive interface, allowing extensive data analysis with significance tests. It also offers the ability to download user-defined subsets with syntax files for further analysis on your platform of choice.

SDA can be used to teach statistics, in the classroom or via distance-learning, without having to teach syntax. It will support most statistical techniques taught in the first year or two of applied statistics. There is no need for expensive statistical packages, or long learning curves. SDA has been awarded the American Political Science Association Best Instructional Software.

For data producers concerned about disclosure control, SDA provides the capability of defining usage restrictions on a variable-by-variable basis. For example, restrictions on minimum cell sizes (weighted or unweighted), use of particular variables without being collapsed (recoded), or restrictions on particular bi- or multivariate combinations.

For data managers and those concerned about data preservation, SDA can be used to store data files in a generic, non-software dependant format (fixed-field format ASCII), and includes capability of producing the accompanying metadata in the emerging DDI-standard XML format.

Data Library staff can mount data files very quickly if they are well documented with appropriate metadata formats (eg SAS or SPSS), depending on access restrictions appertaining to the datafile. To request a datafile be made available in SDA, contact datalib@ed.ac.uk.

Laine Ruus
EDINA and Data Library

Share

New faces at the Data Library

We are pleased to introduce two new staff members who have joined the Data Library team.

Laine Ruus has taken up a six-month post as Assistant Data Librarian, helping out during Stuart Macdonald’s productive secondment at CISER, Cornell University. Laine has worked in data management and services since 1974, at the University of British Columbia, Svensk Nationell Datatjänst, and the University of Toronto. Laine was Secretary of IASSIST for eighteen years. She received the IASSIST Achievement award upon her retirement from the University of Toronto in 2010 and the ICPSR Flanigan Award in 2011.

She is perhaps best known for “ABSM: a selected bibliography concerning the ‘Abominable Snowman’, the Yeti, the Sasquatch, and related hominidae, pp. 316-334 in Manlike monsters on trial: early records and modern evidence, edited by Marjorie M. Halpin and Michael M. Ames. Vancouver: University of British Columbia Press, 1980.”

Pauline Ward, Data Library Assistant, will be contributing to the Data Library and Edinburgh DataShare services for University of Edinburgh students and staff, and helping to deliver new research data management services and training as part of the wider RDM programme. Pauline has a bioinformatics background, and has worked in a variety of roles from curation of the EMBL database at the European Bioinformatics Institute in Hinxton to database development (with Oracle, MySQL, Perl and Java) and sequence analysis at the Wellcome Trust Centre for Molecular Parasitology in Glasgow. She also worked more recently as a Policy Assistant at Universities Scotland.

Pauline said: “It’s great to be back in academia. I am really chuffed to be working to help researchers share their data and make the best use of others’ data. I’m really enjoying it.”

You can follow Pauline on twitter at @PaulineDataWard or check out her previous publications.

Pauline at her desk in the EDINA offices, Edinburgh

by Robin Rice and Pauline Ward
Data Library

Share

New research data storage

The latest BITS magazine for University of Edinburgh staff (Issue 8, Autumn/ Winter 2013) contains a lead article on new data storage facilities that Information Services have recently procured and will be making available to researchers for their research data management.

“The arrival of the RDM storage and its imminent roll out is an exciting step in the development of our new set of services under the Research Data Management banner. Ensuring that the service we deploy is fair, useful and t transparent are key principles for the IS team.” John Scally

 

Information Services is very pleased to announce that our new Research Data Storage hardware has been safely delivered.

Following a competitive procurement process, a range of suppliers were selected to provide the various parts of the infrastructure, incl. Dell, NetApp, Brocade and Cisco. The bulk of the order was assembled over the summer in China and shipped to the King’s Buildings campus at the end of August. Since then IT Infrastructure staff have been installing, testing and preparing the storage for roll-out.

How good is the storage?
Information Services recognises the importance of the University’s research data and has procured enterprise-class storage infrastructure to underpin the programme of Research
Data services. The infrastructure ranges from the highest class of flash-storage (delivering 375,000 IO operations per second) to 1.6PB (1 Petabyte = 1,024 Terabytes) of bulk storage arrays. The data in the Research Data Management (RDM) file-store is automatically replicated to an off-site disaster facility and also backed up with a 60-day retention period, with 10 days of file history visible online.

Who qualifies for an allocation?
Every active researcher in the University! This is an agreement between the University and the researcher to provide quality active data storage, service support and long term curation for researchers. This is for all researchers, not just Principal Investigators or those in receipt of external grants to fund research.

When do I get my allocation?
We are planning to roll out to early adopter Schools and institutes late November this year. This is dependent on all of the quality checks and performance testing on the system being completed successfully, however, confidence is high that the deadline will be met.
The early adopters for the initial service roll-out are: School of GeoSciences, School of Philosophy, Psychology and Language Sciences, and the Centre for Population Health
Sciences. Phased roll-out to all areas of the University will follow.

How much free allocation will I receive?
The University has committed 0.5TB (500GB) of high quality storage with guaranteed backup and resilience to every active researcher. The important principle at work is that the 0.5TB is for the individual researcher to use primarily to store their active research data. This ensures that they can work in a high quality and resilient environment and, hopefully, move valuable data from potentially unstable local drives. Research groups
and Schools will be encouraged to pool their allocations in order to facilitate shared data management and collaboration.

This formula was developed in close consultation with College and School representatives; however, there will be discipline differences in how much storage is required and individual need will not be uniform. A degree of flexibility will be built into the
allocation model and roll-out, though if researchers go over their 0.5TB free allocation they will have to pay.

Why is the University doing this?
The storage roll-out is one component of a suite of existing and planned services known as our Research Data Management Initiative. An awareness raising campaign accompanies the storage allocation to Schools, units and individuals to
encourage best practice in research data management planning and sharing.

Research Data Management support services:
www.ed.ac.uk/is/data-management

University’s Research Data Management Policy:
www.ed.ac.uk/is/research-data-policy

BITS magazine (Issue 8, Autumn/ Winter 2013)
http://www.ed.ac.uk/schools-departments/information-services/about/news/edinburgh-bits

Share

RDM & Cornell University

I’ve been fortunate to have been given the opportunity to take up a secondment at the Cornell Institute for Social and Economic Research (CISER) as Data Services Librarian, the primary tasks of which are to:

  • Modernise the CISER data archive, and if possible, begin the implementation. Tasks include: introduction of persistent identifiers (DOIs) to all archival datasets (via EZID); investigate metadata mapping of archival datasets (DDI, DC, MARCXML); streamline data catalogue functionality (by introducing result sorting, relevance searches, subject classification), assist scoping a data repository solution for social science data assets generated by Cornell researchers
  • Actively participate in the Research Data Management Services Group at Cornell, assisting researchers with their RDM plans, contributing to the advancement of the work of the group
  • Actively consult with researchers about social science datasets and other data outreach activities.
  • Co-ordinate and collate assessment statements in order to gain Data Seal of Approval for CISER data archive.

Last Friday I gave my first presentation on the CISER data archive along with other CISER colleagues (they talked about datasets used in restriction at the Cornell Restricted Access Data Centre, and the CISER Statistical Consultancy Service & ICPSR) at a Policy and Analysis and Management (PAM) workshop for graduate students. This was held at the Survey Research Institute (https://www.sri.cornell.edu/sri/ ) where much discussion centred around survey non-response and mechanisms to counter this increasingly common phenomenon.

On Tuesday of this week I presented on the University of Edinburgh RDM Roadmap at a meeting of the monthly Research Data Management Service Group (RDMSG – http://data.research.cornell.edu). This was followed by two presentations yesterday, one at a Demography Pro-seminar (for graduate students) on campus and later at a Cornell University Library Data Discussion Group meeting in the Mann Library set up to introduce the CISER Data Services Librarian to a range of subject librarians principally in the social sciences. In each case the Edinburgh RDM Roadmap was received with great enthusiasm and engendered much discussion, in particular the centralised and inclusive approach adopted by Edinburgh. Follow up discussion and meetings are being planned including the potential use of MANTRA and the RDM Toolkit for Librarians as materials to raise the profile of RDM at Cornell.

As an aside, at a CISER team meeting the subject was raised about password protection (in some instances passwords to CISER resources are changed on a very regular basis for security purposes) and issues surrounding inappropriate recording of passwords. A site licence for a software protection software package was seen as a possible solution to both user disgruntlement and possible security breaches. As a thought, this might be worth considering as part of the Active Data Infrastructure tool suite.

Stuart Macdonald
Associate Data Librarian, UoE / Visiting CISER Data Services Librarian

Share