Data Carpentry & Software Carpentry workshops

The Research Data Service hosted back to back 2-day workshops in the Main Library this week, run by the Software Sustainability Institute (SSI) to train University of Edinburgh researchers in basic data science and research computing skills.

Learners at Data Carpentry workshop

Learners at Data Carpentry workshop

Software Carpentry (SC) is a popular global initiative originating in the US, aimed at training researchers in good practice in writing, storing and sharing code. Both SC and its newer offshoot, Data Carpentry, teaches methods and tools that helps researchers makes their science reproducible. The SSI, based at Edinburgh Parallel Computing Centre (EPCC), organises workshops for both throughout the UK.

Martin Callaghan, University of Leeds

Martin Callaghan, University of Leeds, introduces goals of Data Carpentry workshop.

Each workshop is taught by trainers trained by the SC organisation, using proven methods of delivery, to learners using their own laptops, and with plenty of support by knowledgeable helpers. Instructors at our workshops were from Leeds and EPCC. Comments from the learners – staff and postgraduate students from a range of schools, included, ‘Variety of needs and academic activities/disciplines catered for. Useful exercies and explanations,’ and ‘Very powerful tools.’

Lessons can vary between different workshops, depending on the level of the learners and their requirements, as determined by a pre-workshop survey. The Data Carpentry workshop on Monday and Tuesday included:

  • Using spreadsheets effectively
  • OpenRefine
  • Introduction to R
  • R and visualisation
  • Databases and SQL
  • Using R with SQLite
  • Managing Research & Data Management Plans

The Software Carpentry workshop was aimed at researchers who write their own code, and covered the following topics:

  • Introduction to the Shell
  • Version Control
  • Introduction to Python
  • Using the Shell (scripts)
  • Version Control (with Github)
  • Open Science and Open Research
Software Carpentry learners

Software Carpentry learners

Clearly the workshops were valued by learners and very worthwhile. The team will consider how it can offer similar workshops in the future at a similarly low cost; your ideas welcome!

Robin Rice
EDINA and Data Library

Share

Highlights from the RDM Programme Progress Report: February to April 2016

The membership of the Research Data Service Virtual Team across four divisions of IS was confirmed and met for the first time (to replace the former action group meetings) on 11 February where it was agreed meetings would be held approximately every six weeks for information and decision-making.

In February, the DataShare metadata was mapped to the PURE metadata and staff in L&UC and Data Library trained each other for creating dataset records in Pure and reviewing submissions in DataShare. It was agreed that staff would create records in Pure for items deposited in DataShare until the company (Elsevier) provides a mechanism for automatically inputting records into Pure.

In March, Jisc announced that the University of Edinburgh was selected as a framework supplier for their new Research Data Management Shared Service.

A review of the existing ethics processes in each college is in progress with Jacqueline McMahon at the College of Arts, Humanities and Social Sciences (CAHSS) to create a University-wide ethics template. There is also engagement with the School ethics committees at the School of Health in Social Sciences (HiSS), Moray House School of Education (MHSE), Law and School of Social and Political Science (SPS) in CAHSS.

The Research Data Management and Sharing (RDMS) Coursera MOOC opened for enrolment on 1 March 2016. This was completed in partnership with the University of North Carolina-Chapel Hill CRADLE project. Research Data Management and Sharing (RDMS) MOOC stats from the Coursera Dashboard reveal that as of 23 May 2016, there have been 5,429 visitors and 1,526 active learners; 335 visitors have completed the course.

The large data sharing investigation was completed for DataShare and reported previously. (Two new releases in DataShare defined: upload and download). Upload release (2.1) to go live 23 May 2016.

PURE dataset functionality is now included in standard PURE and Research Data Management (RDM) training. There are now 210 dataset records in PURE.

Four PhD interns were hired in mid-March to act as College representatives for the IS Innovation Fund Pioneering Research Data Exhibition. They will be employed until mid-December 2016.

A total of 363 staff and postgraduates attended RDM courses and workshops during this quarter.

There were 30 new DMPonline users and 55 new plans created during this quarter.

There are now 210 dataset metadata records in PURE.

A total of 56 datasets were deposited in DataShare during this quarter.

The total number of DataStore users rose from 12,948 in the previous quarter to 13,239 in this quarter, an increase of 291 new users.

National and International Engagement Activities

In February

  • Stuart Lewis gave a DataVault presentation at the International Digital Curation Conference (IDCC) in Amsterdam.

In March

  • A University news item was released to mark the launch of the Research Data Management and Sharing (RDMS) MOOC on Coursera. http://www.ed.ac.uk/news/2016/dataskills-010316
  • Stuart MacDonald gave an RDM presentation to trainee physicians at the Royal College of Physicians Edinburgh Course: Critical appraisal and research for trainees, Edinburgh. http://www.slideshare.net/smacdon2/rdm-for-trainee-physicians
  • Three delegates from Göttingen University were hosted here. The delegates have shared interests in RDM and visited to gain more insight into RDM support and experiences here.
  • Robin Rice gave an invited talk about the RDMS MOOC and web-based Survey Documentation and Analysis (SDA) tool to Learning, Teaching and Web and elearning@Ed Showcase and Network monthly gathering.

In April

As part of my responsibilities to cover the one year interim of Kerry Miller’s maternity leave, I will be writing blogs for this page until Kerry returns next summer.

Prior to this post, I worked the past 12 years as the geospatial metadata co-ordinator at EDINA. My primary role was to promote and support research data management and sharing amongst UK researchers and students using spatial data and geographical information.

Tony Mathys
Research Data Management Service Co-ordinator


Share

Twenty’s Plenty: DataShare v2.1 Upload Upgrade

We have upgraded DataShare (to v2.1) to enable HTML5 resumable upload. This means depositors can now use the user-friendly web deposit interface to upload numerous files at once via drag’n’drop. And to upload files up to 15 GB in size, regardless of network ‘blips’.

In fact we have reason to believe it may be possible to upload a 20 GB file this way: in testing, I gave it 2 hours till the progress bar said 100%, and even though the browser then produced an error message instead of the green tick I was hoping for, I found when I retrieved the submission from the Submissions page that I was able to resume, and the file had been added.

*** So our new advice to depositors is: our current Item size limit and file size limit is 20 GB. Files larger than 15 GB may not upload through your browser. If you have files over 15 GB or data totalling over 20 GB which you’d like to share online, please contact the Data Library team to discuss your options. ***

See screenshots below. Once the files have been selected and the upload commenced, the ‘Status’ column shows the percentage uploaded. A 10 GB file may take in the region of 1 hour to upload in this way. 15 GB files have been uploaded with Chrome, Firefox and Internet Explorer using this interface.

Until now, any file over 1 GB had caused browsers difficulties, meaning many prospective depositors were not able to use the web deposit interface, and instead had to email the curation team, arrange to transfer us their files via DropBox, USB or through the Windows network, and then the curator had to transfer these same files to our server, collate the metadata into an XML file, log into the Linux system and run a batch import script. Often with many hiccups concerning permissions, virus checkers and memory along the way. All very time-consuming.

Soon we will begin working on a download upgrade, to integrate a means for users to download much bigger files from DataShare outside of the limitations of HTTP (perhaps using FTP). The aim is to allow some of the datasets we have in the university which are in the region of 100 GB to be shared online in a way that makes it reasonably quick and easy for users to download them. We have depositors queueing up to use this feature. Watch this space.

Further technical detail about both the HTML5 upload feature and plans for an optimised large download release are available on the slides for the presentation I made at Open Repositories 2016 in Dublin this week: http://www.slideshare.net/paulineward/growing-open-data-making-the-sharing-of-xxlsized-research-data-files-online-a-reality-using-edinburgh-datashare .

NewUploadPage

A simple interface invites the depositor to select files to upload.

 

 

Upload15GB

A 15 GB file uploaded via Firefox on Windows and included in a submitted Item.

 

 

A 20 GB file uploaded and included in an incomplete submission.

A 20 GB file uploaded and included in an incomplete submission.

Pauline Ward, Data Library Assistant, University of Edinburgh

Share

DATA-X Workshop 2

We are holding our second DATA-X workshop on Wednesday 15 June at the
James Clerk Maxwell Building, Room 3217 and are inviting PhD students
and technologists to come along and participate in what we hope will
be lively discussion and activities.

We aim to engender Art and Science collaborations by offering
micro-funds towards each ‘installation’ as well as the opportunity to
publish in the exhibition catalogue and present at the Pioneering
Research Data Symposium later in the year.

We have over 20 registered participants with a range of research
interests including:
Crystal structure, Raman spectroscopy, Structural biology,
Measurement-Based Power Systems Control, Astrophysics, Polymer-sensors,
Biological data mining, Computational Mechanics,
Internationalization of Higher Education, Bioinformatics, Evolution,
Genomics, Visual sociology, Advertising, National identity, Environment,
Agriculture, Nutrient Management, Soil, Pollen, Farmers, Social Network
Analysis, Food security, Systems biology, Cell level
modelling, Cell physiology, Mobile User Experience, Enviromental
Sustainability, Political science, Human rights, Data materialisation,
Digital fabrication, Practice based research, Synthetic
biology!

To register for the workshop (and get a free lunch) see:
http://data-x.blogs.edina.ac.uk/

To find out more about DATA-X see:
http://data-x.blogs.edina.ac.uk/about/

or watch the short You Tube
trailer: https://youtu.be/NMPPZZc-sZ4

Please get in contact should you require further information.

All best,
Stuart Macdonald
DATA-X Project Manager / Associate Data Librarian
EDINA

Share