The Research Data Alliance or RDA is growing about as fast as the data all around us. It got off the ground in 2012 with the support of major research funders in Europe, the US and Australia and has since grown to over 3,000 members. The latest plenary in Paris set a new registration record of ~700 ‘data folk’ including data scientists, data managers, librarians and policy-makers. The theme was Enterprise Engagement with a focus on Research Data for Climate Change.
Not an ordinary conference
What sets RDA apart from other data-related organisations is not just the size of its gatherings, but its emphasis on making change. Parallel sessions are not filled with individual presentations of research papers, but of collaborative activities that lead to outputs that can be used in the real world. Working groups are approved by governance structures that coalesce around actual problems that cannot be solved by individual organisations but require new top-level approaches. They are required to produce their deliverables and close shop after an 18 month period. Interest groups are allowed to exist longer, but are encouraged to spin off working groups to address changes as they are identified through group discussion.
Since 2012, these working groups have produced some impressive deliverables and pilots that if implemented across the Web and across organisations and countries could speed up research and improve reproducibility. They are governed by an elected group of experts, worldwide. Some current active projects are:
- Data Foundation and Terminology WG: defining harmonised terminology for diverse communities used to their own data ‘language’
- Data Type Registries WG: building software to implement a DTR that can automatically match up unknown dataset ‘types’ with relevant services or applications (such as a viewer)
- PID Information Types WG: Creating a single common API for delivering checksums from multiple persistent identifier service providers (DataCite and others)
- Practical policy WG: building on a previous WG that collected various machine-actionable policies practiced by different data centres and repositories, this group will register the policies to move repository managers to move towards a harmonised set.
- Scalable Dynamic Data Citation WG: to solve the difficulty of properly citing dynamic data sources, the recommended solution allows users to re-execute a query with the original time stamp and retrieve the original data or to obtain the current version of the data.
- Data Description Registry Interoperability WG: to solve the problem of scattered datasets across repositories and data registries, the group build Research Data Switchboard linking datasets across platforms.
- Metadata Standards Directory WG: By guiding researchers towards the metadata standards and tools relevant to their discipline, the directory drives up adoption of those standards, improving the chances of future researchers finding and using the data.
Members of the RDM team have been involved in library and repository-related interest groups and Birds of a Feather groups, where surveys of current practice have circulated.
RDA and climate change
Climate science was prominent in the 6th RDA plenary. This was not only due to the imminent Paris-based United Nations COP talks, but indeed due to issues of critical importance for the world today. For some years, driven by the climate model inter-comparison work underpinning Intergovernmental Panel on Climate Change (IPCC) reports and the massive datasets from Earth observation climate science has been located at an intersection of high performance computing, big data management, and services to support and stimulate research, commerce, and governmental initiatives.
Assessment of the risks posed by climate change, and strategies for adaptation and mitigation sharpens the need to solve not only the technical problems of bringing together diverse data (social, soil, climate, land-use, commercial,…) but also to address the policy challenges, given the diverse organisations needing to cooperate. This is a domain that builds on services to give access to data, for computation close to data enabled by e-infrastructure (such as EGI), and one that requires ever stronger approaches to brokering these resources and services, to permit their orchestration and integration.
Among initiatives presented in the climate-related sessions were:
- GEOSS – The GEOSS Common Infrastructure allows the user of Earth observations to access, search and use the data, information, tools and services available through the Global Earth Observation System of Systems
- Global Agricultural Monitoring (GEOGLAM) initiative in response to the growing calls for improved agricultural information.
- An RDS group focused on wheat – the volatility in prices, in part driven by climate unpredictability, has become a major concern.
- The IPSL Mesocentre
- IS-ENES developing services for climate modelling especially
- Copernicus, seeking to “support policymakers, business, and citizens with improved environmental information. Copernicus integrates satellite and in-situ data with modeling to provide user-focused information services”
- CLIPC will provide access to climate datasets, and software and information to assess indicators for climate impact.
Dr. Mike Mineter, School of GeoSciences and Robin Rice, EDINA and Data Library