Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers

Authors: Veerle Van den Eynden, Louise Corti

Abstract: Sharing and publishing social science research data have a long history in the UK, through long-standing agreements with government agencies for sharing survey data and the data policy, infrastructure, and data services supported by the Economic and Social Research Council. The UK Data Service and its predecessors developed data management, documentation, and publishing procedures and protocols that stand today as robust templates for data publishing. As the ESRC research data policy requires grant holders to submit their research data to the UK Data Service after a grant ends, setting standards and promoting them has been essential in raising the quality of the resulting research data being published. In the past, received data were all processed, documented, and published for reuse in-house. Recent investments have focused on guiding and training researchers in good data management practices and skills for creating shareable data, as well as a self-publishing repository system, ReShare. ReShare also receives data sets described in published data papers and achieves scientific quality assurance through peer review of submitted data sets before publication. Social science data are reused for research, to inform policy, in teaching and for methods learning. Over a 10 years period, responsive developments in system workflows, access control options, persistent identifiers, templates, and checks, together with targeted guidance for researchers, have helped raise the standard of self-publishing social science data. Lessons learned and developments in shifting publishing social science data from an archivist responsibility to a researcher process are showcased, as inspiration for institutions setting up a data repository.

Citation: Van den Eynden, V. & Corti, L. (2017). Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers. Int J Digit Libr 18: 113. doi:10.1007/s00799-016-0177-3


Open Science: One Term, Five Schools of Thought

Authors: Benedikt Fecher & Sascha Friesike

Abstract: Open Science is an umbrella term encompassing a multitude of assumptions about the future of knowledge creation and dissemination. Based on a literature review, this chapter aims at structuring the overall discourse by proposing five Open Science schools of thought: The infrastructure school (which is concerned with the technological architecture), the public school (which is concerned with the accessibility of knowledge creation), the “measurement school”(which is concerned with alternative impact measurement), the “democratic school”(which is concerned with access to knowledge) and the “pragmatic school” (which is concerned with collaborative research).

It must be noted that our review is not solely built upon traditional scholarly publications but, due to the nature of the topic, also includes scientific blogs and newspaper articles. It is our aim in this chapter to present a concise picture of the ongoing discussion rather than a complete list of peer-reviewed articles on the topic. In the following, we will describe the five schools in more detail and provide references to relevant literature for each.

Citation: Fecher B & Friesike S. (2014). “Open Science: One Term, Five Schools of Thought”. In Opening Science. Amsterdam: Springer.



Peer Review and Replication Data: Best Practice from Journal of Peace Research

Authors: Nils Petter Gleditsch, Ragnhild Nordås, & Henrik Urdal

Abstract: Journal of Peace Research is an independent, interdisciplinary, and international journal devoted to the study of war and peace. It is owned by the Peace Research Institute Oslo (PRIO) and published on contract with Sage.Its articles range across all the social sciences, although a large majority of its authors now have their main training in political science. The international character of the journal is visible in the composition of the editorial committee and the authorship. The journal has long been a leader among the journals in political science and international relations in making research data publicly available, and is a pioneer in publishing dataset in the form of “special data features.”

Citation: Gleditsch, N. P., Nordås, R., & Urdal, H. (2017). Peer Review and Replication Data: Best Practice from Journal of Peace Research. College & Research Libraries, 78(3), 267–271.


Source: Peer Review and Replication Data: Best Practice from Journal of Peace Research

Research data explored: an extended analysis of citations and altmetrics

Authors: Isabella Peters, Peter Kraker, Elisabeth Lex, Christian Gumpenberger, Juan Gorraiz

Abstract: In this study, we explore the citedness of research data, its distribution over time and its relation to the availability of a digital object identifier (DOI) in the Thomson Reuters database Data Citation Index (DCI). We investigate if cited research data “impacts” the (social) web, reflected by altmetrics scores, and if there is any relationship between the number of citations and the sum of altmetrics scores from various social media platforms. Three tools are used to collect altmetrics scores, namely PlumX, ImpactStory, and, and the corresponding results are compared. We found that out of the three altmetrics tools, PlumX has the best coverage. Our experiments revealed that research data remain mostly uncited (about 85 %), although there has been an increase in citing data sets published since 2008. The percentage of the number of cited research data with a DOI in DCI has decreased in the last years. Only nine repositories are responsible for research data with DOIs and two or more citations. The number of cited research data with altmetrics “foot-prints” is even lower (4–9 %) but shows a higher coverage of research data from the last decade. In our study, we also found no correlation between the number of citations and the total number of altmetrics scores. Yet, certain data types (i.e. survey, aggregate data, and sequence data) are more often cited and also receive higher altmetrics scores. Additionally, we performed citation and altmetric analyses of all research data published between 2011 and 2013 in four different disciplines covered by the DCI. In general, these results correspond very well with the ones obtained for research data cited at least twice and also show low numbers in citations and in altmetrics. Finally, we observed that there are disciplinary differences in the availability and extent of altmetrics scores.

Citation: Peters, I., Kraker, P., Lex, E. et al. (2016). Research data explored: an extended analysis of citations and altmetricsScientometrics 107: 723. doi:10.1007/s11192-016-1887-4


Opening the Publication Process with Executable Research Compendia

Authors: Daniel Nüst, Markus Konkol, Edzer Pebesma, Christian Kray, Marc Schutzeichel, Holger Przibytzin, Jörg Lorenz


Abstract: A strong movement towards openness has seized science. Open data and methods, open source software, Open Access, open reviews, and open research platforms provide the legal and technical solutions to new forms of research and publishing. However, publishing reproducible research is still not common practice. Reasons include a lack of incentives and a missing standardized infrastructure for providing research material such as data sets and source code together with a scientific paper. Therefore we first study fundamentals and existing approaches. On that basis, our key contributions are the identification of core requirements of authors, readers, publishers, curators, as well as preservationists and the subsequent description of an executable research compendium (ERC). It is the main component of a publication process providing a new way to publish and access computational research. ERCs provide a new standardisable packaging mechanism which combines data, software, text, and a user interface description. We discuss the potential of ERCs and their challenges in the context of user requirements and the established publication processes. We conclude that ERCs provide a novel potential to find, explore, reuse, and archive computer-based research.


Citation: Nüst, D, Konkol, M, Pebsema, E, Kray, C, Schutzeichel, M, Przibytzin, H, Lorenz, J. (2017) Opening the Publication Process with Executable Research Compendia D-Lib Magazine 23(1-2).




The Scholix Framework for Interoperability in Data-Literature Information Exchange

Authors: Adrian Burton, Amir Aryani, Hylke Koers, Paolo Manghi, Sandro La Bruzzo, Markus Stocker, Michael Diepenbroek, Uwe Schindler, Martin Fenner


Abstract: The Scholix Framework (SCHOlarly LInk eXchange) is a high level interoperability framework for exchanging information about the links between scholarly literature and data, as well as between datasets. Over the past decade, publishers, data centers, and indexing services have agreed on and implemented numerous bilateral agreements to establish bidirectional links between research data and the scholarly literature. However, because of the considerable differences inherent to these many agreements, there is very limited interoperability between the various solutions. This situation is fueling systemic inefficiencies and limiting the value of these, separated, sets of links. Scholix, a framework proposed by the RDA/WDS Publishing Data Services working group, envisions a universal interlinking service and proposes the technical guidelines of a multi-hub interoperability framework. Hubs are natural collection and aggregation points for data-literature information from their respective communities. Relevant hubs for the communities of data centers, repositories, and journals include DataCite, OpenAIRE, and Crossref, respectively. The framework respects existing community-specific practices while enabling interoperability among the hubs through a common conceptual model, an information model and open exchange protocols. The proposed framework will make research data, and the related literature, easier to find and easier to interpret and reuse, and will provide additional incentives for researchers to share their data.


Citation: Burton, A, Aryani, A, Koers, H, Manghi, P, La Burzzo, S, Stocker, M, Diepenbroek, M, Schindler, U, Fenner, M. (2017) The Scholix Framework for Interoperability in Data-Literature Information Exchange D-Lib Magazine 23(1-2).




Assessing Stewardship Maturity of the Global Historical Climatology Network-Monthly (GHCN-M) Dataset: Use Case Study and Lessons Learned

Authors: Ge Peng, Jay Lawrimore, Valerie Toner, Christina Lief, Richard Baldwin, Nancy Ritchey, Danny Brinegar, Stephen A. Del Greco


Abstract: Assessing stewardship maturity — the current state of how datasets are documented, preserved, stewarded, and made accessible publicly — is a critical step towards meeting U.S. federal regulations, organizational requirements, and user needs. The scientific data stewardship maturity matrix (DSMM), developed in partnership with NOAA’s National Centers of Environmental Information (NCEI) and the Cooperative Institute for Climate and Satellites-North Carolina (CICS-NC), provides a consistent framework for assessing stewardship maturity of individual Earth Science datasets and capturing justifications for transparency. The consolidated stewardship maturity information will allow users and decision-makers to make informed use decisions based on their unique data needs. This DSMM was applied to a widely utilized monthly-land-surface-temperature dataset derived from the Global Historical Climatology Network (GHCN-M). This paper describes the stewardship maturity ratings of GHCN-M version 3 and provides actionable recommendations for improving the maturity of the dataset. The results from the use case study show that an application of DSMM like this one is useful to people who produce or care for digital environmental datasets. Assessments can identify the strengths and weaknesses of an individual dataset or organization’s preservation and stewardship practices, including how information about the dataset is integrated into different systems.


Citation: Peng, G., Lawrimore, J., Toner, V., Lief, C., Baldwin, R., Ritchey, N., . . . Greco, S. A. (2016). Assessing Stewardship Maturity of the Global Historical Climatology Network-Monthly (GHCN-M) Dataset: Use Case Study and Lessons Learned. D-Lib Magazine, 22(11/12).




A Data Citation Roadmap for Scholarly Data Repositories

Authors: Martin Fennera, Merce Crosasb, Jeffrey S. Grethec, David Kennedy, Henning Hermjakobe, Phillippe Rocca-Serraf, Robin Berjong, Sebastian Karcherh, Maryann Martonei, Tim Clark


Abstract: This article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles (Data Citation Synthesis Group, 2014), a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Early Adopters Expert Group, part of the Data Citation Implementation Pilot (DCIP) project (FORCE11, 2015), an initiative of and the NIH BioCADDIE (2016) program. The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories.


Citation: Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., … Clark, T. (2016). A Data Citation Roadmap for Scholarly Data Repositories. bioRxiv.




Research Data Services in Academic Libraries: Data Intensive Roles for the Future?

Authors: Carol Tenopir, Dane Hughes, Suzie Allard, Mike Frame, Ben Birch, Lynn Baird, Robert Sandusky, Madison Langseth, and Andrew Lundeen


Abstract: Objectives: The primary objectives of this study are to gauge the various levels of Research Data Service academic libraries provide based on demographic factors, gauging RDS growth since 2011, and what obstacles may prevent expansion or growth of services.Methods: Survey of academic institutions through stratified random sample of ACRL library directors across the U.S. and Canada. Frequencies and chi-square analysis were applied, with some responses grouped into broader categories for analysis.

Results: Minimal to no change for what services were offered between survey years, and interviews with library directors were conducted to help explain this lack of change.

Conclusion: Further analysis is forthcoming for a librarians study to help explain possible discrepancies in organizational objectives and librarian sentiments of RDS.


Citation: Tenopir, C, Hughes, D, Allard, S, Frame, M, Birch, B, Baird, L., Sandusky, R, Langseth, M, & Lundeen, A (2015) Research Data Services in Academic Libraries: Data Intensive Roles for the FutureJournal of eScience Librarianship 4(2): e1085.




Analyzing data citation practices using the Data Citation Index

Authors: Nicolas Robinson-Garcia, Evaristo Jiménez-Contreras, Daniel Torres-Salinas

Abstract: We present an analysis of data citation practices based on the Data Citation Index from Thomson Reuters. This database launched in 2012 aims to link data sets and data studies with citations received from the other citation indexes. The DCI harvests citations to research data from papers indexed in the Web of Science. It relies on the information provided by the data repository as data citation practices are inconsistent or inexistent in many cases. The findings of this study show that data citation practices are far from common in most research fields. Some differences have been reported on the way researchers cite data: while in the areas of Science and Engineering and Technology data sets were the most cited, in Social Sciences and Arts and Humanities data studies play a greater role. A total of 88.1 percent of the records have received no citation, but some repositories show very low uncitedness rates. Although data citation practices are rare in most fields, they have expanded in disciplines such as crystallography and genomics. We conclude by emphasizing the role that the DCI could play in encouraging the consistent, standardized citation of research data; a role that would enhance their value as a means of following the research process from data collection to publication.

Citation: Nicolas Robinson-Garcia, Evaristo Jiménez-Contreras, Daniel Torres-Salinas. (2015).  Analyzing data citation practices using the Data Citation Index. JASIST. doi: