JISC Digifest Keynote

Posted by & filed under News, Presentations.

Carole Goble gave a thought provoking keynote presentation at the JISC Digital Festival where she talked about www.researchobject.org. It was great to see that the TARDIS (Time and Relative Dimensions in Scholarship) analogy was picked up by Chris Parr from Times Higher Education. You can find the talk live on the JISC website here.

Example of Encoding an RO using RDF-a

Posted by & filed under News.

rdf-a example screenshot

As an example of encoding a Research Object using RDF-a is now available to the community, see the “Parameter Optimization of an Ecological Niche Modeling Workflow”. The page represents a Research Object representing the optimizations made on the AUC output parameter of the ENM workflow using Support Vector Machines (SVM). The optimizations are performed using genetic algorithms and have been represented with the Research Object Optimization Ontology.

 

 

Our Workshop on “What Bioinformaticians need to know about digital publishing beyond the PDF2″ has been accepted for ISMB2014

Posted by & filed under News.

Following our very successful workshop on “What Bioinformaticians need to know about digital publishing beyond the PDF” last year in ISMB 2013 (see our blog post), we are thrilled to announce that our workshop series will be continued in ISMB2014, Boston, USA (tentative dates July 13, 14 or 15). This year, we will join force with BioMed Central, to expand our discussion on new ways of digital publishing by a new topic, i.e., new ways of peer review and their impact on bioinformatics. The workshop is organized by the managing board of Research Object, representatives of ISA community, and participation of BioMed Central and their journals Biology Direct and GigaScience. The workshop web site will be announced shortly! Mark your date, and join us in Boston!

The Launch of Research Object Creator Tool (Give it a try!)

Posted by & filed under News.

ro creator example screenshot

 Research Object Creator Tool is a very lightweight RO creation tool built by Daniel Garijo of UPM. The tool takes as input a LaTeX file and extracts its title and abstract to create an annotated page in RDF-a. It also produces a structure of the contents to reference, so users only have to fill in (and annotate if you want) the resources to point to. A sample can be seen in the image to the left and more details can be found in Daniel’s blog page.

Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome

Posted by & filed under News.

Wf4Ever member Daniel Garijo of UPM published an article, titled “Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome” in the world-leading Computational Biology journal PLOS ONE, with researchers from 5 other institutions. The article quantifies the cost of reproducing computational research in computational biology by using an exemplar experiment from one of the co-author’s own research group. The article provides useful insights and guidelines for improving reproducibility.

10th International Conference on Preservation of Digital Objects

Posted by & filed under News.

 

Screen Shot Research Object Digital Library Demo

Half day tutorial: From Preserving Data to Preserving Research: Curation of Process and Context at iPres 2013. The video presented at the tutorial shows the process of building a research object from the music research experiment, with the purpose of sharing and preserving the experiment and its context, in order to facilitate its reusability, reproducibility and better understanding.

From Preserving Data to Preserving Research:Curation of Process and Context

Posted by & filed under Event.

The TIMBUS and Wf4Ever projects are offering a half-day tutorial at the International Conference on Theory and Practice of Digital Libraries (TPDL) 2013, in Valletta, Malta on September 22, 2013. http://tpdl2013.upatras.gr/tut-pdpr.php

ABSTRACT

In the domain of eScience, investigations are increasingly collaborative. Most scientific and engineering domains benefit from building on the outputs of other research: by sharing information to reason over and data to incorporate in the modeling task at hand. This raises the need for preserving and sharing entire eScience workflows and processes for later reuse. We need to define which information is to be collected, create means to preserve it and approaches to enable and validate the re-execution of a preserved process. This includes and goes beyond preserving the data used in the experiments, as the process underlying its creation and use is essential.

The TIMBUS project and Wf4Ever project team up for this half-day tutorial to provide an introduction to the problem domain and discuss solutions for the curation of eScience processes.

TUTORIAL LEVEL

Introductory level

DURATION

Half-day

OUTLINE OF THE CONTENT

The tutorial will cover the following topics:

Introduction to Process and Context Preservation: The introduction will motivate the need for process and context preservation, illustrate how this task is difficult in an evolving domain, and introduce a use case for the rest of the tutorial to illustrate approaches and tools.

Data Citation: Data forms the basis of the results of many research publications, and thus needs to be referenced with the same accuracy as bibliographic data. Only if data can be identified with high precision can it be reused, validated, verified and reproduced. Citing a specific data set is however not trivial – it exists in a vast plurality of specifications and instances, can potentially be huge in size, and its location might change. We will provide an overview over existing approaches to overcoming these challenges. Further, we will present the issue of creating data citations of data held in databases, especially of dynamic data sets where data is added or updated on a regular basis.

Re-usability and traceability of workflows and processes: The processes creating and interpreting data are complex objects. Curating and preserving them requires special effort, as they are dynamic, and highly dependent on software, configuration, hardware, and other aspects. We will discuss these issues in detail, and provide an introduction to two complementary approaches.

The first approach is based on the concept of Research Objects, which adopts a workflow-centric approach and thereby aims at facilitating the reuse and reproducibility. It allows packaging the data and the methods as one Research Object to share and cite it, and thus enable publishers to grant access to the actual data and methods that contribute to the findings reported in scholarly articles.

A second approach focuses on describing and preserving a process and the context it is embedded in. The artifacts that may need to be captured range from data, software and accompanying documentation, to legal and human resource aspects. Some of this information can be automatically extracted from an existing process, and tools for this will be presented. Ways to archive the process and to perform preservation actions on the process environment, such as recreating a controlled execution environment or migration of software components, are presented. Finally, the challenge of evaluating the re-execution of a preserved process is discussed, addressing means of establishing its authenticity.

INTENDED AUDIENCE

The tutorial is targeted at researchers, publishers and curators in eScience disciplines who want to learn about methods of ensuring the long-term availability of experiments forming the basis of scientific research.

EXPECTED LEARNING OUTCOMES

The tutorial participants will understand

  • Motivations and challenges of process preservation
  • Motivations, stakeholders and challenges of making data citable
  • How Data is Cited Today: OECD report on data citability, Google search of data sets, requirements, guidelines, metadata, locators and identifiers, approaches to naming schemes and properties.
  • Available technologies for identifiers: Archival Resource Key (ARK),  Digital Object Identifiers (DOI), Extensible Resource Identifier (XRI), HANDLE, Life Science ID (LSID),    Object Identifiers (OID), Persistent Uniform Resource Locators (PURL), URI/URN/URL, Universally Unique Identifier  (UUID)
  • Approaches and Initiatives for citing data: CODATA, Data Cite, OpenAire, challenges and opportunities: granularity, scalability, complexity and evolving data sets current research questions
  • Ontologies needed to capture research objects: Core Ontology of the RO family of vocabularies, workflow centric ROs, provenance traces, life cycle of research objects.
  • Wf4Ever Toolkit / technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows: software architecture, functionalities, software interfaces to functionalities, reference implementation as services and clients:

- Collect, manage and preserve aggregations of scientific workflows and related objects and annotations
- Workflow sharing through a social website
- Execution of workflows
- Testing completeness, execution, repeatability and other desired quality features
- Testing the ability of a Research Object to achieve its original purpose after changes to its resources.
- Recommendations of relevant users, Research Objects and their aggregated resources
- Converting workflows into Research Objects
- Search for workflows by input parameters or frequency of use
- Collaborative environment
- Access and use of research objects and aggregated resources.
- Synchronization with remote repositories
- Visualization of correlation between similar objects

  • TIMBUS context model and tools to semi-automatically capture the relevant context of a business process for preservation

- The scope of context regarding business process preservation – technology, application and business context, aligned with enterprise architecture

- The context meta-model, with domain independent and domain specific aspects

- Demonstration of a context model instance of example processes (in the eScience domain)

- Tools to automatically capture some parts of the context (software dependencies, data formats, licenses, …)

- Outlook on reasoning and preservation planning, based on the context model

 

BIOGRAPHY OF THE PRESENTER(S)

Daniel Garijo is a PhD student in the Ontology Engineering Group at the Universidad Politecnica de Madrid. His research activities focus on e-Science and the Semantic Web, specifically on how to increase the understandability of scientific workflows using provenance and metadata. He is a member of the W3C Provenance Working Group, and he is currently part of the Wf4Ever project.

Rudolf Mayer is a researcher at Secure Business Austria, as well as the Department of Software Technology and Interactive Systems at the Vienna University of Technology. His research interests cover digital preservation, specifically the preservation of processes, information retrieval (specifically on text documents and music), data analysis and machine learning. He has many years of lecturing experience in these subjects. He has been involved in the DELOS and PLANETS projects, and currently works on digital preservation aspects in the FP7 projects APARSEN and TIMBUS.

Raul Palma is a researcher at Poznan Supercomputing and Networking Center (PSNC). His research interests cover digital preservation, particularly of scientific methods, provenance and evolution of digital artifacts, ontology engineering and distributed technologies. He has participated in several EU projects, including the Network of Excellence Knowledge Web, NeOn, e-Lico and WF4Ever. He has many years of lecturing experience in related topics, both at the university and private institutions. He has authored or co-authored several vocabularies and ontologies, such as the Research Object evolution Ontology, Ontology Metadata Vocabulary (OMV) and different extensions for describing ontologies and related resources, models for collaborative ontology construction and digital multimedia repositories

Stefan Pröll is a researcher at SBA Research. His primary research focus lies on digital preservation, especially on security aspects of digital archives, including authenticity and provenance of digital objects. Further areas of interest are databases and data citation. Currently he is working on FP7 projects APARSEN and TIMBUS focusing on security and provenance related topics. Before he joined SBA in April 2011, he was working in international organizations in the area of Web development, Linux server and database administration.

Andreas Rauber is Associate Professor at the Department of Software Technology and Interactive Systems at the Vienna University of Technology. He is involved in several research projects in the field of Digital Libraries, focusing on the organization and exploration of large information spaces, as well as Web archiving and digital preservation. His research interests cover the broad scope of digital libraries, including specifically text and music information retrieval and organization, information visualization, as well as data analysis and neural computation. He has been involved in numerous initiatives in the area of digital preservation (DELOS, DPE, Planets, SCAPE, TIMBUS, APARSEN). He has been lecturing extensively on this subject at different universities, as part of the DELOS and nestor summer schools on digital preservation, as well as during a range of training events on digital preservation.