Data

Always Findable: Persistent Identifiers for Digital Assets in the University of British Columbia

Eugene Barsky, University of British Columbia
Poster Session

Abstract

Persistent identifiers (PIs) are easily maintainable digital identifiers that allow digital assets – a file or set of files, such as an article, dataset, paper, video, image, or a piece of software – to be referenced and accessed in a variety of contexts. Some persistent identifiers, such as Digital Object Identifiers (DOIs), specifically contribute to the discovery and access of institutional digital assets, such as dissertations, publications, videos and research datasets.

In a large scale digital initiatives project, the University of British Columbia (UBC) Library developed Open Collections, (https://open.library.ubc.ca), a platform featuring more than 217,000 items to enhance discovery, citation and reproducibility of library-managed digital assets. 

An innovate core component of this project is a comprehensive and flexible solution for creating DOIs, utilizing a custom interface for both automated and manual DOI assignment for UBC digital assets. 

In this presentation, we will provide background and history of various persistent identifiers, then  focus on DOIs, and explain the process behind UBC’s custom solution for minting DOIs with Datacite Canada while illustrating how these persistent identifiers contribute to the delivery, discovery and access of institutional digital assets.

CanDIG: Distributed Genomic Analysis for a Federal Canada

Jonathan Dursi, CanDIG
Presentation 

Abstract

Tackling the “wicked problems” of cancer and rare diseases against the already complex landscape of human biology requires health researchers to have access to as much health and genomic data as possible in order to see connections and test hypotheses. While a torrent of genomic data is now being produced at sites across Canada, accessibility to researchers means more than having it sit on a disk somewhere with the right permission bits set. It has to be discoverable, analyzable, available, and linked to vital metadata for it to be useful in improving human health.

In this talk we present the Canadian Distributed Infrastructure for Genomics (CanDIG, http://distributedgenomics.ca ), a fully distributed platform that allows national-scale, privacy-maintaining analyses of locally-controlled data sets. CanDIG is a CFI Cyberinfrastructure-funded project with initial sites in Toronto, Montréal, and Vancouver. With participating members including the top generators of genomic data for Canadian patients, CanDIG will serve as a foundational platform for all of these institutions collaborative ventures, with a long-term goal of extending data sharing to a wider base of researchers across Canada.  

The CanDIG platform represents a new approach to making sensitive genomic data available to analysis across multiple providers. By moving computation to the data, we enable truly national-scale analysis of private health data in Canada, where provincial health data privacy protections can make it difficult for clinical data to leave the province it was collected in.  As privacy protections built in from the very beginning, we make it easier for health data stewards to justify allowing their data to be part of some remote analyses. Granular control of the amount of data and information being released, and to whom, is a fundamental part of the overall design. Our efforts build on and contribute back to the efforts of the international Global Alliance for Global Health (GA4GH, http://genomicsandhealth.org ), using standardized RESTful APIs for data access to provide interoperability with a wide range of tools, and web-era authentication and authorization standards (OpenID Connect and UMA) to ensure privacy and security of all data. 

CARL's Portage Network Initiatives

Jeff Moon, Queen's University
Presentation 

Abstract

This presentation will describe the Portage Network, an initiative of the Canadian Association of Research Libraries (CARL) with the goal of building research data management (RDM) capacity in Canada through a network of experts in a growing community of practice.  The session will focus on progress made by the Portage Network in finding solutions to practical challenges researchers and institutions face in meeting journal and funding-agency requirements regarding data management.  Specifically, we will describe the strides Portage has made in responding to the three pillars of the emerging Tri-Agency policy regarding data management, and how Portage continues to innovate and contribute to the RDM ecosystem on both practical and policy fronts.

Coordinating a National RDM Strategy

Lori MacMullen, CUCCIO
Jeff Moon, Queen's University
Lee Wilson, ACENET
Panel Discussion

Abstract

This panel recreates one held in 2016 but still equally important given the upcoming requirements around RDM which all research institutions will need to comply with. The panel will look at what is currently being done by Research Data Canada, CARL/Portage, Compute Canada and how these organizations are working together to provide the support and services needed. Collaborative solutions planned and currently underway will be discussed including the scope, unique needs and challenges faced by each group as well as the strengths each brings to the project. By working together on a national strategy we are able to develop a multifaceted problem and solution which we all have a role in solving.

Efficient Research Data Management Using Globus SaaS

Vas Vasiliadis, University of Chicago, U.S.
Presentation 

Abstract

Globus is software-as-a-service that is rapidly becoming the de facto standard for managing research data on a wide variety of HPC and campus computing resources. While usage among service providers affiliated with Compute Canada continues to grow, there are still many Canadian institutions and investigators who are either not aware of the capabilities and benefits Globus can provide, or have limited-scope deployments and need assistance to expand usage beyond simple file transfer tasks. 

We will present an overview of the key features of the Globus service and how it may be used to deliver robust research data management services that span campus systems, national cyberinfrastructure, and public cloud resources. Globus is installed in research computing centers at hundreds of universities worldwide, as well as many US national facilities and federal agencies - we will draw on experiences from this broad user base to highlight the challenges in delivering scalable research data management services, as well as key use cases and implementation takeaways. This presentation will include a live demonstration of file transfer, sharing, and data publication, as well as an overview of the requirements for making a system accessible via Globus. In particular, we will focus on ease of use for non-technical end-users, and minimizing overhead for system administrators tasked with deploying Globus endpoints on their storage systems.

Federated Research Data Repository (FRDR) Update

Lee Wilson, ACENET
Jason Hlady, University of Saskatchewan
Alex Garnett, Simon Fraser University
Presentation 

Abstract

This presentation will provide an overview of the Federated Research Data Repository (FRDR) - https://frdr.ca/ - a scalable, federated platform for digital research data management and discovery of Canadian research data, developed through a partnership between Compute Canada and the Canadian Association of Research Libraries’ Portage Network. We will discuss major features such as FRDR’s discovery layer that harvests and aggregates metadata records from dozens of repositories across Canada to make them discoverable from a single search interface, its large and geographically distributed primary data storage, and its automated preservation processing through integration with Archivematica. We will also review FRDR within the broader context of the developing Canadian research data management ecosystem and offer insights into when to use FRDR over other repository solutions. Finally, we will highlight several of the research projects we are working with during our limited production phase and discuss how researchers at your institutions can start using FRDR to ensure that their research data is discoverable and accessible in compliance with forthcoming Tri-Agency policies for research data management

From Project Implementation to Sustainment - Keeping Research on TRAQ

Tom Herra, Queen's University
Birds of a Feather

Abstract

Queen's University Research Services office implemented electronic research system TRAQ (Tools for Research at Queen's) in stages. Several modules were implemented. Human Ethics Certifications - in 2010, Researcher Portal - 2011, Biohazards - 2013, Awards - 2014. TRAQ in connection with Researcher Portal allows researchers to apply for internal approvals of grants, contracts and associated human ethics certifications and biohazard permits to support compliance. This presentation will discuss transition from implementation to sustainment. Despite steady increase of number of applications and post-approval event forms, sustainment team had managed to support the system. Ticketing  system implemented in 2014 shows no major changes in number of tickets generated per year. University environment with changing user groups (new Faculty, staff and annual addition of the new student cohort) creates the need for continuous education, training an support. Quite often major electronic system implementation involves substantial  budgetary support along with creation of project implementation team. Major objectives of the project included:

  • increasing operational efficiencies and enhancing workflow management,
  • switch from paper based applications to electronic submissions,
  • replace old in-house build electronic awards system to the new electronic system which covers pre- and post-awards processes,
  • enabling flexible research project management reporting,
  • enhancing strategic and integrated planning capabilities,
  • supporting regulatory compliance and accountability,
  • enhancing institutional reputation with external stakeholders. 

Electronic system sustainment  team tasks and duties and major challenges post implementation will be reviewed and discussed. John Kotter's 8-steps change management model will be reviewed in connection with TRAQ implementation and sustainment. 

It's 10 o'clock… do you know where your data are?

Scott Baker, University of British Columbia
Presentation 

Abstract

With untold dangers lurking at every turn, research projects with sensitive data must navigate the shoals of security, evade the privateers of privacy, climb the cliffs of compliance, and achieve audit approval!  Explore how effective research data management including a fusion of information security, privacy, data management, procedures, technology, compliance, and even audit can be a far less treacherous voyage than legends might imply!

Policy by Data Science

Byron Chu, Cybera
30 minute mini-presentation

Abstract

In 2017, Cybera’s data scientists undertook a monumental task: to create a machine learning tool that could make it easier for members of the public to examine the 65,000 pages of documents and testimonials provided to the CRTC for its 2015 “review of basic telecommunications service” consultation. Our goal was to use data science principles (such as natural language processing, machine learning, and fuzzy searches) to make it easier for Canadians to better understand the information presented at public proceedings and their connection to policy decisions designed to safeguard the internet.

In order to explore the data, we wrote a browser-based application, in Ruby, using the lightweight Sinatra framework. Next, we used a Java-based “graph database,” called Neo4J, which models relationships between data as connected nodes on a graph, and had been used in the investigative journalism work on the Paradise and Panama Papers. Finally, we used the Apache Solr project to help find relevant sections of text from larger documents that we wanted to pull out and do further analysis on. 

The above tools have broad applications in data processing and analysis, beyond our specific use case. We hope to demonstrate their potential to the CANHEIT-TECC audience. This presentation will feature demonstrations of the analysis tools and relevant data visualizations.This project was funded by the Canadian Internet Registration Authority (CIRA) Community Investment Program. 

The long tail of data science? Supporting research with coding and analytics

David Chan, Cybera
Panel Discussion

Abstract

Code-based analyses of data are becoming more ubiquitous in every research sector. However, coding and programming-based analyses, using tools such as R and python, are traditionally not in the repertoire of most research groups. In the humanities, researchers often rely on prefabricated computational tools, which may or may not be suitable for their research problem. In the sciences, programming and coded analyses can be a challenge for experimental groups, where coding is not part of their traditional training. Advanced techniques, such as machine learning, may present a particularly daunting black box for any research group.

To overcome these challenges, there is growing support for non-programming based research groups, and various approaches are being explored. For example: 

  • Software carpentry has expanded to provide Data Carpentry training, with the goal of introducing data analysis concepts to all researchers
  • The University of California - Berkeley is teaching Data Science 101 to any interested first year student, and it has seen this class become its fastest growing undergraduate course ever
  • Dedicated data science support teams are appearing at universities to aid research groups lacking those skills

In this session, we will convene a panel of experts that represent the different approaches to providing data science support to research groups. They will discuss the challenges, successes, and paths forward to leveraging data science in all branches of research. 

We are all big data: A vision for engaging researchers to unlock knowledge

Trevor Roald, Simon Fraser University
Seychelle Cushing, Simon Fraser University
Presentation 

Abstract

There are few guideposts on a data journey.  

As a university, SFU has the opportunity to engage in data-intensive research like never before. But, how can we harness the potential of data, leverage Supercomputer Cedar, and empower researchers from across the disciplines to help deliver breakthroughs faster? 

A data journey is not just for the few: it can be for all. From genomics to criminology, physics to healthcare and more, KEY, SFU’s Big Data Initiative is supporting the data journey for researchers across all disciplines. We are lowering the barriers for people to engage in advanced research computing, accelerating data-intensive research and innovation, and sparking new collaborations across diverse sectors in industry and government. 

During this presentation, Seychelle Cushing and Trevor Roald share the stories, the lessons, and the struggles of how SFU is empowering people to engage with data and setting guideposts to ease the data journey for those who will follow.

Your Data or Your Dog: Enterprise Backups for Macs

Geoff Brown, Simon Fraser University
30 minute mini-presentation

Abstract

Every IT administrator wants to safeguard their user’s data. But how do you ensure that the tools are being leveraged to their potential? In a university environment where research, academic and personal data are at risk, it is paramount that this data be protected.

The difficulties in ensuring data reliability for endpoints are discussed and solutions are outlined. The goal of this case study is to demonstrate that as professionals, IT departments are the ultimate agent in maintaining client’s data. With a cradle-to-grave approach to systems management, it is an IT administrator’s responsibility to take the correct measures to ensure this.

Relative cost and risks are compared vis-à-vis in hopes of illuminating the alternatives. Geoff uses real-world, personal examples to give insight into why end-users with the best of intentions still fail to safeguard their data. The technical and business aspects of one university’s decision-making process is evaluated and the chosen solution is discussed.

In this session, IT administrators and decision-makers will gain insight on Simon Fraser University's journey from using a sync service, to no backup, to local backup disks, to an enterprise solution while maintaining our commitment to the reliability and security of our clientele's data.

CANHEIT-TECC 2018 : June 18-21