Skip to end of metadata
Go to start of metadata

Presenter: Amit Chourasia, Sr. Scientist, San Diego Supercomputer Center, UC San Diego

Abstract: An award-winning data management system for teams struggling with intractable data organization and data access.

Problems: access, storage, dissemination; missing context (emails, missing notes, protocols); scattered across systems

It's not enough to just store data - you need context to give it depth and meaning; otherwise it is harder to find and use, becoming "dark data"

Data creation, use/reuse increasing; research teams are larger and more heterogeneous; data doubling every year; sprawling data governance issues; subpar realization of data value. Not a lot of rich data management solutions out there.

SeedMeLab seeks to make data

  • accessible on web via browser, API and command line
  • sharable with access control
  • annotatable with context and metadata
  • presentable in rich format

The framework seeks to be

  • easily usable, customizable, extensible
  • turn-key deployable
  • mature and sustainable

Uses specialized Drupal modules for data management, REST service/client, visualization plugins, SSO

Add files via drag and drop; organize in folders; add rich description including formatted text, links, lists, tables, images, videos, equations; authors and users can comment, starting scholarly discussion and giving evidence of communication

Admins can add custom metadata fields across the file system

Auto generate and present visualization for any filetypes via plugins

Full indexing - can search filenames, descriptions, even comments

Built in user management/roles; SSO through OAuth2 and LDAP add-ons (wonder if we could get CAS?)

User can grant specific access on files/folders (not hugely complicated, because implementing sharing is hard, but can share at top level)  - global role-based permissions for viewing, sharing, inviting, etc.

Lots of customization - filesystem fields, data list views, visualization plugins, processing plugins, layout, theme, branding

REST client

FolderShare module under hood - no dependency besides Drupal core - implements virtual file system for Drupal; includes search indexing of page/files, task scheduler, pluggable file formatters


Use cases:

  • providing a branded DMS for research groups
  • integrate with existing systems for scientific apps
  • become a service provider

Specific apps:

  • FlowGate - science portal for diagnosing cancer

Users say it allows for swift feedback, makes it easier to write papers, fulfills data share requirements

Full demo available on the website

Currently no integrations with IR systems - this takes a huge effort. Looking at existing filesystems that might facilitate something like it but no plans to create any at this time. (We could always just encourage faculty to submit "finished" data to SJ.)