Presenter: Amit Chourasia, Sr. Scientist, San Diego Supercomputer Center, UC San Diego
Abstract: An award-winning data management system for teams struggling with intractable data organization and data access.
Problems: access, storage, dissemination; missing context (emails, missing notes, protocols); scattered across systems
It's not enough to just store data - you need context to give it depth and meaning; otherwise it is harder to find and use, becoming "dark data"
Data creation, use/reuse increasing; research teams are larger and more heterogeneous; data doubling every year; sprawling data governance issues; subpar realization of data value. Not a lot of rich data management solutions out there.
SeedMeLab seeks to make data
- accessible on web via browser, API and command line
- sharable with access control
- annotatable with context and metadata
- presentable in rich format
The framework seeks to be
- easily usable, customizable, extensible
- turn-key deployable
- mature and sustainable
Uses specialized Drupal modules for data management, REST service/client, visualization plugins, SSO
Add files via drag and drop; organize in folders; add rich description including formatted text, links, lists, tables, images, videos, equations; authors and users can comment, starting scholarly discussion and giving evidence of communication
Admins can add custom metadata fields across the file system
Auto generate and present visualization for any filetypes via plugins
Full indexing - can search filenames, descriptions, even comments
Built in user management/roles; SSO through OAuth2 and LDAP add-ons (wonder if we could get CAS?)
User can grant specific access on files/folders (not hugely complicated, because implementing sharing is hard, but can share at top level) - global role-based permissions for viewing, sharing, inviting, etc.
Lots of customization - filesystem fields, data list views, visualization plugins, processing plugins, layout, theme, branding
FolderShare module under hood - no dependency besides Drupal core - implements virtual file system for Drupal; includes search indexing of page/files, task scheduler, pluggable file formatters
- providing a branded DMS for research groups
- integrate with existing systems for scientific apps
- become a service provider
- FlowGate - science portal for diagnosing cancer
Users say it allows for swift feedback, makes it easier to write papers, fulfills data share requirements
Full demo available on the website
Currently no integrations with IR systems - this takes a huge effort. Looking at existing filesystems that might facilitate something like it but no plans to create any at this time. (We could always just encourage faculty to submit "finished" data to SJ.)