Data Across Boundaries Workshop | October 14 @ CROSS Symposium 2021

Title

What can interdisciplinary data science and open-source software development learn from each other?

→ Details | → Speakers | → Audience | → Organizers

Data is the lifeblood of science, no matter which discipline we look at. However, it is rarely the case that one can readily understand domain-specific data, as they depend on the context in which they were acquired, as well as the acquisition methodology. Often the full meaning of a specific dataset exists only in one scientist's mind.

Gaining insight into a particular dataset is therefore important. This happens through the process of abstraction, be it visual, textual, auditory or a mix of those. The abstraction relies on a shared understanding between the acquirer of the data and their consumer. Transparency, shared terminology, and documentation are some of the key ingredients here.

Unsurprisingly, software developers face problems similar to the above: hundreds of engineers can contribute to a project where each has only a fractional knowledge of the whole system. On top of that, the open source model has no central entity that would manage the whole process. Much like science, this effort relies on self-organization.

The aim of this workshop is to draw parallels between these two communities — scientists and open-source developers — and learn from each other's experiences in fostering communication and shared understanding across the inevitable boundaries.

Update: the recording is now available!

 Details

Data Across Boundaries will take place online within the CROSS Research Symposium on October 14 (9:45-11:15 am PT / 12:45-2:15 pm EST / 6:45-8:15 pm CET). Registration to the symposium is free - please register here if you'd like to take part in the workshop!

The Zoom link to the session is available in the Agenda document (accessible only to the registered participants, see above).

The workshop will have two parts:
    1) Individual presentations from the contributors, each up to 15 minutes long including a quick immediate QnA
    2) Roundtable discussion between the participants on questions related to the workshop’s theme

 Speakers

Data Across Boundaries will feature four speakers with expertise in open data visualization: Dr. Jing Zhu (computational genomics), Dr. Matthew J. Turk (astrophysics), Dr. Alexander Bock (astronomy, astrographics), and Jessica Kendall-Bar (marine biology).


Speaker4

Dr. Jing Zhu leads the UCSC Xena project, a web-browser based visualization and analysis tool to explore cancer genomics data. Jing is a research scientist at the UC Santa Cruz Genomics Institute. She earned her bachelor of science degree in Biochemistry from Fudan University, Shanghai China, and her Ph.D. in Biological and Medical Informatics from University of California San Francisco. Her research interests include creating genomics data visualizations, building web-based data visualizations tools, and making these tools available to every biologist. She is part the Cancer Genome Atlas (TCGA) research network, NCI’s Informatics Tools for Cancer Research (ITCR) program, the International Cancer Genomics Consortium (ICGC), and Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium.

Talk: UCSC Xena is a web-browser based visual exploration resource for both public and private cancer genomics data, supported through the web-based Xena Browser and multiple turn-key Xena Hubs. This unique architecture allows researchers to view their own data securely, using private Xena Hubs, simultaneously visualizing large public cancer genomics datasets, including TCGA and the GDC. Data integration occurs only within the Xena Browser, keeping private data private. Xena supports visualization of virtually any types of functional genomics data, including SNVs, INDELs, large structural variants, CNV, expression, DNA methylation, ATAC-seq signals, and phenotypic annotations. Browser features include the Visual Spreadsheet, survival analyses, powerful filtering and subgrouping, statistical analyses, genomic signatures, and bookmarks.


Speaker2

Dr. Matthew J. Turk is an Assistant Professor at the School of Information Sciences at UIUC, with an appointment in the department of astronomy. He has worked with the development of tools and the attendant open source communities within domains such as astronomy, crop sciences and materials science. He works on the development of tools and techniques for the analysis and visualization of large-scale volumetric datasets, such as the yt project.

Talk: Analyzing complex, multi-source, multi-format and multi-modal data from astrophysical simulations, observations and theory requires methods for transforming raw numbers into manipulable quantities, and the application of high-level semantic models on top of those quantities. In this talk I will present methods for defining and applying a grammar of analysis to volumetric astrophysical data, and describe the implications this has for visualization, analysis and inference in astrophysics.


Speaker3

Dr. Alexander Bock is an Assistant Professor at Linköping University, Sweden. Prior to this, he has been Moore-Sloan Data Science Fellow with the Center for Data Science at New York University and a Research Fellow with the Scientific Computing and Imaging Institute at the University of Utah. He received his PhD in Visualization and Interaction from Linköping University, Sweden. In 2015, he was a visiting Research Scholar with the Community Coordinated Modeling Center at NASA’s Goddard Space Flight Center, USA. He is also the Development Lead on the open-source Astrovisualization software OpenSpace. Bock was awarded 2014 and 2015 with the Best Scientific Visualization poster and 2017 with the Best Scientific Visualization paper awards at the IEEE Visualization conference for his work in the field of Astrovisualization.​

Talk: I will introduce and demonstrate the NASA/Sweden funded application called OpenSpace. It is an open-source tool for space and astronomy research and communication, as well as a platform for technical visualization research, developed in collaboration between Linköping University, the American Museum of Natural History, NASA’s Community Coordinated Modeling Center, New York University, and the University of Utah. The software is a scalable platform that paves the way for the next generation of public outreach by enabling the same visualization in immersive environments, such as dome theaters and planetariums, and off-the-shelf computer hardware and enables the general public to explore our known universe. The talk will first introduce the software and then conclude with a live demonstration.


Speaker1

Jessica Kendall-Bar is a PhD candidate studying Marine Mammal Physiology and Neuroscience in the Costa and Williams’ labs at UC Santa Cruz. After studying Marine Science and Integrative Biology at UC Berkeley, she began her dissertation research which explores new techniques for monitoring sleep in wild marine mammals. A scientist by training, Jessica's research has spanned a wide range of topics including oceanic geochemistry, octopus behavior, marine arthropod mating behavior, moray eel behavior, human sleep deprivation, and marine mammal neuroscience. However, Jessica believes that scientific progress is futile unless communicated successfully. Jessica’s illustrations, photography, animations, photography, and cinematography aim to accurately portray science and its role in preserving the underwater ecosystem. Her work as a science communication strategist for the Coastal Resilience Lab at UC Santa Cruz on the flood protection benefits of coral reefs and mangroves has been presented to top-level decision makers at forums with leaders across the US, Pacific and Indian Ocean nations. She has illustrated two children’s books, The Castor Oil Rig Tales, published by AzBukiVeri Press, and Looking for Marla. One of her animations illustrates her scientific research on marine mammal sleep and was displayed at Burning Man, a festival that reaches over 50,000 people each year. At the interface of science and art, Jessica endeavors not only to make meaningful discoveries, but also to convey those results broadly and creatively to impact diverse populations within and outside of academia.

Talk: We introduce a creative pipeline to incorporate physiological and behavioral data from contemporary marine mammal research into data-driven animations, leveraging functionality from industry tools and custom scripts to promote scientific insights, public awareness, and conservation outcomes. Our framework can flexibly transform data describing animals’ orientation, position, heart rate, and swimming stroke rate to control the position, rotation, and behavior of 3D models, to render animations, and to drive data sonification. Additionally, we explore the challenges of unifying disparate datasets gathered by an interdisciplinary team of researchers, and outline our design process for creating meaningful data visualization tools and animations. As part of our pipeline, we clean and process raw acceleration and electrophysiological signals to expedite complex multi-stream data analysis and the identification of critical foraging and escape behaviors. We provide details about four animation projects illustrating marine mammal datasets. These animations, commissioned by scientists to achieve outreach and conservation outcomes, have successfully increased the reach and engagement of the scientific projects they describe. These impactful visualizations help scientists identify behavioral responses to disturbance, increase public awareness of human-caused disturbance, and help build momentum for targeted conservation efforts backed by scientific evidence.

 Audience

Faculty and students (primarily from UC Santa Cruz), academic researchers, industry representatives, members of a broader open-source community. We encourage the participants to advertise the event in their circles.

 Organizers

Chair

Oskar Elek organizes and chairs the workshop. He is a postdoctoral researcher in the Creative Coding Lab at University of California, Santa Cruz and an adjunct lecturer therein. He is also a CROSS Incubator Fellow. Oskar holds a PhD from Max Planck Institut Informatik in Saarbrücken, Germany. His past work involves physically based rendering, volumetric optics, and computational fabrication. Currently, he focuses on developing nature inspired models and their application to visualizing and interpreting data from astronomy as well as other domains.


Co-Organizer

Angus G. Forbes co-organizes the workshop. Angus is an Associate Professor of Computational Media and directs the Creative Coding Lab at University of California, Santa Cruz. His research investigates novel techniques for visualizing and interacting with complex scientific information; his interactive artwork has been featured at museums, galleries, and festivals throughout the world. He chaired the IEEE VIS Arts Program (VISAP) from 2013 to 2017 and was the Arts Papers chair for ACM SIGGRAPH in 2018 and chair at ACM SIGGRAPH Art Gallery 2021.


Cross

CROSS stands for the Center for Research in Open Source Software. Founded in 2015 by Carlos Maltzahn and Sage Weil, CROSS supports young researchers in sharing their work in the form of open-source projects with the broader scientific and industrial communities. The annual CROSS Research Symposium brings together faculty, students, domain experts and software developers to share their experience with collaborative research and open software development practices.

© Oskar Elek. All rights reserved.