All research is becoming data-intensive research
All research is becoming data-intensive research
All research is becoming data-intensive research
Including neuroimaging...
Van Horn and Toga (2014)
The fourth paradigm of science
1. Empirical (experimental)
2. Theoretical (mathematical)
3. Simulation (computational)
4. Data-intensive (eScience)
Jim Gray
Our mission: "All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data... In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields"
Data Science?
Programming and software engineering
Data management
Statistics and machine learning
Data visualization and communication
A focus on reproducibility and openess
New role for data scientists
Facilitate data-intensive research in different fields (inter- and cross- disciplinary)
Focus on methodology
Focus on reproducibility
Contribute to openly available tools, rather than/in addition to peer-reviewed publications
"Career paths for data scientists that recognize and reward contributions in methodology, computation, or development of tools are important."
(From a recent NIH BD2K RFA)
Incubator projects
Focused, intensive, collaborative projects
Data scientists + domain scientists
Results that wouldn't be possible otherwise
Data Science for Social Good
Inspired by DSSG program at U Chicago, GA Tech
10-week internship program
16 DSSG fellows/students
6 high-school students from ALVA program
4 projects (+project leads!)
+ Data scientist mentors
Project Leads: Anjana Sundaram, Neil Roche, Bill & Melinda Gates Foundation
DSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris Suberlak
ALVA Students: Cameron Holt, Xilalit Sanchez
eScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton
Family Trajectories through Programs
Neuroimaging and Data Science
Normal behavior is supported by brain connectivity
Image from Catani and ffytche (2015)
Not just passive cables
Brain connections change with development
Individual differences account for differences in behaviour
Adapt with learning
This has clinical significance
Diffusion MRI
Isotropic diffusion
Diffusion MRI
Anisotropic diffusion
Modeling diffusion
Basser, Mattielo and Le Bihan (1994)
Diffusion statistics
Principal diffusion direction
DIPY: Diffusion MRI in Python
Part of the
NIPY community
Started in 2009 by Eleftherios Garyfallidis
Contributors from at least six different countries and many different labs
Why Python?
The lingua franca of reproducible computational science
Open source
Easy to learn
Come learn Python!
January 7th-8th, the WRF Data Science Studio (Physics/Astronomy building)
Why Python?
The lingua franca of reproducible computational science
Open source
Easy to learn
Phenomenal ecosystem of open-source tools
The scipy & nipy ecosystem
The scipy & nipy ecosystem
The scipy & nipy ecosystem
The scipy & nipy ecosystem
Diffusion MRI: the challenge of validation
A statistical learning approach
In-vivo validation
Measurement #1
Test-retest reliability
Model
Cross-validation
For example
model = dti.TensorModel(gtab)
fit = model.fit(data1)
prediction = fit.predict(gtab)
RMSE = np.sqrt(\ np.mean((prediction - data2) ** 2), -1))
rRMSE = RMSE / np.sqrt(\ np.mean((data1 - data2) ** 2), -1))
Corpus callosum
Corticospinal tract
Superior longitudinal fasciculus
When you've only measured once
k-fold cross-validation
# Use a k of 2
dti_pred = kfold_xval(dti_model, data, 2)
csd_pred = kfold_xval(csd_model, data, 2)
LiFE: Linear Fascicle Evaluation
Forward model from the tracks to the measured signal
Pestilli et al. (2014)
From tracks to diffusion
...
=
Pestilli et al. (2014)
Solve for
>>> X.shape
(10e8, 10e6)
Pestilli et al. (2014)
fiber_model = life.FiberModel(gtab)
fit = fiber_model.fit(data, tracks)
prediction = fit.predict(gtab)
optimized_tracks = tracks[fit.beta>0]
The verical occipital fasciculus - a century old controversy
The verical occipital fasciculus - a century old controversy
Resolved through computational neuroanatomy!
The VOF is strategically located
To transmit information between dorsal and ventral visual areas
Summary
The eScience Institute
The Dipy project
In vivo validation through statistical learning
Come visit the Data Science Studio!
Vision and Cognition Lab UW Psychology November 16, 2015
Data Science meets Neuroscience at the University
Ariel Rokem, University of Washington eScience Institute
Follow along at http://arokem.github.io/2015-11-16-viscog