Conversation

rciric

Summary

Continuing #2859

As a step toward implementing nipreps/fmriprep#1458, this PR updates the CompCor algorithm to allow greater flexibility.

List of changes proposed in this PR (pull-request)

  • As before, num_components encodes the number of components to return if it takes an integer value greater than 1.
  • If num_components is all, then CompCor returns all component time series.
  • Alternatively, a mutually exclusive interface option, variance_threshold, allows for automatic selection of components on the basis of their capacity to explain variance in the data. variance_threshold is mutually exclusive with num_components and takes values between 0 and 1.
  • If variance_threshold is set, CompCor returns the minimum number of components necessary to explain the indicated variance in the data, computed as the cumulative sum of (singular_value^2)/sum(singular_values^2).
  • For backward compatibility, if neither num_components nor variance_threshold is defined, then CompCor executes as though num_components were set to 6 (the previous default behaviour).
  • CompCor optionally writes out a metadata table, which includes for each component the source mask, the singular value, the explained variance, and the cumulative explained variance.

Acknowledgment

  • (Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

@codecov-io

Codecov Report

Merging #2878 into master will increase coverage by 0.22%.
The diff coverage is 70.78%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2878      +/-   ##
=========================================
+ Coverage   67.57%   67.8%   +0.22%     
=========================================
  Files         343     344       +1     
  Lines       43645   44494     +849     
  Branches     5428    5557     +129     
=========================================
+ Hits        29494   30168     +674     
- Misses      13447   13606     +159     
- Partials      704     720      +16
FlagCoverage Δ
#smoketests50.38% <12.35%> (-0.1%)⬇️
#unittests65.3% <70.78%> (+0.29%)⬆️
Impacted FilesCoverage Δ
nipype/workflows/rsfmri/fsl/resting.py85.24% <ø> (-0.24%)⬇️
nipype/info.py93.93% <ø> (ø)⬆️
nipype/algorithms/confounds.py67.52% <70.78%> (+1.17%)⬆️
nipype/interfaces/dipy/registration.py93.33% <0%> (-6.67%)⬇️
nipype/interfaces/dipy/base.py76.28% <0%> (-0.3%)⬇️
nipype/interfaces/dipy/stats.py100% <0%> (ø)
nipype/interfaces/dipy/reconstruction.py33.49% <0%> (+0.48%)⬆️
nipype/interfaces/afni/preprocess.py85.42% <0%> (+3.02%)⬆️
nipype/interfaces/dipy/tracks.py37.05% <0%> (+4.66%)⬆️
nipype/interfaces/dipy/preprocess.py29.11% <0%> (+5.58%)⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44ed184...b80a3d7. Read the comment docs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worried about that i variable containing names. Other than that this is shaping up!

@rciric

Interestingly, I'm now getting

ValueError: Inputs for CompCor, timeseries and mask, do not have matching spatial dimensions ((64, 64, 46) and (64, 64, 46, 1), respectively)

for one of my test masks. Probably an oddity on my end -- I didn't add the squeeze back in, since it seemed to be causing trouble earlier.

@rciric

Looks like the empty mask is now breaking cosine_filter -- I'll try and fix that next.

@effigies

@rciric Check out nibabel.squeeze_image.

@mgxd

hi @rciric - do you have a moment to finish this up sometime this week? We hope to release next week and would love to get this in.

@rciric

Happy to do what it takes to get this over the finish line. It looks like I'm getting errors related to diffusion processing now -- specifically an import failure from dipy.workflows.tracking for DetTrackPAMFlow. The only workflows in dipy.workflows.tracking are PFTrackingPAMFlow and LocalFiberTrackingPAMFlow. I guess it's now LocalFiberTrackingPAMFlow with sh_strategy="deterministic"?

@mgxd

@rciric yeah that is unrelated to this PR - I just merged #2903 which should fix it so if you pull the latest master it should be good.

@oesteban

@rciric what is the status of this PR? /cc @mgxd @effigies

@rciric

I tested it with my fmriprep install before the latest push, so I think we just need verification as to whether it works from reviewers. Additional feedback is also always welcome.

@oesteban

@rciric are there any public tests (i.e. CircleCI) where we can check how this is playing along with fMRIPrep?

@oesteban

Okay, so it seems this PR is working out as expected (https://circleci.com/workflow-run/3c1ae1ca-27db-43e6-897e-54cc5d323dcb, the red light on ds005 is unrelated to this PR).

If @rciric promises that the correlation between a_comp_cor_xx regressors does not pertain to this PR and should be addressed in niworkflows/fmriprep context then we can go ahead with this. Could you double check whether we get the same outputs if you force the CompCor interfaces in fmriprep to behave like the former implementation (ie. leave both num_components and variance_threshold undefined)?

Do you want to have a final review @effigies, @mgxd ?

@satra, it would be great if you could take a quick look, we don't want to regret having this PR merged in the future.

@satra

@rciric and @oesteban - this looks good to me!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No serious complaints, but a few minor suggestions. Feel free to ignore if you just want to get this in.

@mgxd @satra Just a note on the #2913 situation here. Do you agree with my proposed traits constraint, or did you have something else in mind?


components_data = [line.split('\t') for line in components_file]
components_data = [re.sub('\n', '', line).split('\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just stripping the newline? If so, re is overly heavy machinery.

Suggested change
components_data = [re.sub('\n', '', line).split('\t')
components_data = [line.rstrip().split('\t')

assert os.path.getsize(expected_metadata_file) > 0

with open(ccresult.outputs.metadata_file, 'r') as metadata_file:
components_metadata = [re.sub('\n', '', line).split('\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if I understand the purpose:

Suggested change
components_metadata = [re.sub('\n', '', line).split('\t')
components_metadata = [line.rstrip().split('\t')

@@ -141,7 +141,8 @@ def get_nipype_gitversion():
'numpy>=%s ; python_version >= "3.7"' % NUMPY_MIN_VERSION_37,
'python-dateutil>=%s' % DATEUTIL_MIN_VERSION,
'scipy>=%s' % SCIPY_MIN_VERSION,
'traits>=%s' % TRAITS_MIN_VERSION,
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is #2913. The fix should be just to skip 5.0, and we can do that for all versions.

Suggested change
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'),
'traits>=%s,!=5.0' % TRAITS_MIN_VERSION,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, though we should probably only restrict 5.0 if the user is running py27 - to my knowledge everything was working fine with py3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here (this seems simpler than splitting out by Python version, but yes there may be someone who really needs Py3 + traits 5.0), but I think maybe let's make that change after this is merged?

@@ -141,7 +141,8 @@ def get_nipype_gitversion():
'numpy>=%s ; python_version >= "3.7"' % NUMPY_MIN_VERSION_37,
'python-dateutil>=%s' % DATEUTIL_MIN_VERSION,
'scipy>=%s' % SCIPY_MIN_VERSION,
'traits>=%s' % TRAITS_MIN_VERSION,
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'),
'traits>=%s ; python_version >= "3.0"' % TRAITS_MIN_VERSION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'traits>=%s ; python_version >= "3.0"' % TRAITS_MIN_VERSION,

@@ -1,6 +1,7 @@
# emacs: -*- mode: python; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
import os
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestions below were prompted by a suspicion this might be heavier than needed. If you accept those changes, also revert:

Suggested change
import re

raise ValueError('No components found')
components = np.ones((M.shape[0], num_components), dtype=np.float32) * np.nan
return components, basis
components = np.full((M.shape[0], num_components), np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally promoting this to a np.float64?

np.hstack((metadata['cumulative_variance_explained'],
cumulative_variance_explained)))
metadata['retained'] = (
metadata['retained'] + [i < num_components for i in range(len(s))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these can get large and you do more than 2 or 3 loops, this will be significantly less efficient than aggregating these things into lists and concatenating/stacking them at the end. e.g.,

# Outside loop
md_mask = []
md_sv = []
md_var = []
md_cumvar = []
md_retained = []
for ...
    # here
    md_mask.append([name] * len(s))
    md_sv.append(s)
    md_var.append(variance_explained)
    md_cumvar.append(cumulative_variance_explained)
    # The lack of square brackets is intentional
    md_retained.append(i < num_components for i in range(len(s)))
# below
metadata = {
    'mask': list(itertools.chain(*md_mask)),
    'singular_value': np.hstack(md_sv),
    'variance_explained': np.hstack(md_var),
    'cumulative_variance_explained': np.hstack(md_cumvar),
    'retained': list(itertools.chain(*md_retained))}

Numpy array containing the requested set of noise components
basis: numpy array
Numpy array containing the (non-constant) filter regressors
metadata: OrderedDict{str: numpy array}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an OrderedDict as opposed to a dict?

@rciric

Hello, and thanks for the comments! I think they've cleaned up the code considerably -- I've updated the code to reflect those comments with the following exceptions:

  • I wasn't able to get md_retained working as a generator on my local test machine, so I ended up using a list.
  • I am using an OrderedDict because the call that prints the metadata to TSV currently requires its entries to be in a particular order.

@effigies

Oh, looks like tests are failing. Sorry if I've led you astray.

@rciric

No, this is totally me not updating my own unit tests.

@effigieseffigies merged commit d353f0d into nipy:master Apr 29, 2019
@oesteban

yay! @rciric

@effigieseffigies mentioned this pull request May 8, 2019
7 tasks
Sign up for free to join this conversation on . Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.