Note

Go to the end to download the full example code. or to run this example in your browser via Binder

Preprocessing Sessions#

Note

This is a long-form tutorial on session preprocessing. See here for a quick how-to.

In this how-to we will use the spikewrap.Session interface to manage the preprocessing and visualisation of a dataset.

We will cover:

Loading raw data for a session.
Prototyping / visualising preprocessing steps.
Saving preprocessed data.

Attention

spikewrap’s features are currently limited. See the Roadmap for planned features.

Under the hood, spikewrap uses SpikeInterface to perform all preprocessing steps. See the Supported Preprocessing Steps for details on supported functionality.

Loading Data#

To load a session data, we must instantiate spikewrap.Session object, with the location of the data.

The examples in this tutorial examples use a project in NeuroBlueprint format, (however, some custom formats are supported, see Supported Formats for details).

Let’s say we have a dataset (available at spikewrap.get_example_data_path()) like:

SpikeGLX

└── rawdata/
    └── sub-001/
        └── ses-001/
            └── ephys/
                ├── run-001_g0_imec0/
                │   ├── run-001_g0_t0.imec0.ap.bin
                │   └── run-001_g0_t0.imec0.ap.meta
                └── run-002_g0_imec0/
                    ├── run-002_g0_t0.imec0.ap.bin
                    └── run-002_g0_t0.imec0.ap.meta

OpenEphys

└── rawdata/
    └── sub-001/
        └── ses-001/
            └── ephys/
                └── Recording Node 304/
                    └── experiment1/
                        ├── recording1/
                        │   └── ...
                        └── recording2/
                            └── ...

This dataset is installed with spikewrap so you can run this tutorial locally.

First, we import and instantiate the spikewrap.Session object with the:

subject_path:: Full filepath to the subject folder in which the session is located.
session name:: Name of the session folder.
file_format:: "spikeglx" or "openephys", the acquisition software used.
run_names:: (optional) Default “all" or a list of run folder names to process.
probe:: (optional) Default None. Neuropixels are auto-detected from recording output. Otherwise, a ProbeInterface object. This probe will be set on all runs.

import spikewrap as sw

session = sw.Session(
    subject_path=sw.get_example_data_path() / "rawdata" / "sub-001",
    session_name="ses-001",
    file_format="spikeglx",  # or "openephys"
    run_names="all"
)

session.preprocess(configs="neuropixels+kilosort2_5", concat_runs=True)

The preprocessing options are: {
    "1": [
        "phase_shift",
        {}
    ],
    "2": [
        "bandpass_filter",
        {
            "freq_max": 6000,
            "freq_min": 300
        }
    ],
    "3": [
        "common_reference",
        {
            "operator": "median",
            "reference": "global"
        }
    ]
}
/home/runner/work/spikewrap/spikewrap/spikewrap/process/_loading.py:147: UserWarning: The sessions or runs provided for are not in creation datetime order.
They will be concatenated in the order provided, as:
['run-002_g0_imec0', 'run-001_g0_imec0'].
  warnings.warn(

Loading data from path: /home/runner/work/spikewrap/spikewrap/spikewrap/examples/example_tiny_data/spikeglx/rawdata/sub-001/ses-001/ephys/run-002_g0_imec0

Loading data from path: /home/runner/work/spikewrap/spikewrap/spikewrap/examples/example_tiny_data/spikeglx/rawdata/sub-001/ses-001/ephys/run-001_g0_imec0

Concatenating raw recordings in the following order:['run-002_g0_imec0', 'run-001_g0_imec0']

Due to the magic of SpikeInterface, data loading and most preprocessing functions are ‘lazy’ and will be very fast. Note that nothing is written to disk at this stage.

We can inspect the detected run names with:

print(session.get_raw_run_names())

# and the names of the preprocessed runs (which may change to "concat_run"
# if the runs are concatenated prior to preprocessing:
print(session.get_preprocessed_run_names())

['run-002_g0_imec0', 'run-001_g0_imec0']
['concat_run']

Preprocessing Options#

Defining preprocessing steps can be done in two ways, with a pre-set configuration file or a python dictionar (see Managing Configs for more details).

Briefly, a configuration file related to your data can be used to preprocess by name, as we did above.

We can print the configs used with:

sw.show_configs("neuropixels+kilosort2_5")

The preprocessing options are: {
    "1": [
        "phase_shift",
        {}
    ],
    "2": [
        "bandpass_filter",
        {
            "freq_max": 6000,
            "freq_min": 300
        }
    ],
    "3": [
        "common_reference",
        {
            "operator": "median",
            "reference": "global"
        }
    ]
}


The sorting options are: {
  "kilosort2_5": {
    "car": false,
    "freq_min": 150
  }
}

Otherwise, we can define a dictionary with the steps to pass to spikewrap.Session.preprocess(). Preprocess steps generally take the underlying SpikeInterface function name and parameters, see Supported Preprocessing Steps for details.

configs = {
    "preprocessing": {
        "1": ["phase_shift", {}],
        "2": ["bandpass_filter", {"freq_min": 300, "freq_max": 6000}],
        "3": ["common_reference", {"operator": "median"}],
    }
}

spikewrap.Session.preprocess() will also accept a dictionary with the top-level omitted

pp_steps = {
    "1": ["phase_shift", {}],
    "2": ["bandpass_filter", {"freq_min": 300, "freq_max": 6000}],
    "3": ["common_reference", {"operator": "median"}],
}

Concatenation Arguments#

For multi-run sessions or multi-shank probes, we can set:

per_shank:: If True, split the recordings into separate shanks before preprocessing.
concat runs:: If True, concatenate all runs together before processing.

session.preprocess(
    configs=configs,
    per_shank=True,
    concat_runs=True,
)

The preprocessing options are: {
    "1": [
        "phase_shift",
        {}
    ],
    "2": [
        "bandpass_filter",
        {
            "freq_max": 6000,
            "freq_min": 300
        }
    ],
    "3": [
        "common_reference",
        {
            "operator": "median"
        }
    ]
}

Concatenating raw recordings in the following order:['run-002_g0_imec0', 'run-001_g0_imec0']

Split run: concat_run by shank. There are 2 shanks.

Visualising Preprocessing#

spikewrap can be used to iteratively prototype preprocessing steps by adjusting configurations and arguments, then re-plotting. This can be performed in a Jupyter notebook if desired.

plots = session.plot_preprocessed(
    show=True,
    time_range=(0, 0.5),
    show_channel_ids=False,  # also, "mode"="map" or "line"
)

Session: ses-001, Run: concat_run, shank_0, Session: ses-001, Run: concat_run, shank_1

plots (a dict of matplotlib figures) contains the figures for (optional) further editing.

print(plots)

{'concat_run': <Figure size 2000x600 with 4 Axes>}

Now, let’s update a preprocessing step and plot again:

import copy

pp_attempt_2 = copy.deepcopy(configs)


# This is currently quite verbose. It is the second preprocessing
# step, second element of the list ["function_name", {function_kwargs...}]
# (see processing dictionary defined above)
pp_attempt_2["preprocessing"]["3"][1]["operator"] = "average"

session.preprocess(
    configs=pp_attempt_2,
    per_shank=False,
    concat_runs=False,
)

plots = session.plot_preprocessed(
    time_range=(0, 0.5), show_channel_ids=False, show=True
)

The preprocessing options are: {
    "1": [
        "phase_shift",
        {}
    ],
    "2": [
        "bandpass_filter",
        {
            "freq_max": 6000,
            "freq_min": 300
        }
    ],
    "3": [
        "common_reference",
        {
            "operator": "average"
        }
    ]
}
/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/spikeinterface/widgets/traces.py:150: UserWarning: You have selected a time after the end of the segment. The range will be clipped to 0.4999999999999989
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/spikeinterface/widgets/traces.py:150: UserWarning: You have selected a time after the end of the segment. The range will be clipped to 0.4999999999999989
  warnings.warn(

Save Preprocessing#

When you are ready to save the preprocessed recording with your chosen settings, you can run:

session.save_preprocessed(overwrite=True, n_jobs=6)

Saving data for: run-002_g0_imec0...
write_binary_recording
engine=process - n_jobs=4 - samples_per_chunk=60,000 - chunk_memory=43.95 MiB - total_memory=175.78 MiB - chunk_duration=2.00s

write_binary_recording (workers: 1 processes):   0%|          | 0/1 [00:00<?, ?it/s]
write_binary_recording (workers: 1 processes): 100%|██████████| 1/1 [00:00<00:00,  2.01it/s]
write_binary_recording (workers: 1 processes): 100%|██████████| 1/1 [00:00<00:00,  2.01it/s]

Saving data for: run-001_g0_imec0...
write_binary_recording
engine=process - n_jobs=4 - samples_per_chunk=60,000 - chunk_memory=43.95 MiB - total_memory=175.78 MiB - chunk_duration=2.00s

write_binary_recording (workers: 1 processes):   0%|          | 0/1 [00:00<?, ?it/s]
write_binary_recording (workers: 1 processes): 100%|██████████| 1/1 [00:00<00:00,  2.14it/s]
write_binary_recording (workers: 1 processes): 100%|██████████| 1/1 [00:00<00:00,  2.13it/s]

Attention

On some systems, you may encounter strange behaviour when running multiple jobs (n_jobs > 1), such as non-parallelised steps running more than once.

You may need to wrap your script in a if __name__ == "__main__" block, (if you encounter this problem, you will see an error to this effect).

if __name__ == "__main__":

    import spikewrap as sw
    ...

Using SLURM#

The spikewrap.Session.save_preprocessed step is where all data is preprocessed and written to disk. Therefore, if using a HPC (high-performance computing) system, it may be convenient to run it through the job-scheduler SLURM.

The function takes an argument slurm=True which can be used to save the preprocessing in a SLURM sbatch job.

See the SLURM tutorial for more information.

Attention

SLURM jobs are requested at the run level. For example, if a session has 2 runs (which are not concatenated), spikewrap.Session.save_preprocessed() will request two nodes.

Output data organisation#

See the

Total running time of the script: (0 minutes 4.469 seconds)

Gallery generated by Sphinx-Gallery