DataJoint pipeline: Data ingestion and processing#

Important

This guide assumes you have installed and configured a DataJoint pipeline.

This guide demonstrates the process of ingesting data from the source and preparing it for querying and further analysis in the Aeon DataJoint pipeline. The three main steps are:

  1. Create a new experiment: Set up a new experiment in the pipeline.

  2. Insert subjects and blocks: Manually input details about the subjects involved in the experiment and specify the blocks of interest.

  3. Run automated ingestion and processing: Run routines to ingest data and process it for querying and analysis.

Note

This guide uses the Single mouse in a foraging assay sample dataset for the experiment named social0.2-aeon3.

If you are using a different dataset, please make sure the DataJoint pipeline is correctly configured (e.g. the data directory is correctly specified in the DataJoint configuration file (dj_local_conf.json)). You should also replace the experiment name and other parameters in the code below accordingly.

from aeon.dj_pipeline import acquisition, subject
from aeon.dj_pipeline.analysis import block_analysis
from aeon.dj_pipeline.create_experiments.create_socialexperiment import (
    create_new_social_experiment,
)
from aeon.dj_pipeline.populate.worker import (
    AutomatedExperimentIngestion,
    acquisition_worker,
    analysis_worker,
    streams_worker,
)

Step 1 - Create a new experiment#

Insert a new entry for the social0.2-aeon3 experiment into the acquisition.Experiment table, along with its associated metadata:

experiment_name = "social0.2-aeon3"
create_new_social_experiment(experiment_name)

We can now check that the experiment has been successfully inserted into the acquisition.Experiment table:

acquisition.Experiment()

experiment_name

e.g exp0-aeon3

experiment_start_time

datetime of the start of this experiment

experiment_description

arena_name

unique name of the arena (e.g. circular_2m)

lab

Abbreviated lab name

location

experiment_type

social0.2-aeon3 2024-03-01 16:46:12 Social0.2 experiment on AEON3 machine circle-2m SWC AEON3 social

Total: 1

We can also check the acquisition.Experiment.Directory table to see the raw and processed directories associated with the experiment:

acquisition.Experiment.Directory()

experiment_name

e.g exp0-aeon3

directory_type

repository_name

directory_path

load_order

order of priority to load the directory
social0.2-aeon3 processed ceph_aeon aeon/data/processed/AEON3/social0.2 0
social0.2-aeon3 raw ceph_aeon aeon/data/raw/AEON3/social0.2 1

Total: 2

Step 2 - Insert subjects and blocks#

The social0.2-aeon3 experiment involves two subjects:

  • BAA-1104045

  • BAA-1104047

Let’s create entries for these subjects and insert them into the subject.Subject table:

subject_list = [
    {
        "subject": "BAA-1104045",
        "sex": "U",
        "subject_birth_date": "2024-01-01",
        "subject_description": "Subject for Social 0.2 experiment",
    },
    {
        "subject": "BAA-1104047",
        "sex": "U",
        "subject_birth_date": "2024-01-01",
        "subject_description": "Subject for Social 0.2 experiment",
    },
]
subject.Subject.insert(subject_list, skip_duplicates=True)

To associate these subjects with the experiment social0.2-aeon3:

subject_experiment_list = [
    {"experiment_name": "social0.2-aeon3", "subject": "BAA-1104045"},
    {"experiment_name": "social0.2-aeon3", "subject": "BAA-1104047"},
]
acquisition.Experiment.Subject.insert(subject_experiment_list, skip_duplicates=True)

We can now check that the subjects have been successfully associated with the experiment social0.2-aeon3 by querying the acquisition.Experiment.Subject table:

acquisition.Experiment.Subject()
the subjects participating in this experiment

experiment_name

e.g exp0-aeon3

subject

social0.2-aeon3 BAA-1104045
social0.2-aeon3 BAA-1104047

Total: 2

Next, we need to create and insert an entry for a block of interest into the block_analysis.Block table.

block_data = {
    "experiment_name": "social0.2-aeon3",
    "block_start": "2024-03-02 12:00:00",
    "block_end": "2024-03-02 14:00:00",
    "block_duration_hr": 2,
}
block_analysis.Block.insert1(block_data)

Likewise, we can query the block_analysis.Block table to check that the block has been successfully inserted:

block_analysis.Block()

experiment_name

e.g exp0-aeon3

block_start

block_end

block_duration_hr

(hour)
social0.2-aeon3 2024-03-02 12:00:00 2024-03-02 14:00:00 2.000

Total: 1

Step 3 - Data ingestion and processing#

Data ingestion and processing are fully automated through the prepared routines provided below. As DataJoint pipelines are idempotent, these routines can be safely run multiple times without the risk of duplicating or altering existing data.

To initiate the automated data ingestion process for the experiment social0.2-aeon3, we need to first insert an entry for the experiment into the AutomatedExperimentIngestion table:

AutomatedExperimentIngestion.insert1(
    {"experiment_name": "social0.2-aeon3"}, skip_duplicates=True
)

Ingestion and processing of acquisition-related data for the experiment social0.2-aeon3 can now be initiated by running:

acquisition_worker.run()

Likewise, ingestion and processing of all data streams for the experiment social0.2-aeon3 can be initiated by running:

streams_worker.run()

Finally, for data analysis, run:

analysis_worker.run()

Once the data ingestion and processing routines are complete, we can begin querying the data from the pipeline.