DataJoint pipeline: Data ingestion and processing#
Important
This guide assumes you have installed and configured a DataJoint pipeline.
This guide demonstrates the process of ingesting data from the source and preparing it for querying and further analysis in the Aeon DataJoint pipeline. The three main steps are:
Create a new experiment: Set up a new experiment in the pipeline.
Insert subjects and blocks: Manually input details about the subjects involved in the experiment and specify the blocks of interest.
Run automated ingestion and processing: Run routines to ingest data and process it for querying and analysis.
Note
This guide uses the Single mouse in a foraging assay sample dataset for the experiment named social0.2-aeon3
.
If you are using a different dataset, please make sure the DataJoint pipeline is correctly configured (e.g. the data directory is correctly specified in the DataJoint configuration file (dj_local_conf.json
)).
You should also replace the experiment name and other parameters in the code below accordingly.
from aeon.dj_pipeline import acquisition, subject
from aeon.dj_pipeline.analysis import block_analysis
from aeon.dj_pipeline.create_experiments.create_socialexperiment import (
create_new_social_experiment,
)
from aeon.dj_pipeline.populate.worker import (
AutomatedExperimentIngestion,
acquisition_worker,
analysis_worker,
streams_worker,
)
Step 1 - Create a new experiment#
Insert a new entry for the social0.2-aeon3
experiment into the acquisition.Experiment
table, along with its associated metadata:
experiment_name = "social0.2-aeon3"
create_new_social_experiment(experiment_name)
We can now check that the experiment has been successfully inserted into the acquisition.Experiment
table:
acquisition.Experiment()
experiment_name e.g exp0-aeon3 | experiment_start_time datetime of the start of this experiment | experiment_description | arena_name unique name of the arena (e.g. circular_2m) | lab Abbreviated lab name | location | experiment_type |
---|---|---|---|---|---|---|
social0.2-aeon3 | 2024-03-01 16:46:12 | Social0.2 experiment on AEON3 machine | circle-2m | SWC | AEON3 | social |
Total: 1
We can also check the acquisition.Experiment.Directory
table to see the raw
and processed
directories associated with the experiment:
acquisition.Experiment.Directory()
experiment_name e.g exp0-aeon3 | directory_type | repository_name | directory_path | load_order order of priority to load the directory |
---|---|---|---|---|
social0.2-aeon3 | processed | ceph_aeon | aeon/data/processed/AEON3/social0.2 | 0 |
social0.2-aeon3 | raw | ceph_aeon | aeon/data/raw/AEON3/social0.2 | 1 |
Total: 2
Step 2 - Insert subjects and blocks#
The social0.2-aeon3
experiment involves two subjects:
BAA-1104045
BAA-1104047
Let’s create entries for these subjects and insert them into the subject.Subject
table:
subject_list = [
{
"subject": "BAA-1104045",
"sex": "U",
"subject_birth_date": "2024-01-01",
"subject_description": "Subject for Social 0.2 experiment",
},
{
"subject": "BAA-1104047",
"sex": "U",
"subject_birth_date": "2024-01-01",
"subject_description": "Subject for Social 0.2 experiment",
},
]
subject.Subject.insert(subject_list, skip_duplicates=True)
To associate these subjects with the experiment social0.2-aeon3
:
subject_experiment_list = [
{"experiment_name": "social0.2-aeon3", "subject": "BAA-1104045"},
{"experiment_name": "social0.2-aeon3", "subject": "BAA-1104047"},
]
acquisition.Experiment.Subject.insert(subject_experiment_list, skip_duplicates=True)
We can now check that the subjects have been successfully associated with the experiment social0.2-aeon3
by querying the acquisition.Experiment.Subject
table:
acquisition.Experiment.Subject()
experiment_name e.g exp0-aeon3 | subject |
---|---|
social0.2-aeon3 | BAA-1104045 |
social0.2-aeon3 | BAA-1104047 |
Total: 2
Next, we need to create and insert an entry for a block of interest into the block_analysis.Block
table.
block_data = {
"experiment_name": "social0.2-aeon3",
"block_start": "2024-03-02 12:00:00",
"block_end": "2024-03-02 14:00:00",
"block_duration_hr": 2,
}
block_analysis.Block.insert1(block_data)
Likewise, we can query the block_analysis.Block
table to check that the block has been successfully inserted:
block_analysis.Block()
experiment_name e.g exp0-aeon3 | block_start | block_end | block_duration_hr (hour) |
---|---|---|---|
social0.2-aeon3 | 2024-03-02 12:00:00 | 2024-03-02 14:00:00 | 2.000 |
Total: 1
Step 3 - Data ingestion and processing#
Data ingestion and processing are fully automated through the prepared routines provided below. As DataJoint pipelines are idempotent, these routines can be safely run multiple times without the risk of duplicating or altering existing data.
To initiate the automated data ingestion process for the experiment social0.2-aeon3
, we need to first insert an entry for the experiment into the AutomatedExperimentIngestion
table:
AutomatedExperimentIngestion.insert1(
{"experiment_name": "social0.2-aeon3"}, skip_duplicates=True
)
Ingestion and processing of acquisition-related data for the experiment social0.2-aeon3
can now be initiated by running:
acquisition_worker.run()
Likewise, ingestion and processing of all data streams for the experiment social0.2-aeon3
can be initiated by running:
streams_worker.run()
Finally, for data analysis, run:
analysis_worker.run()
Once the data ingestion and processing routines are complete, we can begin querying the data from the pipeline.