DataJoint pipeline for Aeon#
DataJoint is a framework for developing and executing structured data pipelines that organise data into a relational database. The Aeon DataJoint pipeline consists of a set of tables designed to manage, contain, and process data generated by the Aeon acquisition system.
Pipeline architecture#
The following diagrams provide a high-level overview of the pipeline’s components and processes.
Data acquisition-related tables.#
Data flow for various data streams.#
Pyrat synchronisation process.#
Analysis tables.#
As seen above, the pipeline is structured into hierarchical layers of tables, each classified into one of four tiers based on the origin of the data they contain:
lookup
-tier (grey): Contains reference information defined a priori.manual
-tier (green): Contains data manually entered by the user.imported
-tier tables (purple): Contains data ingested from external sources such as raw data files.computed
-tier tables (red): Contains results from automated pipeline computations.
Data flows through the pipeline in a top-down manner, driven by a combination of ingestion and computation routines. This layered organisation facilitates efficient data processing and modular analysis.
Core tables#
This section provides an overview of the core tables in the Aeon DataJoint pipeline, categorised by their primary function and the type of data they manage.
Experiment and data acquisition#
aquisition.Experiment
- This table stores meta information about Aeon experiments, including details such as the lab/room where the experiment is conducted, the participating subjects, and the directory storing the raw data.aquisition.Epoch
- This table records all aquisition epochs, which are periods reflecting the on/off states of the hardware within the acquisition system, along with their associated configurations for any particular experiment listed in theaquisition.Experiment
table.aquisition.Chunk
- The raw data acquired through the acquisition system is stored as a collection of files at hourly intervals, referred to as a chunk. This table records all time chunks and their associated raw data files for any particular experiment in theaquisition.Experiment
table. Each chunk must only belong to a single epoch.
Video data and position tracking#
qc.CameraQC
- This table records the quality control procedures applied to eachstreams.SpinnakerVideoSource
(camera) in the experiment, such as identifying dropped frames in the video data.tracking.SLEAPTracking
- This table records the SLEAP position tracking of object(s) for each chunk of video recorded from a particularstreams.SpinnakerVideoSource
(camera device). Key part tables include:PoseIdentity
- This table records the procedure that identifies the object being tracked (i.e. assigns an identity) and stores the name of the body part (anchor_part
) used as the anchor point.Part
- This table records the x, y positions of all body parts tracked using SLEAP.
Standard analyses#
analysis.visit.Visit
andanalysis.visit.VisitEnd
- These tables record the start and end times of each visit, along with the associated place andSubject
.analysis.block_analysis.Block
- This table records the start and end times of each block, along with the associatedacquisition.Experiment
.analysis.block_analysis.BlockAnalysis
- This table contains a higher-level aggregation of events and metrics occurring within a block. Here, patch-related and subject-related metrics are computed separately, providing a detailed view of how differentstreams.UndergroundFeeder
(s) and individualSubject
(s) contribute to the overall experimental outcomes.analysis.block_analysis.BlockSubjectAnalysis
- This table focuses on the detailed analysis of individualSubject
(s) within a block, and considers eachSubject
’s interactions with eachstreams.UndergroundFeeder
(food patch). Metrics such as total interaction time and overall time spent at the patch are computed. Key part tables include:Patch
: This table records the subject’s interactions with a particular food patch, detailing the time spent at the patch, the distance spun on the patch wheel, the number of pellets received, and the timing of pellet deliveries.Preference
: This table records the subject’s preference for each food patch by computing cumulative preference metrics based on time spent at the patch and wheel distance spun.
Data streams#
streams.SpinnakerVideoSource
- This table records the placement and operation of a Spinnaker video source in anaquisition.Experiment
, as well as metadata such as the installation time of the device. It facilitates the collection of video data, including frame counts and timestamps.streams.RfidReader
- This table records the placement and operation of an RFID reader in anaquisition.Experiment
, as well as the installation time of the device. It facilitates the collection of RFID data, such as RFID tag detection counts, timestamps, and tag IDs.streams.WeightScale
- This table records the placement and operation of a weighing scale in anaquisition.Experiment
, including its installation time and any relevant metadata. It facilitates the collection of raw and filtered weight measurements.streams.UndergroundFeeder
- This table records the operation of an underground food patch in anaquisition.Experiment
, including its installation time and any relevant metadata. It facilitates the collection of various patch events and states, including beam breaks, pellet deliveries, and depletion states.
Pipeline operation: Auto ingestion and processing#
The process begins by entering the meta-information about the experiment, such as the experiment name, participating subjects, cameras, and food patch configurations.
This information can either be input manually or automatically parsed from configuration YAML files. To facilitate this, the following scripts handle both manual input and automated parsing for different types of experiments:
aeon.dj_pipeline.create_experiments.create_experiment_02.py
aeon.dj_pipeline.create_experiments.create_socialexperiment.py
This is a one-time operation per experiment.
In DataJoint, tables are written within the make()
method, which generates and inserts new records based on data from upstream tables. The populate()
method automates the process of calling make()
for all relevant tables.
The auto-ingestion and processing/computation routines are defined in aeon.dj_pipeline.populate/worker.py
. Running the auto-processing routine is equivalent to executing the following four commands, which can be performed sequentially or in parallel using different processing threads:
aeon_ingest pyrat_worker
aeon_ingest acquisition_worker
aeon_ingest streams_worker
aeon_ingest analysis_worker
Since DataJoint operations are idempotent, the aforementioned routine or commands can safely be rerun if needed.
See also
The guides on deploying the Aeon DataJoint pipeline on-premises, data ingestion and processing, and data querying.