DataJoint pipeline for Aeon#

DataJoint is a framework for developing and executing structured data pipelines that organise data into a relational database. The Aeon DataJoint pipeline, implemented in the aeon_mecha repository, consists of a set of tables designed to manage, contain, and process data generated by the Aeon acquisition system.

Pipeline architecture#

The following diagrams provide a high-level overview of the pipeline’s components and processes.

dj-overview-acquisition — Data acquisition-related tables.#

dj-overview-streams — Data flow for various data streams.#

dj-overview-pyrat — Pyrat synchronisation process.#

As seen above, the pipeline is structured into hierarchical layers of tables, each classified into one of four tiers based on the origin of the data they contain:

lookup-tier (grey): Contains reference information defined a priori.
manual-tier (green): Contains data manually entered by the user.
imported-tier tables (purple): Contains data ingested from external sources such as raw data files.
computed-tier tables (red): Contains results from automated pipeline computations.

Data flows through the pipeline in a top-down manner, driven by a combination of ingestion and computation routines. This layered organisation facilitates efficient data processing and modular analysis.

Automatic table definition#

Device tables (e.g. streams.UndergroundFeeder) and their corresponding data stream tables (e.g. streams.UndergroundFeederDeliverPellet, streams.UndergroundFeederDepletionState) are dynamically defined and written to the auto-generated aeon.dj_pipeline.streams module (aeon/dj_pipeline/streams.py).

The automatic table definition process is initiated during the ingestion of metadata for each acquisition epoch and proceeds in four stages:

Schema-driven device discovery: Experimental schemas (e.g. social02) defined in aeon/schema/schemas.py specify the available devices and their associated data streams (alongside the reader classes) for each experimental setup.
Metadata integration: For each acquisition epoch, actual device configurations are extracted from the metadata.yml file, which specifies the connected devices, tracking modules, environmental controls, and other system components used during that epoch.
Catalogue population: Schema and metadata information are used to populate the catalogue tables (streams.DeviceType, streams.StreamType, streams.Device), which serve as registries to drive the automatic table definition process. To ensure that tables are generated only for relevant devices, only those present in both the schema and the metadata are included.
Dynamic table definition: aeon/dj_pipeline/utils/streams_maker.py reads from the catalogue tables to define the actual device and stream table classes (e.g. streams.UndergroundFeeder, streams.UndergroundFeederDeliverPellet, streams.UndergroundFeederDepletionState) using predefined templates, and writes them to aeon/dj_pipeline/streams.py.

This automatic table definition system enables the pipeline to handle diverse experimental setups by defining table classes on-demand based on the actual devices present in each experiment. The actual database tables are then automatically created when DataJoint processes the @schema-decorated classes.

Core tables#

This section provides an overview of the core tables in the Aeon DataJoint pipeline, categorised by their primary function and the type of data they manage.

Experiment and data acquisition#

acquisition.Experiment - This table stores meta information about Aeon experiments, including details such as the lab/room where the experiment is conducted, the participating subjects, and the directory storing the raw data.
acquisition.Epoch - This table records all acquisition epochs, which are periods reflecting the on/off states of the hardware within the acquisition system, along with their associated configurations for any particular experiment listed in the acquisition.Experiment table.
acquisition.Chunk - The raw data acquired through the acquisition system is stored as a collection of files at hourly intervals, referred to as a chunk. This table records all time chunks and their associated raw data files for any particular experiment in the acquisition.Experiment table. Each chunk must only belong to a single epoch.

Video data and position tracking#

qc.CameraQC - This table records the quality control procedures applied to each streams.SpinnakerVideoSource (camera) in the experiment, such as identifying dropped frames in the video data.
tracking.SLEAPTracking - This table records the SLEAP position tracking of object(s) for each chunk of video recorded from a particular streams.SpinnakerVideoSource (camera device). Key part tables include:
- PoseIdentity - This table records the procedure that identifies the object being tracked (i.e. assigns an identity) and stores the name of the body part (anchor_part) used as the anchor point.
- Part - This table records the x, y positions of all body parts tracked using SLEAP.

Standard analyses#

analysis.visit.Visit and analysis.visit.VisitEnd - These tables record the start and end times of each visit, along with the associated place and Subject.
analysis.block_analysis.Block - This table records the start and end times of each block, along with the associated acquisition.Experiment.
analysis.block_analysis.BlockAnalysis - This table contains a higher-level aggregation of events and metrics occurring within a block. Here, patch-related and subject-related metrics are computed separately, providing a detailed view of how different streams.UndergroundFeeder(s) and individual Subject(s) contribute to the overall experimental outcomes.
analysis.block_analysis.BlockSubjectAnalysis - This table focuses on the detailed analysis of individual Subject(s) within a block, and considers each Subject’s interactions with each streams.UndergroundFeeder (food patch). Metrics such as total interaction time and overall time spent at the patch are computed. Key part tables include:
- Patch: This table records the subject’s interactions with a particular food patch, detailing the time spent at the patch, the distance spun on the patch wheel, the number of pellets received, and the timing of pellet deliveries.
- Preference: This table records the subject’s preference for each food patch by computing cumulative preference metrics based on time spent at the patch and wheel distance spun.

Data streams#

streams.SpinnakerVideoSource - This table records the placement and operation of a Spinnaker video source in an acquisition.Experiment, as well as metadata such as the installation time of the device. It facilitates the collection of video data, including frame counts and timestamps.
streams.RfidReader - This table records the placement and operation of an RFID reader in an acquisition.Experiment, as well as the installation time of the device. It facilitates the collection of RFID data, such as RFID tag detection counts, timestamps, and tag IDs.
streams.WeightScale - This table records the placement and operation of a weighing scale in an acquisition.Experiment, including its installation time and any relevant metadata. It facilitates the collection of raw and filtered weight measurements.
streams.UndergroundFeeder - This table records the operation of an underground food patch in an acquisition.Experiment, including its installation time and any relevant metadata. It facilitates the collection of various patch events and states, including beam breaks, pellet deliveries, and depletion states.

Pipeline operation: Auto ingestion and processing#

The process begins by entering the meta-information about the experiment, such as the experiment name, participating subjects, cameras, and food patch configurations.

This information can either be input manually or automatically parsed from configuration YAML files. To facilitate this, the following scripts handle both manual input and automated parsing for different types of experiments:

aeon/dj_pipeline/create_experiments/create_experiment_02.py
aeon/dj_pipeline/create_experiments/create_socialexperiment.py

This is a one-time operation per experiment. In DataJoint, tables are written within the make() method, which generates and inserts new records based on data from upstream tables. The populate() method automates the process of calling make() for all relevant tables.

The auto-ingestion and processing/computation routines are defined in aeon/dj_pipeline/populate/worker.py. Running the auto-processing routine is equivalent to executing the following four commands, which can be performed sequentially or in parallel using different processing threads:

aeon_ingest pyrat_worker
aeon_ingest acquisition_worker
aeon_ingest streams_worker
aeon_ingest analysis_worker

Since DataJoint operations are idempotent, the aforementioned routine or commands can safely be rerun if needed.