(target-aeon-dj-pipeline)=
# DataJoint pipeline for Aeon

[DataJoint](datajoint:) is a framework for developing and executing structured data pipelines that organise data into a relational database. 
The Aeon DataJoint pipeline consists of a set of [tables](datajoint:docs/core/datajoint-python/0.14/concepts/principles/) designed to manage, contain, and process data generated by the [Aeon acquisition system](target-aeon-acquisition-reference). 

## Pipeline architecture
The following diagrams provide a high-level overview of the pipeline's components and processes.

(target-aeon-dj-pipeline-acquisition-fig)=
:::{figure} ../../images/datajoint_overview_acquisition_related_diagram.svg
:alt: dj-overview-acquisition
:target: ../../images/datajoint_overview_acquisition_related_diagram.svg
Data acquisition-related tables.
:::

(target-aeon-dj-pipeline-streams-fig)=
:::{figure} ../../images/datajoint_overview_data_stream_diagram.svg
:alt: dj-overview-streams
:target: ../../images/datajoint_overview_data_stream_diagram.svg
Data flow for various data streams.
:::

:::{figure} ../../images/datajoint_overview_pyrat_related_diagram.svg
:alt: dj-overview-pyrat
:target: ../../images/datajoint_overview_pyrat_related_diagram.svg
Pyrat synchronisation process.
:::

(target-aeon-dj-pipeline-analysis-fig)=
:::{figure} ../../images/datajoint_analysis_diagram.svg
:alt: dj-analysis
:target: ../../images/datajoint_analysis_diagram.svg
Analysis tables.
:::

As seen above, the pipeline is structured into hierarchical layers of tables, each classified into one of four tiers based on the origin of the data they contain:

+ `lookup`-tier (grey): Contains reference information defined a priori.
+ `manual`-tier (green): Contains data manually entered by the user.
+ `imported`-tier tables (purple): Contains data ingested from external sources such as raw data files.
+ `computed`-tier tables (red): Contains results from automated pipeline computations.

Data flows through the pipeline in a top-down manner, driven by a combination of ingestion and computation routines. This layered organisation facilitates efficient data processing and modular analysis.

## Core tables
This section provides an overview of the core tables in the Aeon DataJoint pipeline, categorised by their primary function and the type of data they manage.

(target-aeon-dj-pipeline-acquisition-tables)=
### Experiment and data acquisition
+ `aquisition.Experiment` - This table stores meta information about Aeon experiments, including details such as the lab/room where the experiment is conducted, the participating subjects, and the directory storing the raw data.

+ `aquisition.Epoch` - This table records all {term}`aquisition epochs <Acquisition Epoch>`, which are periods reflecting the on/off states of the hardware within the acquisition system, along with their associated configurations for any particular experiment listed in the `aquisition.Experiment` table.

+ `aquisition.Chunk` - The raw data acquired through the [acquisition system](target-aeon-acquisition-reference) is stored as a collection of files at hourly intervals, referred to as a {term}`chunk <Acquisition Chunk>`. 
This table records all time chunks and their associated raw data files for any particular experiment in the `aquisition.Experiment` table. Each chunk must only belong to a single {term}`epoch <Acquisition Epoch>`.

(target-aeon-dj-pipeline-tracking-tables)=
### Video data and position tracking
+ `qc.CameraQC` - This table records the quality control procedures applied to each `streams.SpinnakerVideoSource` (camera) in the experiment, such as identifying dropped frames in the video data.

+ `tracking.SLEAPTracking` - This table records the [SLEAP](sleap:) position tracking of object(s) for each chunk of video recorded from a particular `streams.SpinnakerVideoSource` ([camera device](target-module-camera)). 
Key [part tables](datajoint:docs/core/datajoint-python/0.14/design/tables/master-part/) include:
    - `PoseIdentity` - This table records the procedure that identifies the object being tracked (i.e. assigns an identity) and stores the name of the body part (`anchor_part`) used as the anchor point.
    - `Part` - This table records the x, y positions of all body parts tracked using SLEAP.

(target-aeon-dj-pipeline-analysis-tables)=
### Standard analyses
+ `analysis.visit.Visit` and `analysis.visit.VisitEnd` - These tables record the start and end times of each {term}`visit`, along with the associated {term}`place` and `Subject`.

+ `analysis.block_analysis.Block` - This table records the start and end times of each {term}`block`, along with the associated `acquisition.Experiment`.

+ `analysis.block_analysis.BlockAnalysis` - This table contains a higher-level aggregation of events and metrics occurring within a {term}`block`. Here, patch-related and subject-related metrics are computed separately, providing a detailed view of how different `streams.UndergroundFeeder`(s) and individual `Subject`(s) contribute to the overall experimental outcomes.

+ `analysis.block_analysis.BlockSubjectAnalysis` - This table focuses on the detailed analysis of individual `Subject`(s) within a {term}`block`, and considers each `Subject`'s interactions with each `streams.UndergroundFeeder` (food patch). Metrics such as total interaction time and overall time spent at the patch are computed. Key part tables include:
    - `Patch`: This table records the subject's interactions with a particular food patch, detailing the time spent at the patch, the distance spun on the patch wheel, the number of pellets received, and the timing of pellet deliveries.
    - `Preference`: This table records the subject's preference for each food patch by computing cumulative preference metrics based on time spent at the patch and wheel distance spun.

(target-aeon-dj-pipeline-streams-tables)=
### Data streams
+ `streams.SpinnakerVideoSource` - This table records the placement and operation of a Spinnaker video source in an `aquisition.Experiment`, as well as metadata such as the installation time of the device. It facilitates the collection of video data, including frame counts and timestamps.

+ `streams.RfidReader` - This table records the placement and operation of an RFID reader in an `aquisition.Experiment`, as well as the installation time of the device. It facilitates the collection of RFID data, such as RFID tag detection counts, timestamps, and tag IDs.

+ `streams.WeightScale` - This table records the placement and operation of a weighing scale in an `aquisition.Experiment`, including its installation time and any relevant metadata. It facilitates the collection of raw and filtered weight measurements. 

+ `streams.UndergroundFeeder` - This table records the operation of an underground food patch in an `aquisition.Experiment`, including its installation time and any relevant metadata. It facilitates the collection of various patch events and states, including beam breaks, pellet deliveries, and depletion states.

## Pipeline operation: Auto ingestion and processing

The process begins by entering the meta-information about the experiment, such as the experiment name, participating subjects, cameras, and food patch configurations.

This information can either be input manually or automatically parsed from configuration YAML files. 
To facilitate this, the following scripts handle both manual input and automated parsing for different types of experiments:

  + `aeon.dj_pipeline.create_experiments.create_experiment_02.py`
  + `aeon.dj_pipeline.create_experiments.create_socialexperiment.py`

This is a one-time operation per experiment.
In DataJoint, tables are written within the [`make()`](datajoint:docs/core/datajoint-python/0.13/reproduce/make-method/) method, which generates and inserts new records based on data from upstream tables. The [`populate()`](datajoint:docs/core/datajoint-python/0.14/compute/populate/) method automates the process of calling `make()` for all relevant tables.

The auto-ingestion and processing/computation routines are defined in `aeon.dj_pipeline.populate/worker.py`. Running the auto-processing routine is equivalent to executing the following four commands, which can be performed sequentially or in parallel using different processing threads:

  + `aeon_ingest pyrat_worker`
  + `aeon_ingest acquisition_worker`
  + `aeon_ingest streams_worker`
  + `aeon_ingest analysis_worker`

Since DataJoint operations are idempotent, the aforementioned routine or commands can safely be rerun if needed.

:::{seealso}
The guides on [deploying the Aeon DataJoint pipeline on-premises](target-dj-pipeline-deployment), [data ingestion and processing](target-dj-data-ingestion-processing), and [data querying](target-dj-querying-data).
:::