{ "cells": [ { "cell_type": "markdown", "id": "d4216edb-2492-4c63-b5ea-b7071b7b2b78", "metadata": {}, "source": [ "(target-dj-data-ingestion-processing)=\n", "# DataJoint pipeline: Data ingestion and processing\n", "\n", ":::{important}\n", "This guide assumes you have [installed and configured a DataJoint pipeline](target-dj-pipeline-deployment).\n", ":::\n", "\n", "This guide demonstrates the process of ingesting data from the source and preparing it for querying and further analysis in the [Aeon DataJoint pipeline](target-aeon-dj-pipeline). The three main steps are:\n", "1. [Create a new experiment](#target-dj-data-ingestion-processing-step-1): Set up a new experiment in the pipeline.\n", "2. [Insert subjects and blocks](#target-dj-data-ingestion-processing-step-2): Manually input details about the subjects involved in the experiment and specify the {term}`blocks ` of interest.\n", "3. [Run automated ingestion and processing](#target-dj-data-ingestion-processing-step-3): Run routines to ingest data and process it for querying and analysis.\n", "\n", ":::{note}\n", "This guide uses the [Single mouse in a foraging assay](sample-data-single-mouse-foraging:) sample dataset for the experiment named `social0.2-aeon3`.\n", " \n", "If you are using a different dataset, please make sure the DataJoint pipeline is correctly configured (e.g. the data directory is correctly specified in the DataJoint configuration file (`dj_local_conf.json`)).\n", "You should also replace the experiment name and other parameters in the code below accordingly.\n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "f84f3d3b-a8a3-49a5-aed2-917be953ca7b", "metadata": {}, "outputs": [], "source": [ "from aeon.dj_pipeline import acquisition, subject\n", "from aeon.dj_pipeline.analysis import block_analysis\n", "from aeon.dj_pipeline.create_experiments.create_socialexperiment import (\n", " create_new_social_experiment,\n", ")\n", "from aeon.dj_pipeline.populate.worker import (\n", " AutomatedExperimentIngestion,\n", " acquisition_worker,\n", " analysis_worker,\n", " streams_worker,\n", ")\n" ] }, { "cell_type": "markdown", "id": "9dab1734-83ea-4306-83db-8f6f18d214be", "metadata": {}, "source": [ "(target-dj-data-ingestion-processing-step-1)=\n", "## Step 1 - Create a new experiment\n", "Insert a new entry for the `social0.2-aeon3` experiment into the `acquisition.Experiment` table, along with its associated metadata:" ] }, { "cell_type": "code", "execution_count": null, "id": "9eb99df8-4058-4014-8885-d9a393019e2d", "metadata": {}, "outputs": [], "source": [ "experiment_name = \"social0.2-aeon3\"\n", "create_new_social_experiment(experiment_name)" ] }, { "cell_type": "markdown", "id": "38cdccbe", "metadata": {}, "source": [ "We can now check that the experiment has been successfully inserted into the `acquisition.Experiment` table:" ] }, { "cell_type": "code", "execution_count": null, "id": "819b0d10-1a0b-480f-9762-5a2ccdfa3152", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "

experiment_name

\n", " e.g exp0-aeon3\n", "
\n", "

experiment_start_time

\n", " datetime of the start of this experiment\n", "
\n", "

experiment_description

\n", " \n", "
\n", "

arena_name

\n", " unique name of the arena (e.g. circular_2m)\n", "
\n", "

lab

\n", " Abbreviated lab name\n", "
\n", "

location

\n", " \n", "
\n", "

experiment_type

\n", " \n", "
social0.2-aeon32024-03-01 16:46:12Social0.2 experiment on AEON3 machinecircle-2mSWCAEON3social
\n", " \n", "

Total: 1

\n", " " ], "text/plain": [ "*experiment_name experiment_start_time experiment_description arena_name lab location experiment_type \n", "+-----------------+ +-----------------------+ +--------------------------------------+ +------------+ +-----+ +----------+ +-----------------+\n", "social0.2-aeon3 2024-03-01 16:46:12 Social0.2 experiment on AEON3 machine circle-2m SWC AEON3 social \n", " (Total: 1)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acquisition.Experiment()" ] }, { "cell_type": "markdown", "id": "5747a655", "metadata": {}, "source": [ "We can also check the `acquisition.Experiment.Directory` table to see the `raw` and `processed` directories associated with the experiment:" ] }, { "cell_type": "code", "execution_count": null, "id": "61787328-d3bd-4427-8be5-0f387d98f50d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "

experiment_name

\n", " e.g exp0-aeon3\n", "
\n", "

directory_type

\n", " \n", "
\n", "

repository_name

\n", " \n", "
\n", "

directory_path

\n", " \n", "
\n", "

load_order

\n", " order of priority to load the directory\n", "
social0.2-aeon3processedceph_aeonaeon/data/processed/AEON3/social0.20
social0.2-aeon3rawceph_aeonaeon/data/raw/AEON3/social0.21
\n", " \n", "

Total: 2

\n", " " ], "text/plain": [ "*experiment_name *directory_type repository_name directory_path load_order \n", "+-----------------+ +----------------+ +-----------------+ +-------------------------------------+ +------------+\n", "social0.2-aeon3 processed ceph_aeon aeon/data/processed/AEON3/social0.2 0 \n", "social0.2-aeon3 raw ceph_aeon aeon/data/raw/AEON3/social0.2 1 \n", " (Total: 2)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acquisition.Experiment.Directory()" ] }, { "cell_type": "markdown", "id": "61a29dfc-5f25-49d0-a376-ab70d4853d01", "metadata": {}, "source": [ "(target-dj-data-ingestion-processing-step-2)=\n", "## Step 2 - Insert subjects and blocks\n", "\n", "The `social0.2-aeon3` experiment involves two subjects: \n", "- `BAA-1104045`\n", "- `BAA-1104047`\n", "\n", "Let's create entries for these subjects and insert them into the `subject.Subject` table:" ] }, { "cell_type": "code", "execution_count": null, "id": "fbac420f-d0b8-4226-a47c-8e409f9e361b", "metadata": {}, "outputs": [], "source": [ "subject_list = [\n", " {\n", " \"subject\": \"BAA-1104045\",\n", " \"sex\": \"U\",\n", " \"subject_birth_date\": \"2024-01-01\",\n", " \"subject_description\": \"Subject for Social 0.2 experiment\",\n", " },\n", " {\n", " \"subject\": \"BAA-1104047\",\n", " \"sex\": \"U\",\n", " \"subject_birth_date\": \"2024-01-01\",\n", " \"subject_description\": \"Subject for Social 0.2 experiment\",\n", " },\n", "]\n", "subject.Subject.insert(subject_list, skip_duplicates=True)" ] }, { "cell_type": "markdown", "id": "7e488055", "metadata": {}, "source": [ "To associate these subjects with the experiment `social0.2-aeon3`:" ] }, { "cell_type": "code", "execution_count": null, "id": "92589a7f-cf36-4de9-b36c-02dceafbf341", "metadata": {}, "outputs": [], "source": [ "subject_experiment_list = [\n", " {\"experiment_name\": \"social0.2-aeon3\", \"subject\": \"BAA-1104045\"},\n", " {\"experiment_name\": \"social0.2-aeon3\", \"subject\": \"BAA-1104047\"},\n", "]\n", "acquisition.Experiment.Subject.insert(subject_experiment_list, skip_duplicates=True)" ] }, { "cell_type": "markdown", "id": "3fbc1eac", "metadata": {}, "source": [ "We can now check that the subjects have been successfully associated with the experiment `social0.2-aeon3` by querying the `acquisition.Experiment.Subject` table:" ] }, { "cell_type": "code", "execution_count": null, "id": "b9d37c3c-8f52-4f66-9253-c33fe1437d69", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " the subjects participating in this experiment\n", "
\n", " \n", " \n", " \n", "\n", "\n", "
\n", "

experiment_name

\n", " e.g exp0-aeon3\n", "
\n", "

subject

\n", " \n", "
social0.2-aeon3BAA-1104045
social0.2-aeon3BAA-1104047
\n", " \n", "

Total: 2

\n", " " ], "text/plain": [ "*experiment_na *subject \n", "+------------+ +------------+\n", "social0.2-aeon BAA-1104045 \n", "social0.2-aeon BAA-1104047 \n", " (Total: 2)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acquisition.Experiment.Subject()" ] }, { "cell_type": "markdown", "id": "4b5c39b4", "metadata": {}, "source": [ "Next, we need to create and insert an entry for a {term}`block` of interest into the `block_analysis.Block` table." ] }, { "cell_type": "code", "execution_count": null, "id": "73a16820", "metadata": {}, "outputs": [], "source": [ "block_data = {\n", " \"experiment_name\": \"social0.2-aeon3\",\n", " \"block_start\": \"2024-03-02 12:00:00\",\n", " \"block_end\": \"2024-03-02 14:00:00\",\n", " \"block_duration_hr\": 2,\n", "}\n", "block_analysis.Block.insert1(block_data)" ] }, { "cell_type": "markdown", "id": "51346bee", "metadata": {}, "source": [ "Likewise, we can query the `block_analysis.Block` table to check that the block has been successfully inserted:" ] }, { "cell_type": "code", "execution_count": null, "id": "caf42af6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "\n", "\n", "\n", "
\n", "

experiment_name

\n", " e.g exp0-aeon3\n", "
\n", "

block_start

\n", " \n", "
\n", "

block_end

\n", " \n", "
\n", "

block_duration_hr

\n", " (hour)\n", "
social0.2-aeon32024-03-02 12:00:002024-03-02 14:00:002.000
\n", " \n", "

Total: 1

\n", " " ], "text/plain": [ "*experiment_name *block_start block_end block_duration_hr \n", "+-----------------+ +---------------------+ +---------------------+ +-------------------+\n", "social0.2-aeon3 2024-03-02 12:00:00 2024-03-02 14:00:00 2.000 \n", " (Total: 1)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "block_analysis.Block()" ] }, { "cell_type": "markdown", "id": "9af39820-95b4-42b7-b648-32ae5f47d9a6", "metadata": {}, "source": [ "(target-dj-data-ingestion-processing-step-3)=\n", "## Step 3 - Data ingestion and processing\n", "\n", "Data ingestion and processing are fully automated through the prepared routines provided below. \n", "As DataJoint pipelines are idempotent, these routines can be safely run multiple times without the risk of duplicating or altering existing data." ] }, { "cell_type": "markdown", "id": "d588dcd2", "metadata": {}, "source": [ "To initiate the automated data ingestion process for the experiment `social0.2-aeon3`, we need to first insert an entry for the experiment into the `AutomatedExperimentIngestion` table:" ] }, { "cell_type": "code", "execution_count": 4, "id": "39309e8a-634f-4ed0-9bc2-b3535df1a44d", "metadata": {}, "outputs": [], "source": [ "AutomatedExperimentIngestion.insert1(\n", " {\"experiment_name\": \"social0.2-aeon3\"}, skip_duplicates=True\n", ")" ] }, { "cell_type": "markdown", "id": "a6f92484", "metadata": {}, "source": [ "Ingestion and processing of [acquisition-related data](target-aeon-dj-pipeline-acquisition-fig) for the experiment `social0.2-aeon3` can now be initiated by running:" ] }, { "cell_type": "code", "execution_count": null, "id": "ee8f48df-c355-4900-b99e-7a25c34cc651", "metadata": {}, "outputs": [], "source": [ "acquisition_worker.run()" ] }, { "cell_type": "markdown", "id": "493b9e53", "metadata": {}, "source": [ "Likewise, ingestion and processing of all [data streams](target-aeon-dj-pipeline-streams-fig) for the experiment `social0.2-aeon3` can be initiated by running:" ] }, { "cell_type": "code", "execution_count": null, "id": "ff41d578-6a3c-45f5-89e7-11ec95c084fd", "metadata": {}, "outputs": [], "source": [ "streams_worker.run()" ] }, { "cell_type": "markdown", "id": "a1aee89c", "metadata": {}, "source": [ "Finally, for [data analysis](target-aeon-dj-pipeline-analysis-fig), run:" ] }, { "cell_type": "code", "execution_count": null, "id": "3aaed1a0-f98d-453a-a26a-131834a7d154", "metadata": {}, "outputs": [], "source": [ "analysis_worker.run()" ] }, { "cell_type": "markdown", "id": "754d72c3", "metadata": {}, "source": [ "Once the data ingestion and processing routines are complete, we can begin [querying the data](target-dj-querying-data) from the pipeline." ] } ], "metadata": { "kernelspec": { "display_name": "aeon", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }