Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Insert

The insert operation adds new entities to Manual tables. In the context of the Relational Workflow Model, inserting data is how information enters the pipeline from external sources.

Insert in the Workflow

The insert operation applies to Manual tables—tables that receive data from outside the pipeline:

Table TierHow Data Enters
Lookupcontents property (part of schema definition)
Manualinsert from external sources
Imported/Computedpopulate() mechanism

For Manual tables, each insert represents new information entering the workflow from an external source. The term “manual” refers to the data’s origin—outside the pipeline—not to how it arrives. Inserts into Manual tables can come from human data entry, automated scripts parsing instrument files, or integrations with external systems. What matters is that the pipeline’s populate mechanism does not create this data—it comes from outside.

Each insert into a Manual table potentially triggers downstream computations: when you insert a new session, all Imported and Computed tables that depend on it become candidates for population.

The insert1 Method

Use insert1 to add a single row:

<Table>.insert1(row, ignore_extra_fields=False)

Parameters:

  • row: A dictionary with keys matching table attributes

  • ignore_extra_fields: If True, extra dictionary keys are silently ignored; if False (default), extra keys raise an error

Example:

# Insert a single subject into a Manual table
Subject.insert1({
    'subject_id': 'M001',
    'species': 'mouse',
    'sex': 'M',
    'date_of_birth': '2023-06-15'
})

Use insert1 when:

  • Adding individual records interactively

  • Processing items one at a time in a loop where you need error handling per item

  • Debugging, where single-row operations provide clearer error messages

The insert Method

Use insert for batch insertion of multiple rows:

<Table>.insert(rows, ignore_extra_fields=False, skip_duplicates=False)

Parameters:

  • rows: A list of dictionaries (or any iterable of dict-like objects)

  • ignore_extra_fields: If True, extra keys are ignored

  • skip_duplicates: If True, rows with existing primary keys are silently skipped; if False (default), duplicates raise an error

Example:

# Batch insert multiple sessions (could be from a script parsing log files)
Session.insert([
    {'subject_id': 'M001', 'session_date': '2024-01-15', 'session_notes': 'baseline'},
    {'subject_id': 'M001', 'session_date': '2024-01-16', 'session_notes': 'treatment'},
    {'subject_id': 'M001', 'session_date': '2024-01-17', 'session_notes': 'follow-up'},
])

Use insert when:

  • Loading data from files or external sources

  • Importing from external databases or APIs

  • Migrating or synchronizing data between systems

Server-Side Insert

The insert method can also accept a query as its argument instead of explicit data:

<Table>.insert(query, ignore_extra_fields=False, skip_duplicates=False)

When a query is passed to insert, the operation executes entirely on the database server. The query results are inserted directly into the target table without transferring data to the client. This is significantly more efficient for large datasets.

Requirements:

  • The query must produce attributes that match the target table’s attribute names exactly

  • All required attributes (primary key and non-nullable attributes) must be present in the query

  • Use proj() to rename attributes if the source table has different column names

Example:

# Suppose we have a staging table with subject data to import
# Server-side insert directly from query results
Subject.insert(StagingSubjects.proj())

# With attribute renaming to match target table
Subject.insert(
    RawData.proj(
        subject_id='raw_id',
        species='animal_type',
        sex='gender'
    )
)

# Inserting a filtered subset
ActiveSubjects.insert(
    Subject & 'status = "active"'
)

Benefits:

  • Efficiency: Data never leaves the database server—no network transfer overhead

  • Atomicity: The entire insert executes as a single database operation

  • Scalability: Handles large datasets that might be impractical to fetch client-side

Use cases:

  • Populating summary or denormalized tables from existing data

  • Copying subsets of data between tables

  • Data migrations within the database

  • Creating materialized views of query results

This pattern is particularly powerful when combined with DataJoint’s query operations like join, proj, aggr, and restrictions to transform and filter data before insertion.

Referential Integrity

DataJoint enforces referential integrity on insert. If a table has foreign key dependencies, the referenced entities must already exist:

# This will fail if subject 'M001' doesn't exist in Subject table
Session.insert1({
    'subject_id': 'M001',  # Must exist in Subject
    'session_date': '2024-01-15'
})

This constraint ensures the dependency graph remains valid—you cannot create downstream entities without their upstream dependencies. Note that Lookup table data (defined via contents) is automatically available when the schema is activated, so foreign key references to Lookup tables are always satisfied.

Best Practices

  1. Match insert method to use case: Use insert1 for single records, insert for batches

  2. Keep ignore_extra_fields=False (default): Helps catch data mapping errors early

  3. Insert upstream before downstream: Respect the dependency order defined by foreign keys

  4. Use skip_duplicates=True for idempotent scripts: When re-running import scripts, this avoids errors on existing data

  5. Let populate() handle auto-populated tables: Never insert directly into Imported or Computed tables