The insert operation adds new entities to Manual tables.
In the context of the Relational Workflow Model, inserting data is how information enters the pipeline from external sources.
Insert in the Workflow¶
The insert operation applies to Manual tables—tables that receive data from outside the pipeline:
| Table Tier | How Data Enters |
|---|---|
| Lookup | contents property (part of schema definition) |
| Manual | insert from external sources |
| Imported/Computed | populate() mechanism |
For Manual tables, each insert represents new information entering the workflow from an external source.
The term “manual” refers to the data’s origin—outside the pipeline—not to how it arrives.
Inserts into Manual tables can come from human data entry, automated scripts parsing instrument files, or integrations with external systems.
What matters is that the pipeline’s populate mechanism does not create this data—it comes from outside.
Each insert into a Manual table potentially triggers downstream computations: when you insert a new session, all Imported and Computed tables that depend on it become candidates for population.
The insert1 Method¶
Use insert1 to add a single row:
<Table>.insert1(row, ignore_extra_fields=False)Parameters:
row: A dictionary with keys matching table attributesignore_extra_fields: IfTrue, extra dictionary keys are silently ignored; ifFalse(default), extra keys raise an error
Example:
# Insert a single subject into a Manual table
Subject.insert1({
'subject_id': 'M001',
'species': 'mouse',
'sex': 'M',
'date_of_birth': '2023-06-15'
})Use insert1 when:
Adding individual records interactively
Processing items one at a time in a loop where you need error handling per item
Debugging, where single-row operations provide clearer error messages
The insert Method¶
Use insert for batch insertion of multiple rows:
<Table>.insert(rows, ignore_extra_fields=False, skip_duplicates=False)Parameters:
rows: A list of dictionaries (or any iterable of dict-like objects)ignore_extra_fields: IfTrue, extra keys are ignoredskip_duplicates: IfTrue, rows with existing primary keys are silently skipped; ifFalse(default), duplicates raise an error
Example:
# Batch insert multiple sessions (could be from a script parsing log files)
Session.insert([
{'subject_id': 'M001', 'session_date': '2024-01-15', 'session_notes': 'baseline'},
{'subject_id': 'M001', 'session_date': '2024-01-16', 'session_notes': 'treatment'},
{'subject_id': 'M001', 'session_date': '2024-01-17', 'session_notes': 'follow-up'},
])Use insert when:
Loading data from files or external sources
Importing from external databases or APIs
Migrating or synchronizing data between systems
Server-Side Insert¶
The insert method can also accept a query as its argument instead of explicit data:
<Table>.insert(query, ignore_extra_fields=False, skip_duplicates=False)When a query is passed to insert, the operation executes entirely on the database server.
The query results are inserted directly into the target table without transferring data to the client.
This is significantly more efficient for large datasets.
Requirements:
The query must produce attributes that match the target table’s attribute names exactly
All required attributes (primary key and non-nullable attributes) must be present in the query
Use
proj()to rename attributes if the source table has different column names
Example:
# Suppose we have a staging table with subject data to import
# Server-side insert directly from query results
Subject.insert(StagingSubjects.proj())
# With attribute renaming to match target table
Subject.insert(
RawData.proj(
subject_id='raw_id',
species='animal_type',
sex='gender'
)
)
# Inserting a filtered subset
ActiveSubjects.insert(
Subject & 'status = "active"'
)Benefits:
Efficiency: Data never leaves the database server—no network transfer overhead
Atomicity: The entire insert executes as a single database operation
Scalability: Handles large datasets that might be impractical to fetch client-side
Use cases:
Populating summary or denormalized tables from existing data
Copying subsets of data between tables
Data migrations within the database
Creating materialized views of query results
This pattern is particularly powerful when combined with DataJoint’s query operations like join, proj, aggr, and restrictions to transform and filter data before insertion.
Referential Integrity¶
DataJoint enforces referential integrity on insert. If a table has foreign key dependencies, the referenced entities must already exist:
# This will fail if subject 'M001' doesn't exist in Subject table
Session.insert1({
'subject_id': 'M001', # Must exist in Subject
'session_date': '2024-01-15'
})This constraint ensures the dependency graph remains valid—you cannot create downstream entities without their upstream dependencies.
Note that Lookup table data (defined via contents) is automatically available when the schema is activated, so foreign key references to Lookup tables are always satisfied.
Best Practices¶
Match insert method to use case: Use
insert1for single records,insertfor batchesKeep
ignore_extra_fields=False(default): Helps catch data mapping errors earlyInsert upstream before downstream: Respect the dependency order defined by foreign keys
Use
skip_duplicates=Truefor idempotent scripts: When re-running import scripts, this avoids errors on existing dataLet
populate()handle auto-populated tables: Never insert directly into Imported or Computed tables