The make() method defines the computational logic for auto-populated tables (dj.Imported and dj.Computed).
This chapter describes its anatomy, constraints, and the three-part pattern that enables long-running computations while preserving transactional integrity.
Input: The Key¶
The make() method receives a single argument: the key dictionary.
This key identifies which entity to compute—it contains the primary key attributes from the table’s key source.
The key source is automatically determined by DataJoint as the join of all parent tables referenced by foreign keys in the auto-populated table’s primary key, minus entries that already exist:
# For a table with dependencies -> Image and -> BlobParamSet,
# the key source is effectively:
Image.proj() * BlobParamSet.proj() - DetectionEach call to make() processes exactly one key from this source.
The Three Parts¶
A well-structured make() method has three distinct parts:
1. Fetch¶
Retrieve the necessary data from upstream tables using the provided key:
def make(self, key):
# 1. FETCH: Get data from upstream tables
image = (Image & key).fetch1("image_data")
params = (BlobParamSet & key).fetch1()The key restricts each upstream table to exactly the relevant row(s).
Use fetch1() when expecting a single row, fetch() for multiple rows.
Upstream tables are those reachable from the current table by following foreign key references upward through the dependency graph. The fetch step should only access:
Tables that are upstream dependencies (directly or transitively via foreign keys)
Part tables of those upstream tables
This constraint ensures computational reproducibility—the computation depends only on data that logically precedes it in the pipeline.
2. Compute¶
Perform the actual computation or data transformation:
# 2. COMPUTE: Perform the transformation
blobs = detect_blobs(
image,
min_sigma=params["min_sigma"],
max_sigma=params["max_sigma"],
threshold=params["threshold"],
)This is the scientific or business logic—image processing, statistical analysis, simulation, or any transformation that produces derived data. The compute step should be a pure function of the fetched data.
3. Insert¶
Store the results in the table (and any part tables):
# 3. INSERT: Store results
self.insert1({**key, "blob_count": len(blobs)})
self.Blob.insert([{**key, "blob_id": i, **b} for i, b in enumerate(blobs)])The key must be included in the inserted row to maintain referential integrity.
For master-part structures, insert both the master row and all part rows within the same make() call.
Restrictions on Auto-Populated Tables¶
Auto-populated tables (dj.Imported and dj.Computed) enforce important constraints:
No manual insertion: Users cannot insert data into auto-populated tables outside of the
make()method. All data must come through thepopulate()mechanism.Upstream-only fetching: The fetch step should only access tables that are upstream in the pipeline—reachable by following foreign key references from the current table toward its dependencies.
Complete key inclusion: Inserted rows must include the full primary key (the input
keyplus any additional primary key attributes defined in the table).
These constraints ensure:
Reproducibility: Results can be regenerated by re-running
populate()Provenance: Every row traces back to specific upstream data
Consistency: The dependency graph accurately reflects data flow
Complete Example¶
@schema
class Detection(dj.Computed):
definition = """
-> Image
-> BlobParamSet
---
blob_count : int
"""
class Blob(dj.Part):
definition = """
-> master
blob_id : int
---
x : float
y : float
sigma : float
"""
def make(self, key):
# 1. FETCH
image = (Image & key).fetch1("image_data")
params = (BlobParamSet & key).fetch1()
# 2. COMPUTE
blobs = detect_blobs(
image,
min_sigma=params["min_sigma"],
max_sigma=params["max_sigma"],
threshold=params["threshold"],
)
# 3. INSERT
self.insert1({**key, "blob_count": len(blobs)})
self.Blob.insert([{**key, "blob_id": i, **b} for i, b in enumerate(blobs)])Transactional Integrity¶
By default, each make() call executes inside an ACID transaction:
Atomicity — The entire computation either commits or rolls back as a unit
Isolation — Partial results are never visible to other processes
Consistency — The database moves from one valid state to another
The transaction wraps the entire make() execution, including all fetches and inserts.
This guarantees that computed results are correctly associated with their specific inputs.
The Three-Part Pattern for Long Computations¶
For long-running computations (hours or days), holding a database transaction open for the entire duration causes problems:
Database locks block other operations
Transaction timeouts may occur
Resources are held unnecessarily
The three-part make pattern solves this by separating the computation from the transaction:
@schema
class SignalAverage(dj.Computed):
definition = """
-> RawSignal
---
avg_signal : float
"""
def make_fetch(self, key):
"""Step 1: Fetch input data (outside transaction)"""
raw_signal = (RawSignal & key).fetch1("signal")
return (raw_signal,)
def make_compute(self, key, fetched):
"""Step 2: Perform computation (outside transaction)"""
(raw_signal,) = fetched
avg = raw_signal.mean()
return (avg,)
def make_insert(self, key, fetched, computed):
"""Step 3: Insert results (inside brief transaction)"""
(avg,) = computed
self.insert1({**key, "avg_signal": avg})How It Works¶
DataJoint executes the three parts with verification:
fetched = make_fetch(key) # Outside transaction
computed = make_compute(key, fetched) # Outside transaction
<begin transaction>
fetched_again = make_fetch(key) # Re-fetch to verify
if fetched != fetched_again:
<rollback> # Inputs changed—abort
else:
make_insert(key, fetched, computed)
<commit>The key insight: the computation runs outside any transaction, but referential integrity is preserved by re-fetching and verifying inputs before insertion. If upstream data changed during computation, the job is cancelled rather than inserting inconsistent results.
Benefits¶
| Aspect | Standard make() | Three-Part Pattern |
|---|---|---|
| Transaction duration | Entire computation | Only final insert |
| Database locks | Held throughout | Minimal |
| Suitable for | Short computations | Hours/days |
| Integrity guarantee | Transaction | Re-fetch verification |
Generator Syntax Alternative¶
The three-part pattern can also be expressed as a generator, which is more concise:
def make(self, key):
# 1. FETCH
raw_signal = (RawSignal & key).fetch1("signal")
computed = yield (raw_signal,) # Yield fetched data
if computed is None:
# 2. COMPUTE
avg = raw_signal.mean()
computed = (avg,)
yield computed # Yield computed results
# 3. INSERT
(avg,) = computed
self.insert1({**key, "avg_signal": avg})
yield # Signal completionDataJoint automatically detects the generator pattern and handles the three-part execution.
When to Use Each Pattern¶
| Computation Time | Pattern | Rationale |
|---|---|---|
| Seconds to minutes | Standard make() | Simple, transaction overhead acceptable |
| Minutes to hours | Three-part | Avoid long transactions |
| Hours to days | Three-part | Essential for stability |
The three-part pattern trades off fetching data twice for dramatically reduced transaction duration. Use it when computation time significantly exceeds fetch time.