The make Method - The DataJoint Book

The make() method defines the computational logic for auto-populated tables (dj.Imported and dj.Computed). This chapter describes its anatomy, constraints, and the three-part pattern that enables long-running computations while preserving transactional integrity.

Input: The Key¶

The make() method receives a single argument: the key dictionary. This key identifies which entity to compute—it contains the primary key attributes from the table’s key source.

The key source is automatically determined by DataJoint as the join of all parent tables referenced by foreign keys in the auto-populated table’s primary key, minus entries that already exist:

# For a table with dependencies -> Image and -> BlobParamSet,
# the key source is effectively:
Image.proj() * BlobParamSet.proj() - Detection

Each call to make() processes exactly one key from this source.

The Three Parts¶

A well-structured make() method has three distinct parts:

1. Fetch¶

Retrieve the necessary data from upstream tables using the provided key:

def make(self, key):
    # 1. FETCH: Get data from upstream tables
    image = (Image & key).fetch1("image_data")
    params = (BlobParamSet & key).fetch1()

The key restricts each upstream table to exactly the relevant row(s). Use fetch1() when expecting a single row, fetch() for multiple rows.

Upstream tables are those reachable from the current table by following foreign key references upward through the dependency graph. The fetch step should only access:

Tables that are upstream dependencies (directly or transitively via foreign keys)
Part tables of those upstream tables

This constraint ensures computational reproducibility—the computation depends only on data that logically precedes it in the pipeline.

2. Compute¶

Perform the actual computation or data transformation:

    # 2. COMPUTE: Perform the transformation
    blobs = detect_blobs(
        image,
        min_sigma=params["min_sigma"],
        max_sigma=params["max_sigma"],
        threshold=params["threshold"],
    )

This is the scientific or business logic—image processing, statistical analysis, simulation, or any transformation that produces derived data. The compute step should be a pure function of the fetched data.

3. Insert¶

Store the results in the table (and any part tables):

    # 3. INSERT: Store results
    self.insert1({**key, "blob_count": len(blobs)})
    self.Blob.insert([{**key, "blob_id": i, **b} for i, b in enumerate(blobs)])

The key must be included in the inserted row to maintain referential integrity. For master-part structures, insert both the master row and all part rows within the same make() call.

Restrictions on Auto-Populated Tables¶

Auto-populated tables (dj.Imported and dj.Computed) enforce important constraints:

No manual insertion: Users cannot insert data into auto-populated tables outside of the make() method. All data must come through the populate() mechanism.
Upstream-only fetching: The fetch step should only access tables that are upstream in the pipeline—reachable by following foreign key references from the current table toward its dependencies.
Complete key inclusion: Inserted rows must include the full primary key (the input key plus any additional primary key attributes defined in the table).

These constraints ensure:

Reproducibility: Results can be regenerated by re-running populate()
Provenance: Every row traces back to specific upstream data
Consistency: The dependency graph accurately reflects data flow

Complete Example¶

@schema
class Detection(dj.Computed):
    definition = """
    -> Image
    -> BlobParamSet
    ---
    blob_count : int
    """

    class Blob(dj.Part):
        definition = """
        -> master
        blob_id : int
        ---
        x : float
        y : float
        sigma : float
        """

    def make(self, key):
        # 1. FETCH
        image = (Image & key).fetch1("image_data")
        params = (BlobParamSet & key).fetch1()

        # 2. COMPUTE
        blobs = detect_blobs(
            image,
            min_sigma=params["min_sigma"],
            max_sigma=params["max_sigma"],
            threshold=params["threshold"],
        )

        # 3. INSERT
        self.insert1({**key, "blob_count": len(blobs)})
        self.Blob.insert([{**key, "blob_id": i, **b} for i, b in enumerate(blobs)])

Transactional Integrity¶

By default, each make() call executes inside an ACID transaction:

Atomicity — The entire computation either commits or rolls back as a unit
Isolation — Partial results are never visible to other processes
Consistency — The database moves from one valid state to another

The transaction wraps the entire make() execution, including all fetches and inserts. This guarantees that computed results are correctly associated with their specific inputs.

The Three-Part Pattern for Long Computations¶

For long-running computations (hours or days), holding a database transaction open for the entire duration causes problems:

Database locks block other operations
Transaction timeouts may occur
Resources are held unnecessarily

The three-part make pattern solves this by separating the computation from the transaction:

@schema
class SignalAverage(dj.Computed):
    definition = """
    -> RawSignal
    ---
    avg_signal : float
    """

    def make_fetch(self, key):
        """Step 1: Fetch input data (outside transaction)"""
        raw_signal = (RawSignal & key).fetch1("signal")
        return (raw_signal,)

    def make_compute(self, key, fetched):
        """Step 2: Perform computation (outside transaction)"""
        (raw_signal,) = fetched
        avg = raw_signal.mean()
        return (avg,)

    def make_insert(self, key, fetched, computed):
        """Step 3: Insert results (inside brief transaction)"""
        (avg,) = computed
        self.insert1({**key, "avg_signal": avg})

How It Works¶

DataJoint executes the three parts with verification:

fetched = make_fetch(key)           # Outside transaction
computed = make_compute(key, fetched)  # Outside transaction

<begin transaction>
fetched_again = make_fetch(key)     # Re-fetch to verify
if fetched != fetched_again:
    <rollback>                       # Inputs changed—abort
else:
    make_insert(key, fetched, computed)
    <commit>

The key insight: the computation runs outside any transaction, but referential integrity is preserved by re-fetching and verifying inputs before insertion. If upstream data changed during computation, the job is cancelled rather than inserting inconsistent results.

Benefits¶

Aspect	Standard `make()`	Three-Part Pattern
Transaction duration	Entire computation	Only final insert
Database locks	Held throughout	Minimal
Suitable for	Short computations	Hours/days
Integrity guarantee	Transaction	Re-fetch verification

Generator Syntax Alternative¶

The three-part pattern can also be expressed as a generator, which is more concise:

def make(self, key):
    # 1. FETCH
    raw_signal = (RawSignal & key).fetch1("signal")
    computed = yield (raw_signal,)  # Yield fetched data

    if computed is None:
        # 2. COMPUTE
        avg = raw_signal.mean()
        computed = (avg,)
        yield computed  # Yield computed results

    # 3. INSERT
    (avg,) = computed
    self.insert1({**key, "avg_signal": avg})
    yield  # Signal completion

DataJoint automatically detects the generator pattern and handles the three-part execution.

When to Use Each Pattern¶

Computation Time	Pattern	Rationale
Seconds to minutes	Standard `make()`	Simple, transaction overhead acceptable
Minutes to hours	Three-part	Avoid long transactions
Hours to days	Three-part	Essential for stability

The three-part pattern trades off fetching data twice for dramatically reduced transaction duration. Use it when computation time significantly exceeds fetch time.