Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The make Method

The make() method defines the computational logic for auto-populated tables (dj.Imported and dj.Computed). This chapter describes its anatomy, constraints, and the three-part pattern that enables long-running computations while preserving transactional integrity.

Input: The Key

The make() method receives a single argument: the key dictionary. This key identifies which entity to compute—it contains the primary key attributes from the table’s key source.

The key source is automatically determined by DataJoint as the join of all parent tables referenced by foreign keys in the auto-populated table’s primary key, minus entries that already exist:

# For a table with dependencies -> Image and -> BlobParamSet,
# the key source is effectively:
Image.proj() * BlobParamSet.proj() - Detection

Each call to make() processes exactly one key from this source.

The Three Parts

A well-structured make() method has three distinct parts:

1. Fetch

Retrieve the necessary data from upstream tables using the provided key:

def make(self, key):
    # 1. FETCH: Get data from upstream tables
    image = (Image & key).fetch1("image_data")
    params = (BlobParamSet & key).fetch1()

The key restricts each upstream table to exactly the relevant row(s). Use fetch1() when expecting a single row, fetch() for multiple rows.

Upstream tables are those reachable from the current table by following foreign key references upward through the dependency graph. The fetch step should only access:

  • Tables that are upstream dependencies (directly or transitively via foreign keys)

  • Part tables of those upstream tables

This constraint ensures computational reproducibility—the computation depends only on data that logically precedes it in the pipeline.

2. Compute

Perform the actual computation or data transformation:

    # 2. COMPUTE: Perform the transformation
    blobs = detect_blobs(
        image,
        min_sigma=params["min_sigma"],
        max_sigma=params["max_sigma"],
        threshold=params["threshold"],
    )

This is the scientific or business logic—image processing, statistical analysis, simulation, or any transformation that produces derived data. The compute step should be a pure function of the fetched data.

3. Insert

Store the results in the table (and any part tables):

    # 3. INSERT: Store results
    self.insert1({**key, "blob_count": len(blobs)})
    self.Blob.insert([{**key, "blob_id": i, **b} for i, b in enumerate(blobs)])

The key must be included in the inserted row to maintain referential integrity. For master-part structures, insert both the master row and all part rows within the same make() call.

Restrictions on Auto-Populated Tables

Auto-populated tables (dj.Imported and dj.Computed) enforce important constraints:

  1. No manual insertion: Users cannot insert data into auto-populated tables outside of the make() method. All data must come through the populate() mechanism.

  2. Upstream-only fetching: The fetch step should only access tables that are upstream in the pipeline—reachable by following foreign key references from the current table toward its dependencies.

  3. Complete key inclusion: Inserted rows must include the full primary key (the input key plus any additional primary key attributes defined in the table).

These constraints ensure:

  • Reproducibility: Results can be regenerated by re-running populate()

  • Provenance: Every row traces back to specific upstream data

  • Consistency: The dependency graph accurately reflects data flow

Complete Example

@schema
class Detection(dj.Computed):
    definition = """
    -> Image
    -> BlobParamSet
    ---
    blob_count : int
    """

    class Blob(dj.Part):
        definition = """
        -> master
        blob_id : int
        ---
        x : float
        y : float
        sigma : float
        """

    def make(self, key):
        # 1. FETCH
        image = (Image & key).fetch1("image_data")
        params = (BlobParamSet & key).fetch1()

        # 2. COMPUTE
        blobs = detect_blobs(
            image,
            min_sigma=params["min_sigma"],
            max_sigma=params["max_sigma"],
            threshold=params["threshold"],
        )

        # 3. INSERT
        self.insert1({**key, "blob_count": len(blobs)})
        self.Blob.insert([{**key, "blob_id": i, **b} for i, b in enumerate(blobs)])

Transactional Integrity

By default, each make() call executes inside an ACID transaction:

  • Atomicity — The entire computation either commits or rolls back as a unit

  • Isolation — Partial results are never visible to other processes

  • Consistency — The database moves from one valid state to another

The transaction wraps the entire make() execution, including all fetches and inserts. This guarantees that computed results are correctly associated with their specific inputs.

The Three-Part Pattern for Long Computations

For long-running computations (hours or days), holding a database transaction open for the entire duration causes problems:

  • Database locks block other operations

  • Transaction timeouts may occur

  • Resources are held unnecessarily

The three-part make pattern solves this by separating the computation from the transaction:

@schema
class SignalAverage(dj.Computed):
    definition = """
    -> RawSignal
    ---
    avg_signal : float
    """

    def make_fetch(self, key):
        """Step 1: Fetch input data (outside transaction)"""
        raw_signal = (RawSignal & key).fetch1("signal")
        return (raw_signal,)

    def make_compute(self, key, fetched):
        """Step 2: Perform computation (outside transaction)"""
        (raw_signal,) = fetched
        avg = raw_signal.mean()
        return (avg,)

    def make_insert(self, key, fetched, computed):
        """Step 3: Insert results (inside brief transaction)"""
        (avg,) = computed
        self.insert1({**key, "avg_signal": avg})

How It Works

DataJoint executes the three parts with verification:

fetched = make_fetch(key)           # Outside transaction
computed = make_compute(key, fetched)  # Outside transaction

<begin transaction>
fetched_again = make_fetch(key)     # Re-fetch to verify
if fetched != fetched_again:
    <rollback>                       # Inputs changed—abort
else:
    make_insert(key, fetched, computed)
    <commit>

The key insight: the computation runs outside any transaction, but referential integrity is preserved by re-fetching and verifying inputs before insertion. If upstream data changed during computation, the job is cancelled rather than inserting inconsistent results.

Benefits

AspectStandard make()Three-Part Pattern
Transaction durationEntire computationOnly final insert
Database locksHeld throughoutMinimal
Suitable forShort computationsHours/days
Integrity guaranteeTransactionRe-fetch verification

Generator Syntax Alternative

The three-part pattern can also be expressed as a generator, which is more concise:

def make(self, key):
    # 1. FETCH
    raw_signal = (RawSignal & key).fetch1("signal")
    computed = yield (raw_signal,)  # Yield fetched data

    if computed is None:
        # 2. COMPUTE
        avg = raw_signal.mean()
        computed = (avg,)
        yield computed  # Yield computed results

    # 3. INSERT
    (avg,) = computed
    self.insert1({**key, "avg_signal": avg})
    yield  # Signal completion

DataJoint automatically detects the generator pattern and handles the three-part execution.

When to Use Each Pattern

Computation TimePatternRationale
Seconds to minutesStandard make()Simple, transaction overhead acceptable
Minutes to hoursThree-partAvoid long transactions
Hours to daysThree-partEssential for stability

The three-part pattern trades off fetching data twice for dramatically reduced transaction duration. Use it when computation time significantly exceeds fetch time.