Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Foreign Keys

Foreign Keys: Ensuring Referential Integrity

While entity integrity ensures that each record uniquely represents a real-world entity, referential integrity ensures that relationships between entities are valid and consistent. A foreign key guarantees that you won’t have an employee assigned to a non-existent department or a task associated with a deleted project.

Referential integrity is impossible without entity integrity. You must first have a reliable way to identify unique entities before you can define relationships between them.

What is a Foreign Key?

A foreign key is a column (or set of columns) in a child table that references the primary key of a parent table. This link establishes a relationship between entities and enforces referential integrity by ensuring that references point to valid records.

Foreign Key Characteristics

In DataJoint, foreign keys:

  • Always reference the primary key of the parent table

  • Automatically inherit the parent’s primary key attributes (name and datatype)

  • Create both a referential constraint and a workflow dependency

  • Can be placed in the primary key (above ---) or as secondary attributes (below ---)

DataJoint
SQL
@schema
class Title(dj.Lookup):
    definition = """
    title_code : char(8)       # job title code
    ---
    full_title : varchar(120)  # full title description
    """

@schema
class Employee(dj.Manual):
    definition = """
    person_id : int            # employee identifier
    ---
    first_name : varchar(30)
    last_name : varchar(30)
    -> Title                   # foreign key to Title
    """

The arrow -> Title in DataJoint creates a foreign key from Employee (child) to Title (parent). Notice how the SQL version requires explicit repetition of the column name and datatype—DataJoint handles this automatically.

Referential Integrity + Workflow Dependencies

In DataJoint, foreign keys serve a dual role that extends beyond traditional relational databases:

  1. Referential integrity (like traditional databases): Ensures that child records reference existing parent records

  2. Workflow dependencies (DataJoint’s addition): Prescribes the order of operations—the parent must exist before the child can reference it

This transforms the schema into a directed acyclic graph (DAG) representing valid workflow execution sequences. The foreign key -> Title in Employee not only ensures that each employee has a valid title, but also establishes that titles must be created before employees can be assigned to them.

The Five Effects of a Foreign Key

Foreign keys impose important constraints on data operations. Understanding these effects is essential for designing schemas that maintain integrity.

Effect 1: Schema Embedding

When a foreign key is declared, the primary key columns from the parent become embedded in the child table with matching names and datatypes.

DataJoint
SQL
@schema
class Employee(dj.Manual):
    definition = """
    person_id : int
    ---
    first_name : varchar(30)
    last_name : varchar(30)
    -> Title                   # embeds title_code with type char(8)
    """

# The Employee table now contains:
# person_id (int) - primary key
# first_name (varchar(30))
# last_name (varchar(30))
# title_code (char(8)) - inherited from Title

Effect 2: Insert Restriction on Child

A foreign key ensures that no “orphaned” records are created. An insert into the child table succeeds only if the foreign key value corresponds to an existing primary key in the parent.

The rule: Inserts are restricted in the child, not the parent. You can always add new job titles, but you cannot add an employee with a title_code that doesn’t exist in Title.

DataJoint
SQL
# This works - 'Web-Dev1' exists in Title
Employee.insert1((1, 'Mark', 'Sommers', 'Web-Dev1'))

# This fails - 'BizDev' does not exist in Title
Employee.insert1((2, 'Brenda', 'Means', 'BizDev'))
# IntegrityError: Cannot add or update a child row:
#   a foreign key constraint fails

In DataJoint, this enforces workflow order: The parent entity must be created before the child entity can reference it.

Effect 3: Delete Restriction on Parent

To prevent broken relationships, a parent record cannot be deleted if any child records still reference it.

The rule: Deletes are restricted in the parent, not the child. You can always delete an employee, but you cannot delete a title if employees still have that title.

In standard SQL, this would fail with a constraint error. DataJoint implements cascading delete—it warns you that deleting the parent will also delete all dependent child records, which can cascade through many levels of a deep hierarchy.

In DataJoint, this maintains workflow consistency: When you delete a parent entity, all downstream workflow artifacts that depend on it are also deleted. This ensures computational validity—if the inputs are gone, any results based on those inputs must be removed.

Effect 4: Update Restriction on Keys

Updates to a parent’s primary key or a child’s foreign key are restricted to maintain referential integrity.

DataJoint does not support updating primary key values, as this risks breaking referential integrity in complex scientific workflows. The preferred pattern is to delete the old record and insert a new one with the updated information.

In DataJoint, this preserves workflow immutability: Workflow artifacts are treated as immutable once created. If upstream data changes, the workflow must be re-executed from that point forward.

Effect 5: Automatic Index Creation

A secondary index is automatically created on the foreign key columns in the child table. This index accelerates:

  1. Delete operations: Fast lookup of child records when checking if parent can be deleted

  2. Join operations: Efficient matching of foreign keys to primary keys

  3. Constraint validation: Quick verification during inserts

You don’t need to create this index manually—the database system handles it automatically when the foreign key is declared.

Foreign Key Modifiers

DataJoint provides two modifiers that alter foreign key behavior: nullable and unique. These modifiers control whether the relationship is optional and whether it enforces uniqueness.

The nullable Modifier

By default, foreign key attributes are required (NOT NULL)—every child record must reference a valid parent. The nullable modifier makes the relationship optional, allowing child records to exist without a parent reference.

Nullable Foreign Key Syntax

-> [nullable] ParentTable

This creates foreign key attributes that accept NULL values, indicating “no associated parent.”

DataJoint
SQL
@schema
class Account(dj.Manual):
    definition = """
    account_id : int unsigned     # account identifier
    ---
    -> [nullable] Customer        # optional owner - can be NULL
    open_date : date
    balance : decimal(10,2)
    """

# Accounts can exist without an owner
Account.insert1({
    'account_id': 1001,
    'customer_id': None,          # NULL - no owner assigned
    'open_date': '2024-01-15',
    'balance': 0.00
})

Use cases for nullable foreign keys:

The unique Modifier

By default, a secondary foreign key allows many-to-one relationships—multiple child records can reference the same parent. The unique modifier restricts this to one-to-one—at most one child can reference each parent.

Unique Foreign Key Syntax

-> [unique] ParentTable

This adds a unique constraint on the foreign key attributes, ensuring each parent is referenced by at most one child.

DataJoint
SQL
@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int unsigned
    ---
    full_name : varchar(60)
    """

@schema
class ParkingSpot(dj.Manual):
    definition = """
    spot_number : int unsigned    # parking spot identifier
    ---
    -> [unique] Employee          # at most one spot per employee
    location : varchar(30)
    """

# Each employee can have at most one parking spot
ParkingSpot.insert1({
    'spot_number': 101,
    'employee_id': 1,
    'location': 'Garage A'
})

# This would fail - employee 1 already has a spot
ParkingSpot.insert1({
    'spot_number': 102,
    'employee_id': 1,            # ERROR: duplicate entry
    'location': 'Garage B'
})

Use cases for unique foreign keys:

Combining Modifiers

The nullable and unique modifiers can be combined to create an optional one-to-one relationship:

Combined Modifier Syntax

-> [nullable, unique] ParentTable or -> [unique, nullable] ParentTable

This creates an optional relationship where each parent can be referenced by at most one child (or none).

DataJoint
SQL
@schema
class Account(dj.Manual):
    definition = """
    account_id : int unsigned
    ---
    -> [nullable, unique] Customer   # optional, one account per customer max
    open_date : date
    """

# Account without owner
Account.insert1({'account_id': 1, 'customer_id': None, 'open_date': '2024-01-01'})

# Account with owner - customer 100
Account.insert1({'account_id': 2, 'customer_id': 100, 'open_date': '2024-01-02'})

# This fails - customer 100 already has an account
Account.insert1({'account_id': 3, 'customer_id': 100, 'open_date': '2024-01-03'})
# IntegrityError: Duplicate entry '100' for key 'customer_id'

Modifier Summary

ModifierPlacementEffectUse Case
(none)SecondaryRequired many-to-oneDefault: every child references exactly one parent
nullableSecondary onlyOptional many-to-oneChild may exist without parent reference
uniqueSecondaryRequired one-to-oneEach parent referenced by at most one child
nullable, uniqueSecondary onlyOptional one-to-oneOptional relationship, but exclusive if present

Foreign Key Placement

Where you place a foreign key—above or below the --- line—fundamentally changes its meaning:

PlacementPrimary Key?RelationshipLine Style in Diagram
Above --- (only FK)Yes, entire PKOne-to-one (extension)Thick solid
Above --- (with other attrs)Yes, part of PKOne-to-many (containment)Thin solid
Below ---NoOne-to-many (reference)Dashed
Below --- + uniqueNoOne-to-one (reference)Dashed
DataJoint
SQL
# Foreign key IS the entire primary key (thick solid line)
@schema
class CustomerPreferences(dj.Manual):
    definition = """
    -> Customer              # customer_id IS the primary key
    ---
    theme : varchar(20)
    """

# Foreign key is PART OF primary key (thin solid line)
@schema
class CustomerAccount(dj.Manual):
    definition = """
    -> Customer              # customer_id is part of primary key
    account_num : int        # together they form the primary key
    ---
    balance : decimal(10,2)
    """

# Foreign key is a secondary attribute (dashed line)
@schema
class Order(dj.Manual):
    definition = """
    order_id : int           # order_id is the primary key
    ---
    -> Customer              # customer_id is a secondary attribute
    order_date : date
    """

Association Tables: Many-to-Many Relationships

A single foreign key creates a one-to-many (or one-to-one) relationship. To model many-to-many relationships, use an association table with foreign keys to both entities:

DataJoint
SQL
@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    name : varchar(60)
    """

@schema
class Language(dj.Lookup):
    definition = """
    lang_code : char(4)
    ---
    language_name : varchar(30)
    """

@schema
class Fluency(dj.Manual):
    definition = """
    -> Person                # part of primary key
    -> Language              # part of primary key
    ---
    fluency_level : enum('beginner', 'intermediate', 'fluent')
    """

The Fluency table has a composite primary key combining both foreign keys. This allows:

Summary

Foreign keys ensure referential integrity by linking child tables to parent tables. In DataJoint, they also establish workflow dependencies that prescribe the order of operations.

EffectDescription
Schema EmbeddingParent’s primary key attributes are added to child table
Insert RestrictionChild inserts require valid parent reference
Delete RestrictionParent deletes cascade to remove dependent children
Update RestrictionPrimary/foreign key values cannot be updated in place
Index CreationAutomatic index on foreign key for performance
ModifierSyntaxEffectRestriction
nullable-> [nullable] ParentAllows NULL (no parent)Secondary attributes only
unique-> [unique] ParentOne-to-one relationshipSecondary attributes only
Both-> [nullable, unique] ParentOptional one-to-oneSecondary attributes only