Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Operator: Restriction

The restriction operator is one of the fundamental operations in relational algebra. It selects rows from a table that satisfy specific conditions, allowing you to filter data based on criteria you define.

Understanding Restriction

Restriction selects rows (not columns) from a table based on conditions. It’s the “WHERE clause” equivalent in SQL, but in DataJoint it’s represented by the & operator for inclusion and - operator for exclusion.

Key Concepts

  • Restriction never changes the primary key - the result still has the same entity type as the input

  • Algebraic closure - the result of restriction is still a valid relation that can be used in further operations

  • Entity integrity - restriction preserves the one-to-one correspondence between records and real-world entities

Basic Syntax

# Include rows matching condition
result = Table & condition

# Exclude rows matching condition  
result = Table - condition

Types of Restriction Conditions

1. Dictionary Conditions (Equality)

Use dictionaries for exact equality matches:

# Create example database from the lecture
import datajoint as dj
schema = dj.Schema('languages_demo')

@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    name : varchar(60)
    date_of_birth : date
    """

# Restrict by primary key (returns 0 or 1 record)
person_1 = Person & {'person_id': 1}

# Restrict by secondary attributes (may return multiple records)
millennials = Person & {'name': 'John Doe', date_of_birth': '1990-01-01'}

Key principle: Restricting by primary key always returns at most one record because primary keys are unique.

2. String Conditions (Inequalities and Ranges)

Use strings for more complex conditions:

# Range conditions
gen_z = Person & 'date_of_birth BETWEEN "2000-01-01" AND "2013-12-31"'

# Inequality conditions  
adults = Person & 'date_of_birth < "2005-01-01"'

# Pattern matching
j_names = Person & 'name LIKE "J%"'

3. Subquery Conditions (Semijoins and Antijoins)

The most powerful form of restriction uses one query expression to restrict another. This creates subqueries—queries nested inside other queries. In DataJoint, subqueries emerge naturally when you use query expressions as restriction conditions, effectively creating a semijoin (with &) or antijoin (with -).

@schema
class Language(dj.Lookup):
    definition = """
    lang_code : char(4)
    ---
    language : varchar(30)
    """
    contents = [
        ('en', 'English'),
        ('es', 'Spanish'), 
        ('ja', 'Japanese')
    ]

@schema  
class Fluency(dj.Manual):
    definition = """
    -> Person
    -> Language  
    ---
    fluency_level : enum('beginner', 'intermediate', 'fluent')
    """

# Find people who speak English (semijoin)
english_speakers = Person & (Fluency & {'lang_code': 'en'})

# Find people who do NOT speak English (antijoin)
non_english_speakers = Person - (Fluency & {'lang_code': 'en'})

The inner query (Fluency & {'lang_code': 'en'}) acts as a subquery—its primary key values determine which rows from Person are included in or excluded from the result.

Systematic Query Patterns

Following the lecture approach, let’s examine systematic patterns for building complex restrictions.

Pattern 1: Existence Check (IN)

Find entities that have related records in another table.

# Find A where matching B exists
result = A & B

Example: Students with Enrollments

# Find all students who are enrolled in at least one course
enrolled_students = Student & Enroll

SQL Equivalent:

SELECT * FROM student
WHERE student_id IN (SELECT student_id FROM enroll);

Example: Students with Math Majors

# Find students majoring in math
math_students = Student & (StudentMajor & {'dept': 'MATH'})

Pattern 2: Non-Existence Check (NOT IN)

Find entities that do NOT have related records in another table.

# Find A where no matching B exists
result = A - B

Example: Students Without Enrollments

# Find students who are not enrolled in any course
unenrolled_students = Student - Enroll

SQL Equivalent:

SELECT * FROM student
WHERE student_id NOT IN (SELECT student_id FROM enroll);

Pattern 3: Multiple Conditions (AND)

Find entities that satisfy multiple conditions simultaneously.

# Find A where both B1 and B2 conditions are met
result = A & B1 & B2

Example: Students Speaking Both Languages

# Find people who speak BOTH English AND Spanish
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual = english_speakers & spanish_speakers

SQL Equivalent:

SELECT * FROM person
WHERE person_id IN (
    SELECT person_id FROM fluency WHERE lang_code = 'en'
)
AND person_id IN (
    SELECT person_id FROM fluency WHERE lang_code = 'es'
);

Pattern 4: Either/Or Conditions (OR)

Find entities that satisfy at least one of multiple conditions.

Using List Restriction (for simple OR on the same attribute):

# Find A where condition1 OR condition2
result = A & [condition1, condition2]

Using Union (for OR across different relationships):

# Find A where B1 OR B2 condition is met
result = (A & B1) + (A & B2)

Example: Students in Multiple States

# Find students from California OR New York (simple OR)
coastal_students = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]

# Or using SQL syntax
coastal_students = Student & 'home_state IN ("CA", "NY")'

Pattern 5: Exclusion with Condition

Find entities that have some relationship but NOT a specific variant of it.

# Find A where B exists but B with specific condition does not
result = (A & B) - (B & specific_condition)

Example: Non-Fluent Speakers

# Find people who speak Japanese but are NOT fluent
japanese_speakers = Person & (Fluency & {'lang_code': 'ja'})
fluent_japanese = Person & (Fluency & {'lang_code': 'ja', 'fluency_level': 'fluent'})
non_fluent_japanese = japanese_speakers - fluent_japanese

SQL Equivalent:

SELECT * FROM person
WHERE person_id IN (
    SELECT person_id FROM fluency WHERE lang_code = 'ja'
)
AND person_id NOT IN (
    SELECT person_id FROM fluency 
    WHERE lang_code = 'ja' AND fluency_level = 'fluent'
);

Pattern 6: All-or-Nothing (Universal Quantification)

Find entities where ALL related records meet a condition, or where NO related records fail a condition.

# Find A where ALL related B satisfy condition
# Equivalent to: A with B, minus A with B that doesn't satisfy condition
result = (A & B) - (B - condition)

Example: All-A Students

# Find students who have received ONLY 'A' grades (no non-A grades)
students_with_grades = Student & Grade
students_with_non_a = Student & (Grade - {'grade': 'A'})
all_a_students = students_with_grades - students_with_non_a

SQL Equivalent:

SELECT * FROM student
WHERE student_id IN (SELECT student_id FROM grade)
AND student_id NOT IN (
    SELECT student_id FROM grade WHERE grade <> 'A'
);

Pattern 7: Reverse Perspective

Sometimes you need to flip the perspective—instead of asking about entities, ask about their related entities.

Example: Languages Without Speakers

# Find languages that no one speaks
languages_spoken = Language & Fluency
unspoken_languages = Language - languages_spoken

Example: Courses Without Enrollments

# Find courses with no students enrolled this term
courses_with_enrollment = Course & (Enroll & CurrentTerm)
empty_courses = Course - courses_with_enrollment

Self-Referencing Patterns

Some tables reference themselves through foreign keys, creating hierarchies like management structures or prerequisite chains.

Management Hierarchy Example

Consider a schema where employees can report to other employees:

@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int
    ---
    name : varchar(60)
    """

@schema
class ReportsTo(dj.Manual):
    definition = """
    -> Employee
    ---
    -> Employee.proj(manager_id='employee_id')
    """

Finding Managers

# Employees who have direct reports (are managers)
managers = Employee & ReportsTo.proj(employee_id='manager_id')

Finding Top-Level Managers

# Employees who don't report to anyone
top_managers = Employee - ReportsTo

Finding Non-Managers

# Employees with no direct reports
non_managers = Employee - ReportsTo.proj(employee_id='manager_id')

Building Queries Systematically

Complex queries are best built incrementally. Follow this approach:

Step 1: Identify the Target Entity

What type of entity do you want in your result?

Step 2: List the Conditions

What criteria must the entities satisfy?

Step 3: Build Each Condition as a Query

Create separate query expressions for each condition.

Step 4: Combine with Appropriate Operators

  • Use & for AND conditions

  • Use - for NOT conditions

  • Use + for OR conditions across different paths

Step 5: Test Incrementally

Verify each intermediate result.

Example: Building a Complex Query

Goal: Find CS majors who are enrolled this term but haven’t received any grades yet.

# Step 1: Target entity is Student
# Step 2: Conditions:
#   - Has CS major
#   - Enrolled in current term
#   - No grades in current term

# Step 3: Build each condition
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
enrolled_current = Student & (Enroll & CurrentTerm)
graded_current = Student & (Grade & CurrentTerm)

# Step 4: Combine
result = cs_majors & enrolled_current - graded_current

# Step 5: Verify counts
print(f"CS majors: {len(cs_majors)}")
print(f"Enrolled current term: {len(enrolled_current)}")
print(f"CS majors enrolled, no grades: {len(result)}")

Summary of Patterns

PatternDataJointSQL Equivalent
Existence (IN)A & BWHERE id IN (SELECT ...)
Non-existence (NOT IN)A - BWHERE id NOT IN (SELECT ...)
AND (both conditions)A & B1 & B2WHERE ... AND ...
OR (either condition)(A & B1) + (A & B2)WHERE ... OR ...
Exclusion(A & B) - B_conditionWHERE IN (...) AND NOT IN (...)
Universal (all match)(A & B) - (B - condition)WHERE IN (...) AND NOT IN (NOT condition)

Key principles:

  1. Build incrementally — construct complex queries from simpler parts

  2. Test intermediate results — verify each step before combining

  3. Think in sets — restriction filters sets, not individual records

  4. Primary key is preserved — restrictions never change the entity type

Understanding Query Execution

Order of Operations

Unlike SQL where SELECT and WHERE are in one statement, DataJoint separates concerns:

  1. DataJoint approach:

    result = Person & condition  # Restriction first
    result = result.proj(...)    # Projection second
  2. SQL approach:

    SELECT columns    -- Projection  
    FROM table       
    WHERE condition  -- Restriction (executed first internally)

Primary Key Preservation

Critical concept: Restriction never changes the primary key or entity type.

# All of these have the same primary key: person_id
people = Person                    # Primary key: person_id
english_speakers = Person & (...)  # Primary key: person_id  
gen_z = Person & (...)            # Primary key: person_id

This enables algebraic closure - you can chain restrictions infinitely:

result = Person & condition1 & condition2 & condition3  # Still a Person table

Best Practices

1. Think in Sets and Logic

When designing restrictions, think about:

  • What set am I starting with?

  • What subset do I want?

  • How do I express that mathematically?

2. Build Complex Queries Incrementally

# Start simple
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(english_speakers)}")

# Add complexity  
fluent_english = english_speakers & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(fluent_english)}")

# Add more conditions
gen_z_fluent_english = fluent_english & 'date_of_birth > "2000-01-01"'

3. Understand Foreign Key Relationships

Subqueries work because of foreign key relationships:

# This works because Fluency.person_id references Person.person_id
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})

The restriction automatically matches on the shared attributes (foreign key relationships).

4. Test Your Logic

For complex queries, verify your logic:

# Test: People who speak both English and Spanish
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish

# Verify: Should be subset of both individual sets
assert len(both) <= len(english)
assert len(both) <= len(spanish)

Summary

The restriction operator is fundamental to database querying. Key takeaways:

  1. Restriction selects rows based on conditions

  2. Primary key is preserved - algebraic closure is maintained

  3. Three condition types: dictionaries (equality), strings (inequalities), subqueries (relationships)

  4. Build systematically: Start simple, add complexity incrementally

  5. Think in sets: Use mathematical logic to design queries

  6. Foreign keys enable subqueries: Relationships between tables drive complex restrictions

Master these patterns and you can answer any query that asks “find records where...”

Practice Exercises

Setup: Languages Database

import datajoint as dj
schema = dj.Schema('languages_practice')

@schema
class Language(dj.Lookup):
    definition = """
    lang_code : char(4)
    ---
    language : varchar(30)
    """
    contents = [
        ('ENG', 'English'),
        ('SPA', 'Spanish'),
        ('JPN', 'Japanese'),
        ('TAG', 'Tagalog'),
        ('MAN', 'Mandarin'),
        ('POR', 'Portuguese')
    ]

@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    name : varchar(60)
    date_of_birth : date
    """

@schema
class Fluency(dj.Manual):
    definition = """
    -> Person
    -> Language
    ---
    fluency_level : enum('beginner', 'intermediate', 'fluent')
    """

# Populate with sample data...

Exercise 1: Basic Restrictions

Question: How would you find person with ID 5?

Solution:

person_5 = Person & {'person_id': 5}

Question: How would you find all people born after 2000?

Solution:

gen_z = Person & 'date_of_birth > "2000-01-01"'

Exercise 2: Simple Subqueries

Question: Find all people who speak English.

Solution:

english_speakers = Person & (Fluency & {'lang_code': 'ENG'})

Exercise 3: Negation

Question: Find people who do NOT speak English.

Solution:

non_english_speakers = Person - (Fluency & {'lang_code': 'ENG'})

Exercise 4: Multiple Conditions (AND)

Question: Find people who speak BOTH English AND Spanish.

Solution:

english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
spanish_speakers = Person & (Fluency & {'lang_code': 'SPA'})
bilingual = english_speakers & spanish_speakers

Exercise 5: Multiple Conditions (OR)

Question: Find people who speak English OR Spanish.

Solution:

english_or_spanish = Person & (Fluency & 'lang_code IN ("ENG", "SPA")')

Exercise 6: Complex Negation

Question: Find people who speak Japanese but NOT fluently.

Solution:

japanese_speakers = Person & (Fluency & {'lang_code': 'JPN'})
fluent_japanese = Person & (Fluency & {'lang_code': 'JPN', 'fluency_level': 'fluent'})
japanese_non_fluent = japanese_speakers - fluent_japanese

Exercise 7: Reverse Perspective

Question: Find languages that are NOT spoken by anyone fluently.

Solution:

fluent_records = Fluency & {'fluency_level': 'fluent'}
languages_with_fluent_speakers = Language & fluent_records
languages_without_fluent_speakers = Language - languages_with_fluent_speakers

Exercise 8: All-or-Nothing

Question: Find students who have received only ‘A’ grades.

Solution:

has_grades = Student & Grade
has_non_a = Student & (Grade - {'grade': 'A'})
all_a = has_grades - has_non_a

Exercise 9: Chaining Restrictions

Question: Find Gen Z people who speak English fluently.

Solution:

result = Person & 'date_of_birth > "2000-01-01"' & \
         (Fluency & {'lang_code': 'ENG', 'fluency_level': 'fluent'})

Debugging and Verification Techniques

Test Your Logic

# Always verify your logic makes sense
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish

# Sanity checks:
print(f"English speakers: {len(english)}")
print(f"Spanish speakers: {len(spanish)}")
print(f"Bilingual: {len(both)}")

# Both should be <= each individual set
assert len(both) <= len(english)
assert len(both) <= len(spanish)

Build Incrementally

# Start simple and add complexity
step1 = Person
print(f"All people: {len(step1)}")

step2 = step1 & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(step2)}")

step3 = step2 & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(step3)}")

Common Patterns Summary

  1. Basic inclusion: Table & condition

  2. Basic exclusion: Table - condition

  3. Logical Conjunction (AND-list): Table & cond1 & cond2 & cond3

  4. Logical Disjunction (OR-list): Table & [cond1, cond2, cond3]

These patterns form the building blocks for any restriction query you’ll encounter.