The restriction operator is one of the fundamental operations in relational algebra. It selects rows from a table that satisfy specific conditions, allowing you to filter data based on criteria you define.
Understanding Restriction¶
Restriction selects rows (not columns) from a table based on conditions. It’s the “WHERE clause” equivalent in SQL, but in DataJoint it’s represented by the & operator for inclusion and - operator for exclusion.
Key Concepts¶
Restriction never changes the primary key - the result still has the same entity type as the input
Algebraic closure - the result of restriction is still a valid relation that can be used in further operations
Entity integrity - restriction preserves the one-to-one correspondence between records and real-world entities
Basic Syntax¶
# Include rows matching condition
result = Table & condition
# Exclude rows matching condition
result = Table - conditionTypes of Restriction Conditions¶
1. Dictionary Conditions (Equality)¶
Use dictionaries for exact equality matches:
# Create example database from the lecture
import datajoint as dj
schema = dj.Schema('languages_demo')
@schema
class Person(dj.Manual):
definition = """
person_id : int
---
name : varchar(60)
date_of_birth : date
"""
# Restrict by primary key (returns 0 or 1 record)
person_1 = Person & {'person_id': 1}
# Restrict by secondary attributes (may return multiple records)
millennials = Person & {'name': 'John Doe', date_of_birth': '1990-01-01'}Key principle: Restricting by primary key always returns at most one record because primary keys are unique.
2. String Conditions (Inequalities and Ranges)¶
Use strings for more complex conditions:
# Range conditions
gen_z = Person & 'date_of_birth BETWEEN "2000-01-01" AND "2013-12-31"'
# Inequality conditions
adults = Person & 'date_of_birth < "2005-01-01"'
# Pattern matching
j_names = Person & 'name LIKE "J%"'3. Subquery Conditions (Semijoins and Antijoins)¶
The most powerful form of restriction uses one query expression to restrict another. This creates subqueries—queries nested inside other queries. In DataJoint, subqueries emerge naturally when you use query expressions as restriction conditions, effectively creating a semijoin (with &) or antijoin (with -).
@schema
class Language(dj.Lookup):
definition = """
lang_code : char(4)
---
language : varchar(30)
"""
contents = [
('en', 'English'),
('es', 'Spanish'),
('ja', 'Japanese')
]
@schema
class Fluency(dj.Manual):
definition = """
-> Person
-> Language
---
fluency_level : enum('beginner', 'intermediate', 'fluent')
"""
# Find people who speak English (semijoin)
english_speakers = Person & (Fluency & {'lang_code': 'en'})
# Find people who do NOT speak English (antijoin)
non_english_speakers = Person - (Fluency & {'lang_code': 'en'})The inner query (Fluency & {'lang_code': 'en'}) acts as a subquery—its primary key values determine which rows from Person are included in or excluded from the result.
Systematic Query Patterns¶
Following the lecture approach, let’s examine systematic patterns for building complex restrictions.
Pattern 1: Existence Check (IN)¶
Find entities that have related records in another table.
# Find A where matching B exists
result = A & BExample: Students with Enrollments
# Find all students who are enrolled in at least one course
enrolled_students = Student & EnrollSQL Equivalent:
SELECT * FROM student
WHERE student_id IN (SELECT student_id FROM enroll);Example: Students with Math Majors
# Find students majoring in math
math_students = Student & (StudentMajor & {'dept': 'MATH'})Pattern 2: Non-Existence Check (NOT IN)¶
Find entities that do NOT have related records in another table.
# Find A where no matching B exists
result = A - BExample: Students Without Enrollments
# Find students who are not enrolled in any course
unenrolled_students = Student - EnrollSQL Equivalent:
SELECT * FROM student
WHERE student_id NOT IN (SELECT student_id FROM enroll);Pattern 3: Multiple Conditions (AND)¶
Find entities that satisfy multiple conditions simultaneously.
# Find A where both B1 and B2 conditions are met
result = A & B1 & B2Example: Students Speaking Both Languages
# Find people who speak BOTH English AND Spanish
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual = english_speakers & spanish_speakersSQL Equivalent:
SELECT * FROM person
WHERE person_id IN (
SELECT person_id FROM fluency WHERE lang_code = 'en'
)
AND person_id IN (
SELECT person_id FROM fluency WHERE lang_code = 'es'
);Pattern 4: Either/Or Conditions (OR)¶
Find entities that satisfy at least one of multiple conditions.
Using List Restriction (for simple OR on the same attribute):
# Find A where condition1 OR condition2
result = A & [condition1, condition2]Using Union (for OR across different relationships):
# Find A where B1 OR B2 condition is met
result = (A & B1) + (A & B2)Example: Students in Multiple States
# Find students from California OR New York (simple OR)
coastal_students = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]
# Or using SQL syntax
coastal_students = Student & 'home_state IN ("CA", "NY")'Pattern 5: Exclusion with Condition¶
Find entities that have some relationship but NOT a specific variant of it.
# Find A where B exists but B with specific condition does not
result = (A & B) - (B & specific_condition)Example: Non-Fluent Speakers
# Find people who speak Japanese but are NOT fluent
japanese_speakers = Person & (Fluency & {'lang_code': 'ja'})
fluent_japanese = Person & (Fluency & {'lang_code': 'ja', 'fluency_level': 'fluent'})
non_fluent_japanese = japanese_speakers - fluent_japaneseSQL Equivalent:
SELECT * FROM person
WHERE person_id IN (
SELECT person_id FROM fluency WHERE lang_code = 'ja'
)
AND person_id NOT IN (
SELECT person_id FROM fluency
WHERE lang_code = 'ja' AND fluency_level = 'fluent'
);Pattern 6: All-or-Nothing (Universal Quantification)¶
Find entities where ALL related records meet a condition, or where NO related records fail a condition.
# Find A where ALL related B satisfy condition
# Equivalent to: A with B, minus A with B that doesn't satisfy condition
result = (A & B) - (B - condition)Example: All-A Students
# Find students who have received ONLY 'A' grades (no non-A grades)
students_with_grades = Student & Grade
students_with_non_a = Student & (Grade - {'grade': 'A'})
all_a_students = students_with_grades - students_with_non_aSQL Equivalent:
SELECT * FROM student
WHERE student_id IN (SELECT student_id FROM grade)
AND student_id NOT IN (
SELECT student_id FROM grade WHERE grade <> 'A'
);Pattern 7: Reverse Perspective¶
Sometimes you need to flip the perspective—instead of asking about entities, ask about their related entities.
Example: Languages Without Speakers
# Find languages that no one speaks
languages_spoken = Language & Fluency
unspoken_languages = Language - languages_spokenExample: Courses Without Enrollments
# Find courses with no students enrolled this term
courses_with_enrollment = Course & (Enroll & CurrentTerm)
empty_courses = Course - courses_with_enrollmentSelf-Referencing Patterns¶
Some tables reference themselves through foreign keys, creating hierarchies like management structures or prerequisite chains.
Management Hierarchy Example¶
Consider a schema where employees can report to other employees:
@schema
class Employee(dj.Manual):
definition = """
employee_id : int
---
name : varchar(60)
"""
@schema
class ReportsTo(dj.Manual):
definition = """
-> Employee
---
-> Employee.proj(manager_id='employee_id')
"""Finding Managers¶
# Employees who have direct reports (are managers)
managers = Employee & ReportsTo.proj(employee_id='manager_id')Finding Top-Level Managers¶
# Employees who don't report to anyone
top_managers = Employee - ReportsToFinding Non-Managers¶
# Employees with no direct reports
non_managers = Employee - ReportsTo.proj(employee_id='manager_id')Building Queries Systematically¶
Complex queries are best built incrementally. Follow this approach:
Step 1: Identify the Target Entity¶
What type of entity do you want in your result?
Step 2: List the Conditions¶
What criteria must the entities satisfy?
Step 3: Build Each Condition as a Query¶
Create separate query expressions for each condition.
Step 4: Combine with Appropriate Operators¶
Use
&for AND conditionsUse
-for NOT conditionsUse
+for OR conditions across different paths
Step 5: Test Incrementally¶
Verify each intermediate result.
Example: Building a Complex Query¶
Goal: Find CS majors who are enrolled this term but haven’t received any grades yet.
# Step 1: Target entity is Student
# Step 2: Conditions:
# - Has CS major
# - Enrolled in current term
# - No grades in current term
# Step 3: Build each condition
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
enrolled_current = Student & (Enroll & CurrentTerm)
graded_current = Student & (Grade & CurrentTerm)
# Step 4: Combine
result = cs_majors & enrolled_current - graded_current
# Step 5: Verify counts
print(f"CS majors: {len(cs_majors)}")
print(f"Enrolled current term: {len(enrolled_current)}")
print(f"CS majors enrolled, no grades: {len(result)}")Summary of Patterns¶
| Pattern | DataJoint | SQL Equivalent |
|---|---|---|
| Existence (IN) | A & B | WHERE id IN (SELECT ...) |
| Non-existence (NOT IN) | A - B | WHERE id NOT IN (SELECT ...) |
| AND (both conditions) | A & B1 & B2 | WHERE ... AND ... |
| OR (either condition) | (A & B1) + (A & B2) | WHERE ... OR ... |
| Exclusion | (A & B) - B_condition | WHERE IN (...) AND NOT IN (...) |
| Universal (all match) | (A & B) - (B - condition) | WHERE IN (...) AND NOT IN (NOT condition) |
Key principles:
Build incrementally — construct complex queries from simpler parts
Test intermediate results — verify each step before combining
Think in sets — restriction filters sets, not individual records
Primary key is preserved — restrictions never change the entity type
Understanding Query Execution¶
Order of Operations¶
Unlike SQL where SELECT and WHERE are in one statement, DataJoint separates concerns:
DataJoint approach:
result = Person & condition # Restriction first result = result.proj(...) # Projection secondSQL approach:
SELECT columns -- Projection FROM table WHERE condition -- Restriction (executed first internally)
Primary Key Preservation¶
Critical concept: Restriction never changes the primary key or entity type.
# All of these have the same primary key: person_id
people = Person # Primary key: person_id
english_speakers = Person & (...) # Primary key: person_id
gen_z = Person & (...) # Primary key: person_idThis enables algebraic closure - you can chain restrictions infinitely:
result = Person & condition1 & condition2 & condition3 # Still a Person tableBest Practices¶
1. Think in Sets and Logic¶
When designing restrictions, think about:
What set am I starting with?
What subset do I want?
How do I express that mathematically?
2. Build Complex Queries Incrementally¶
# Start simple
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(english_speakers)}")
# Add complexity
fluent_english = english_speakers & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(fluent_english)}")
# Add more conditions
gen_z_fluent_english = fluent_english & 'date_of_birth > "2000-01-01"'3. Understand Foreign Key Relationships¶
Subqueries work because of foreign key relationships:
# This works because Fluency.person_id references Person.person_id
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})The restriction automatically matches on the shared attributes (foreign key relationships).
4. Test Your Logic¶
For complex queries, verify your logic:
# Test: People who speak both English and Spanish
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish
# Verify: Should be subset of both individual sets
assert len(both) <= len(english)
assert len(both) <= len(spanish)Summary¶
The restriction operator is fundamental to database querying. Key takeaways:
Restriction selects rows based on conditions
Primary key is preserved - algebraic closure is maintained
Three condition types: dictionaries (equality), strings (inequalities), subqueries (relationships)
Build systematically: Start simple, add complexity incrementally
Think in sets: Use mathematical logic to design queries
Foreign keys enable subqueries: Relationships between tables drive complex restrictions
Master these patterns and you can answer any query that asks “find records where...”
Practice Exercises¶
Setup: Languages Database¶
import datajoint as dj
schema = dj.Schema('languages_practice')
@schema
class Language(dj.Lookup):
definition = """
lang_code : char(4)
---
language : varchar(30)
"""
contents = [
('ENG', 'English'),
('SPA', 'Spanish'),
('JPN', 'Japanese'),
('TAG', 'Tagalog'),
('MAN', 'Mandarin'),
('POR', 'Portuguese')
]
@schema
class Person(dj.Manual):
definition = """
person_id : int
---
name : varchar(60)
date_of_birth : date
"""
@schema
class Fluency(dj.Manual):
definition = """
-> Person
-> Language
---
fluency_level : enum('beginner', 'intermediate', 'fluent')
"""
# Populate with sample data...Exercise 1: Basic Restrictions¶
Question: How would you find person with ID 5?
Solution:
person_5 = Person & {'person_id': 5}Question: How would you find all people born after 2000?
Solution:
gen_z = Person & 'date_of_birth > "2000-01-01"'Exercise 2: Simple Subqueries¶
Question: Find all people who speak English.
Solution:
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})Exercise 3: Negation¶
Question: Find people who do NOT speak English.
Solution:
non_english_speakers = Person - (Fluency & {'lang_code': 'ENG'})Exercise 4: Multiple Conditions (AND)¶
Question: Find people who speak BOTH English AND Spanish.
Solution:
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
spanish_speakers = Person & (Fluency & {'lang_code': 'SPA'})
bilingual = english_speakers & spanish_speakersExercise 5: Multiple Conditions (OR)¶
Question: Find people who speak English OR Spanish.
Solution:
english_or_spanish = Person & (Fluency & 'lang_code IN ("ENG", "SPA")')Exercise 6: Complex Negation¶
Question: Find people who speak Japanese but NOT fluently.
Solution:
japanese_speakers = Person & (Fluency & {'lang_code': 'JPN'})
fluent_japanese = Person & (Fluency & {'lang_code': 'JPN', 'fluency_level': 'fluent'})
japanese_non_fluent = japanese_speakers - fluent_japaneseExercise 7: Reverse Perspective¶
Question: Find languages that are NOT spoken by anyone fluently.
Solution:
fluent_records = Fluency & {'fluency_level': 'fluent'}
languages_with_fluent_speakers = Language & fluent_records
languages_without_fluent_speakers = Language - languages_with_fluent_speakersExercise 8: All-or-Nothing¶
Question: Find students who have received only ‘A’ grades.
Solution:
has_grades = Student & Grade
has_non_a = Student & (Grade - {'grade': 'A'})
all_a = has_grades - has_non_aExercise 9: Chaining Restrictions¶
Question: Find Gen Z people who speak English fluently.
Solution:
result = Person & 'date_of_birth > "2000-01-01"' & \
(Fluency & {'lang_code': 'ENG', 'fluency_level': 'fluent'})Debugging and Verification Techniques¶
Test Your Logic¶
# Always verify your logic makes sense
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish
# Sanity checks:
print(f"English speakers: {len(english)}")
print(f"Spanish speakers: {len(spanish)}")
print(f"Bilingual: {len(both)}")
# Both should be <= each individual set
assert len(both) <= len(english)
assert len(both) <= len(spanish)Build Incrementally¶
# Start simple and add complexity
step1 = Person
print(f"All people: {len(step1)}")
step2 = step1 & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(step2)}")
step3 = step2 & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(step3)}")Common Patterns Summary¶
Basic inclusion:
Table & conditionBasic exclusion:
Table - conditionLogical Conjunction (AND-list):
Table & cond1 & cond2 & cond3Logical Disjunction (OR-list):
Table & [cond1, cond2, cond3]
These patterns form the building blocks for any restriction query you’ll encounter.