The restriction operator is one of the fundamental operations in relational algebra. It selects rows from a table that satisfy specific conditions, allowing you to filter data based on criteria you define.
Understanding Restriction¶
Restriction selects rows (not columns) from a table based on conditions. It’s the “WHERE clause” equivalent in SQL, but in DataJoint it’s represented by the & operator for inclusion and - operator for exclusion.
Key Concepts¶
Restriction never changes the primary key - the result still has the same entity type as the input
Algebraic closure - the result of restriction is still a valid relation that can be used in further operations
Entity integrity - restriction preserves the one-to-one correspondence between records and real-world entities
Basic Syntax¶
# Include rows matching condition
result = Table & condition
# Exclude rows matching condition
result = Table - conditionTypes of Restriction Conditions¶
1. Dictionary Conditions (Equality)¶
Use dictionaries for exact equality matches:
# Create example database from the lecture
import datajoint as dj
schema = dj.Schema('languages_demo')
@schema
class Person(dj.Manual):
definition = """
person_id : int
---
name : varchar(60)
date_of_birth : date
"""
# Restrict by primary key (returns 0 or 1 record)
person_1 = Person & {'person_id': 1}
# Restrict by secondary attributes (may return multiple records)
millennials = Person & {'name': 'John Doe', date_of_birth': '1990-01-01'}Key principle: Restricting by primary key always returns at most one record because primary keys are unique.
2. String Conditions (Inequalities and Ranges)¶
Use strings for more complex conditions:
# Range conditions
gen_z = Person & 'date_of_birth BETWEEN "2000-01-01" AND "2013-12-31"'
# Inequality conditions
adults = Person & 'date_of_birth < "2005-01-01"'
# Pattern matching
j_names = Person & 'name LIKE "J%"'3. Subquery Conditions¶
The most powerful form - restrict one table based on another:
@schema
class Language(dj.Lookup):
definition = """
lang_code : char(4)
---
language : varchar(30)
"""
contents = [
('en', 'English'),
('es', 'Spanish'),
('ja', 'Japanese')
]
@schema
class Fluency(dj.Manual):
definition = """
-> Person
-> Language
---
fluency_level : enum('beginner', 'intermediate', 'fluent')
"""
# Find people who speak English
english_speakers = Person & (Fluency & {'lang_code': 'en'})Systematic Query Patterns¶
Following the lecture approach, let’s examine systematic patterns for building complex restrictions.
Pattern 1: Basic Subquery (IN)¶
Goal: Find all people who speak English
english_speakers = Person & (Fluency & {'lang_code': 'en'})SQL Equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id
FROM fluency f
WHERE f.lang_code = 'en'
);Analysis: This selects people whose person_id appears in the fluency table with English.
Pattern 2: Negated Subquery (NOT IN)¶
Goal: Find people who do NOT speak English
non_english_speakers = Person - (Fluency & {'lang_code': 'en'})SQL Equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id NOT IN (
SELECT f.person_id
FROM fluency f
WHERE f.lang_code = 'en'
);Pattern 3: Multiple Conditions (AND)¶
Goal: Find people who speak BOTH English AND Spanish
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual = english_speakers & spanish_speakersSQL Equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id FROM fluency f WHERE f.lang_code = 'en'
)
AND p.person_id IN (
SELECT f.person_id FROM fluency f WHERE f.lang_code = 'es'
);Key insight: When you need “both conditions”, use separate subqueries connected with AND.
Pattern 4: Multiple Conditions (OR)¶
Goal: Find people who speak English OR Spanish
# Method 1: Using DataJoint
english_or_spanish = Person & (Fluency & 'lang_code IN ("en", "es")')
# Method 2: More explicit
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
either_language = english_speakers.proj() + spanish_speakers.proj()SQL Equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id
FROM fluency f
WHERE f.lang_code IN ('en', 'es')
);Pattern 5: Complex Negation¶
Goal: Find people who speak Japanese but NOT fluently
japanese_speakers = Person & (Fluency & {'lang_code': 'ja'})
fluent_japanese = Person & (Fluency & {'lang_code': 'ja', 'fluency_level': 'fluent'})
japanese_non_fluent = japanese_speakers - fluent_japaneseSQL Equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id FROM fluency f WHERE f.lang_code = 'ja'
)
AND p.person_id NOT IN (
SELECT f.person_id FROM fluency f
WHERE f.lang_code = 'ja' AND f.fluency_level = 'fluent'
);Advanced Examples from the Lecture¶
Example 1: Languages Without Fluent Speakers¶
Goal: Find languages that no one speaks fluently
fluent_speakers = Fluency & {'fluency_level': 'fluent'}
languages_with_fluent_speakers = Language & fluent_speakers
languages_without_fluent_speakers = Language - languages_with_fluent_speakersSQL Equivalent:
SELECT l.*
FROM language l
WHERE l.lang_code NOT IN (
SELECT f.lang_code
FROM fluency f
WHERE f.fluency_level = 'fluent'
);Example 2: Generational Filtering¶
# Define generations by birth year ranges
gen_z = Person & 'date_of_birth BETWEEN "2000-01-01" AND "2013-12-31"'
millennials = Person & 'date_of_birth BETWEEN "1981-01-01" AND "1999-12-31"'
# Find Gen Z English speakers
gen_z_english = gen_z & (Fluency & {'lang_code': 'ENG'})Understanding Query Execution¶
Order of Operations¶
Unlike SQL where SELECT and WHERE are in one statement, DataJoint separates concerns:
DataJoint approach:
result = Person & condition # Restriction first result = result.proj(...) # Projection secondSQL approach:
SELECT columns -- Projection FROM table WHERE condition -- Restriction (executed first internally)
Primary Key Preservation¶
Critical concept: Restriction never changes the primary key or entity type.
# All of these have the same primary key: person_id
people = Person # Primary key: person_id
english_speakers = Person & (...) # Primary key: person_id
gen_z = Person & (...) # Primary key: person_idThis enables algebraic closure - you can chain restrictions infinitely:
result = Person & condition1 & condition2 & condition3 # Still a Person tableBest Practices from the Lecture¶
1. Think in Sets and Logic¶
When designing restrictions, think about:
What set am I starting with?
What subset do I want?
How do I express that mathematically?
2. Build Complex Queries Incrementally¶
# Start simple
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(english_speakers)}")
# Add complexity
fluent_english = english_speakers & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(fluent_english)}")
# Add more conditions
gen_z_fluent_english = fluent_english & 'date_of_birth > "2000-01-01"'3. Understand Foreign Key Relationships¶
Subqueries work because of foreign key relationships:
# This works because Fluency.person_id references Person.person_id
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})The restriction automatically matches on the shared attributes (foreign key relationships).
4. Test Your Logic¶
For complex queries, verify your logic:
# Test: People who speak both English and Spanish
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish
# Verify: Should be subset of both individual sets
assert len(both) <= len(english)
assert len(both) <= len(spanish)SQL Translation Patterns¶
Every DataJoint restriction follows predictable SQL patterns:
Dictionary Restriction¶
# DataJoint
Person & {'person_id': 1}
# SQL
SELECT * FROM person WHERE person_id = 1;String Restriction¶
# DataJoint
Person & 'age > 25'
# SQL
SELECT * FROM person WHERE age > 25;Subquery Restriction¶
# DataJoint
Person & (Fluency & {'lang_code': 'ENG'})
# SQL
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id
FROM fluency f
WHERE f.lang_code = 'ENG'
);Summary¶
The restriction operator is fundamental to database querying. Key takeaways:
Restriction selects rows based on conditions
Primary key is preserved - algebraic closure is maintained
Three condition types: dictionaries (equality), strings (inequalities), subqueries (relationships)
Build systematically: Start simple, add complexity incrementally
Think in sets: Use mathematical logic to design queries
Foreign keys enable subqueries: Relationships between tables drive complex restrictions
Master these patterns and you can answer any query that asks “find records where...”
Practice Exercises: Systematic Query Building¶
Let’s work through practical examples using the languages database from the lecture. These exercises will help you develop systematic thinking about restriction queries.
Setup: Languages Database¶
import datajoint as dj
schema = dj.Schema('languages_practice')
@schema
class Language(dj.Lookup):
definition = """
lang_code : char(4)
---
language : varchar(30)
"""
contents = [
('ENG', 'English'),
('SPA', 'Spanish'),
('JPN', 'Japanese'),
('TAG', 'Tagalog'),
('MAN', 'Mandarin'),
('POR', 'Portuguese')
]
@schema
class Person(dj.Manual):
definition = """
person_id : int
---
name : varchar(60)
date_of_birth : date
"""
@schema
class Fluency(dj.Manual):
definition = """
-> Person
-> Language
---
fluency_level : enum('beginner', 'intermediate', 'fluent')
"""
# Populate with sample data...Exercise 1: Basic Restrictions¶
Question: How would you find person with ID 5?
Solution:
person_5 = Person & {'person_id': 5}Key insight: Primary key restrictions return 0 or 1 record.
Question: How would you find all people born after 2000?
Solution:
gen_z = Person & 'date_of_birth > "2000-01-01"'Exercise 2: Simple Subqueries¶
Question: Find all people who speak English.
Step-by-step thinking:
I want people (start with
Persontable)Who speak English (condition in
Fluencytable)English speakers are those whose
person_idappears inFluencywithlang_code = 'ENG'
Solution:
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})SQL equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id
FROM fluency f
WHERE f.lang_code = 'ENG'
);Exercise 3: Negation¶
Question: Find people who do NOT speak English.
Step-by-step thinking:
I want people (start with
Persontable)Who do NOT speak English (exclude those in the English speakers set)
Use subtraction operator
-
Solution:
non_english_speakers = Person - (Fluency & {'lang_code': 'ENG'})SQL equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id NOT IN (
SELECT f.person_id
FROM fluency f
WHERE f.lang_code = 'ENG'
);Exercise 4: Multiple Conditions (AND)¶
Question: Find people who speak BOTH English AND Spanish.
Step-by-step thinking:
I want people who speak English AND Spanish
This means they must be in BOTH sets
Create each set separately, then intersect with
&
Solution:
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
spanish_speakers = Person & (Fluency & {'lang_code': 'SPA'})
bilingual = english_speakers & spanish_speakersSQL equivalent:
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
SELECT f.person_id FROM fluency f WHERE f.lang_code = 'ENG'
)
AND p.person_id IN (
SELECT f.person_id FROM fluency f WHERE f.lang_code = 'SPA'
);Exercise 5: Multiple Conditions (OR)¶
Question: Find people who speak English OR Spanish.
Solution Method 1 (using IN):
english_or_spanish = Person & (Fluency & 'lang_code IN ("ENG", "SPA")')Solution Method 2 (explicit union):
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
spanish_speakers = Person & (Fluency & {'lang_code': 'SPA'})
# Note: Union removes duplicates automaticallyExercise 6: Complex Negation¶
Question: Find people who speak Japanese but NOT fluently.
Step-by-step thinking:
I want Japanese speakers (those in Fluency with Japanese)
But NOT fluent ones (exclude those with fluency_level = ‘fluent’)
Japanese speakers MINUS fluent Japanese speakers
Solution:
japanese_speakers = Person & (Fluency & {'lang_code': 'JPN'})
fluent_japanese = Person & (Fluency & {'lang_code': 'JPN', 'fluency_level': 'fluent'})
japanese_non_fluent = japanese_speakers - fluent_japaneseAlternative solution (direct):
japanese_non_fluent = Person & (Fluency & {'lang_code': 'JPN'}) - \
(Fluency & {'lang_code': 'JPN', 'fluency_level': 'fluent'})Exercise 7: Reverse Perspective¶
Question: Find languages that are NOT spoken by anyone fluently.
Step-by-step thinking:
I want languages (start with
Languagetable)That are NOT spoken fluently (exclude those that appear in fluent records)
Languages MINUS languages with fluent speakers
Solution:
fluent_records = Fluency & {'fluency_level': 'fluent'}
languages_with_fluent_speakers = Language & fluent_records
languages_without_fluent_speakers = Language - languages_with_fluent_speakersExercise 8: Chaining Restrictions¶
Question: Find Gen Z people who speak English fluently.
Step-by-step thinking:
Start with all people
Restrict to Gen Z (born after 2000)
Further restrict to English speakers
Further restrict to fluent level
Solution:
gen_z = Person & 'date_of_birth > "2000-01-01"'
gen_z_english = gen_z & (Fluency & {'lang_code': 'ENG'})
gen_z_english_fluent = gen_z_english & (Fluency & {'fluency_level': 'fluent'})
# Or in one line:
result = Person & 'date_of_birth > "2000-01-01"' & \
(Fluency & {'lang_code': 'ENG', 'fluency_level': 'fluent'})Debugging and Verification Techniques¶
Test Your Logic¶
# Always verify your logic makes sense
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish
# Sanity checks:
print(f"English speakers: {len(english)}")
print(f"Spanish speakers: {len(spanish)}")
print(f"Bilingual: {len(both)}")
# Both should be <= each individual set
assert len(both) <= len(english)
assert len(both) <= len(spanish)Build Incrementally¶
# Start simple and add complexity
step1 = Person
print(f"All people: {len(step1)}")
step2 = step1 & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(step2)}")
step3 = step2 & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(step3)}")Common Patterns Summary¶
Basic inclusion:
Table & conditionBasic exclusion:
Table - conditionLogicial Donjunction (AND-list):
Table & cond1 & cond2 & cond3Logical Disjunction (OR-list):
Table & [cond1, cond2, cond3]
These patterns form the building blocks for any restriction query you’ll encounter.