Skip to article frontmatterSkip to article content

Operator: Restriction

The restriction operator is one of the fundamental operations in relational algebra. It selects rows from a table that satisfy specific conditions, allowing you to filter data based on criteria you define.

Understanding Restriction

Restriction selects rows (not columns) from a table based on conditions. It’s the “WHERE clause” equivalent in SQL, but in DataJoint it’s represented by the & operator for inclusion and - operator for exclusion.

Key Concepts

  • Restriction never changes the primary key - the result still has the same entity type as the input

  • Algebraic closure - the result of restriction is still a valid relation that can be used in further operations

  • Entity integrity - restriction preserves the one-to-one correspondence between records and real-world entities

Basic Syntax

# Include rows matching condition
result = Table & condition

# Exclude rows matching condition  
result = Table - condition

Types of Restriction Conditions

1. Dictionary Conditions (Equality)

Use dictionaries for exact equality matches:

# Create example database from the lecture
import datajoint as dj
schema = dj.Schema('languages_demo')

@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    name : varchar(60)
    date_of_birth : date
    """

# Restrict by primary key (returns 0 or 1 record)
person_1 = Person & {'person_id': 1}

# Restrict by secondary attributes (may return multiple records)
millennials = Person & {'name': 'John Doe', date_of_birth': '1990-01-01'}

Key principle: Restricting by primary key always returns at most one record because primary keys are unique.

2. String Conditions (Inequalities and Ranges)

Use strings for more complex conditions:

# Range conditions
gen_z = Person & 'date_of_birth BETWEEN "2000-01-01" AND "2013-12-31"'

# Inequality conditions  
adults = Person & 'date_of_birth < "2005-01-01"'

# Pattern matching
j_names = Person & 'name LIKE "J%"'

3. Subquery Conditions

The most powerful form - restrict one table based on another:

@schema
class Language(dj.Lookup):
    definition = """
    lang_code : char(4)
    ---
    language : varchar(30)
    """
    contents = [
        ('en', 'English'),
        ('es', 'Spanish'), 
        ('ja', 'Japanese')
    ]

@schema  
class Fluency(dj.Manual):
    definition = """
    -> Person
    -> Language  
    ---
    fluency_level : enum('beginner', 'intermediate', 'fluent')
    """

# Find people who speak English
english_speakers = Person & (Fluency & {'lang_code': 'en'})

Systematic Query Patterns

Following the lecture approach, let’s examine systematic patterns for building complex restrictions.

Pattern 1: Basic Subquery (IN)

Goal: Find all people who speak English

english_speakers = Person & (Fluency & {'lang_code': 'en'})

SQL Equivalent:

SELECT DISTINCT p.*
FROM person p  
WHERE p.person_id IN (
    SELECT f.person_id
    FROM fluency f
    WHERE f.lang_code = 'en'
);

Analysis: This selects people whose person_id appears in the fluency table with English.

Pattern 2: Negated Subquery (NOT IN)

Goal: Find people who do NOT speak English

non_english_speakers = Person - (Fluency & {'lang_code': 'en'})

SQL Equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id NOT IN (
    SELECT f.person_id  
    FROM fluency f
    WHERE f.lang_code = 'en'
);

Pattern 3: Multiple Conditions (AND)

Goal: Find people who speak BOTH English AND Spanish

english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual = english_speakers & spanish_speakers

SQL Equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id FROM fluency f WHERE f.lang_code = 'en'
)
AND p.person_id IN (
    SELECT f.person_id FROM fluency f WHERE f.lang_code = 'es'  
);

Key insight: When you need “both conditions”, use separate subqueries connected with AND.

Pattern 4: Multiple Conditions (OR)

Goal: Find people who speak English OR Spanish

# Method 1: Using DataJoint
english_or_spanish = Person & (Fluency & 'lang_code IN ("en", "es")')

# Method 2: More explicit
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'}) 
either_language = english_speakers.proj() + spanish_speakers.proj()

SQL Equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id
    FROM fluency f  
    WHERE f.lang_code IN ('en', 'es')
);

Pattern 5: Complex Negation

Goal: Find people who speak Japanese but NOT fluently

japanese_speakers = Person & (Fluency & {'lang_code': 'ja'})
fluent_japanese = Person & (Fluency & {'lang_code': 'ja', 'fluency_level': 'fluent'})
japanese_non_fluent = japanese_speakers - fluent_japanese

SQL Equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id FROM fluency f WHERE f.lang_code = 'ja'
)
AND p.person_id NOT IN (
    SELECT f.person_id FROM fluency f 
    WHERE f.lang_code = 'ja' AND f.fluency_level = 'fluent'
);

Advanced Examples from the Lecture

Example 1: Languages Without Fluent Speakers

Goal: Find languages that no one speaks fluently

fluent_speakers = Fluency & {'fluency_level': 'fluent'}
languages_with_fluent_speakers = Language & fluent_speakers
languages_without_fluent_speakers = Language - languages_with_fluent_speakers

SQL Equivalent:

SELECT l.*
FROM language l
WHERE l.lang_code NOT IN (
    SELECT f.lang_code
    FROM fluency f
    WHERE f.fluency_level = 'fluent'
);

Example 2: Generational Filtering

# Define generations by birth year ranges
gen_z = Person & 'date_of_birth BETWEEN "2000-01-01" AND "2013-12-31"'
millennials = Person & 'date_of_birth BETWEEN "1981-01-01" AND "1999-12-31"'

# Find Gen Z English speakers
gen_z_english = gen_z & (Fluency & {'lang_code': 'ENG'})

Understanding Query Execution

Order of Operations

Unlike SQL where SELECT and WHERE are in one statement, DataJoint separates concerns:

  1. DataJoint approach:

    result = Person & condition  # Restriction first
    result = result.proj(...)    # Projection second
  2. SQL approach:

    SELECT columns    -- Projection  
    FROM table       
    WHERE condition  -- Restriction (executed first internally)

Primary Key Preservation

Critical concept: Restriction never changes the primary key or entity type.

# All of these have the same primary key: person_id
people = Person                    # Primary key: person_id
english_speakers = Person & (...)  # Primary key: person_id  
gen_z = Person & (...)            # Primary key: person_id

This enables algebraic closure - you can chain restrictions infinitely:

result = Person & condition1 & condition2 & condition3  # Still a Person table

Best Practices from the Lecture

1. Think in Sets and Logic

When designing restrictions, think about:

  • What set am I starting with?

  • What subset do I want?

  • How do I express that mathematically?

2. Build Complex Queries Incrementally

# Start simple
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(english_speakers)}")

# Add complexity  
fluent_english = english_speakers & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(fluent_english)}")

# Add more conditions
gen_z_fluent_english = fluent_english & 'date_of_birth > "2000-01-01"'

3. Understand Foreign Key Relationships

Subqueries work because of foreign key relationships:

# This works because Fluency.person_id references Person.person_id
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})

The restriction automatically matches on the shared attributes (foreign key relationships).

4. Test Your Logic

For complex queries, verify your logic:

# Test: People who speak both English and Spanish
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish

# Verify: Should be subset of both individual sets
assert len(both) <= len(english)
assert len(both) <= len(spanish)

SQL Translation Patterns

Every DataJoint restriction follows predictable SQL patterns:

Dictionary Restriction

# DataJoint
Person & {'person_id': 1}

# SQL  
SELECT * FROM person WHERE person_id = 1;

String Restriction

# DataJoint
Person & 'age > 25'

# SQL
SELECT * FROM person WHERE age > 25;

Subquery Restriction

# DataJoint
Person & (Fluency & {'lang_code': 'ENG'})

# SQL
SELECT DISTINCT p.* 
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id 
    FROM fluency f 
    WHERE f.lang_code = 'ENG'
);

Summary

The restriction operator is fundamental to database querying. Key takeaways:

  1. Restriction selects rows based on conditions

  2. Primary key is preserved - algebraic closure is maintained

  3. Three condition types: dictionaries (equality), strings (inequalities), subqueries (relationships)

  4. Build systematically: Start simple, add complexity incrementally

  5. Think in sets: Use mathematical logic to design queries

  6. Foreign keys enable subqueries: Relationships between tables drive complex restrictions

Master these patterns and you can answer any query that asks “find records where...”

Practice Exercises: Systematic Query Building

Let’s work through practical examples using the languages database from the lecture. These exercises will help you develop systematic thinking about restriction queries.

Setup: Languages Database

import datajoint as dj
schema = dj.Schema('languages_practice')

@schema
class Language(dj.Lookup):
    definition = """
    lang_code : char(4)
    ---
    language : varchar(30)
    """
    contents = [
        ('ENG', 'English'),
        ('SPA', 'Spanish'),
        ('JPN', 'Japanese'),
        ('TAG', 'Tagalog'),
        ('MAN', 'Mandarin'),
        ('POR', 'Portuguese')
    ]

@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    name : varchar(60)
    date_of_birth : date
    """

@schema
class Fluency(dj.Manual):
    definition = """
    -> Person
    -> Language
    ---
    fluency_level : enum('beginner', 'intermediate', 'fluent')
    """

# Populate with sample data...

Exercise 1: Basic Restrictions

Question: How would you find person with ID 5?

Solution:

person_5 = Person & {'person_id': 5}

Key insight: Primary key restrictions return 0 or 1 record.

Question: How would you find all people born after 2000?

Solution:

gen_z = Person & 'date_of_birth > "2000-01-01"'

Exercise 2: Simple Subqueries

Question: Find all people who speak English.

Step-by-step thinking:

  1. I want people (start with Person table)

  2. Who speak English (condition in Fluency table)

  3. English speakers are those whose person_id appears in Fluency with lang_code = 'ENG'

Solution:

english_speakers = Person & (Fluency & {'lang_code': 'ENG'})

SQL equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id
    FROM fluency f
    WHERE f.lang_code = 'ENG'
);

Exercise 3: Negation

Question: Find people who do NOT speak English.

Step-by-step thinking:

  1. I want people (start with Person table)

  2. Who do NOT speak English (exclude those in the English speakers set)

  3. Use subtraction operator -

Solution:

non_english_speakers = Person - (Fluency & {'lang_code': 'ENG'})

SQL equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id NOT IN (
    SELECT f.person_id
    FROM fluency f
    WHERE f.lang_code = 'ENG'
);

Exercise 4: Multiple Conditions (AND)

Question: Find people who speak BOTH English AND Spanish.

Step-by-step thinking:

  1. I want people who speak English AND Spanish

  2. This means they must be in BOTH sets

  3. Create each set separately, then intersect with &

Solution:

english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
spanish_speakers = Person & (Fluency & {'lang_code': 'SPA'})
bilingual = english_speakers & spanish_speakers

SQL equivalent:

SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id FROM fluency f WHERE f.lang_code = 'ENG'
)
AND p.person_id IN (
    SELECT f.person_id FROM fluency f WHERE f.lang_code = 'SPA'
);

Exercise 5: Multiple Conditions (OR)

Question: Find people who speak English OR Spanish.

Solution Method 1 (using IN):

english_or_spanish = Person & (Fluency & 'lang_code IN ("ENG", "SPA")')

Solution Method 2 (explicit union):

english_speakers = Person & (Fluency & {'lang_code': 'ENG'})
spanish_speakers = Person & (Fluency & {'lang_code': 'SPA'})
# Note: Union removes duplicates automatically

Exercise 6: Complex Negation

Question: Find people who speak Japanese but NOT fluently.

Step-by-step thinking:

  1. I want Japanese speakers (those in Fluency with Japanese)

  2. But NOT fluent ones (exclude those with fluency_level = ‘fluent’)

  3. Japanese speakers MINUS fluent Japanese speakers

Solution:

japanese_speakers = Person & (Fluency & {'lang_code': 'JPN'})
fluent_japanese = Person & (Fluency & {'lang_code': 'JPN', 'fluency_level': 'fluent'})
japanese_non_fluent = japanese_speakers - fluent_japanese

Alternative solution (direct):

japanese_non_fluent = Person & (Fluency & {'lang_code': 'JPN'}) - \
                     (Fluency & {'lang_code': 'JPN', 'fluency_level': 'fluent'})

Exercise 7: Reverse Perspective

Question: Find languages that are NOT spoken by anyone fluently.

Step-by-step thinking:

  1. I want languages (start with Language table)

  2. That are NOT spoken fluently (exclude those that appear in fluent records)

  3. Languages MINUS languages with fluent speakers

Solution:

fluent_records = Fluency & {'fluency_level': 'fluent'}
languages_with_fluent_speakers = Language & fluent_records
languages_without_fluent_speakers = Language - languages_with_fluent_speakers

Exercise 8: Chaining Restrictions

Question: Find Gen Z people who speak English fluently.

Step-by-step thinking:

  1. Start with all people

  2. Restrict to Gen Z (born after 2000)

  3. Further restrict to English speakers

  4. Further restrict to fluent level

Solution:

gen_z = Person & 'date_of_birth > "2000-01-01"'
gen_z_english = gen_z & (Fluency & {'lang_code': 'ENG'})
gen_z_english_fluent = gen_z_english & (Fluency & {'fluency_level': 'fluent'})

# Or in one line:
result = Person & 'date_of_birth > "2000-01-01"' & \
         (Fluency & {'lang_code': 'ENG', 'fluency_level': 'fluent'})

Debugging and Verification Techniques

Test Your Logic

# Always verify your logic makes sense
english = Person & (Fluency & {'lang_code': 'ENG'})
spanish = Person & (Fluency & {'lang_code': 'SPA'})
both = english & spanish

# Sanity checks:
print(f"English speakers: {len(english)}")
print(f"Spanish speakers: {len(spanish)}")
print(f"Bilingual: {len(both)}")

# Both should be <= each individual set
assert len(both) <= len(english)
assert len(both) <= len(spanish)

Build Incrementally

# Start simple and add complexity
step1 = Person
print(f"All people: {len(step1)}")

step2 = step1 & (Fluency & {'lang_code': 'ENG'})
print(f"English speakers: {len(step2)}")

step3 = step2 & (Fluency & {'fluency_level': 'fluent'})
print(f"Fluent English speakers: {len(step3)}")

Common Patterns Summary

  1. Basic inclusion: Table & condition

  2. Basic exclusion: Table - condition

  3. Logicial Donjunction (AND-list): Table & cond1 & cond2 & cond3

  4. Logical Disjunction (OR-list): Table & [cond1, cond2, cond3]

These patterns form the building blocks for any restriction query you’ll encounter.