DataJoint provides five operators for building queries. These operators form a complete query language—any question you can ask of your data can be expressed using these five tools.
| Operator | Symbol | Purpose |
|---|---|---|
| Restriction | &, - | Filter entities by conditions |
| Projection | .proj() | Select and compute attributes |
| Join | * | Combine related entities |
| Aggregation | .aggr() | Summarize related data |
| Union | + | Combine entity sets of the same type |
Core Principles¶
Before diving into the operators, two principles guide how DataJoint queries work:
Entity Integrity¶
Every query result is a well-formed entity set with:
A clear entity type (what kind of things are in the result)
A defined primary key (how entities are uniquely identified)
Attributes that directly describe each entity
This means you always know what you’re working with. A query on Student entities returns Student entities—not some ambiguous collection of data.
Algebraic Closure¶
The output of any operator is itself a valid entity set that can be used as input to another operator. This enables unlimited composition:
# Chain operators freely
result = ((Student & 'gpa > 3.5') * Enrollment).proj('course_name')Each intermediate step produces a valid result you can inspect, debug, or use further.
Quick Reference¶
Here’s a summary of what each operator does and when to use it:
Restriction (&, -)¶
Use when: You want to filter entities based on conditions.
# Keep entities matching a condition
young_mice = Mouse & 'age < 30'
# Keep entities matching values from another table (semijoin)
mice_with_sessions = Mouse & Session
# Exclude entities (antijoin)
mice_without_sessions = Mouse - SessionResult: Same entity type, same primary key, fewer entities.
Projection (.proj())¶
Use when: You want to select, rename, or compute attributes.
# Select specific attributes
Mouse.proj('sex', 'date_of_birth')
# Rename an attribute
Mouse.proj(dob='date_of_birth')
# Compute a new attribute
Mouse.proj(age_days='DATEDIFF(NOW(), date_of_birth)')Result: Same entity type, same primary key, same number of entities, different attributes.
Join (*)¶
Use when: You want to combine information from related tables.
# Combine mouse info with their sessions
Mouse * Session
# Chain multiple joins
Mouse * Session * ScanResult: Combined entity type, combined primary key, only matching combinations.
Aggregation (.aggr())¶
Use when: You want to summarize related data for each entity.
# Count sessions per mouse
Mouse.aggr(Session, n_sessions='COUNT(*)')
# Average score per student
Student.aggr(Grade, avg_grade='AVG(score)')Result: Same entity type as the first operand, enriched with summary attributes.
Union (+)¶
Use when: You want to combine entities from compatible tables.
# Combine two groups of mice
experimental_mice = Mouse & 'group="experimental"'
control_mice = Mouse & 'group="control"'
all_selected = experimental_mice + control_miceResult: Same entity type, combined entities (duplicates removed).
How DataJoint Differs from SQL¶
If you’re familiar with SQL, here are the key differences:
| Aspect | SQL | DataJoint |
|---|---|---|
| Results | Can produce arbitrary column sets | Always produces well-formed entity sets |
| Joins | Any columns can be joined | Only semantically related attributes |
| Aggregation | Transforms entity type | Enriches existing entities |
| Order of operations | Hidden execution order | Explicit, left-to-right |
| Primary key | Can be lost in queries | Always preserved or well-defined |
Semantic Joins¶
DataJoint’s join operator (*) only matches attributes that share a semantic relationship—they must trace back to the same source through the schema’s dependencies. This prevents accidental joins on coincidentally named columns.
# DataJoint: Only joins on schema-defined relationships
Session * Scan # Works: scan depends on session
# SQL NATURAL JOIN: Joins on ANY matching column names
# Can produce wrong results if tables share unrelated column namesAggregation Preserves Identity¶
SQL’s GROUP BY creates a new entity type. DataJoint’s .aggr() enriches existing entities:
# DataJoint: Still Mouse entities, now with session counts
Mouse.aggr(Session, n='COUNT(*)')
# SQL: Creates "mouse summary" entities, losing direct connection to Mouse
# SELECT mouse_id, COUNT(*) FROM session GROUP BY mouse_idBuilding Complex Queries¶
The power of these operators comes from composition. Build complex queries step by step:
# Goal: Find adult mice with more than 5 sessions, showing their average session duration
# Step 1: Filter to adult mice
adults = Mouse & 'age > 90'
# Step 2: Add session statistics
with_stats = adults.aggr(
Session,
n_sessions='COUNT(*)',
avg_duration='AVG(duration)'
)
# Step 3: Filter to mice with many sessions
result = with_stats & 'n_sessions > 5'Each step produces a valid entity set you can examine:
adults # Check: which mice are adults?
with_stats # Check: what are the session counts?
result # Final answerNext Steps¶
The following chapters explore each operator in detail:
Restriction — Filtering with conditions, semijoins, and antijoins
Projection — Selecting, renaming, and computing attributes
Join — Combining related tables
Union — Merging compatible entity sets
Aggregation — Computing summaries across related data
Summary¶
DataJoint’s five query operators provide a complete, composable query language:
Restriction (
&,-) — Filter entitiesProjection (
.proj()) — Shape attributesJoin (
*) — Combine related dataAggregation (
.aggr()) — Summarize relationshipsUnion (
+) — Merge entity sets
Every operation preserves entity integrity, ensuring results are always meaningful and can be used in further operations. This makes queries predictable, debuggable, and composable.