Skip to article frontmatterSkip to article content

Create a Table

In DataJoint, declaring individual tables is the foundational step in building your data pipeline. Each table corresponds to a specific entity or data structure that you want to model within your database. This tutorial will guide you through the basics of declaring individual tables, covering essential components like primary keys, attributes, and basic definitions.

Schema Declaration

Before declaring tables, you need to declare a schema which is a namespace for your tables, giving it a unique name.

The schema groups related tables together and avoids naming conflicts.

import datajoint as dj

# Define the schema
schema = dj.Schema('my_schema')
[2024-08-27 04:10:41,167][INFO]: Connecting root@localhost:3306
[2024-08-27 04:10:41,184][INFO]: Connected root@localhost:3306

The resulting schema object refers to the schema and is then used as a decorator on classes to associate tables with your schema.

Table Declaration

To declare a table in DataJoint, you need to define a class that inherits from one of the base classes provided by DataJoint, such as dj.Manual, dj.Lookup, dj.Imported, or dj.Computed. Each class type serves a different purpose, but for now, let’s start with the dj.Manual class, which is used for tables where data is manually inserted.

Here’s a basic example of a table declaration:

@schema
class Experiment(dj.Manual):
    definition = """
    experiment_id: int      # Unique identifier for each experiment
    ---
    experiment_name: varchar(64)   # Name of the experiment
    experiment_date: date          # Date when the experiment was conducted
    """

You have just declared a table in your database schema.

Explanation:

  • Table Name: The class name Experiment becomes the table name in the database. By convention, table names should be singular and capitalized. For compound names, use the CamelCase notation, do not use underscores.
  • Table definition: the definition member of the class provide a multiline string that defines the table.
  • Attributes: Each line in the defintion is a named attribute (table column), in the format <name> : <datatype> [# comment].
  • Primary Key: The attributes above the separator ---, together, constitute the primary key. These attributes jointly uniquely identify and discriminate entities in the table. In this example, the primary key comprises a single attribute, experiment_id of type int.

Attribute Data Types

DataJoint uses data types provided by its underlying database management system, MySQL. DataJoint also provides its own data types, which we will introduce later.

MySQL, one of the most popular relational database management systems, supports a variety of data types that are used to define the kind of data that can be stored in a table’s columns. Choosing the correct data type is crucial for optimizing performance, storage, and ensuring the accuracy of the data. This section provides an overview of the main MySQL data types, categorized into numeric types, date and time types, string (or character) types, and other types.

You can learn about the full set of MySQL datatypes in the MySQL Documentation.

Here we only list the datatypes commonly used in DataJoint. We will only use these datatypes in this book.

Numeric Data Types

Numeric data types are used to store numbers. They are divided into integer types, floating-point types, and fixed-point types.

  • Integer Types:

    • TINYINT: A very small integer. Range: -128 to 127 or 0 to 255 (if UNSIGNED).
    • SMALLINT: A small integer. Range: -32,768 to 32,767 or 0 to 65,535 (if UNSIGNED).
    • INT: A standard integer. Range: -2,147,483,648 to 2,147,483,647 or 0 to 4,294,967,295 (if UNSIGNED).
    • BIGINT: A large integer. Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 or 0 to 18,446,744,073,709,551,615 (if UNSIGNED).

    Each of these types can be specified as UNSIGNED to allow only non-negative values, effectively doubling the upper limit of the range.

  • Floating-Point Types:

    • FLOAT: A floating-point number with M digits in total and D digits after the decimal point. Approximate range is -3.402823466E+38 to 3.402823466E+38.
    • DOUBLE: A double-precision floating-point number. Approximate range is -1.7976931348623157E+308 to 1.7976931348623157E+308.
  • Fixed-Point Types:

    • DECIMAL(M,D): A fixed-point number where M is the total number of digits and D is the number of digits after the decimal point. This type is used for exact numeric data (e.g., for monetary values). It can be marked UNSIGNED as well.

Date and Time Data Types

Date and time data types are used to store temporal data. These types are essential for tracking dates, times, and intervals.

  • DATE: A date in the format YYYY-MM-DD. The range is from ‘1000-01-01’ to ‘9999-12-31’.
  • DATETIME: A combination of date and time in the format YYYY-MM-DD HH:MM:SS. The range is from ‘1000-01-01 00:00:00’ to ‘9999-12-31 23:59:59’.
  • TIMESTAMP: A timestamp in the format YYYY-MM-DD HH:MM:SS, representing the number of seconds since the Unix epoch (1970-01-01 00:00:00 UTC). The range is from ‘1970-01-01 00:00:01’ UTC to ‘2038-01-19 03:14:07’ UTC.
  • TIME: A time value in the format HH:MM:SS. The range is from ‘-838:59:59’ to ‘838:59:59’.
  • YEAR: A year in 2-digit or 4-digit format. The range is from ‘1901’ to ‘2155’ for 4-digit format.

String (Character) Data Types

String data types are used to store text and binary data. These types are flexible and can store anything from a single character to large text blocks.

  • CHAR(M): A fixed-length string that is always padded with spaces to the specified length M. The maximum length is 255 characters.
  • VARCHAR(M): A variable-length string where M specifies the maximum length. The maximum length is 65,535 characters.
  • ENUM: An enumeration. You can specify a list of permitted values, and the column can store only one of these values.

Choosing the Right Data Type

Selecting the correct data type for each column in your table is crucial:

  • Efficiency: Choose data types that use the least amount of storage while accommodating all possible values. For instance, if you only need to store small numbers, use TINYINT instead of INT.
  • Precision: Use DECIMAL for precise calculations, especially for financial data, instead of FLOAT or DOUBLE, which are subject to rounding errors.
  • Appropriateness: Match the data type to the nature of the data. For example, use DATE for date values, CHAR for fixed-length fields like postal codes, and TEXT for large text blocks.

Understanding MySQL data types is essential for designing efficient, reliable, and maintainable databases. By choosing the appropriate data types for your columns, you ensure that your database performs well and stores data accurately. This overview provides a solid foundation for selecting the right data types when defining your tables in MySQL.

Inserting Data

Once a table is defined, you can start inserting data into it manually.

You can do this using one of two methods:

  • insert1: Used to insert a single record into the table.
  • insert: Used to insert multiple records at once. The data is provided as a list of dictionaries, where each dictionary represents one record.
# Insert a new record into the Experiment table
Experiment.insert1({
    'experiment_id': 1,
    'experiment_name': 'Experiment 1',
    'experiment_date': '2024-08-26'
})

# Insert multiple records at once
Experiment.insert([
    {'experiment_id': 2, 'experiment_name': 'Experiment 2', 'experiment_date': '2024-08-27'},
    {'experiment_id': 3, 'experiment_name': 'Experiment 3', 'experiment_date': '2024-08-28'}
])

Previewing Table Contents

To view the structure of your table (its attributes and types), you can simply breference the table object:

Experiment()
Loading...

Deleting records

If you want to delete a specific record, use the table.delete() method after restricting the table to a specific row by the primary key value using the & operator. You will be given a chance to confirm before the delete is commited.

(Experiment & {'experiment_id': 1}).delete()
[2024-08-27 04:10:41,375][INFO]: Deleting 1 rows from `my_schema`.`experiment`
[2024-08-27 04:10:42,835][INFO]: Deletes committed.
1

To delete all records from the table, use the .delete() method. You will have a chance to confirm before finalizing the delete.

Experiment.delete()
[2024-08-27 04:10:42,866][INFO]: Deleting 2 rows from `my_schema`.`experiment`
[2024-08-27 04:10:43,873][INFO]: Deletes committed.
2

Dropping a table

Dropping tables in DataJoint is a process that involves permanently removing a table and all of its associated data from the database. This action is irreversible, so it should be performed with caution.

To drop a table in DataJoint, you use the drop() method on the table class. When you drop a table, DataJoint not only deletes the table itself but also all entries and relationships tied to it. If the table is part of a hierarchy with dependent tables (such as tables that reference it via foreign keys), DataJoint will prompt you to confirm the action, ensuring you are aware of the cascading effect this drop will have on related tables. This mechanism helps maintain the integrity of the database by preventing accidental data loss that could disrupt the data pipeline. It’s essential to ensure that any critical data has been backed up or is no longer needed before proceeding with a table drop in DataJoint.

Experiment.drop()
[2024-08-27 04:10:43,899][INFO]: `my_schema`.`experiment` (0 tuples)
[2024-08-27 04:10:44,722][INFO]: Dropped table `my_schema`.`experiment`
[2024-08-27 04:10:44,735][INFO]: Tables dropped. Restart kernel.

Dropping a schema

Dropping a schema in DataJoint involves permanently deleting all the tables within that schema and the schema itself from the database. This is a significant action because it removes not only the tables but also all the data stored within those tables. To drop a schema, you typically use the schema.drop() method, where schema is the schema object you defined earlier in your code.

When you execute this command, DataJoint will prompt you to confirm the action, as it is irreversible and will result in the loss of all data associated with the schema. This operation is especially powerful because it cascades through all tables within the schema, removing each one.

It’s crucial to ensure that any data within the schema is either no longer needed or has been adequately backed up before dropping the schema, as this action will permanently remove all traces of the data and the schema’s structure from the database.

schema.drop()