This and other notebooks for this video series are found at
https://
This notebook describes how to work with database schemas without access to the original code that generated the schema. These situations often arise when the database is created by another user who has not shared the generating code yet or when the database schema is created from a programming language other than Python.
import datajoint as dj
Working with schemas and their modules¶
Typically a DataJoint schema is created as a dedicated Python module. This module defines a schema
object that is used to link classes declared in the module to tables in the database schema. As an example, examine the university
module in this folder (./university.py
).
You may then import the module to interact with its tables:
import university as uni
Connecting dimitri@localhost:3306
dj.Diagram(uni)
Note that dj.Diagram
can extract the diagram from a schema object or from a python module containing its schema
object, lending further support to the convention of one-to-one correspondence between database schemas and Python modules in a datajoint project:
dj.Diagram(uni)
is equvalent to
dj.Diagram(uni.schema)
# students without majors
uni.Student - uni.StudentMajor
Spawning missing classes¶
Now imagine that you do not have access to university.py
or you do not have its latest version. You can still connect to the database schema but you will not have classes declared to interact with it.
So let’s start over in this scenario.
!!! Restart the kernel here to remove the previous class definitions !!!¶
You can may use the dj.list_schemas
function (new in datajoint 0.12.0
) to list the names of database schemas available to you.
import datajoint as dj
dj.list_schemas()
Connecting dimitri@localhost:3306
['dimitri_alter',
'dimitri_attach',
'dimitri_blob',
'dimitri_blobs',
'dimitri_nphoton',
'dimitri_schema',
'dimitri_university',
'dimitri_uuid',
'university']
Just as with a new schema, we start by creating a schema
object to connect to the chosen database schema:
schema = dj.schema('dimitri_university')
If the schema already exists, dj.schema
is initialized as usual and you may plot the schema diagram. But instead of seeing class names, you will see the raw table names as they appear in the database.
# let's plot its diagram
dj.Diagram(schema)
You may view the diagram but, at this point, there is now way to interact with these tables. A similar situation arises when another developer has added new tables to the schema but has not yet shared the updated module code with you. Then the diagram will show a mixture of class names and database table names.
Now you may use the schema.spawn_missing_classes
method to spawn classes into the local namespace for any tables missing their classes:
schema.spawn_missing_classes()
dj.Di(schema)
Now you may interact with these tables as if they were declared right here in this namespace:
# students without majors
Student - StudentMajor
!!! Restart the kernel here to remove the previous class definitions !!!¶
Creating a virtual module¶
Now spawn_missing_classes
creates the new classes in the local namespace. However, it is often more convenient to import a schema with its python module, equivalent to the python the python command
import university as uni
We can mimick this import without having access to university.py
using the create_virtual_module
function:
import datajoint as dj
uni = dj.create_virtual_module('university', 'dimitri_university')
Connecting dimitri@localhost:3306
Now uni
behaves as an imported module complete with the schema
object and all the table classes.
dj.Di(uni)
uni.Student - uni.StudentMajor
dj.create_virtual_module
takes optional arguments.
First, create_schema=False
assures that an error is raised when the schema does not already exist. Set it to True
if you want to create an empty schema.
dj.create_virtual_module('what', 'nonexistent')
---------------------------------------------------------------------------
DataJointError Traceback (most recent call last)
<ipython-input-4-b316110c06f3> in <module>
----> 1 dj.create_virtual_module('what', 'nonexistent')
~/dev/datajoint-python/datajoint/schema.py in create_virtual_module(module_name, schema_name, create_schema, create_tables, connection)
242 """
243 module = types.ModuleType(module_name)
--> 244 _schema = Schema(schema_name, create_schema=create_schema, create_tables=create_tables, connection=connection)
245 _schema.spawn_missing_classes(context=module.__dict__)
246 module.__dict__['schema'] = _schema
~/dev/datajoint-python/datajoint/schema.py in __init__(self, schema_name, context, connection, create_schema, create_tables)
65 raise DataJointError(
66 "Database named `{name}` was not defined. "
---> 67 "Set argument create_schema=True to create it.".format(name=schema_name))
68 else:
69 # create database
DataJointError: Database named `nonexistent` was not defined. Set argument create_schema=True to create it.
The other optional argument, create_tables=False
is passed to the schema
object. It prevents the use of the schema
obect of the virtual module for creating new tables in the existing schema. This is a precautionary measure since virtual modules are often used for completed schemas. You may set this argument to True
if you wish to add new tables to the existing schema. A more common approach in this scenario would be to create a new schema
object and to use the spawn_missing_classes
function to make the classes available.
However, you if do decide to create new tables in an existing tables using the virtual module, you may do so by using the schema object from the module as the decorator for declaring new tables:
uni = dj.create_virtual_module('university.py', 'dimitri_university', create_tables=True)
@uni.schema
class Example(dj.Manual):
definition = """
-> uni.Student
---
example : varchar(255)
"""
dj.Di(uni)
uni.Enroll.describe();
-> uni.Student
-> uni.Section
uni.save()