Due Sep 5, 2024
Modify the program below to create a table to contain 100+ persons, including their
full names
gender
date of birth
address
phone
Save the populated notebook.
Git add, commit, and push to your fork. I will see your submission but your fellow students won’t.
Connect with DataJoint¶
The Devcontainer that comes with this textbook contains a running MySQL server.
The root credentials are set in environment variables DJ_HOST
, DJ_USER
, and DJ_PASS
.
These credentials are not secret since this database is not exposed to the external world.
The DataJoint client library uses these environment variables to connect to the database.
It sufficient to simply import the datajoint client library.
It will connect to the database automatically as soon as necessary.
However, you can explicity verify the connection by calling dj.conn()
:
import datajoint as dj
dj.conn() # test the connection (optionally)
[2024-08-21 04:11:09,541][INFO]: Connecting root@localhost:3306
[2024-08-21 04:11:09,560][INFO]: Connected root@localhost:3306
DataJoint connection (connected) root@localhost:3306
If you are only learning DataJoint, you are done. If you are reading this text to also learn SQL, you can use two ways to issue queries: with IPython magic commands or a client library.
Connect with IPython “Magic”¶
You can execute SQL statements directly from Jupyter with the help of “magic commdands”.
The following cell sets up the connection to the database for the Jupyter SQL Magic.
import pymysql
import os
pymysql.install_as_MySQLdb()
connection_string = "mysql://{user}:{password}@{host}".format(
user=os.environ['DJ_USER'],
host=os.environ['DJ_HOST'],
password=os.environ['DJ_PASS']
)
%load_ext sql
%sql $connection_string
Then you can issue SQL commands from a Jupyter cell by starting it with %%sql
.
Change the cell type to SQL
for appropriate syntax highlighting.
%%sql
-- show all users
SELECT User FROM mysql.user
We will use SQL magic only for fast interactive SQL queries. We will not use SQL magic as part of Python code.
Connect with Python client¶
To issue SQL queries from Python code, we will use a conventional SQL client, in this case pymysql
.
# create a database connection
conn = pymysql.connect(
host=os.environ['DJ_HOST'],
user=os.environ['DJ_USER'],
password=os.environ['DJ_PASS']
)
# crewate a query cursor and issue an SQL query
cur = conn.cursor()
cur.execute('SELECT User FROM mysql.user')
cur.fetchall()
(('debian-sys-maint',),
('mysql.infoschema',),
('mysql.session',),
('mysql.sys',),
('root',))
We are all set for executing all the database queries in this book!
Generate fake data¶
You can use the faker
module to generate fake data for testing.
import faker
fake = faker.Faker()
# Explore creating fake data
print(f"Fake name: {fake.name()}\n")
print(f"Fake address:\n{fake.address()}\n")
print(f"Fake email: {fake.email()}\n")
print(f"Fake VIN: {fake.vin()}\n")
print(f"Fake phone: {fake.phone_number()}\n")
print(f"Fake credit card:\n{fake.credit_card_full()}\n")
Fake name: Steven Serrano
Fake address:
10858 Fox Skyway
Cobbmouth, VA 41073
Fake email: aherring@example.net
Fake VIN: T9AD53LP59D6C3XTA
Fake phone: +1-252-201-5245
Fake credit card:
American Express
James Wiley
375346481270480 10/29
CID: 8680
Create a Person Table (DataJoint)¶
import datajoint as dj
schema = dj.Schema('assignment1')
@schema
class Person(dj.Manual):
definition = """
person_id : int unsigned
---
full_name : varchar(60)
date_of_birth : date
"""
# insert as a tuple
Person.insert1((1, "John Doe", "1991-10-01"))
# insert as a dict
Person.insert1(
dict(
person_id=2,
full_name="Jane Doe",
date_of_birth="1990-01-10"
)
)
# insert as a random person
Person.insert1(
dict(
person_id=fake.random_int(1, 100_000),
full_name=fake.name(),
date_of_birth=fake.date()
)
)
# insert many random people
for _ in range(100):
Person.insert1(
dict(
person_id=fake.random_int(1, 100_000),
full_name=fake.name(),
date_of_birth=fake.date()
)
)
# insert many random people at once
Person.insert(
dict(
person_id=fake.random_int(1, 100_000),
full_name=fake.name(),
date_of_birth=fake.date()
)
for _ in range(100))
Person()
# drop the entire schema with all contents. Danger!
schema.drop()