This book is designed to teach database programming for scientific operations, using the most popular scientific programming language today—Python.
Executable Book Format¶
Following the standards of the Executable Books Project, this book is fully executable. The entire development environment, including all dependencies, is containerized using Docker and DevContainer, ensuring that all examples and exercises can be run seamlessly.
Using DevContainer¶
The main GitHub repository for this book includes a DevContainer setup that provides a complete development environment. This environment is pre-configured with:
A MySQL database server
Minio server for storing large objects
MyST for generating the book
Python and Jupyter for programming examples
MySQL client and IPython SQL Magic for executing SQL commands
The DataJoint client library
Essential scientific programming libraries
If you use DevContainer for this book, everything you need is already set up. This configuration is sufficient for the tutorial exercises but may not be robust enough for large-scale, real-world collaborative projects.
Setting Up DevContainer¶
There are two main methods to build and activate the DevContainer:
Using Visual Studio Code
Using GitHub CodeSpaces
Detailed instructions for working with DevContainers can be found in the documentation for MyST, Visual Studio Code, and GitHub CodeSpaces. Readers are encouraged to explore these resources for guidance on setting up and operating the environment.
DataJoint Works¶
For real-life projects, consider setting up an account on DataJoint Works. This platform provides the computational infrastructure and programming environment needed to implement collaborative data pipelines, either in the cloud or on local infrastructure. DataJoint Works integrates:
Code management
User access management
Database operations
Automated computations
Electronic lab notebooks and visualization dashboards
Data navigation and exploration
Performance monitoring and optimization
Backups and recovery
Data export and publishing
The platform also offers technical support for setup and configuration issues.
Do-It-Yourself Setup¶
If you prefer to set up a DataJoint pipeline using your own resources, the tools are open source and available for you to use. However, this requires substantial skills in systems administration and database management. You can start by reverse-engineering the DevContainer provided with this book and referring to the tutorials and documentation for each software component.
To run the coding examples in this book on your own setup, you’ll need:
A MySQL server with user accounts and appropriate privileges
Object storage for large data objects (e.g., Minio or Ceph)
Python programming language
An Integrated Development Environment (IDE) like VSCode or PyCharm, or Jupyter
Python libraries for scientific computations and the DataJoint Python client
IPython SQL magic
While the setup provided in this book is streamlined for educational purposes, real-world database operations can be complex and may require significant infrastructure and support.