To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science. While we will focus on database principles, we assume a working knowledge of the following. If you’re new to these, we highly recommend exploring MIT’s “The Missing Semester of Your CS Education” to get up to speed.
Command-Line Proficiency¶
You’ll frequently interact with systems through a terminal or shell. You don’t need to be a guru, but you should know how to navigate directories (cd
), list files (ls
), and run basic commands. The shell is the universal language for automation and remote computing.
Python Fundamentals¶
Python is our language for interacting with databases. You should understand variables, data types (strings, integers, lists, dictionaries), loops, and functions. We will touch on more advanced concepts like decorators, but a solid foundation is key.
Git and GitHub¶
In collaborative science and software, version control is non-negotiable. We expect you to have a GitHub account and be familiar with the basic workflow: clone
, add
, commit
, and push
. This is how you’ll manage your code and assignments.
Jupyter Notebooks¶
This textbook itself is built using Jupyter. You should know how to launch, navigate, and run code within Jupyter Notebooks or JupyterLab. The concept of “literate programming”—mixing executable code, text, and results—is central to reproducible science.