- Log into http://kgcoe-git.rit.edu using your RIT username and password. You should find yourself already assigned to a group. If you are not in a group, get in touch with your instructor. We are not using gitlab.com.
- In your group, we have created a project for you called
db-abc123
whereabc123
is your username. Note that you have access to other students’ repositories. You are going to be pushing code to your own repository and making comments on other students’ repositories. (See our FAQ on working with other students’ code) - Clone your project repository locally using your favorite Git client. (See our Git page for some helpful resources.)
- You will need Python 3.7 installed on your system. Be sure to add
python
to your PATH in installation. SWEN lab machines: If you are in the lab, this is pre-installed and available on the command line. - Install PostgreSQL 11. Be sure to also install SWEN lab machines: PostgreSQL is already installed on the SE lab machines.
- Using the PostgreSQL admin console (pgAdmin), create a user called
swen344
with a password of your choosing. This password will be stored in a file, so you don’t need to memorize it - you can mash the keyboard. Just keep that string because we’re about to put it in a file in a moment. Note: be sure to check the box for “User Can Login” on the Privileges tab. SWEN lab machines: this has been done for you. The password issalutecaptionearthyfight
- Still in pgAdmin, create a database also called
swen344
and make the owner of it the userswen344
. - Now let’s create our project structure in our local repository. We’re going to explain every file, but let’s start by turning some of our directories into Python packages by creating a bunch of directories and empty
__init__.py
files. Please use these exact folder names. Your file structure should be this:
db-abc123/ // your username instead of abc123
__init__.py
config/ // don't need an __init.py__ here
src/
__init__.py
tests/
__init__.py
- We need to tell Git to not version control a bunch of files. Create
.gitignore
(note the dot at the beginning) file at the root of the repository with the following contents:
*.pyc
**/db.yml
- Create a file
config/db.yml
that has the following content. This is our configuration file that we’ll use to connect to our PostgreSQL database. Notice how our.gitignore
file tells Git not to commit this to the repository - which is good because passwords in repositories is a bad, bad thing.
Notes:
- You must change the password (replacme) to whatever you set in step 6.
- Depending on your PC, you may also need to change the port
host: localhost
database: swen344
user: swen344
password: replaceme
port: 5432
- Create a file
config/gitlab-credentials.yml
with the following content. This is one of our configuration files for our CI server to run on GitLab. When we run our tests on the server, we’ll copy this over to replacedb.yml
# This file doesn't need to be changed. It's the same for everyone
host: postgres
database: swen344DB
user: swen344CIUser
password: bubbles9874
port: 5432
- Next, we need to tell GitLab how to run our build on a server. For this, we’re using a technology called Docker. We’ll discuss this more later in class, but for now just copy and paste this config. Create a file called
.gitlab-ci.yml
(note the dot at the beginning) at the top of your repository:
image:
name: ritse/swen344db:latest
services:
- postgres:11
variables:
POSTGRES_DB: swen344DB
POSTGRES_USER: swen344CIUser
POSTGRES_PASSWORD: bubbles9874
before_script:
- pip install -r requirements.txt
- cp config/gitlab-credentials.yml config/db.yml
testrunner:
script:
- echo "hello class"
- python --version
- python -m unittest
stage: test
- Now we need to tell Python what packages we need. At the root of the repository make a file called
requirements.txt
with this content. Note: if you ever need to add new Python packages for your project, feel free to add them here and they will get installed upon every run of the CI.
psycopg2==2.8.4
pyyaml==5.1
- Let’s tell Python we need to install those two packages. Run
pip install -r requirements.txt
. Thepsycopg2
package is what we use to connect to PostgreSQL, andpyyaml
is the package we use to parse our YML config files. SWEN lab machines: you’ll need to add--user
to that command since you don’t have root access to those machines. MAC users: you may need to installpsycopg2-binary
depending on how you installed PostgreSQL. - The power of unit tests is that they can tell us what’s wrong if something breaks. One thing that can change is your PostgreSQL setup. So let’s make a test for that. If this test ever fails, we know we have a problem with our environment (as opposed to some new code we wrote). This test will connect to PostgreSQL locally and just check the version string. Make a file called
tests/test_postgresql.py
and put this in it:
import unittest
from src.swen344_db_utils import connect
class TestPostgreSQL(unittest.TestCase):
def test_can_connect(self):
conn = connect()
cur = conn.cursor()
cur.execute('SELECT VERSION()')
self.assertTrue(cur.fetchone()[0].startswith('PostgreSQL'))
conn.close()
if __name__ == '__main__':
unittest.main()
- Now let’s run this test. From the root of the repository, run
python -m unittest
. Everything should pass, right? Nope. My output looked like this:
>python -m unittest
E
======================================================================
ERROR: tests.test_postgresql (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_postgresql
Traceback (most recent call last):
File "C:\Users\andy\AppData\Local\Programs\Python\Python37-32\lib\unittest\loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "C:\Users\andy\AppData\Local\Programs\Python\Python37-32\lib\unittest\loader.py", line 377, in _get_module_from_name
__import__(name)
File "C:\code\meneely-db-survey\tests\test_postgresql.py", line 2, in <module>
from src.swen344_db_utils import connect
ModuleNotFoundError: No module named 'src.swen344_db_utils'
----------------------------------------------------------------------
Ran 1 test in 0.000s
- The key part here is
No module named 'src.swen344_db_utils'
- the test is referring to code that doesn’t exist (yet). Here it is. We’ve put together a small utility file that gets you connected to PostgreSQL from your config file easily. Create a filesrc/swen344_db_utils.py
with the following content:
import psycopg2
import yaml
import os
def connect():
config = {}
yml_path = os.path.join(os.path.dirname(__file__), '../config/db.yml')
with open(yml_path, 'r') as file:
config = yaml.load(file, Loader=yaml.FullLoader)
return psycopg2.connect(dbname=config['database'],
user=config['user'],
password=config['password'],
host=config['host'],
port=config['port'])
def exec_sql_file(path):
full_path = os.path.join(os.path.dirname(__file__), f'../../{path}')
conn = connect()
cur = conn.cursor()
with open(full_path, 'r') as file:
cur.execute(file.read())
conn.commit()
conn.close()
def exec_get_one(sql, args={}):
conn = connect()
cur = conn.cursor()
cur.execute(sql, args)
one = cur.fetchone()
conn.close()
return one
def exec_get_all(sql, args={}):
conn = connect()
cur = conn.cursor()
cur.execute(sql, args)
# https://www.psycopg.org/docs/cursor.html#cursor.fetchall
list_of_tuples = cur.fetchall()
conn.close()
return list_of_tuples
def exec_commit(sql, args={}):
conn = connect()
cur = conn.cursor()
result = cur.execute(sql, args)
conn.commit()
conn.close()
return result
Let’s do a quick review of what this code above does:
connect
will connect you to Postgres via our config. Closing this connection is up to you.exec_sql_file
will open up an SQL file and blindly execute everything in it. Useful for test data and your schema. Having your code in a SQL file also gives syntax highlighting!exec_get_one
will run a query and assume that you only want the top result and return that. It does not commit any changes, so don’t use it for updates.exec_get_all
will run a query and return all results, usually as a list of tuples. It does not commit any changes, so don’t use it for updates.exec_commit
will run SQL and then do a commit operation, so use this for updating operations on the DB.
- Now run your tests again. Your output will (hopefully) look something like this:
> python -m unittest
.
----------------------------------------------------------------------
Ran 1 test in 0.037s
OK
- To clarify, your file structure will be this structure (your username instead of abc123)
db-abc123/
__init__.py
config/
gitlab-credentials.yml
db.yml // git should NOT be committing this file
src/
__init__.py
swen344_db_utils.py
tests/
__init__.py
test_postgresql.py
requirements.txt
.gitlab-ci.yml
.gitignore
- At this stage, now is a great time to
git add
,git commit
, andgit push
our work to GitLab. Take a few moments to explore GitLab’s functionality and find the console output to your code. Push to your master branch and check that this builds on your CI. Note Just because your build is green doesn’t mean you’re done. Always check your console output to make sure the run went the way you expected! The beginning of the CI console output will look something like this:
Running with gitlab-runner 12.6.0 (ac8e767a)
on gitlab-ci01 3KSy9uxJ
Using Docker executor with image ritse/swen344db:latest ...
Starting service postgres:11 ...
Pulling docker image postgres:11 ...
Using docker image sha256:d3d96b1e5d48f46ed9efac74ae5e25d440dab44aeef2a1489f7bab1366627d2b for postgres:11 ...
And then at the end it will look like this:
pip install -r requirements.txt
Requirement already satisfied: psycopg2==2.8.4 in /usr/local/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (2.8.4)
Requirement already satisfied: pyyaml==5.1 in /usr/local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (5.1)
$ cp config/gitlab-credentials.yml config/db.yml
$ echo "hello class"
hello class
$ python --version
Python 3.7.5
$ python -m unittest
.
----------------------------------------------------------------------
Ran 1 test in 0.015s
OK
Job succeeded
- Recognize some of those commands? This build essentially automated your development cycle but in an isolated, minimal environment. Also, does this feel like it’s taking too long? Often, various technologies will have generators or project templates to start from. We’re taking a slow stroll through this structure to explain everything.
- Now that our skeleton is set up, we need to starting building our actual project. Let’s start with how we will be building our database schema. We need methods in our main code to do that. Create a file called
src/chat.py
- or whatever name makes sense to you. Next, create a unit test for that file intests
calledtest_chat.py
(or whatever word you used). - Add this unit test to your new test file - adapting to your project, of course:
import unittest
from src.chat import *
from src.swen344_db_utils import connect
class TestChat(unittest.TestCase):
def test_build_tables(self):
"""Build the tables"""
conn = connect()
cur = conn.cursor()
buildTables()
cur.execute('SELECT * FROM example_table')
self.assertEqual([], cur.fetchall(), "no rows in example_table")
conn.close()
- Run your tests, you’ll get an error about not knowing what
buildTables
is. So let’s add that to our main code (in this casechat.py
)
from src.swen344_db_utils import connect
def buildTables():
conn = connect()
conn.close()
- Why not add buildTables to the utils? Utility classes can often become the junk drawer of software designs. Don’t succumb to the temptation to not organize things and put everything in to a
util
. Resist that laziness. These utils are general-purpose for SWEN 344, not your project. So this code belongs in your main code. - Run your tests now, and you’ll get an error about
psycopg2.errors.UndefinedTable: relation "example_table" does not exist
. Because we haven’t made it yet! - Now let’s get this to pass. Add this code between
connect()
line and theclose()
line.
cur = conn.cursor()
sql = """
CREATE TABLE example_table(
example_col VARCHAR(40)
)
"""
cur.execute(sql)
conn.commit() # don't forget this!
- If you run your tests again, we now pass. We successfully created a table! BUT! Run your tests again. You’ll get a
DuplicateTable: relation "example_table" already exists
. Oops. We just violated a fundamental rule of unit tests: idempotent. If something is idempotent then we can run it multiple times in a row and get the same effect. Unit tests should always be idempotent and not be impacted by side effects. So let’s fix this. Rename everything torebuildTables
instead ofbuildTables
. Rename the test case, fix the comments, change the method name, etc. - Next, let’s have our code drop the table first. Now, this assumes that the table is already there! So that will break our first run of the unit tests - which means it will work on your machine and not on a new colleague’s machine - the ol’ works-for-me-but-not-for-you problem. To fix that, use
DROP TABLE IF EXISTS
. Something like this:
def rebuildTables():
conn = connect()
cur = conn.cursor()
drop_sql = """
DROP TABLE IF EXISTS example_table
"""
create_sql = """
CREATE TABLE example_table(
example_col VARCHAR(40)
)
"""
cur.execute(drop_sql)
cur.execute(create_sql)
conn.commit()
conn.close()
- Tests pass now! Note that there are many ways to refactor this above code. We could use psycopg2’s
executemany
method, or add everything to a loop and execute them. We could make a utility method for executing SQL and returning the cursor. We could add a utility method for checking if a table exists. There are many ways we can improve upon this. In this class, we’ll constantly be talking about how to improve our code for readability, maintainability, extensibility, error-proneness, performance, security, and many other concerns we’ll have. - Remember our idempotency problem, though? What if we make that mistake later? Surely we’ll add new tables to this proejct and we’ll probably also forget to add the DROP or something at some point. We’re human (probably). Let’s make sure that mistake never happens again. Now that we’ve learned about it, let’s immortalize our fear of making that mistake again by making another unit test. This is the true power of unit testing: whenver you realize that something could go wrong, you can usually automate that check in an isolated way and know that the system lacks those problems. Here’s my idempotency test:
def test_rebuild_tables_is_idempotent(self):
"""Drop and rebuild the tables twice"""
rebuildTables()
rebuildTables()
conn = connect()
cur = conn.cursor()
cur.execute('SELECT * FROM example_table')
self.assertEqual([], cur.fetchall(), "no rows in example_table")
conn.close()
- To reiterate, you are welcome to improve this code. Just because the instructor gave it to you doesn’t mean it’s perfect. Mess with the config files and see what happens. Add your own utilities or improve ours. This is all here to just get you started. In this setup we worked to make this follow modern engineering conventions as best we could. The only two things that we ask that you keep are Python and PostgreSQL.
db-abc123/ // your username instead of abc123
__init__.py
config/
gitlab-credentials.yml
db.yml
src/
__init__.py
chat.py // use "chat.py" or other fitting name
swen344_db_utils.py
tests/
__init__.py
test_postgresql.py
test_chat.py // use "test_chat.py" or other fitting name
requirements.txt
.gitlab-ci.yml
.gitignore
- Ok, this gives you plenty to start with. Push your code to GitLab and make sure the build still runs (assuming your tests pass locally).
- You’ll need to be researching how to use psycopg2 (we won’t be using most of its fancy features). Most of the hard work will be in writing SQL. The Python is there to make things realistic. Take a few moments to read the psycopg2 documentation, starting with the
connection
class and thecursor
class. - Finally, we need to tell the instructors what to grade. Create a tag on the commit that marks your completion of this setup. You can do this on the command line, but it’s probably easiest to do this in GitLab itself by going to Repository > Tags. Please call this tag
db0
.