DB Project Setup

Log into http://kgcoe-git.rit.edu using your RIT username and password. You should find yourself already assigned to a group. If you are not in a group, get in touch with your instructor. We are not using gitlab.com.
In your group, we have created a project for you called db-abc123 where abc123 is your username. Note that you have access to other students’ repositories. You are going to be pushing code to your own repository and making comments on other students’ repositories. (See our FAQ on working with other students’ code)
Clone your project repository locally using your favorite Git client. (See our Git page for some helpful resources.)
You will need Python 3.7 installed on your system. Be sure to add python to your PATH in installation. SWEN lab machines: If you are in the lab, this is pre-installed and available on the command line.
Install PostgreSQL 11. Be sure to also install SWEN lab machines: PostgreSQL is already installed on the SE lab machines.
Using the PostgreSQL admin console (pgAdmin), create a user called swen344 with a password of your choosing. This password will be stored in a file, so you don’t need to memorize it - you can mash the keyboard. Just keep that string because we’re about to put it in a file in a moment. Note: be sure to check the box for “User Can Login” on the Privileges tab. SWEN lab machines: this has been done for you. The password is salutecaptionearthyfight
Still in pgAdmin, create a database also called swen344 and make the owner of it the user swen344.
Now let’s create our project structure in our local repository. We’re going to explain every file, but let’s start by turning some of our directories into Python packages by creating a bunch of directories and empty __init__.py files. Please use these exact folder names. Your file structure should be this:

db-abc123/                         // your username instead of abc123
      __init__.py
      config/                     // don't need an __init.py__ here
      src/
              __init__.py
      tests/
              __init__.py

We need to tell Git to not version control a bunch of files. Create .gitignore (note the dot at the beginning) file at the root of the repository with the following contents:

*.pyc
**/db.yml

Create a file config/db.yml that has the following content. This is our configuration file that we’ll use to connect to our PostgreSQL database. Notice how our .gitignore file tells Git not to commit this to the repository - which is good because passwords in repositories is a bad, bad thing.

Notes:

You must change the password (replacme) to whatever you set in step 6.
Depending on your PC, you may also need to change the port

host: localhost
database: swen344
user: swen344
password: replaceme
port: 5432

Create a file config/gitlab-credentials.yml with the following content. This is one of our configuration files for our CI server to run on GitLab. When we run our tests on the server, we’ll copy this over to replace db.yml

# This file doesn't need to be changed. It's the same for everyone
host: postgres
database: swen344DB
user: swen344CIUser
password: bubbles9874
port: 5432

Next, we need to tell GitLab how to run our build on a server. For this, we’re using a technology called Docker. We’ll discuss this more later in class, but for now just copy and paste this config. Create a file called .gitlab-ci.yml (note the dot at the beginning) at the top of your repository:

image:
  name: ritse/swen344db:latest

services:
  - postgres:11

variables:
  POSTGRES_DB: swen344DB
  POSTGRES_USER: swen344CIUser
  POSTGRES_PASSWORD: bubbles9874

before_script:
  - pip install -r requirements.txt
  - cp config/gitlab-credentials.yml config/db.yml

testrunner:
  script:
    - echo "hello class"
    - python --version
    - python -m unittest
  stage: test

Now we need to tell Python what packages we need. At the root of the repository make a file called requirements.txt with this content. Note: if you ever need to add new Python packages for your project, feel free to add them here and they will get installed upon every run of the CI.

psycopg2==2.8.4
pyyaml==5.1

Let’s tell Python we need to install those two packages. Run pip install -r requirements.txt. The psycopg2 package is what we use to connect to PostgreSQL, and pyyaml is the package we use to parse our YML config files. SWEN lab machines: you’ll need to add --user to that command since you don’t have root access to those machines. MAC users: you may need to install psycopg2-binary depending on how you installed PostgreSQL.
The power of unit tests is that they can tell us what’s wrong if something breaks. One thing that can change is your PostgreSQL setup. So let’s make a test for that. If this test ever fails, we know we have a problem with our environment (as opposed to some new code we wrote). This test will connect to PostgreSQL locally and just check the version string. Make a file called tests/test_postgresql.py and put this in it:

import unittest
from src.swen344_db_utils import connect

class TestPostgreSQL(unittest.TestCase):

    def test_can_connect(self):
        conn = connect()
        cur = conn.cursor()
        cur.execute('SELECT VERSION()')
        self.assertTrue(cur.fetchone()[0].startswith('PostgreSQL'))
        conn.close()

if __name__ == '__main__':
    unittest.main()

Now let’s run this test. From the root of the repository, run python -m unittest. Everything should pass, right? Nope. My output looked like this:

>python -m unittest
E
======================================================================
ERROR: tests.test_postgresql (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_postgresql
Traceback (most recent call last):
  File "C:\Users\andy\AppData\Local\Programs\Python\Python37-32\lib\unittest\loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "C:\Users\andy\AppData\Local\Programs\Python\Python37-32\lib\unittest\loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "C:\code\meneely-db-survey\tests\test_postgresql.py", line 2, in <module>
    from src.swen344_db_utils import connect
ModuleNotFoundError: No module named 'src.swen344_db_utils'


----------------------------------------------------------------------
Ran 1 test in 0.000s

The key part here is No module named 'src.swen344_db_utils' - the test is referring to code that doesn’t exist (yet). Here it is. We’ve put together a small utility file that gets you connected to PostgreSQL from your config file easily. Create a file src/swen344_db_utils.py with the following content:

import psycopg2
import yaml
import os

def connect():
    config = {}
    yml_path = os.path.join(os.path.dirname(__file__), '../config/db.yml')
    with open(yml_path, 'r') as file:
        config = yaml.load(file, Loader=yaml.FullLoader)
    return psycopg2.connect(dbname=config['database'],
                            user=config['user'],
                            password=config['password'],
                            host=config['host'],
                            port=config['port'])

def exec_sql_file(path):
    full_path = os.path.join(os.path.dirname(__file__), f'../../{path}')
    conn = connect()
    cur = conn.cursor()
    with open(full_path, 'r') as file:
        cur.execute(file.read())
    conn.commit()
    conn.close()

def exec_get_one(sql, args={}):
    conn = connect()
    cur = conn.cursor()
    cur.execute(sql, args)
    one = cur.fetchone()
    conn.close()
    return one

def exec_get_all(sql, args={}):
    conn = connect()
    cur = conn.cursor()
    cur.execute(sql, args)
    # https://www.psycopg.org/docs/cursor.html#cursor.fetchall
    list_of_tuples = cur.fetchall()
    conn.close()
    return list_of_tuples

def exec_commit(sql, args={}):
    conn = connect()
    cur = conn.cursor()
    result = cur.execute(sql, args)
    conn.commit()
    conn.close()
    return result

Let’s do a quick review of what this code above does:

connect will connect you to Postgres via our config. Closing this connection is up to you.
exec_sql_file will open up an SQL file and blindly execute everything in it. Useful for test data and your schema. Having your code in a SQL file also gives syntax highlighting!
exec_get_one will run a query and assume that you only want the top result and return that. It does not commit any changes, so don’t use it for updates.
exec_get_all will run a query and return all results, usually as a list of tuples. It does not commit any changes, so don’t use it for updates.
exec_commit will run SQL and then do a commit operation, so use this for updating operations on the DB.

Now run your tests again. Your output will (hopefully) look something like this:

> python -m unittest
.
----------------------------------------------------------------------
Ran 1 test in 0.037s

OK

To clarify, your file structure will be this structure (your username instead of abc123)

db-abc123/
      __init__.py
      config/
              gitlab-credentials.yml
              db.yml                              // git should NOT be committing this file
      src/
              __init__.py
              swen344_db_utils.py
      tests/
              __init__.py
              test_postgresql.py
      requirements.txt
      .gitlab-ci.yml
      .gitignore

At this stage, now is a great time to git add, git commit, and git push our work to GitLab. Take a few moments to explore GitLab’s functionality and find the console output to your code. Push to your master branch and check that this builds on your CI. Note Just because your build is green doesn’t mean you’re done. Always check your console output to make sure the run went the way you expected! The beginning of the CI console output will look something like this:

 Running with gitlab-runner 12.6.0 (ac8e767a)
   on gitlab-ci01 3KSy9uxJ

Using Docker executor with image ritse/swen344db:latest ...
 Starting service postgres:11 ...
 Pulling docker image postgres:11 ...
 Using docker image sha256:d3d96b1e5d48f46ed9efac74ae5e25d440dab44aeef2a1489f7bab1366627d2b for postgres:11 ...

And then at the end it will look like this:

pip install -r requirements.txt
Requirement already satisfied: psycopg2==2.8.4 in /usr/local/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (2.8.4)
Requirement already satisfied: pyyaml==5.1 in /usr/local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (5.1)
$ cp config/gitlab-credentials.yml config/db.yml
$ echo "hello class"
hello class
$ python --version
Python 3.7.5
$ python -m unittest
.
----------------------------------------------------------------------
Ran 1 test in 0.015s
OK
Job succeeded

Recognize some of those commands? This build essentially automated your development cycle but in an isolated, minimal environment. Also, does this feel like it’s taking too long? Often, various technologies will have generators or project templates to start from. We’re taking a slow stroll through this structure to explain everything.
Now that our skeleton is set up, we need to starting building our actual project. Let’s start with how we will be building our database schema. We need methods in our main code to do that. Create a file called src/chat.py - or whatever name makes sense to you. Next, create a unit test for that file in tests called test_chat.py (or whatever word you used).
Add this unit test to your new test file - adapting to your project, of course:

import unittest
from src.chat import *
from src.swen344_db_utils import connect

class TestChat(unittest.TestCase):

    def test_build_tables(self):
        """Build the tables"""
        conn = connect()
        cur = conn.cursor()
        buildTables()
        cur.execute('SELECT * FROM example_table')
        self.assertEqual([], cur.fetchall(), "no rows in example_table")
        conn.close()

Run your tests, you’ll get an error about not knowing what buildTables is. So let’s add that to our main code (in this case chat.py)

from src.swen344_db_utils import connect

def buildTables():
    conn = connect()
    conn.close()

Why not add buildTables to the utils? Utility classes can often become the junk drawer of software designs. Don’t succumb to the temptation to not organize things and put everything in to a util. Resist that laziness. These utils are general-purpose for SWEN 344, not your project. So this code belongs in your main code.
Run your tests now, and you’ll get an error about psycopg2.errors.UndefinedTable: relation "example_table" does not exist. Because we haven’t made it yet!
Now let’s get this to pass. Add this code between connect() line and the close() line.

cur = conn.cursor()
  sql = """
      CREATE TABLE example_table(
          example_col VARCHAR(40)
      )
  """
  cur.execute(sql)
  conn.commit() # don't forget this!

If you run your tests again, we now pass. We successfully created a table! BUT! Run your tests again. You’ll get a DuplicateTable: relation "example_table" already exists. Oops. We just violated a fundamental rule of unit tests: idempotent. If something is idempotent then we can run it multiple times in a row and get the same effect. Unit tests should always be idempotent and not be impacted by side effects. So let’s fix this. Rename everything to rebuildTables instead of buildTables. Rename the test case, fix the comments, change the method name, etc.
Next, let’s have our code drop the table first. Now, this assumes that the table is already there! So that will break our first run of the unit tests - which means it will work on your machine and not on a new colleague’s machine - the ol’ works-for-me-but-not-for-you problem. To fix that, use DROP TABLE IF EXISTS. Something like this:

def rebuildTables():
    conn = connect()
    cur = conn.cursor()
    drop_sql = """
        DROP TABLE IF EXISTS example_table
    """
    create_sql = """
        CREATE TABLE example_table(
            example_col VARCHAR(40)
        )
    """
    cur.execute(drop_sql)
    cur.execute(create_sql)
    conn.commit()
    conn.close()

Tests pass now! Note that there are many ways to refactor this above code. We could use psycopg2’s executemany method, or add everything to a loop and execute them. We could make a utility method for executing SQL and returning the cursor. We could add a utility method for checking if a table exists. There are many ways we can improve upon this. In this class, we’ll constantly be talking about how to improve our code for readability, maintainability, extensibility, error-proneness, performance, security, and many other concerns we’ll have.
Remember our idempotency problem, though? What if we make that mistake later? Surely we’ll add new tables to this proejct and we’ll probably also forget to add the DROP or something at some point. We’re human (probably). Let’s make sure that mistake never happens again. Now that we’ve learned about it, let’s immortalize our fear of making that mistake again by making another unit test. This is the true power of unit testing: whenver you realize that something could go wrong, you can usually automate that check in an isolated way and know that the system lacks those problems. Here’s my idempotency test:

def test_rebuild_tables_is_idempotent(self):
      """Drop and rebuild the tables twice"""
      rebuildTables()
      rebuildTables()
      conn = connect()
      cur = conn.cursor()
      cur.execute('SELECT * FROM example_table')
      self.assertEqual([], cur.fetchall(), "no rows in example_table")
      conn.close()

To reiterate, you are welcome to improve this code. Just because the instructor gave it to you doesn’t mean it’s perfect. Mess with the config files and see what happens. Add your own utilities or improve ours. This is all here to just get you started. In this setup we worked to make this follow modern engineering conventions as best we could. The only two things that we ask that you keep are Python and PostgreSQL.

db-abc123/                         // your username instead of abc123
      __init__.py
      config/
              gitlab-credentials.yml
              db.yml
      src/
              __init__.py
              chat.py          // use "chat.py" or other fitting name
              swen344_db_utils.py
      tests/
              __init__.py
              test_postgresql.py
              test_chat.py    // use "test_chat.py" or other fitting name
      requirements.txt
      .gitlab-ci.yml
      .gitignore

Ok, this gives you plenty to start with. Push your code to GitLab and make sure the build still runs (assuming your tests pass locally).
You’ll need to be researching how to use psycopg2 (we won’t be using most of its fancy features). Most of the hard work will be in writing SQL. The Python is there to make things realistic. Take a few moments to read the psycopg2 documentation, starting with the connection class and the cursor class.
Finally, we need to tell the instructors what to grade. Create a tag on the commit that marks your completion of this setup. You can do this on the command line, but it’s probably easiest to do this in GitLab itself by going to Repository > Tags. Please call this tag db0.