RESTful API Project - Bechdel


The Project

The overview section for REST can be found here. The setup instructions for REST are here. Read on for details on the REST Bechdel project.

Bechdel Test

The Bechdel Test is a popular way to quantify the representation of women in movies. In short, a movie passes the Bechdel test if (1) it has two named women in it, (2) who talk to each other (3) about something besides a man. Which movies pass and fail the Bechdel test may be surprising. Let’s make a simple RESTful API for this data. We recommend the FiveThirtyEight article on this topic.

We downloaded the data from the website on Feb 26, 2020 in JSON format. Here is a starter schema:

DROP TABLE IF EXISTS movies;

CREATE TABLE movies(
  id      INTEGER PRIMARY KEY,
  imdbid  INTEGER NOT NULL,
  rating  INTEGER NOT NULL,
  title   TEXT NOT NULL,
  year    INTEGER NOT NULL
);

Additionally, if you want some useful Python code to import the JSON into the Postgres table, this might be helpful. Note how we are using execute_values for a massive batch insert instead of individual insertions:

import json
from psycopg2.extras import execute_values
from src.db.swen344_db_utils import *

exec_sql_file('data/schema.sql')

with open("data/bechdel_test_movies.json", "r") as json_file:
    data = json.load(json_file)
list_of_tuples = []
for movie in data:
    list_of_tuples.append((
        movie['id'],
        movie['imdbid'],
        movie['rating'],
        movie['title'],
        movie['year']))
conn = connect()
cur = conn.cursor()
sql = "INSERT INTO movies(id, imdbid, rating, title, year) VALUES %s"
execute_values(cur, sql, list_of_tuples)
conn.commit()
conn.close()

REST1: Basic Resources and Resource Methods

We will still be following the merge request and tagging pattern. Please tag your code as rest1 on the master branch when it is merged.

In this initial iteration, we’re looking to get the basics working. In particular, we’ll be looking at getting a few types of resource methods working, to perform the following functions

  • List All Keys,
  • List All Movies, and
  • Show Movie Details.
    These are not the actual APIs, they are the descriptive names of the functionality provided by the APIs.
    Remember that RESTful APIs operate on resources. In the case of Bechdel data, the resource is the movie data.
    The APIs, will be various operations on the movie data.
    For example, List All Keys could look like /movies/keys and would return all the the primary keys and imdb ids from the movies DB.
    List All Movies could look like /movies and would return all data rows from the movies DB.
    Finally, Show Movie Details would return details on one specific row. The API could either be /movies/keys/{the_key}, where the_key would be the primary key of the desired row, or could be /movies/id/{imdb_id}, where imdb_id is the imdb_id for the row you want.
  • Note that the id could be provided in the URL or as a query string

To implement the functionality, use the above information to name your resources, and then create endpoints and resource methods for those resources. All responses must be in JSON format and follow the RESTful guidelines.

Grading REST1 (40 points)

  • (15pts) APIs correctly implemented
  • (15pts) Test cases implemented & pass
  • (05pts) RESTful API standards followed
  • (05pts) Good code maintainability and quality

REST1 Bechdel

  • For the List All Keys API, return an array of IMDB keys and our own primary keys
  • For List All Movies API, show everything that was in table
  • For Show Movie Details API, given a primary key (or imdb_id), show all the details from that row.
    • The parameter can be part of the URL or in a query string

Note that the IMDB database often prepends their keys with some letters. But, the data that we downloaded seems to NOT have any letter prefix. You will often find data to be inconsisten! Your RESTful API should find a way to handle this ambiguity, expecting that sometimes an IMDB ID will come in as just an integer, and sometimes with a letter prefix. This also gets tricky when you consider the ambiguity between your primary key IDs and your primary key integer ID. You are allowed to modify the schema to your needs, if you see fit.

Test Cases

  • Implement tests that cover at least the following scenarios and match the stated results:
  • When listing all keys, you should get an array with 8363 different entries in it and it should have the IMDB id tt4648786 or 4648786 in it. There should be no other data in this query.
  • If you run List All Keys and List All Movies, I should get the same count
  • If I run the List All Movies, the data set should include tt4648786 (or 4648786) with the movie Harriet in the year 2019 and a rating of 3.
  • I can retrieve Harriet using it’s primary key or imdb_id with Show Movie Details

REST2: CRUD and Authentication

Create a branch rest2-dev. Please tag your code as rest2 on the master branch when it is merged.

This iteration, you will need to provide only two basic CRUD operations to your data: create and delete. Add those actions to your RESTful API using those conventions.

However, we can’t let just anyone come into the system and make changes, so we need to also provide authentication. Let’s assume that there is a separate way to set and reset accounts. You’ll implement just the core functionality: logging in and logging out - but will thorough testing of ‘happy path’ and ‘problem’ scenarios. From the client side, this will look like this:

  1. Send a POST request to your login endpoint with a username and password.
  • The username and password should be parameters in the BODY of the POST request
  1. System returns a message that your login was successful and gives you a session key, which is a large random number that nobody should be able to guess.
  2. When attempting to do a CRUD operation, the operation will only work if your request also has the session key. The session key should be sent in the HEADER of each subsequent API call
  3. If you send a request to the logout endpoint, then subsequent CRUD operations won’t work. Note that you must be logged in to trigger a logout, otherwise anyone can log you out!

You will create a users table with a username, password, and session_key field in your database. Your test data should have a user already registered. However, we must store our passwords securely. This means using a one-way encryption function called a hash digest . You can read more about hash digests and salts over on the Common Weakness Enumeration CWE-759. In pseudocode, the way you would compute the hash digest of a password would be: hashed_password = hash(password)

For this project, we’d like you to use the SHA-512 hash digest algorithm. Find a secure way to generate a session key within your technology choice as well, not just the default random number generator for your language.

From the server side, authentication would look like this:

  1. Client sends a request with their username and password to the login endpoint
  2. Server computes the proper hash digest of the password
  3. Server checks if that digest exists in the database table with that username
  4. If so, generate a secure session key. (Otherwise, return a message that login was not successful)
  5. Save the session key to the user’s record in the database
  6. Return a success message with the session key

For Python, these resources will be helpful:

Grading REST2 (70 points)

  • (20pts) CRUD operations work
  • (10pts) Authentication feature works
  • (30pts) Test cases implemented and pass
  • (05pts) RESTful API standards followed
  • (05pts) Good code maintainability and quality

Key Decisions

  • What are the conventions for CRUD operations based on your existing resource?
  • Does it make sense to refactor the meaning of your resource to better fit RESTful conventions?
  • What will be the structure of the APIs? How will you pass the arguments?

Test Cases

In each test case, print out (clearly), the test being run and the result of the test. If there are multiple steps in a test case, list the steps/ results as they occur. Make sure you handle the results without crashing!

  • Login test cases.
    • Create a new user accout. Validate it is created
    • Try to login with a non-existent account
    • Try to login with a valid user but invalid password
    • User blorg already has an account with the password saltfatacidheat. The users table initially has a row with username blorg, and the hashed (sha512) result of the password! An HTTP request is sent in to the login endpoint with blorg and saltfatacidheat as the credentials. A large session key is sent back.
    • Nobody is logged into the system. An HTTP request makes an attempt to delete Resident Evil: Extinction with IMDB key 432021, and the system does not do the action because nobody is logged in.
      • Note: To safely delete a record, you should get the record for Resident Evil: Extinction, then use the primary from that query to try and delete the record.
    • The above test case happens, but log in first and then the delete occurs correctly when the given session key is provided.
  • Logout test case. The login test case is first run, then a request to the logout endpoint occurs, providing the given session key. An HTTP delete is attempted and fails.
  • Unauthenticated logout. Log in with the test case above. Attempt to log out without providing a session key. All users remain logged in.
  • Try to logout a user who is not logged in
  • A new movie is added, 2 Fast 2 Covid that fully passes the Bechdel test.
  • An update to a rating is needed when the movid ‘Speedy’ is found to have more dialog than assumed, because the reviewer had fast-forwarded while watching (the new rating is 1).