Rideshare System


Clarification (Meneely sections only)
This is for Meneely's sections only.

Project: Rideshare System

You are a tech company that provides a mobile app for booking taxi rides (e.g. Uber, Lyft). Both drivers and riders use this app and this is the database that stores the records of those trips. The app also has a reviewing system where both drivers and riders review each other.

DB Design & Style Expectations

Be sure to check out our expectations page. Be sure to consult these when you are trying to think of helpful feedback.

DB0: Setup

DB0 is the same for everyone, regardless of project.

For setting up in the lab or on your own machine, follow our DB Project Setup Instructions.

Grading DB0

  • 10pts. All steps implemented
  • 10pts. CI fully passes

Grading notes:

  • Make sure you tagged the commit as db0 so we can find it easily.
  • We grade GitLab only, unless an instructor or TA has given explicit written instructions that you can bypass the CI (which is rare)

DB1: Initial Schema, Test Data Set

In this iteration, please name your topic branch db1-dev. Once you have merged this into the master branch, tag the version that you consider to be your submission with db1.

The main purposes of this twofold:

  • Get your initial schema going
  • Set up your test data for unit testing

Build a table in your database and populate it with some test data. We will be building this database schema incrementally, so let’s just start with two or three tables.

This iteration you will build some initial tables that relate to each other. Don’t worry about storing all the columns you can think of - just get the general concept of the table and its foreign/primary keys.

Generally speaking, you will not call SQL directly from your unittest code. Instead, you will have APIs that will do the heavy lifting for SQL. Your APIs will be methods in src/ that will be called by your unittests. The APIs will make the calls to SQL (e.g. using CREATE and SELECT etc.). Keep your test data loading separate from the APIs in src/ to keep test code and production code distinct. See your project domain for specifics.

We will give you test cases and you will need to adapt them into Python unit tests. You also must add 3 additional tests (i.e. test_* methods). You are welcome (encouraged!) to add to your test dataset too.

Lesson: Learn the value of deleting out of date code. Delete any trace of example_table from your code. We’re done with the example setup, so adapt your code accordingly. That code will live on in your repository history anyway. Don’t comment it out like a packrat. Delete. The sooner you get used to the idea of revising code instead of continually adding to it, the better your software will be on so many levels.

Grading DB1

By lab day:

  • 5 pts. Set up merge request by lab day
  • 10pts. Enough functionality finished such that it can be thoroughly reviewed

By submission day:

  • 10pts. Directions followed. e.g. Example removed, git branch, merge, tag, etc.
  • 10pts. Provided useful feedback to others (merge request feedback)
  • 5pts. Responded to feedback on your own project
  • 15pts. Test cases implemented
  • 10pts. CI Succesfully runs
  • 15pts. Overall spirit of the feature implemented

Feature Overview

The main stakeholders in this system are riders and drivers. Riders have a name, any special instructions, and a current average rating. Drivers also have those things, as well as a car make and model and the system knows how long they have been a driver. Also, drivers will likely need to provide their license number, but riders will not. Riders will need to provide their credit card number, but drivers will not! Use this thought process to design your tables and the relationships.

The core of your system is the concept of a rider and driver taking a ride. Make these tables with test data. Don’t worry about too many other columns than what you need for these tests. At this point, you can load your test data with any mechanism you choose. We’ll get to loading from files later (although you can do that here as well, if you choose)

Key decisions

  • How many API methods do you need? Flexibility and simplicity always trade off!
  • One table or multiple tables for drivers and riders? What if a driver also wants to take a ride? Should they have separate accounts?
  • How should we handle return trips? Trips with multiple stops?
  • How will we handle dates and times? Your API should NOT assume you have a batch process periodically updating records.
  • How will you do repeatable tests when dealing with dates?

Test Data Seeds

You will need to maintain a set of test data to seed your database before every test. This can take multiple forms, such as:

  • An SQL file with many INSERT statements
  • A hardcoded Python file calling SQL commands

How you do this is up to you. You are welcome to add more data than what we ask for any tests you write.

For DB1, you need the following data:

  • Drivers “Tom Magliozzi” and “Ray Magliozzi”. Their average ratings are 3.2 and 3.4 respectively. Both of their special instructions are “Don’t drive like my brother.”
  • A rider named “Mike Easter” whose average rating is 4.3. No special instructions.
  • Tom gave a ride to Mike
  • Ray gave a ride to Mike
  • Tom gave a ride to Ray

Test Case Sketches

  • The database is seeded with a test data set without crashing
  • When we list the rides that Tom gave, they include the rides to Mike and Ray
  • When we list the rides that Mike took, they include the rides from Tom and Ray
  • When we list the rides that Mike gave, we get no results
  • When Tom checks his rating, it returns the correct value

Note: make sure your tests are actually making calls to your API, not just directly to the database. Data will be seeded directly, but your tests will be testing the API. For example, there should be an API method for “When we list the rides…” and we call that.

Not Necessary (yet)

For this iteration, you do NOT need to implement:

  • An API call for creating new accounts
  • Handling data related to location (e.g. GPS, region)
  • An API call for creating a new ride

You may need to do these in the future, but for now just test against your pre-seeded test data.

DB2: CRUD & Design iteration

Please call your topic branch db2-dev and your final tag db2.

Our goal here is to add basic CRUD operations to our persistence API. In this iteration, you will be:

  • Adding some more tables (if needed)
  • Adding columns/ changing columns in existing tables (if needed)
  • Implementing some baseline methods for doing CRUD operations on your data
  • Add some useful search/ query methods

Remember, CRUD operations usually refer to a specific row. So need to ensure you identify the specific row uniquely.

Keep your old unit tests and update them accordingly.

We do not need CRUD for everything. In “real life”, you can usually come up with a reason to set up CRUD for every entity in your schema. That would be too repetitive for us. Focus on what we are asking for.

We also need to begin providing reference documentation for how to use these DB methods. We would like you to use the Python Docstring methodology for documenting your API methods. For each method in your API, you must include:

  • A useful, single sentence about what the method does.
  • Name of each argument, what it means, and any default value
  • Access control assumptions you are making
  • What is returned? (e.g. python dict? psycopg2 result set?)

Grading DB2

By lab day:

  • Set up merge request by lab day (5 pts)
  • Enough functionality finished such that it can be thoroughly reviewed (5 pts)

By submission day:

  • Directions followed. e.g. Example removed, git branch/ tag, etc. (5 pts)
  • Provided useful feedback to others (merge request feedback) (5 pts)
  • Responded to feedback on your own project (merge request) (5 pts)
  • Requirements implemented (20 pts)
  • CI Succesfully runs (5 pts)
  • Documentation complete (10 pts)
  • Test cases implemented and pass (20 pts)

DB2 Ride Sharing requirements

In this iteration, you will add:

  • Create new accounts for a riders and drivers

  • Riders and drivers should be able to modify their account information

  • A rider and driver should be able to “remove” their account

    • Does this mean disabling or deleting? Consider this choice and implement accordingly.
  • Record a new ride.

    • A ride will always have a driver and rider, a starting point, and a destination
    • Add information to rides: destination (in GPS coordinates, i.e. two numbers, latitude and longitude), special ride instructions, time information
    • Add the concept of reviews for a ride. Driver gets to review a rider on a particular ride, and vice versa.
  • Availability.

    • A rider should be able to mark themselves as “available”. When this happens, assume that the client (e.g. phone) will automatically supply a zip code based on the rider’s location
    • A driver should be able to see all available riders in a given zip code, along with their GPS coordinates and average rating. (Presumably, a separate mapping service would translate those GPS coordinates into distances - we won’t do this)
  • When a ride is arranged, but before pickup, both the rider and driver should be able to update their own location and see the location of the other.

  • Load test data from the provided csv file and add to the database.

    • This contains a set of information on riders and drivers
    • NOTE: You will need to read the data and figure out a way to make it fit according to your DB schema (or do you need to modify your schema?). There can be inconsistencies in data format, some stray characters etc. Think through how to import this data using code, in a consistent, logical way.

Key Questions

  • Should average rating be computed every time we make an API call? Or store it somewhere?
  • Deactivate or delete accounts?
  • What information is necessary for representing a “ride”?
  • What is the flow of states for a ride? For example, what if a ride was booked but never happened?
  • The availability feature will get a lot more traffic than profile editing - do we need to account for this?

A note about timestamps and APIs. In this project, we ask that your APIs allow for setting timestamps, which makes our testing setup easier. In practice, a more common approach is to have the database be the source of generating timestamps. For example, if we send a new message, then the database would set the current timestamp using the CURRENT_DATE (see Postgresql docs) or something similar. To make your tests repeatable, web engineers have library calls that can make the system “pretend” the current time is what we expect in our test data. An example of this is the travel_to method in Ruby on Rails, with a good explanation here. For us, rather than bringing in more libraries, you can just provide the timestamp in your API and assume that the front-end developer will get the proper times.

Test Case Sketches

  • Tom and Ray Magliozzi updated their profiles (perform an update, and then verify the data is updated)
  • Hoke Colburn and Ms. Daisy signed up for accounts
  • Ms. Daisy marks herself as available in zip code 30301 on December 13, 11:55am
  • Hoke Colburn gets a listing of available fares and can see Ms. Daisy in his availability in zip code 30301
  • Ms. Daisy is able to see all drivers in zip code 30301.
  • Hoke Colburn drove Ms. Daisy to a location on December 13, 1989 at 12:00pm.
  • Both Ms. Daisy and Hoke rated each other for the ride
  • Tom Magliozzi drove Hoke Colburn to a location on December 14, 1989 at 4:00pm
  • Ms. Daisy, after all of this, is able to remove her account.

Not Necessary

PostgreSQL has some functionality for geo-spatial information systems (GIS), called PostGIS You don’t need to use this extension. But, it is worth your time to look into the features they offer.

DB3: Expanding Your Schema

Please call your topic branch db3-dev and your final tag db3. All lowercase, hyphenated.

We have the core of our system down and our development infrastructure set up. Now it’s time to accelerate on features.

First, it’s time for a DTR: Define The Relationships. After reviewing these features, we recommend you sit down and determine your relationships on a piece of paper. Some people like to use the notation from Entity-Relationship Diagrams - we will not require this of you. Boxes for tables and lines with arrows for foreign keys would suffice. The minimum information you must show is: Each table; the fields in the table(s); the PK <-> FK relationships. Your “DTR” can be a picture of your hand-drawn diagram (but make sure it’s legible), or use any drawing tool to create a file. Make sure the name starts with DTR. If it’s a file, convert it to PDF, so we can read it. Add your document to your repo, and make sure it’s pushed to the repo, in the root of the directory.

Key decision for everyone. Continually ask yourself: “Should I do this in SQL or in Python?” Most things can be done in either. (Heck you can do inner joins in Python by doing nested loops, but… blech… don’t do that.) If your answer is “I know how to do it in Python but not SQL” - that’s a bad answer. If your answer is about what is simpler, more readable, maintainable, and performant - that’s a good answer. This project is a database API project, not just an SQL project.

Keep Your Old Tests Passing! These new features might involve revising past features. You must keep your old test cases running, but you may need to adapt them in spirit to these new features. It’s okay if you need to change your old test cases, as long as the spirit of the test remains.

Good luck! This might be a tough one. Don’t be afraid to scrap your schema design ideas when they don’t work. That’s why we do branches, unit tests, and merge requests.

Grading DB3

By lab day:

  • Set up merge request by lab day (5 pts)
  • Enough functionality finished such that it can be thoroughly reviewed (5 pts)

By submission day:

  • Directions followed. e.g.git branch/ tag, etc. (5 pts)
  • Provided useful feedback to others (merge request feedback) (5 pts)
  • Responded to feedback on your own project (merge request) (5 pts)
  • Requirements implemented (20 pts)
  • CI Succesfully runs (5 pts)
  • Documentation/ DTR complete (10 pts)
  • Test cases implemented and pass (20 pts)

DB3 Rideshare: New Features

These features likely involve revising your existing implementation.

  • Carpooling to a single destination. A ride can have one driver, but multiple passengers now. A driver can keep themselves marked as “available” to pick up other riders. A ride will always have a single destination. Billing is split evenly across all riders.
  • Responding to Reviews. A driver and a rider can write a single written review for a given ride (as before in DB2), and now can also write one response to each other’s reviews. That is, a rider can respond to a driver’s review, and a driver can respond to a rider’s review. When reviews are obtained from the API, all relevant responses are returned too.
  • Receipts. A rider should be able to get a listing of their finished rides for a given date range, along with an accurate total of the money they have spent.

Test Data

  • Add users Alex, Bobby, Louie, Elaine, and Tony. They are all drivers.

Test Case Sketches

In each test scenario, peform the action(s), then validate that the DB is correctly updated after the action(s)

  • Godot (driver) marks himself available. Vladimir (rider) accepts the ride. Godot never shows up. No receipt is generated for either user.
  • Alex picks up Bobby to go to the airport for $12.
    • On the way, he picks up Louie, then Elaine, then Tony.
    • The receipts for each of the four riders is now $3.
    • Louie gives the ride a 2 and writes a bad review. Bobby reviews the ride and gives it a 5 with a good review. Alex responds to Louie’s review, but not Bobby’s.
  • Tony gives a ride to Alex. Elaine joins the ride. Tony then marks himself as unavailable. A search for available drivers shows does not include Tony.

Not Necessary

  • In carpooling, a rider does not need to be able to respond to the review of another rider, just to the driver’s review.

DB4: Analytics

Grading DB4

Please call your topic branch db4-dev and your final tag db4. All lowercase, hyphenated.
By lab day:

  • Set up merge request by lab day (5 pts)
  • Enough functionality finished such that it can be thoroughly reviewed (5 pts)

By submission day:

  • Directions followed. e.g. Example removed, git branch/ tag, etc. (5 pts)
  • Provided useful feedback to others (merge request feedback) (5 pts)
  • Responded to feedback on your own project (merge request) (5 pts)
  • Requirements implemented (20 pts)
  • CI Succesfully runs (5 pts)
  • Documentation complete (10 pts)
  • Test cases implemented and pass (20 pts)

DB4 Rideshare: New Features

  • Full Ride Info. Aggregate rider information for all rides within 1 day of a given date. This includes the ride, and then a list of riders. (Tip: you may find functions like array_agg might help to create an array from a group). Provide an average rating of the ride across all riders.
  • Fare Times. We want to be able to study the fares charged at different times of day. Provide a summary of hours and average fares. Hint: To get the hour part from a timestamp called metime in Postgresql, the syntax looks like EXTRACT(HOUR FROM metime). If an hour had no rides, then it does not need a row.
  • Future Plan. Create a file called FUTURE.md in the root of your repository. Using Markdown syntax, provide written answers to the following:
    • In the future, if we were to add the ability to have surge pricing to your system, what would need to change? For example, maybe we want to surge the prices when there’s a lot of demand in one area at one time
      • What tables need changing and/or adding?
      • What API methods would you provide?
      • How might existing API methods change?
    • In the future, if we were to add the ability to have future scheduling to your system, what would need to change? For example, a user would want to book a fare several days in advance
      • What tables need changing and/or adding?
      • What API methods would you provide?
      • How might existing API methods change?

Test Case Sketches

  1. Return and print the results of the Full Ride Info API A Full Ride Info call might look something like this. Your test data may vary.
driver dest_lat dest_long riders avg_rating
Alex 123.0 456.0 ['Bobby', 'Louie', 'Elaine', 'Tony'] 3.5
Godot 1234.0 3456.0 ['Vladimir'] 0

In Python, that would look like:

[

  { 'driver': 'Alex', 'dest_lat': 123.0, 'dest_long': 456.0, 'riders': ['Bobby', 'Louie', 'Elaine', 'Tony'], 'avg_rating': 3.5 },
  { 'driver': 'Godot', 'dest_lat': 1234.0, 'dest_long': 3456.0, 'riders': ['Vladimir'], 'avg_rating': 0 },

]
  1. Return and print the results of the Fare Times API. For Fare Times, say you only had rides at 3:30pm for $5, 4:55pm for $10, and 4:59pm for $20, then your output would look like this:
Hour Av. Fare
3 5.0
4 15.0
[[3, 5.0],
 [4, 15.0]