C Activity
CSV Parser With Tests

A common text format for structured tabular data is comma separated values (CSV). Most database systems and spreadsheets provide an option to save their data in this format. Simply put, a CSV file consists of a sequence of lines, where each line contains 1 or more fields separated by commas. Note that a line with N fields will have (N-1) commas.

Example #1

Last,First,Email,NumGrade,Letter

Awesome,Abby,axa@foo.edu,95.6,A

Better,Bobby,bnb@foo.edu,82.3,B

Doofus,Donald,ddd@foo.edu,64.4,D

Example #2

Food,Amount,Calories

Peanut Butter,tbsp,95

Whole Milk,cup,146

Ho Hos,serving,370

As these examples show, a CSV file often has a header line that labels the fields in the following lines. For this activity we will only consider CSV files with header lines.

Setup

Download  csv.zip which you will complete to parse and print a CSV file (read from standard input).

The following constants & structure will be used to represent each parsed CSV line

#define MAX_FIELDS (15)

#define MAX_CHARS (20)

 

typedef char f_string[MAX_CHARS + 1] ;

 

typedef struct {

    int nfields ;

    f_string field[MAX_FIELDS] ;

} csv_line ;


A csv_line is a struct holding a field countnfields, and up to MAX_FIELDS fields. Each field is of type f_string, which is an array of chars that can store at most MAX_CHARS characters plus a terminating NUL ('\0')). You must assume that a field may be empty - that is, have only a NUL in it. This would be the case, for instance, if a line contains two consecutive commas or consists of only a newline character.

You program must complete the bodies of the following three functions in csv.c:

int get_field(f_string field) ;

Fills in the field array with the next field from standard input, ensuring the field is properly NUL terminated. A field ends when one of (1) a comma (,), (2) a newline ('\n'), or (3) EOF is returned by getchar().
The (provided) helper function is_end_of_field(int ch) returns true if and only if the character ch is one of these terminators.
get_field() returns the character that ended the field it read.


csv_line get_line() ;

Reads and splits the next CSV input line into its constituent fields, returning the resulting csv_line structure. The function works by repeatedly calls get_field() with the successive field arrays to be filled in, stopping when get_field() returns a newline or EOF.
Note that any legal line, even an empty one, has at least one field; thus, end of file is indicated by setting nfields to 0 in the structure get_line() returns.


void print_csv(csv_line header, csv_line data) ;

Prints label / value pairs, where header is the parsed version of the first input line and data is the parsed version of one of the following lines. For instance,
the first data line from the CSV file in Example #2 above would be printed as:

Food = Peanut Butter

Amount = tbsp

Calories = 95

If the header and data lines differ in the number of fields, then the number of pairs printed is the minimum of these two field counts - see the provided helper function
min(int x, int y)

Processing

The (provided) main function reads the first header line, then reads and prints the successive data lines using print_csv().

Compiling and Testing

To compile for unit tests:
gcc -o test -g -Wall csv.c unit_tests.c

To compile for normal execution:
gcc -o csv -g -Wall csv.c unit_tests.c

To run unit tests (all unit tests MUST pass for full credit):
./test

To run normally:
./csv <food.csv >actual.out

To compare output:
diff actual.out expected_food_csv.out

You can download the expected_food_csv.out file here

Activity Journal

Fill out an ActivityJournal.txt.

Continuous Integration with gitlab

We are also going to have to learn about continuous integration - where each time you update and push your code, tests are automatically run. This is an industry (best) practice and learning the basics is an important part of the SE journey. Follow the instructions in the CI Intro page.

Note that these instructions are specific for the csv assignment. As you change assignments, you will need to modify the appropriate variables in the .gitlab-ci.yml file

Submission

Place your completed files, along with your activity journal, in a directory named csv at the top level of your git repo.

Bonus Opportunity

Create a make file that builds the test version and the normal (csv) version for an additional 5%. If you create a makefile that also automatically executes the unit tests you can get 5% more for a total possible bonus of 10%. Submit your Makefile in the same csv directory.

Your Makefile must create both the test and csv executables, compile and link with the -g switch, and must compile with the -Wall switch.

NOTE -- for you to get full bonus credit you must accomplish this by just typing make with no parameters.