SWEN-250

C Activity

Word Count

Overview

One of the most simple but useful utilities on Unix/ Linux is the word count ( wc )program. This program simply reads one or more files and prints the number of lines, words, and characters found in each file. Without any command line arguments, wc reads its text from standard input. For example, here's a run of wc on the DarkAndStormyNight.txt file we'll be using in this activity :

bash-3.2$ wc < DarkAndStormyNight.txt
 11  64 375


This shows that the file has 11 lines, 64 words, and 375 characters.

Your task for this activity is to exactly duplicate the behavior of wc shown above.

Setup

  1. Create a new directory in your repository called CWordCount .
  2. Using wget to download the ritwc.zip file to this directory and unpack it. In the browser right click on the zip link to copy the link. Then use wget in the hamilton window to get the zip file (right click in the hamilton window to paste the URL for the zip file). You should see two files, DarkAndStormyNight.txt and ritwc.c .

The Activity

  1. Edit the ritwc.c file and fill in the body of the main program with code to duplicate the functionality of wc . See the next section for some hints.
  2. On hamilton , compile your program using the GNU C Compiler ( gcc ) as follows:
        gcc -o ritwc ritwc.c
  3. When you get a clean compile, the executable program will be named ritwc (that's what the -o (output) option is for). To test your program, execute the following commands:
        wc < DarkAndStormyNight.txt
        ./ritwc < DarkAndStormyNight.txt
  4. The first line runs the standard wc command so you can see what's expected. The second runs your program; the ./ forces the command language interpreter ( bash , or the Bourne-Again Shell ) to look in the current directory rather than the directories for standard system commands.
  5. In step 3 above your output must exactly match the output of the Linux wc utility.

Submission

Submit your source file ritwc.c and updated ActivityJournal.txt in a directory named CWordCount to your Git repo.

Grading Criteria

To receive full credit for this activity you do the following:

  1. Submit your work in a correctly named directory. This must be one of the top level directories in your repository.
  2. Use the exact filenames for both files: ritwc.c and ActivityJournal.txt.
  3. The program must compile without any warnings.
  4. The program must compile exactly as shown in Step 2 of the activity.
    • You cannot use any other options.
    • Do not use the -std=c99 option.
  5. The output must match the expected output including the exact formatting.
  6. Use good software style including consistent indentation and appropriate variable names.
  7. Use a reasonable amount of comments describing the purpose of each section of code.
  8. Complete the Activity Journal including the time estimate, plan, actual time, and observations.

Hints & Suggestions

  1. What's in a word? More specifically, what is a word to wc ? A word is a sequence of non-whitespace characters terminated by a whitespace character or end-of-file. Whitespace characters include the obvious "space" itself, along with tabs, carriage-return, line-feed and some other control characters. The following, from the sample file, has 2 lines, 14 words, and 67 characters:

    It was a dark and stormy night;
    the rain fell in torrents - except


    The words are (quoted): 'It' , 'was' , 'a' , 'dark' , 'and' , 'stormy' , 'night;' , 'the' , 'rain' , 'fell' , 'in' , 'torrents' , '-' , and 'except' .
  2. Note that (a) an empty line, or one with only spaces, has no words on it, and (b) a line containing words may begin with whitespace characters which are simply ignored.
  3. Peruse on-line documentation for the three libraries referenced via the #include directives: stdlib.h , stdio.h and ctype.h . Some of these library functions will make your life easier. In particular, there is one that makes it trivial to tell whether or not a character is whitespace.
  4. Note that getchar ( ) returns an int , not a char ; this is important as EOF is actually a negative number (which cannot be the code for a legal ASCII character).

(*)  This file has the first sentence from Edward Bulwer-Lytton's excruciatingly bad 1830 novel Paul Clifford. Its fame stems from Charles Schulz's comic  Peanuts, and the many strips where Snoopy is writing a novel that starts "It was a dark and stormy night." It is also the inspiration for the annual Bulwer-Lytton Fiction Contest, sponsored by the English Department at San Jose State , a competition to create the worst opening sentence for a novel.

Snoopy - Dark & Stormy Night