C Activity
Word Count

Overview

One of the most simple but useful utilities on Unix / Linux is the word count (wc) program. This program simply reads one or more files and prints the number of lines, words, and characters found in each file. Without any command line arguments, wc reads its text from standard input. For example, here's a run of wc on the DarkAndStormyNight.txt file we'll be using in this activity:

bash-3.2$ wc < DarkAndStormyNight.txt
 11  64 375


This shows that the file has 11 lines, 64 words, and 375 characters.

Your task for this activity is to duplicate the behavior of wc shown above (the number of spaces preceding each number need not match).

Setup

  1. Create a new directory in your repository called WordCount.
  2. Download the ritwc.zip file to this directory and unpack it. You should see two files, DarkAndStormyNight.txt (§) and ritwc.c.

A few tips and reminders:

The Activity

  1. Using the text editor of your choosing edit the ritwc.c file and fill in the body of the main program with code to duplicate the functionality of wc. See the next section for some hints.
  2. On nitron, compile your program using the GNU C Compiler (gcc) as follows:
        gcc -o ritwc ritwc.c
  3. When you get a clean compile, the executable program will be named ritwc (that's what the -o (output) option is for). To test your program, execute the following commands:
        wc < DarkAndStormyNight.txt
        ./ritwc < DarkAndStormyNight.txt
  4. The first line runs the standard wc command so you can see what's expected. The second runs your program; the ./ forces the command language interpreter (bash, or the Bourne-Again Shell) to look in the current directory rather than the directories for standard system commands.
  5. When you have a working program (or even when you have just a skeleton that compiles), submit it via your Git repository as specified in the Submission section. Note that submitting anything will earn you some credit.

Hints & Suggestions

  1. What's in a word? More specifically, what is a word to wc? A word is a sequence of non-whitespace characters terminated by a whitespace character or end-of-file. Whitespace characters include the obvious "space" itself, along with tabs, carriage-return, line-feed and some other control characters. The following, from the sample file, has 2 lines, 14 words, and 67 characters:

    It was a dark and stormy night;
    the rain fell in torrents - except


    The words are (quoted): 'It', 'was', 'a', 'dark', 'and', 'stormy', 'night;', 'the', 'rain', 'fell', 'in', 'torrents', '-', and 'except'.
  2. Note that (a) an empty line, or one with only spaces, has no words on it, and (b) a line containing words may begin with whitespace characters which are simply ignored.
  3. Peruse on-line documentation for the three libraries referenced via the #include directives: stdlib.h, stdio.h and ctype.h. Some of these library functions will make your life easier. In particular, there is one that makes it trivial to tell whether or not a character is whitespace.
  4. Note that getchar() returns an int, not a char; this is important as EOF is actually a negative number (which cannot be the code for a legal ASCII character).

Submission

  1. We will grade what is in your WordCount directory in myCourses (under Class Activities).
  2. To check that you have submitted, run these two commands
    • git status - to make sure everything has been committed (should say "# On branch master nothing to commit (working directory clean)")
    • git push pushbox master - if it says "Everything up-to-date", you have everything pushed and submitted
  3. You may commit and push as many times as you like. We will grade whatever is the latest up to the submission deadline. If you want to see the time on the server, the command is date
  4. Assessment (10 points)
    1. (3) Submission exactly as specified.
    2. (1) Submitted file compiles.
    3. (1) Program produces correct character count.
    4. (1) Program produces correct line count.
    5. (2) Program produces correct word count.
    6. (2) Program exhibits good craft practice:
      1. Consistent indentation.
      2. Clear, concise, correct and informative comments - neither too much nor too little
      3. Thoughtful, meaningful naming of variables, constants, etc.

(§) This file has the first sentence from Edward Bulwer-Lytton's excruciatingly bad 1830 novel Paul Clifford. Its fame stems from Charles Schulz's comic Peanuts, and the many strips where Snoopy is writing a novel that starts "It was a dark and stormy night." It is also the inspiration for the annual Bulwer-Lytton Fiction Contest, sponsored by the English Department at San Jose State, a competition to create the worst opening sentence for a novel.