C Activity
Word Count
Overview
One of the
most simple but useful utilities on Unix
/ Linux is the word count (wc)
program. This program simply reads one or more files and prints the
number of
lines, words, and characters found in each file. Without any command
line
arguments, wc
reads
its text from standard input. For example, here's a run of wc
on the DarkAndStormyNight.txt file we'll be using
in this activity:
bash-3.2$
wc
< DarkAndStormyNight.txt
11 64 375
This shows that the file has 11 lines, 64 words, and 375 characters.
Your task for this activity is to duplicate the behavior of wc
shown above (the number of spaces preceding
each number need not match).
Setup
- Create a new directory in your repository called
WordCount
.
- Download the ritwc.zip
file to this directory and unpack it. You should see two files, DarkAndStormyNight.txt
(§)
and ritwc.c.
A few tips and reminders:
- Find PuTTY
in the programs menu and use it to logon to nitron.se.rit.edu using your SE account and
password.
- Nitron uses Ubuntu Linux; if you don't know
much about Linux/Unix, now would be a good time to peruse the material
in the course
resources related to Linux commands.
- Your home directory on nitron is the same as your Z:
drive on the Windows workstations. One thing to remember is that the
directory pathname separator is forward-slash (/) on Linux, rather than the
backslash (\) on Windows. In point of
fact, Windows will also recognize forward slash as a pathname separator.
- Remember to set the bash
shell: $ bash
-
Tip: if you copy the link to the zip from this browser page, you can download the zip file directly to nitron with
wget
, unzip it with unzip
.
-
Tip: Once unzipped, commit the files to Git so you have a base of how you started
The
Activity
- Using the text editor of
your choosing edit the ritwc.c
file and fill in the body of the main program with code to duplicate
the functionality of wc.
See the next section for some hints.
- On nitron,
compile your program using the GNU C Compiler (gcc)
as follows:
gcc -o ritwc ritwc.c
- When you get a clean
compile, the executable program will be named ritwc
(that's what the -o (output) option is for). To
test your program, execute the following commands:
wc <
DarkAndStormyNight.txt
./ritwc
< DarkAndStormyNight.txt
- The first line runs the
standard wc
command so you can see what's expected. The second runs
your program; the ./
forces the command language interpreter (bash,
or the Bourne-Again
Shell) to look in the current directory rather
than the directories for standard system commands.
- When you have a working
program (or even when you have just a skeleton that compiles), submit
it via your Git repository as specified in the Submission
section. Note that submitting anything
will earn you some credit.
Hints
& Suggestions
- What's in a word? More
specifically, what is a word to wc?
A word is a sequence of non-whitespace characters terminated by a
whitespace character or end-of-file. Whitespace characters include the
obvious "space" itself, along with tabs, carriage-return, line-feed and
some other control characters. The following, from the sample
file, has 2 lines, 14 words, and 67 characters:
It was a dark and
stormy night;
the rain fell in torrents - except
The words are (quoted): 'It', 'was',
'a', 'dark', 'and',
'stormy', 'night;', 'the',
'rain', 'fell', 'in',
'torrents', '-', and 'except'.
- Note that (a) an empty line,
or one with only spaces, has no words on it, and (b) a line containing
words may begin with whitespace characters which are simply ignored.
- Peruse on-line documentation
for the three libraries referenced via the #include
directives: stdlib.h,
stdio.h
and ctype.h.
Some of these library functions will make your life easier. In
particular, there is one that makes it trivial to tell whether or not a
character is whitespace.
- Note that getchar()
returns an int,
not a char; this is important as EOF
is actually a negative number (which cannot be the code for a legal
ASCII character).
Submission
- We will grade what is in your
WordCount
directory
in myCourses (under Class Activities).
- To check that you have submitted, run these two commands
git status
- to make sure everything has been committed (should say "# On branch master
nothing to commit (working directory clean)")
git push pushbox master
- if it says "Everything up-to-date", you have everything pushed and submitted
- You may commit and push as many times as you like. We will grade whatever is the latest up to the submission deadline. If you want to see the time on the server, the command is
date
- Assessment (10 points)
- (3) Submission exactly as
specified.
- (1) Submitted file compiles.
- (1) Program produces correct
character count.
- (1) Program produces correct
line count.
- (2) Program produces correct
word count.
- (2) Program exhibits good
craft practice:
- Consistent indentation.
- Clear, concise, correct and
informative comments - neither too much nor too little
- Thoughtful, meaningful
naming of variables, constants, etc.
(§) This file has the
first sentence from
Edward Bulwer-Lytton's excruciatingly bad 1830 novel Paul
Clifford. Its
fame stems from Charles Schulz's comic Peanuts,
and the many strips where Snoopy is writing a novel that starts "It was
a
dark and stormy night." It is also the inspiration for the annual Bulwer-Lytton Fiction
Contest,
sponsored by the English Department at San Jose
State, a competition to create the worst opening sentence for
a novel.