C
Activity
Histogram
Overview
A histogram is a bar
chart showing the relative frequency of occurrence of some
data. For this
activity you'll write a program to display a histogram showing the
relative frequency of letters in a document (specifically, Tom
Sawyer).
Setup
- Create a new directory in your repository called
Histogram
for this activity.
- Download the histogram.zip
file to this directory and unpack it. You should see three files: TomSawyer.txt,
histo_output.txt, and histogram.c.
A few tips and reminders:
- Find PuTTY
in the programs menu and use it to logon to nitron.se.rit.edu using your SE account and
password.
- Linus uses Ubuntu Linux; if you don't know
much about Linux/Unix, now would be a good time to peruse the material
in the course
resources related to Linux commands.
- Your home directory on linus is the same as your Z:
drive on the Windows workstations. One thing to remember is that the
directory pathname separator is forward-slash (/) on Linux, rather than the
backslash (\) on Windows. In point of
fact, Windows will also recognize forward slash as a pathname separator.
- Remember to set the bash
shell: $ bash
-
Tip: if you copy the link to the zip from this browser page, you can download the zip file directly to nitron with
wget
, unzip it with unzip
.
-
Tip: Once unzipped, commit the files to Git so you have a base of how you started
The
Activity
- Using the text editor of
your choosing edit the histogram.c file and fill in
the body of the main and
the print_stars with code to
produce a histogram identical to that given in histo_output.txt.
- Your program must
read from standard input and write to standard output.
- You must
count both lower- and upper-case letters, ignoring any other characters.
- You must
"fold" lower-case letters onto their upper-case equivalents. (Character
counts are not case sensitive, ‘A’ = ‘a’)
- You must
accumulate the per-letter counts in the count
array, where count[0] is for A, count[1]
for B, up to count[25] for Z.
- You must
scale the final counts so that the largest count prints MAXSTARS (70)
asterisks, and all the other counts print proportionally fewer
asterisks.
Thus, if the letter with the highest count is Q at 210, while X has a
count of 30, then Q would print
(MAXSTARS * 210) /210 = (70 * 210) /210
= 70 asterisks,
while X would print
(MAXSTARS * 30) /210 = (70 *
30) /210 = 10 asterisks.
- You must
complete the print_stars
function and call it from the main
function to print out the appropriate number of asterisks for each
letter.
- On linus compile your
program using the GNU C Compiler (gcc) as follows:
gcc -o histogram histogram.c
- When you get a clean
compile, the executable program will be named histogram
(that's what the -o option is for). To test your
program, execute the following command:
./histogram < TomSawyer.txt
- Compare your program's
output provided in histo_output.txt. Your
output may differ by an asterisk or two on some of the lines because of
differences in the algorithms computing the number of asterisks to
print. The upper-case letters labeling each line
and the general "shape" of the histogram chart
should be compatible with histo_output.txt.
- When you have a working
program (or even when you have just a skeleton that compiles), submit
it to the activity drop box as specified in the Submission
section. Note that submitting anything
will earn you some credit.
Hints
& Suggestions
- The only input function you
need is getchar(), and the only output function is printf(),
both of which are included via stdio.h.
- Note that getchar()
returns an int, not a char;
this is important as EOF is actually a negative
number (which cannot be the code for a legal ASCII character).
- The ctype.h
interface includes several functions to classify the input characters
and possibly transform them. Pay particular attention to isalpha,
islower, and toupper.
Submission
- The directory in your repository is
Histogram
- Your source file must
be named precisely
histogram.c.
- It's always a good idea to
check the pushbox after depositing anything to make sure your file(s).
made it safe-and-sound.
git status
- to make sure everything has been committed (should say "# On branch master
nothing to commit (working directory clean)")
git push pushbox master
- if it says "Everything up-to-date", you have everything pushed and submitted
Assessment (10 points)
- (2) Submission exactly as
specified.
- (1) Good source control practices.
- (1) Submitted file compiles.
- (1) Program produces 26
lines, labeled with upper case letters.
- (2) Each line in (d) has a line of asterisks
showing the letter's relative frequency.
- (1) Program utilizes simple,
straight-forward algorithms for all its computations.
- (2) Program exhibits good
craft practice:
- Consistent indentation.
- Clear, concise, correct and
informative comments - neither too much nor too little.
- Thoughtful, meaningful
naming of variables, constants, etc.
$Id$