C Activity
Histogram

Overview

A histogram is a bar chart showing the relative frequency of occurrence of some data. For this activity you'll write a program to display a histogram showing the  relative frequency of letters in a document (specifically, Tom Sawyer).

Setup

  1. Create a new directory in your repository called Histogram for this activity.
  2. Download the histogram.zip file to this directory and unpack it. You should see three files: TomSawyer.txt, histo_output.txt, and histogram.c.

A few tips and reminders:

The Activity

  1. Using the text editor of your choosing edit the histogram.c file and fill in the body of the main and the print_stars with code to produce a histogram identical to that given in histo_output.txt.
    1. Your program must read from standard input and write to standard output.
    2. You must count both lower- and upper-case letters, ignoring any other characters.
    3. You must "fold" lower-case letters onto their upper-case equivalents. (Character counts are not case sensitive, ‘A’ = ‘a’)
    4. You must accumulate the per-letter counts in the count array, where count[0] is for A, count[1] for B, up to count[25] for Z.
    5. You must scale the final counts so that the largest count prints MAXSTARS (70) asterisks, and all the other counts print proportionally fewer asterisks.
      Thus, if the letter with the highest count is Q at 210, while X has a count of  30, then Q would print
           (MAXSTARS * 210) /210 = (70 * 210) /210 = 70 asterisks,
      while X would print
           (MAXSTARS * 30) /210 =  (70 * 30) /210 = 10 asterisks.
    6. You must complete the print_stars function and call it from the main function to print out the appropriate number of asterisks for each letter.
  2. On linus compile your program using the GNU C Compiler (gcc) as follows:
        gcc -o histogram histogram.c
  3. When you get a clean compile, the executable program will be named histogram (that's what the -o option is for). To test your program, execute the following command:
        ./histogram < TomSawyer.txt
  4. Compare your program's output  provided in histo_output.txt. Your output may differ by an asterisk or two on some of the lines because of differences in the algorithms computing the number of asterisks to print. The upper-case letters labeling each line and the general  "shape" of the histogram chart should be compatible with histo_output.txt.
  5. When you have a working program (or even when you have just a skeleton that compiles), submit it to the activity drop box as specified in the Submission section. Note that submitting anything will earn you some credit.

Hints & Suggestions

  1. The only input function you need is getchar(), and the only output function is printf(), both of which are included via stdio.h.
  2. Note that getchar() returns an int, not a char; this is important as EOF is actually a negative number (which cannot be the code for a legal ASCII character).
  3. The ctype.h interface includes several functions to classify the input characters and possibly transform them. Pay particular attention to isalpha, islower, and toupper.

Submission

  1. The directory in your repository is Histogram
  2. Your source file must be named precisely histogram.c.
  3. It's always a good idea to check the pushbox after depositing anything to make sure your file(s). made it safe-and-sound.
    • git status - to make sure everything has been committed (should say "# On branch master nothing to commit (working directory clean)")
    • git push pushbox master - if it says "Everything up-to-date", you have everything pushed and submitted
  • Assessment (10 points)
    1. (2) Submission exactly as specified.
    2. (1) Good source control practices.
    3. (1) Submitted file compiles.
    4. (1) Program produces 26 lines, labeled with upper case letters.
    5. (2) Each line in (d) has a line of asterisks showing the letter's relative frequency.
    6. (1) Program utilizes simple, straight-forward algorithms for all its computations.
    7. (2) Program exhibits good craft practice:
      1. Consistent indentation.
      2. Clear, concise, correct and informative comments - neither too much nor too little.
      3. Thoughtful, meaningful naming of variables, constants, etc.

    $Id$