Introduction
to Ruby – histogram.rb
Setup
For this activity you will write a Ruby script histogram.rb that (eventually) produces a histogram
showing the frequency of occurrence of words
in a text file. Your script will read text from standard input (using a while
loop with gets)
and print its results to standard output. Thus it will act as a filter -
it doesn't care what the source of the words is, not does it care what happens
with the results it produces.
Two text files are provided for your use: totc.txt (the first paragraph fromCharles
Dicken's A Tale of Two Cities) and jabberwocky.txt (from Lewis Carroll's Through
the Looking-Glass and What Alice Found There).
Place this in a directory called RubyHistogram
and submit via Git.
Activity Steps
- Read the text file, line by
line, from the standard input with gets. Apply the chomp!
method to each line to remove the end-of-line characters, and print out
each line with upper case letters converted to lower case. See the downcase
(and downcase!)
method of the String class.
- Enhance step #1 by removing
any characters other than letters and spaces. See the gsub
(and gsub!)
methods of the String class, along with the format of regular
expressions in the RegExp class.
- Enhance step #2 by using sub
(or sub!) from String to strip any leading spaces.
- Now change from processing
lines to processing the words in each line:
- Split each line into
an array of words on arbitrarily long whitespace boundary. See the split method in String,
and use split(/ +/). Note that there is a space
character before the '+'.
- Using the each
method from Array and an appropriate block, print each word
on a separate line.
- Use a Hash
named bag to
simulate a bag of strings by mapping each unique string to the count of
its occurrences.
- Create the bag
with Hash.new(0) so that the default value for a
string is 0.
- Change the body of the
each
block from 4(b) to simply increment the count in bag
for each word, using the word as the hash key.
- After all words on all
lines are accounted for, use the each method from Hash
to print a list of words and their counts, one word & count per line.
Note: the each
method of Hash provides two arguments to the associated
block: a key and its value. The keys, of course, are words, and the
values are the counts. The order of the key/value pairs are generated is
essentially random.
- Use select
from Hash
to get an Array of key value pairs, but only for words
having at least two occurrences.
Print the resulting words and counts using each
on the array. Note that the values passed to each are themselves arrays -
two element arrays where the element at 0 is the key and 1st
element at 1 is the value.
- Sort the array of pairs using
the sort
method and a block to do the comparison.
- Using the counts (the
second element in each pair), arrange things so that the words &
values are printed from highest to lowest number of occurrences.
- Within a given number
of occurrences, sort on the words themselves in alphabetic order.
- You'll have to learn
about Ruby's <=> operator to do the sort
comparisons.
- Instead of printing the
count, generate a histogram. Assume each word fits in 15 characters, so
use printf
"%-15.15s " to print the word, followed by N asterisks
(where N
is the number of occurrences).
Refer to the *
operator on Strings to see what happens when a string is
"multiplied" by an integer.
- Change the printf to printf
"%-*.*s ", where the longest word determines the
width. To determine the longest word, use the inject(0) method on
the array - the block returns the larger of the current maximum and the
length of the curren word.
- Change while gets
loop that reads each line to use $stdin.each with a block.
- Make the cutoff for the
minimum count an optional command line argument.
- If ARGV[0]
exists (is not nil), convert it to an integer using the to_i
method and use this as the minimum count for the words printed in the
histogram.
- Otherwise, use the
minimum of 2 as we've done so far.
Submission
Place this in a directory called RubyHistogram
and submit via Git.
Resources
A short but useful introduction to Ruby is in The
Little Book of Ruby.
The www.ruby-doc.org
site is a treasure trove of material on Ruby (verison 1.9.2 for us),
including:
Check Books
24x7 - students in the past have told me that some of the Ruby books there
have been helpful.