to Ruby – histogram.rb
For this activity you will write two Ruby scripts with an optional third script
(where N is 1, 2, or 3) that (eventually) produces a histogram
showing the frequency of occurrence of words
in a text file. Your script will read text from standard
input and print its results to standard output.
Two text files are provided for your use: totc.txt
(the first paragraph fromCharles
Dicken's A Tale of Two Cities) and jabberwocky.txt
(from Lewis Carroll's Through
the Looking-Glass and What Alice Found There).
Place this in a directory called
and submit via Git as directed by your instructor.
Part 1 (histogram1.rb)
the text file, line by line, from the standard input:
# process this line
method to each line to remove the end-of-line characters, and print out
each line with upper case letters converted to lower case using puts
See the downcase
method of the String
step #2 by removing any characters other than letters and spaces. See
methods of the String
class, along with the format of regular expressions in the RegExp
regular expression for "any characters other than lower case letters,
upper case letters, and spaces"
step #3 by using sub
from String to strip any
regular expression for "one or more spaces at the beginning of the line"
Part 2 (histogram2.rb)
change from processing lines to processing the words in each line:
each line into an array of words on arbitrarily long
whitespace boundary. See the split method in String, and use split(/
+/ regular expression for "words are delimited by one or
more spaces" Note that there is a space
character before the '+'.
method from Array
and an appropriate block, print each word on a separate line.
named bag to
simulate a bag of strings by mapping each unique string to the count of
so that the default value for a string is 0.
the body of the each
block from 5(b) to simply increment the count in bag
for each word, using the word as the hash key.
all words on all lines are accounted for, use the each
method from Hash
to print a list of words and their counts, one word & count per
Note: the each
method of Hash
provides two arguments to the associated block: a key and its value.
The keys, of course, are words, and the values are the counts. The
order of the key/value pairs are generated is essentially random.
to get an Array
of key value pairs, but only for words having at least two
Print the resulting words and counts using each
on the array. Note that the values passed to each are themselves arrays
- two element arrays where the element at 0 is the key
and 1st element at 1 is the value.
Part 3 (histogram3.rb) - Optional
the array of pairs using the sort
method and a block to do the comparison.
the counts (the second element in each pair), arrange things so that
the words & values are printed from highest to lowest number of
a given number of occurrences, sort on the words themselves in
have to learn about Ruby's <=>
operator to do the sort comparisons.
determine the longest word, use the inject(0)
method on the pairs array - the block returns the larger of the current
maximum and the length of the current word.
- Output the
histogram in the following format.
The formatting is tricky, so use this snipet using the pretty print
(pp) library. You will need
to add require
the top of your source.
do | apair |
printf "%-*.*s ", longest, longest, apair
puts "*" * apair
Where the longest word determines the width
as determined in Step #8.
- Make the cutoff
for the minimum count an optional command line argument.
exists (is not nil),
convert it to an integer using the to_i
method and use this as the minimum count for the words printed in the
use the minimum of 2 as we've done so far.
Create a directory called
and submit histogram1.rb, histogram2.rb and histogram3.rb via
A short but useful introduction to Ruby is in The
Little Book of Ruby.
site is a treasure trove of material on Ruby (version
1.9.3 for us),
24x7 - students in the past have told me that some of the
Ruby books there
have been helpful.