SWEN-342 Concurrent & Distributed Software Systems

File Proecessing

The Problem

For this activity you will create a list of every unique word in a file. The words in the list should be punctuation free and should all be in lower case. Additionally, a couple metrics about the file will be displayed.

Requiements

  1. All of your code must be in the main method of a file named FileProcessor.java.
  2. You must use Java Streams and lambdas to implement the parsing.
  3. Limit any lambdas to 5 lines or less. The goal is brevity. In general more stream operations are preferred over larger lambdas.
  4. Each unique word in the file must be store in a List. Words should have no punctuation attached to them and be lower case.
  5. Print the number of words in the list. You must use streams to determine the word count (don't simply print the list's size).
  6. Print every word, (one per line), in the list that matches the regular expression ".*bit.*". Once again, the work should be accomplished by using streams and lambdas.

Example Output

Using the alice.txt file, your output should look similar to:

        There are 3044 words in the file.
        Words that contain 'bit' in them:
            ambition
            bit
            bite
            bitter
            prohibition
            rabbit
            rabbits
    

Hints

  1. Instructor soluion used 5 stream operations to create the list. The longest labmda was 3 lines. Yours does not have to match, this is simply an example to give you some idea of scale.
  2. You can use Files.lines(new File("alice.txt").toPath()) to access the file as a stream, one line at a time.
  3. You are free to hard code the file and regular expression used to be the ones mentioned in this document.

Deliverables

Commit and push your solution to the GitHub repository by the due date.