SWEN-250

Regular Expressions - Convert Markdown format

Overview

This exercise requires you to use regular expressions (RegEx) using the standard C++ library regex classes.

In this exercise you will read a file that is written in Markdown format. Markdown is text based formatting language that uses a simple set of format markers to allow you to do things like Headings, Bold Text, Indented text, Italics etc. We will have you write code to read in the file, find these format markers using Regex, and convert the format to HTML as a web page

  • You will use a C++ makefile (starter file provided) to build the executable
  • The output file will take two parameters. The syntax of the command will be:
      md_convert -i markdown_sample.md -o index.html
    • -i: The input file name
    • -o: The output file name
    • We will grade against the provided markdown_sample.md file, but you should experiment with other markdown files as well!
Here are some common markdown format commands and the html tag equivalents. Review this reference with more information on Markdown Syntax. You may also find this regex tester useful, so you can try out your regular expressions quickly and easily. Also note that some markdown commands can be combined. Make sure your code handles the necessary combinations.
Markdown Markdown 'rule' HTML
# Heading level 1 Line starts with # <h1>Heading level 1</h1>
## Heading level 2 Line starts with ## <h2>Heading level 2</h2>
### Heading level 3 Line starts with ### <h3>Heading level 3</h3>
**Bold** There are two asterisks followed by alphanumerics, followed by two asterisks <strong>Bold text</strong> OR <b>Bold text</b>
_Italics_ There is an underscore followed by alphanumerics, followed by an underscore <i>Italicized text</i>
>Block quote (aligned/ block of text) Line starts with > (The text after the '>' will be aligned in a block until an empty line is found) <blockquote>Block quote (aligned/ block of text)</blockquote>
Basic Requirements

  • Read in the provided file 'markdown_sample.md' from the command line parameter
  • Find all the markdown tags (look at the markdown command references we provide), and replace them with html tags. Do this using using regex
  • Print the final conversion of all the markdown content to html to the console.
  • Write out the resulting content as an html file, named based on '-o' parameter e.g. index.html

The CI will publish your generated webpage (index.html). See the CI output in gitlab for the URL to that page. Use the .gitlab-ci.yml file (make sure you place it in your top level repo folder) to configure the CI correctly. Modify and folder paths if necessary for your setup.
You can view also the results of your markdown-to-html conversion by opening your copying your .html output file to your local PC and opening it in any browser. e.g. In Windows explorer, just right click on 'index.html' and select 'open with Firefox'.

Basic Steps
    (NOTE: Make sure you get the filenames right, since the Makefile references them!)
  • Create a new folder in the root of your repository named regex_md. You will place all your code in this folder
  • Download the starter files and extract the files into your directory
    • Explanation of files:
    • Makefile: This is a starter Makefile. It assumes you have named your files as listed below. Feel free to make any changes based on your code.
    • .gitlab-ci.yml: This is the CI file for gitlab that will publish a webpage (based on your work). This file goes in the root of your repository (not in the regex_md_convert folder)
    • markdown_sample.md: This is the input markdown file you should use for testing your code
    • page_template.html: This is the skeleton .html file you will read to help generate your final output.
  • Create a file main.cpp that contains your main method, and parses the command line arguments
  • Create a class 'parser.cpp' and 'parser.h' to assist with reading the file and finding the markdown format tag. Add methods to check for each type of format tag required.
  • Create a class 'MarkdownConvert' with a pair of files markdownconvert.h and markdownconvert.cpp
  • Add a method for each type of tag you want to be converted
    e.g. string ConvertH1(string) or string ConvertBold(string)
    NOTE: You want to convert the format tags, but preserve the content
  • If a particular format tag is found while reading lines with parser, then call the appropriate Conversion method
  • As each conversion is done, you can store the converted lines in a list of strings, using the C++ Standard library (e.g. std::list<std::string> or std::vector<std::string>)
  • Use the provided 'page_template.html' to help you create the html formatted output. HINT: You will want to take your list of converted strings, and insert them between the <body></body> tags. Use ifstream and ofstream to help read/ write the files.
Submission
  • Submit your code to a directory named regex_md in your git repository.
  • Grading scheme: 100 points

    • Clean build (10)
    • Proper use of regex(20)
    • Full/ correct conversion of Markdown file to HTML (70)