md4c

README.md (6878B)
     1 [![Build status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?label=linux%20build)](https://travis-ci.org/mity/md4c)
     2 [![Build status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master)
     3 [![Coverity Scan Build Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c)
     4 [![Codecov](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?label=code%20coverage)](https://codecov.io/github/mity/md4c)
     5 
     6 # MD4C Readme
     7 
     8 * Home: http://github.com/mity/md4c
     9 * Wiki: http://github.com/mity/md4c/wiki
    10 
    11 MD4C stands for "Markdown for C" and, unsurprisingly, it is a C Markdown parser
    12 implementation.
    13 
    14 
    15 ## What is Markdown
    16 
    17 In short, Markdown is the markup language this `README.md` file is written in.
    18 
    19 The following resources can explain more if you are unfamiliar with it:
    20 * [Wikipedia article](http://en.wikipedia.org/wiki/Markdown)
    21 * [CommonMark site](http://commonmark.org)
    22 
    23 
    24 ## What is MD4C
    25 
    26 MD4C is C Markdown parser with the following features:
    27 
    28 * **Compliance:** Generally MD4C aims to be compliant to the latest version of
    29   [CommonMark specification](http://spec.commonmark.org/). Right now we are
    30   fully compliant to CommonMark 0.28.
    31 
    32 * **Extensions:** MD4C supports some commonly requested and accepted extensions.
    33   See below.
    34 
    35 * **Compactness:** MD4C is implemented in one source file and one header file.
    36 
    37 * **Embedding:** MD4C is easy to reuse in other projects, its API is very
    38   straightforward: There is actually just one function, `md_parse()`.
    39 
    40 * **Push model:** MD4C parses the complete document and calls callback
    41   functions provided by the application for each start/end of block, start/end
    42   of a span, and with any textual contents.
    43 
    44 * **Portability:** MD4C builds and works on Windows and Linux, and it should
    45   be fairly simple to make it run also on most other systems.
    46 
    47 * **Encoding:** MD4C can be compiled to recognize ASCII-only control characters,
    48   UTF-8 and, on Windows, also UTF-16, i.e. what is on Windows commonly called
    49   just "Unicode". See more details below.
    50 
    51 * **Permissive license:** MD4C is available under the MIT license.
    52 
    53 * **Performance:** MD4C is very fast. Preliminary tests show it's quite faster
    54   then [Hoedown](https://github.com/hoedown/hoedown) or
    55   [Cmark](https://github.com/jgm/cmark).
    56 
    57 
    58 ## Using MD4C
    59 
    60 The parser is implemented in a single C source file `md4c.c` and its
    61 accompanying header `md4c.h`.
    62 
    63 The main provided function is `md_parse()`. It takes a text in Markdown syntax
    64 as an input and a pointer to renderer structure which holds pointers to few
    65 callback functions.
    66 
    67 As `md_parse()` processes the input, it calls the appropriate callbacks
    68 allowing application to convert it into another format or render it onto
    69 the screen.
    70 
    71 More comprehensive guide can be found in the header `md4c.h` and also
    72 on [MD4C wiki](http://github.com/mity/md4c/wiki).
    73 
    74 Example implementation of simple renderer is available in the `md2html`
    75 directory which implements a conversion utility from Markdown to HTML.
    76 
    77 
    78 ## Markdown Extensions
    79 
    80 The default behavior is to recognize only elements defined by the [CommonMark
    81 specification](http://spec.commonmark.org/).
    82 
    83 However with appropriate renderer flags, the behavior can be tuned to enable
    84 some extensions or allowing some deviations from the specification.
    85 
    86  * With the flag `MD_FLAG_COLLAPSEWHITESPACE`, non-trivial whitespace is
    87    collapsed into a single space.
    88 
    89  * With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported.
    90 
    91  * With the flag `MD_FLAG_STRIKETHROUGH`, strikethrough spans are enabled
    92    (text enclosed in tilde marks, e.g. '~foo bar~').
    93 
    94  * With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks
    95    (not enclosed in `<` and `>`) are supported.
    96 
    97  * With the flag `MD_FLAG_PERMISSIVEAUTOLINKS`, ditto for e-mail autolinks.
    98 
    99  * With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks
   100    (without any scheme specified; `http:` is assumed) are supported.
   101 
   102  * With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTML`, raw inline HTML
   103    or raw HTML blocks respectively are disabled.
   104 
   105  * With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are
   106    disabled.
   107 
   108 
   109 ## Input/Output Encoding
   110 
   111 The CommonMark specification generally assumes UTF-8 input, but under closer
   112 inspection, Unicode plays any role in few very specific situations when parsing
   113 Markdown documents:
   114 
   115   * For detection of word boundary when processing emphasis and strong emphasis,
   116     some classification of Unicode character (whitespace, punctuation) is used.
   117 
   118   * For (case-insensitive) matching of a link reference with corresponding link
   119     reference definition, Unicode case folding is used.
   120 
   121   * For translating HTML entities (e.g. `&amp;`) and numeric character
   122     references (e.g. `&#35;` or `&#xcab;`) into their Unicode equivalents.
   123     However MD4C leaves this translation on the renderer/application; as the
   124     renderer is supposed to really know output encoding and whether it really
   125     needs to perform this kind of translation. (Consider that a renderer
   126     converting Markdown to HTML may leave the entities untranslated and defer
   127     the work to a web browser.)
   128 
   129 MD4C relies on this property of the CommonMark and the implementation is, to
   130 a large degree, encoding-agnostic. Most of MD4C code only assumes that the
   131 encoding of your choice is compatible with ASCII, i.e. that the codepoints
   132 below 128 have the same numeric values as ASCII.
   133 
   134 Any input MD4C does not understand is simply seen as part of the document text
   135 and sent to the renderer's callback functions unchanged.
   136 
   137 The two situations where MD4C has to understand Unicode are handled accordingly
   138 to the following preprocessor macros:
   139 
   140  * If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8
   141    for word boundary detection and case-folding.
   142 
   143  * On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses
   144    `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.
   145    (UTF-16 is what Windows developers usually call just "Unicode" and what
   146    Win32API works with.)
   147 
   148  * By default (when none of the macros is defined), ASCII-only mode is used
   149    even in the specific situations. That effectively means that non-ASCII
   150    whitespace or punctuation characters won't be recognized as such and that
   151    case-folding is performed only on ASCII letters (i.e. `[a-zA-Z]`).
   152 
   153 (Adding support for yet another encodings should be relatively simple due
   154 the isolation of the respective code.)
   155 
   156 
   157 ## License
   158 
   159 MD4C is covered with MIT license, see the file `LICENSE.md`.
   160 
   161 
   162 ## Reporting Bugs
   163 
   164 If you encounter any bug, please be so kind and report it. Unheard bugs cannot
   165 get fixed. You can submit bug reports here:
   166 
   167 * http://github.com/mity/md4c/issues