README.md (6878B)
1 [![Build status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?label=linux%20build)](https://travis-ci.org/mity/md4c) 2 [![Build status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master) 3 [![Coverity Scan Build Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c) 4 [![Codecov](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?label=code%20coverage)](https://codecov.io/github/mity/md4c) 5 6 # MD4C Readme 7 8 * Home: http://github.com/mity/md4c 9 * Wiki: http://github.com/mity/md4c/wiki 10 11 MD4C stands for "Markdown for C" and, unsurprisingly, it is a C Markdown parser 12 implementation. 13 14 15 ## What is Markdown 16 17 In short, Markdown is the markup language this `README.md` file is written in. 18 19 The following resources can explain more if you are unfamiliar with it: 20 * [Wikipedia article](http://en.wikipedia.org/wiki/Markdown) 21 * [CommonMark site](http://commonmark.org) 22 23 24 ## What is MD4C 25 26 MD4C is C Markdown parser with the following features: 27 28 * **Compliance:** Generally MD4C aims to be compliant to the latest version of 29 [CommonMark specification](http://spec.commonmark.org/). Right now we are 30 fully compliant to CommonMark 0.28. 31 32 * **Extensions:** MD4C supports some commonly requested and accepted extensions. 33 See below. 34 35 * **Compactness:** MD4C is implemented in one source file and one header file. 36 37 * **Embedding:** MD4C is easy to reuse in other projects, its API is very 38 straightforward: There is actually just one function, `md_parse()`. 39 40 * **Push model:** MD4C parses the complete document and calls callback 41 functions provided by the application for each start/end of block, start/end 42 of a span, and with any textual contents. 43 44 * **Portability:** MD4C builds and works on Windows and Linux, and it should 45 be fairly simple to make it run also on most other systems. 46 47 * **Encoding:** MD4C can be compiled to recognize ASCII-only control characters, 48 UTF-8 and, on Windows, also UTF-16, i.e. what is on Windows commonly called 49 just "Unicode". See more details below. 50 51 * **Permissive license:** MD4C is available under the MIT license. 52 53 * **Performance:** MD4C is very fast. Preliminary tests show it's quite faster 54 then [Hoedown](https://github.com/hoedown/hoedown) or 55 [Cmark](https://github.com/jgm/cmark). 56 57 58 ## Using MD4C 59 60 The parser is implemented in a single C source file `md4c.c` and its 61 accompanying header `md4c.h`. 62 63 The main provided function is `md_parse()`. It takes a text in Markdown syntax 64 as an input and a pointer to renderer structure which holds pointers to few 65 callback functions. 66 67 As `md_parse()` processes the input, it calls the appropriate callbacks 68 allowing application to convert it into another format or render it onto 69 the screen. 70 71 More comprehensive guide can be found in the header `md4c.h` and also 72 on [MD4C wiki](http://github.com/mity/md4c/wiki). 73 74 Example implementation of simple renderer is available in the `md2html` 75 directory which implements a conversion utility from Markdown to HTML. 76 77 78 ## Markdown Extensions 79 80 The default behavior is to recognize only elements defined by the [CommonMark 81 specification](http://spec.commonmark.org/). 82 83 However with appropriate renderer flags, the behavior can be tuned to enable 84 some extensions or allowing some deviations from the specification. 85 86 * With the flag `MD_FLAG_COLLAPSEWHITESPACE`, non-trivial whitespace is 87 collapsed into a single space. 88 89 * With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported. 90 91 * With the flag `MD_FLAG_STRIKETHROUGH`, strikethrough spans are enabled 92 (text enclosed in tilde marks, e.g. '~foo bar~'). 93 94 * With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks 95 (not enclosed in `<` and `>`) are supported. 96 97 * With the flag `MD_FLAG_PERMISSIVEAUTOLINKS`, ditto for e-mail autolinks. 98 99 * With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks 100 (without any scheme specified; `http:` is assumed) are supported. 101 102 * With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTML`, raw inline HTML 103 or raw HTML blocks respectively are disabled. 104 105 * With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are 106 disabled. 107 108 109 ## Input/Output Encoding 110 111 The CommonMark specification generally assumes UTF-8 input, but under closer 112 inspection, Unicode plays any role in few very specific situations when parsing 113 Markdown documents: 114 115 * For detection of word boundary when processing emphasis and strong emphasis, 116 some classification of Unicode character (whitespace, punctuation) is used. 117 118 * For (case-insensitive) matching of a link reference with corresponding link 119 reference definition, Unicode case folding is used. 120 121 * For translating HTML entities (e.g. `&`) and numeric character 122 references (e.g. `#` or `ಫ`) into their Unicode equivalents. 123 However MD4C leaves this translation on the renderer/application; as the 124 renderer is supposed to really know output encoding and whether it really 125 needs to perform this kind of translation. (Consider that a renderer 126 converting Markdown to HTML may leave the entities untranslated and defer 127 the work to a web browser.) 128 129 MD4C relies on this property of the CommonMark and the implementation is, to 130 a large degree, encoding-agnostic. Most of MD4C code only assumes that the 131 encoding of your choice is compatible with ASCII, i.e. that the codepoints 132 below 128 have the same numeric values as ASCII. 133 134 Any input MD4C does not understand is simply seen as part of the document text 135 and sent to the renderer's callback functions unchanged. 136 137 The two situations where MD4C has to understand Unicode are handled accordingly 138 to the following preprocessor macros: 139 140 * If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8 141 for word boundary detection and case-folding. 142 143 * On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses 144 `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations. 145 (UTF-16 is what Windows developers usually call just "Unicode" and what 146 Win32API works with.) 147 148 * By default (when none of the macros is defined), ASCII-only mode is used 149 even in the specific situations. That effectively means that non-ASCII 150 whitespace or punctuation characters won't be recognized as such and that 151 case-folding is performed only on ASCII letters (i.e. `[a-zA-Z]`). 152 153 (Adding support for yet another encodings should be relatively simple due 154 the isolation of the respective code.) 155 156 157 ## License 158 159 MD4C is covered with MIT license, see the file `LICENSE.md`. 160 161 162 ## Reporting Bugs 163 164 If you encounter any bug, please be so kind and report it. Unheard bugs cannot 165 get fixed. You can submit bug reports here: 166 167 * http://github.com/mity/md4c/issues