# Turndown [![Build Status](https://travis-ci.org/domchristie/turndown.svg?branch=master)](https://travis-ci.org/domchristie/turndown) Convert HTML into Markdown with JavaScript. ## Project Updates * `to-markdown` has been renamed to Turndown. See the [migration guide](https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown) for details. * Turndown repository has changed its URL to https://github.com/mixmark-io/turndown. ## Installation npm: ``` npm install turndown ``` Browser: ```html ``` For usage with RequireJS, UMD versions are located in `lib/turndown.umd.js` (for Node.js) and `lib/turndown.browser.umd.js` for browser usage. These files are generated when the npm package is published. To generate them manually, clone this repo and run `npm run build`. ## Usage ```js // For Node.js var TurndownService = require('turndown') var turndownService = new TurndownService() var markdown = turndownService.turndown('
Hello worldWorld
Hello worldWorld
` elements is as follows: ```js { filter: 'p', replacement: function (content) { return '\n\n' + content + '\n\n' } } ``` The filter selects `
` elements, and the replacement function returns the `
` contents separated by two new lines. ### `filter` String|Array|Function The filter property determines whether or not an element should be replaced with the rule's `replacement`. DOM nodes can be selected simply using a tag name or an array of tag names: * `filter: 'p'` will select `
` elements
* `filter: ['em', 'i']` will select `` or `` elements
The tag names in the `filter` property are expected in lowercase, regardless of their form in the document.
Alternatively, the filter can be a function that returns a boolean depending on whether a given node should be replaced. The function is passed a DOM node as well as the `TurndownService` options. For example, the following rule selects `` elements (with an `href`) when the `linkStyle` option is `inlined`:
```js
filter: function (node, options) {
return (
options.linkStyle === 'inlined' &&
node.nodeName === 'A' &&
node.getAttribute('href')
)
}
```
### `replacement` Function
The replacement function determines how an element should be converted. It should return the Markdown string for a given node. The function is passed the node's content, the node itself, and the `TurndownService` options.
The following rule shows how `` elements are converted:
```js
rules.emphasis = {
filter: ['em', 'i'],
replacement: function (content, node, options) {
return options.emDelimiter + content + options.emDelimiter
}
}
```
### Special Rules
**Blank rule** determines how to handle blank elements. It overrides every rule (even those added via `addRule`). A node is blank if it only contains whitespace, and it's not an ``, ``,` ` or a void element. Its behaviour can be customised using the `blankReplacement` option.
**Keep rules** determine how to handle the elements that should not be converted, i.e. rendered as HTML in the Markdown output. By default, no elements are kept. Block-level elements will be separated from surrounding content by blank lines. Its behaviour can be customised using the `keepReplacement` option.
**Remove rules** determine which elements to remove altogether. By default, no elements are removed.
**Default rule** handles nodes which are not recognised by any other rule. By default, it outputs the node's text content (separated by blank lines if it is a block-level element). Its behaviour can be customised with the `defaultReplacement` option.
### Rule Precedence
Turndown iterates over the set of rules, and picks the first one that matches the `filter`. The following list describes the order of precedence:
1. Blank rule
2. Added rules (optional)
3. Commonmark rules
4. Keep rules
5. Remove rules
6. Default rule
## Plugins
The plugin API provides a convenient way for developers to apply multiple extensions. A plugin is just a function that is called with the `TurndownService` instance.
## Escaping Markdown Characters
Turndown uses backslashes (`\`) to escape Markdown characters in the HTML input. This ensures that these characters are not interpreted as Markdown when the output is compiled back to HTML. For example, the contents of ` 1. Hello world
` needs to be escaped to `1\. Hello world`, otherwise it will be interpreted as a list item rather than a heading.
To avoid the complexity and the performance implications of parsing the content of every HTML element as Markdown, Turndown uses a group of regular expressions to escape potential Markdown syntax. As a result, the escaping rules can be quite aggressive.
### Overriding `TurndownService.prototype.escape`
If you are confident in doing so, you may want to customise the escaping behaviour to suit your needs. This can be done by overriding `TurndownService.prototype.escape`. `escape` takes the text of each HTML element and should return a version with the Markdown characters escaped.
Note: text in code elements is never passed to`escape`.
## License
turndown is copyright © 2017+ Dom Christie and released under the MIT license.