Issue
Within a Java application, I would need to convert marked-down text into simple plain text instead of html (for example dropping all links addresses, bold and italic markers).
Which is the best way to do this? I was thinking using a markdown library like href="https://github.com/vsch/flexmark-java" rel="nofollow noreferrer">fleaxmark. But I cant find this feature at first sight. Is it there? Are there other better alternatives?
Solution
Edit
Commonmark supports rendering to text, by using org.commonmark.renderer.text.TextContentRenderer
instead of the default HTML renderer. Not sure what it does with newlines, but worth a try.
Original answer, using flexmark HTML + JSoup
The ideal solution would be to implement a custom Renderer for flexmark, but this would force you to write a model-to-string for all language features in markdown. Unless it supports this out of the box, but I'm not aware of this feature...
A simpler solution may be to use flexmark (or any other lightweight markdown renderer) and let it create the HTML. After that, just run the generated HTML through https://jsoup.org/ and let it extract the text:
Jsoup.parse(htmlInputStream).text();
String org.jsoup.nodes.Element.text() Gets the combined text of this element and all its children. Whitespace is normalized and trimmed.
For example, given HTML
<p>Hello <b>there</b> now! </p>
, p.text() returnsHello there now!
We use this approach to get a "preview" of the text entered in a rich content editor (summernote), after being sanitized with org.owasp.html.HtmlSanitizer
.
Answered By - Frederik Heremans
Answer Checked By - Marilyn (JavaFixing Volunteer)