org.opencms.util
Class CmsHtml2TextConverter

java.lang.Object
  extended by org.htmlparser.visitors.NodeVisitor
      extended by org.opencms.util.CmsHtmlParser
          extended by org.opencms.util.CmsHtml2TextConverter
All Implemented Interfaces:
I_CmsHtmlNodeVisitor

public class CmsHtml2TextConverter
extends CmsHtmlParser

Extracts the HTML page content.


Field Summary
 
Fields inherited from class org.opencms.util.CmsHtmlParser
m_echo, m_noAutoCloseTags, m_result, TAG_ARRAY, TAG_LIST
 
Constructor Summary
CmsHtml2TextConverter()
          Creates a new instance of the html converter.
 
Method Summary
static java.lang.String html2text(java.lang.String html, java.lang.String encoding)
          Extracts the text from the given html content, assuming the given html encoding.
 void visitEndTag(org.htmlparser.Tag tag)
          Visitor method (callback) invoked when a closing Tag is encountered.
 void visitStringNode(org.htmlparser.Text text)
          Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
 void visitTag(org.htmlparser.Tag tag)
          Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
 
Methods inherited from class org.opencms.util.CmsHtmlParser
collapse, configureNoAutoCorrectionTags, getConfiguration, getNoAutoCloseTags, getResult, getTagHtml, process, setConfiguration, setNoAutoCloseTags, visitRemarkNode
 
Methods inherited from class org.htmlparser.visitors.NodeVisitor
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CmsHtml2TextConverter

public CmsHtml2TextConverter()
Creates a new instance of the html converter.

Method Detail

html2text

public static java.lang.String html2text(java.lang.String html,
                                         java.lang.String encoding)
                                  throws java.lang.Exception
Extracts the text from the given html content, assuming the given html encoding.

Parameters:
html - the content to extract the plain text from
encoding - the encoding to use
Returns:
the text extracted from the given html content
Throws:
java.lang.Exception - if something goes wrong

visitEndTag

public void visitEndTag(org.htmlparser.Tag tag)
Description copied from interface: I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a closing Tag is encountered.

Specified by:
visitEndTag in interface I_CmsHtmlNodeVisitor
Overrides:
visitEndTag in class CmsHtmlParser
Parameters:
tag - the tag that is ended.
See Also:
NodeVisitor.visitEndTag(org.htmlparser.Tag)

visitStringNode

public void visitStringNode(org.htmlparser.Text text)
Description copied from interface: I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.

Specified by:
visitStringNode in interface I_CmsHtmlNodeVisitor
Overrides:
visitStringNode in class CmsHtmlParser
Parameters:
text - the text that is visited.
See Also:
NodeVisitor.visitStringNode(org.htmlparser.Text)

visitTag

public void visitTag(org.htmlparser.Tag tag)
Description copied from interface: I_CmsHtmlNodeVisitor
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.

Specified by:
visitTag in interface I_CmsHtmlNodeVisitor
Overrides:
visitTag in class CmsHtmlParser
Parameters:
tag - the tag that is visited.
See Also:
NodeVisitor.visitTag(org.htmlparser.Tag)