HTML ArticleExtractor
collects articles from web pages.
Data collected
- Article title
- HTML string of processed article content
- Text content of the article (all HTML removed)
- Article length in characters
- Article description or short excerpt from the content
- Author metadata
- Website name
Use Cases
- Collecting ready-made articles from any websites