kalimeromk/rssfeed

Full-Text RSS extraction package for Laravel - converts partial RSS feeds to full content

Installs: 121

Dependents: 0

Suggesters: 0

Security: 0

Stars: 3

Watchers: 1

Forks: 0

Open Issues: 0

Type:package

pkg:composer/kalimeromk/rssfeed

v3.8 2026-02-05 22:51 UTC

This package is auto-updated.

Last update: 2026-02-16 12:08:30 UTC


README

PHP Version Laravel Version

A comprehensive RSS feed processing package for Laravel that extracts full-text content from RSS/Atom feeds. This package ports the powerful Full-Text RSS functionality from the original FiveFilters project to Laravel.

โœจ Features

  • ๐Ÿ“ฐ Full-Text Extraction - Converts partial RSS feeds to complete articles
  • ๐Ÿค– Readability Algorithm - Automatically detects main content using the Arc90 Readability algorithm
  • ๐ŸŒ Site Configs - 1679+ site-specific configurations for better extraction
  • ๐Ÿ–ผ๏ธ Image Processing - Extracts and saves images with Spatie Media Library support
  • ๐Ÿ” Language Detection - Automatically detects article language
  • ๐Ÿงน HTML Sanitization - XSS filtering and inline style removal
  • ๐Ÿ“„ Multi-Page Support - Handles articles split across multiple pages
  • ๐Ÿ“ Multiple Output Formats - RSS 2.0, Atom, and JSON Feed formats
  • ๐Ÿ” Security - API key validation, domain whitelist/blacklist
  • ๐Ÿ’พ Caching - Built-in cache support via Laravel Cache
  • โšก Modern PHP - Type-safe with PHP 8.0+ features

๐Ÿ“ฆ Installation

composer require kalimeromk/rssfeed

Publish Configuration

php artisan vendor:publish --tag=config

Publish Site Configs (Optional)

php artisan vendor:publish --tag=site-configs

โš™๏ธ Configuration

The configuration file is located at config/rssfeed.php:

return [
    // Enable/disable the service
    'enabled' => true,

    // Security settings
    'key_required' => false,
    'api_keys' => [],
    'allowed_hosts' => [],
    'blocked_hosts' => [],

    // Feature toggles
    'singlepage_enabled' => true,
    'multipage_enabled' => true,
    'caching_enabled' => false,
    'xss_filter_enabled' => false,
    'detect_language' => true,

    // Cache settings
    'cache_time' => 10, // minutes

    // HTML parser settings
    'html_parser' => 'html5php', // or 'libxml'
];

๐Ÿš€ Usage

Basic RSS Feed Parsing

use RssFeed;

// Parse RSS feed
$feed = RssFeed::RssFeeds('https://example.com/feed.xml');

// Get feed items
foreach ($feed->get_items() as $item) {
    echo $item->get_title();
    echo $item->get_description();
}

Full-Text Content Extraction

use Kalimeromk\Rssfeed\FullTextExtractor;

$extractor = app(FullTextExtractor::class);

// Extract from URL
$result = $extractor->extract('https://example.com/article');

if ($result['success']) {
    echo $result['title'];
    echo $result['content'];
    echo $result['author'];
    echo $result['language'];
}

// Extract from HTML string
$result = $extractor->extractFromHtml($html, 'https://example.com/article');

Process Feed with Full Content

use RssFeed;

$items = RssFeed::parseRssFeeds('https://example.com/feed.xml');

foreach ($items as $item) {
    echo $item['title'];
    echo $item['content']; // Full article content
    echo $item['author'];
    echo $item['language'];
}

Clean Text Extraction (No HTML)

$items = RssFeed::parseRssFeedsClean('https://example.com/feed.xml');

foreach ($items as $item) {
    echo $item['content']; // Plain text, no HTML
}

Generate Feed Output

use Kalimeromk\Rssfeed\Services\FeedOutputService;

$outputService = app(FeedOutputService::class);

// RSS 2.0
$rss = $outputService->toRss($feedData);

// Atom
$atom = $outputService->toAtom($feedData);

// JSON Feed
$json = $outputService->toJson($feedData);

Image Handling

// Extract images from feed item
$images = RssFeed::extractImagesFromItem($item);

// Save images to storage
$savedImages = RssFeed::saveImagesToStorage($images, $model);

HTML Sanitization

use Kalimeromk\Rssfeed\Services\HtmlSanitizerService;

$sanitizer = app(HtmlSanitizerService::class);

// Basic sanitization
$clean = $sanitizer->sanitize($html);

// Remove inline styles
$noStyles = $sanitizer->sanitizeWithoutStyles($html);

// Strip all HTML
$text = $sanitizer->stripAllTags($html);

๐Ÿ”ง Advanced Usage

Custom Site Configuration

Create custom extraction rules in site_config/custom/{hostname}.txt:

# Example: example.com.txt
body: //article[contains(@class, 'main-content')]
title: //h1
author: //span[@class='author-name']
date: //time[@pubdate]

# Remove unwanted elements
strip_id_or_class: ads,comments,sidebar
strip: //div[@class='donation-form']

Domain-Specific Selectors

Add to config/rssfeed.php:

'content_selectors' => [
    'example.com' => '//div[@class="article-content"]',
    'news.example.com' => '//article[contains(@class, "story")]',
],

Content Cleanup Rules

'remove_selectors' => [
    '.donation-form',
    '.share-buttons',
    '.comments',
    '.advertisement',
],

๐Ÿงช Testing

composer test

๐Ÿ“‚ Package Structure

src/
โ”œโ”€โ”€ Extractors/
โ”‚   โ”œโ”€โ”€ Readability/          # Arc90 Readability port
โ”‚   โ”‚   โ”œโ”€โ”€ Readability.php
โ”‚   โ”‚   โ””โ”€โ”€ JSLikeHTMLElement.php
โ”‚   โ””โ”€โ”€ ContentExtractor/     # Site config extraction
โ”‚       โ”œโ”€โ”€ ContentExtractor.php
โ”‚       โ””โ”€โ”€ SiteConfig.php
โ”œโ”€โ”€ Handlers/
โ”‚   โ”œโ”€โ”€ MultiPageHandler.php  # Multi-page article handling
โ”‚   โ””โ”€โ”€ SinglePageHandler.php # Single-page view detection
โ”œโ”€โ”€ Services/
โ”‚   โ”œโ”€โ”€ CacheService.php      # Laravel cache wrapper
โ”‚   โ”œโ”€โ”€ FeedOutputService.php # RSS/Atom/JSON generation
โ”‚   โ”œโ”€โ”€ HtmlSanitizerService.php
โ”‚   โ”œโ”€โ”€ LanguageDetectionService.php
โ”‚   โ””โ”€โ”€ SecurityValidator.php
โ”œโ”€โ”€ FullTextExtractor.php     # Main extraction class
โ”œโ”€โ”€ RssFeed.php              # Original RSS functionality
โ””โ”€โ”€ RssfeedServiceProvider.php

site_config/
โ””โ”€โ”€ standard/                # 1679+ site configurations

๐Ÿ”„ Migration from Original Full-Text RSS

Original Feature Laravel Equivalent
Readability.php FullTextExtractor::extract()
Site Config files Same format, copied to site_config/
makefulltextfeed.php FeedOutputService
htmLawed HtmlSanitizerService (HTMLPurifier)
Zend_Cache CacheService (Laravel Cache)

๐Ÿ“ License

MIT License - see LICENSE for details.

๐Ÿ™ Credits

This package is based on the Full-Text RSS project by FiveFilters.org, ported to Laravel with modern PHP practices.

  • Original Readability by Arc90 Labs
  • Ported to PHP by Keyvan Minoukadeh
  • Laravel adaptation by Zorab Shefot Bogoevski