kalimeromk / rssfeed
Full-Text RSS extraction package for Laravel - converts partial RSS feeds to full content
Installs: 121
Dependents: 0
Suggesters: 0
Security: 0
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Type:package
pkg:composer/kalimeromk/rssfeed
Requires
- php: >=7.4
- ext-curl: *
- ext-dom: *
- ext-libxml: *
- illuminate/support: ~5.5.0|~5.6.0|~5.7.0|~5.8.0|^6.0|^7.0|^8.0|^9.0|^10.0|^11.0|^12.0
- simplepie/simplepie: ^1.8.0
- spatie/laravel-medialibrary: ^7.0|^8.0|^9.0|^10.0|^11.0
Requires (Dev)
- larastan/larastan: ^3
- laravel/pint: dev-main
- orchestra/testbench: ^9
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^11.5.6
- rector/rector: 2.0.9
README
A comprehensive RSS feed processing package for Laravel that extracts full-text content from RSS/Atom feeds. This package ports the powerful Full-Text RSS functionality from the original FiveFilters project to Laravel.
โจ Features
- ๐ฐ Full-Text Extraction - Converts partial RSS feeds to complete articles
- ๐ค Readability Algorithm - Automatically detects main content using the Arc90 Readability algorithm
- ๐ Site Configs - 1679+ site-specific configurations for better extraction
- ๐ผ๏ธ Image Processing - Extracts and saves images with Spatie Media Library support
- ๐ Language Detection - Automatically detects article language
- ๐งน HTML Sanitization - XSS filtering and inline style removal
- ๐ Multi-Page Support - Handles articles split across multiple pages
- ๐ Multiple Output Formats - RSS 2.0, Atom, and JSON Feed formats
- ๐ Security - API key validation, domain whitelist/blacklist
- ๐พ Caching - Built-in cache support via Laravel Cache
- โก Modern PHP - Type-safe with PHP 8.0+ features
๐ฆ Installation
composer require kalimeromk/rssfeed
Publish Configuration
php artisan vendor:publish --tag=config
Publish Site Configs (Optional)
php artisan vendor:publish --tag=site-configs
โ๏ธ Configuration
The configuration file is located at config/rssfeed.php:
return [ // Enable/disable the service 'enabled' => true, // Security settings 'key_required' => false, 'api_keys' => [], 'allowed_hosts' => [], 'blocked_hosts' => [], // Feature toggles 'singlepage_enabled' => true, 'multipage_enabled' => true, 'caching_enabled' => false, 'xss_filter_enabled' => false, 'detect_language' => true, // Cache settings 'cache_time' => 10, // minutes // HTML parser settings 'html_parser' => 'html5php', // or 'libxml' ];
๐ Usage
Basic RSS Feed Parsing
use RssFeed; // Parse RSS feed $feed = RssFeed::RssFeeds('https://example.com/feed.xml'); // Get feed items foreach ($feed->get_items() as $item) { echo $item->get_title(); echo $item->get_description(); }
Full-Text Content Extraction
use Kalimeromk\Rssfeed\FullTextExtractor; $extractor = app(FullTextExtractor::class); // Extract from URL $result = $extractor->extract('https://example.com/article'); if ($result['success']) { echo $result['title']; echo $result['content']; echo $result['author']; echo $result['language']; } // Extract from HTML string $result = $extractor->extractFromHtml($html, 'https://example.com/article');
Process Feed with Full Content
use RssFeed; $items = RssFeed::parseRssFeeds('https://example.com/feed.xml'); foreach ($items as $item) { echo $item['title']; echo $item['content']; // Full article content echo $item['author']; echo $item['language']; }
Clean Text Extraction (No HTML)
$items = RssFeed::parseRssFeedsClean('https://example.com/feed.xml'); foreach ($items as $item) { echo $item['content']; // Plain text, no HTML }
Generate Feed Output
use Kalimeromk\Rssfeed\Services\FeedOutputService; $outputService = app(FeedOutputService::class); // RSS 2.0 $rss = $outputService->toRss($feedData); // Atom $atom = $outputService->toAtom($feedData); // JSON Feed $json = $outputService->toJson($feedData);
Image Handling
// Extract images from feed item $images = RssFeed::extractImagesFromItem($item); // Save images to storage $savedImages = RssFeed::saveImagesToStorage($images, $model);
HTML Sanitization
use Kalimeromk\Rssfeed\Services\HtmlSanitizerService; $sanitizer = app(HtmlSanitizerService::class); // Basic sanitization $clean = $sanitizer->sanitize($html); // Remove inline styles $noStyles = $sanitizer->sanitizeWithoutStyles($html); // Strip all HTML $text = $sanitizer->stripAllTags($html);
๐ง Advanced Usage
Custom Site Configuration
Create custom extraction rules in site_config/custom/{hostname}.txt:
# Example: example.com.txt
body: //article[contains(@class, 'main-content')]
title: //h1
author: //span[@class='author-name']
date: //time[@pubdate]
# Remove unwanted elements
strip_id_or_class: ads,comments,sidebar
strip: //div[@class='donation-form']
Domain-Specific Selectors
Add to config/rssfeed.php:
'content_selectors' => [ 'example.com' => '//div[@class="article-content"]', 'news.example.com' => '//article[contains(@class, "story")]', ],
Content Cleanup Rules
'remove_selectors' => [ '.donation-form', '.share-buttons', '.comments', '.advertisement', ],
๐งช Testing
composer test
๐ Package Structure
src/
โโโ Extractors/
โ โโโ Readability/ # Arc90 Readability port
โ โ โโโ Readability.php
โ โ โโโ JSLikeHTMLElement.php
โ โโโ ContentExtractor/ # Site config extraction
โ โโโ ContentExtractor.php
โ โโโ SiteConfig.php
โโโ Handlers/
โ โโโ MultiPageHandler.php # Multi-page article handling
โ โโโ SinglePageHandler.php # Single-page view detection
โโโ Services/
โ โโโ CacheService.php # Laravel cache wrapper
โ โโโ FeedOutputService.php # RSS/Atom/JSON generation
โ โโโ HtmlSanitizerService.php
โ โโโ LanguageDetectionService.php
โ โโโ SecurityValidator.php
โโโ FullTextExtractor.php # Main extraction class
โโโ RssFeed.php # Original RSS functionality
โโโ RssfeedServiceProvider.php
site_config/
โโโ standard/ # 1679+ site configurations
๐ Migration from Original Full-Text RSS
| Original Feature | Laravel Equivalent |
|---|---|
Readability.php |
FullTextExtractor::extract() |
| Site Config files | Same format, copied to site_config/ |
makefulltextfeed.php |
FeedOutputService |
htmLawed |
HtmlSanitizerService (HTMLPurifier) |
Zend_Cache |
CacheService (Laravel Cache) |
๐ License
MIT License - see LICENSE for details.
๐ Credits
This package is based on the Full-Text RSS project by FiveFilters.org, ported to Laravel with modern PHP practices.
- Original Readability by Arc90 Labs
- Ported to PHP by Keyvan Minoukadeh
- Laravel adaptation by Zorab Shefot Bogoevski