mixnode / mixnode-warcreader-php
Read Web ARChive (WARC) files in PHP.
Installs: 3 828
Dependents: 1
Suggesters: 0
Security: 0
Stars: 21
Watchers: 4
Forks: 4
Open Issues: 0
This package is not auto-updated.
Last update: 2025-04-02 21:12:00 UTC
README
This library allows developers to read Web ARChive (WARC) files in PHP.
Installation Guide
We recommend Composer for installing this package:
curl -sS https://getcomposer.org/installer | php
Once done, run the Composer command to install Mixnode WARC Reader for PHP:
php composer.phar require mixnode/mixnode-warcreader-php
After installing, you need to require Composer's autoloader in your code:
require 'vendor/autoload.php';
You can then later update Mixnode WARC Reader using composer:
composer.phar update
A Simple Example
<?php require 'vendor/autoload.php'; // Initialize a WarcReader object // The WarcReader constructure accepts paths to both raw WARC files and GZipped WARC files $warc_reader = new Mixnode\WarcReader("test.warc.gz"); // Using nextRecord, iterate through the WARC file and output each record. while(($record = $warc_reader->nextRecord()) != FALSE){ // A WARC record is broken into two parts: header and content. // header contains metadata about content, while content is the actual resource captured. print_r($record['header']); print_r($record['content']); echo "------------------------------------\n"; }