webignition/html-document-link-finder

Find anchor URLs in a given HTML document

5.0 2019-04-03 10:47 UTC

This package is auto-updated.

Last update: 2024-10-29 04:47:18 UTC


README

Get a collection of absolute urls, with their associated elements, for links in a HTML document.

Usage

Getting a LinkCollection from a WebPage

use webignition\HtmlDocumentLinkUrlFinder\HtmlDocumentLinkUrlFinder;
use webignition\WebResource\WebPage\WebPage;

$webPageUrl = 'http://www.google.co.uk/search?q=Hello+World';
$webPage = WebPage::createFromContent((string) file_get_contents($sourceUrl));

$finder = new HtmlDocumentLinkUrlFinder();
$linkCollection = $finder->getLinkCollection($webPage, $webPageUrl);

Accessing a LinkCollection

use Psr\Http\Message\UriInterface;

// Assuming $linkCollection from previous example

// Iterating
foreach ($linkCollection as $link) {
    $link->getUri();      // UriInterface instance
    $link->getElement();  // \DOMElement instance
}

// Counting
count($linkCollection);

// Get URIs only
$linkCollection->getUris(); // array of UriInterface

// Get unique URIs only
$linkCollection->getUniqueUris(); // array of UriInterface

Filtering a LinkCollection

All LinkCollection::filterBy*() methods return a new LinkCollection instance.

use webignition\Uri\ScopeComparer;

// Filtering
$anchorLinks = $linkCollection->filterByElementName('a');
$elementsWithRelStylesheetAttribute = $linkCollection->filterByAttribute('rel', 'stylesheet');
$linksWithinUrlScope = $linkCollection->filterByUrlScope(
    new ScopeComparer(),
    ['http://example.com/']
);

$linkElementsWithRelStylesheetAttribute = $linkCollection
    ->filterByElementName('link')
    ->filterByAttribute('rel', 'stylesheet');