revolution / salvager
Tiny WebCrawler for Laravel using Playwright
Fund package maintenance!
invokable
Installs: 7 783
Dependents: 1
Suggesters: 0
Security: 0
Stars: 7
Watchers: 1
Forks: 3
pkg:composer/revolution/salvager
Requires
- php: ^8.3
- illuminate/support: ^11.0||^12.0
- playwright-php/playwright: ^1.0
Requires (Dev)
- laravel/pint: ^1.26
- mockery/mockery: ^1.6.1
- orchestra/testbench: ^10.8
- phpunit/phpunit: ^12.4
- revolution/laravel-boost-copilot-cli: ^1.0
README
Tiny WebCrawler for Laravel using Playwright.
Version 2
Version 2 has been reworked as a simple package that depends on Playwright. It only implements minimal functionality, since you can use playwright-php/playwright directly.
In addition, version 2.2 now supports the Vercel agent-browser.
Requirements
- PHP >= 8.3
- Laravel >= 11.x
Installation
composer require revolution/salvager
Playwright
Install Playwright browsers:
vendor/bin/playwright-install --browsers
Or install Playwright browsers with OS dependencies:
vendor/bin/playwright-install --with-deps
Vercel agent-browser
Global installation and use Chromium binary
Install agent-browser and Chromium globally and run it as a Laravel Process.
Warning
This doesn't work with Vercel or Laravel Cloud. See below.
npm install -g agent-browser
agent-browser install
# Linux
agent-browser install --with-deps
# .env SALVAGER_AGENT_BROWSER_PATH=/path/to/agent-browser SALVAGER_AGENT_BROWSER_OPTIONS=
If you want to use custom Chromium binary @sparticuz/chromium, you can specify it in shell environment variables.
AGENT_BROWSER_EXECUTABLE_PATH=/tmp/chromium
# .env SALVAGER_INSTALL_CHROMIUM="node ./scripts/install-chromium.js"
Local installation and use Cloud provider
You can also install agent-browser locally and use it with Cloud provider such as Browserbase or Browser Use.
This should work on Vercel and Laravel Cloud, which cannot install OS deps.
Install in your Laravel project. Requires agent-browser v0.7.6 or later.
npm install agent-browser
# .env SALVAGER_AGENT_BROWSER_PATH="npx agent-browser" SALVAGER_AGENT_BROWSER_OPTIONS=
Set it in the shell environment variables instead of .env.
AGENT_BROWSER_PROVIDER=browserbase BROWSERBASE_PROJECT_ID="your-project-id" BROWSERBASE_API_KEY="your-api-key"
AGENT_BROWSER_PROVIDER=browseruse BROWSER_USE_API_KEY="your-api-key"
Vercel also requires AGENT_BROWSER_SOCKET_DIR.
AGENT_BROWSER_SOCKET_DIR=/tmp/
I have confirmed that it works with Vercel and Browserbase.
Usage
Playwright
The browser will be terminated when you exit Salvager::browse(), so please obtain any necessary data within the Salvager::browse() closure. The Page object cannot be used outside of Salvager::browse().
use Revolution\Salvager\Facades\Salvager; use Playwright\Page\Page; class SalvagerController { public function __invoke() { Salvager::browse(function (Page $page) use (&$url, &$text) { $page->goto('https://example.com/'); $page->screenshot(config('salvager.screenshots').'example.png'); $url = $page->url(); $text = $page->locator('p')->first()->innerText(); }); dump($url); dump($text); } }
If you want more control, just launch the browser with Salvager::launch().
use Playwright\Browser\BrowserContextInterface; use Revolution\Salvager\Facades\Salvager; /* @var BrowserContextInterface $browser */ $browser = Salvager::launch(); $page = $browser->newPage(); $page->goto('https://example.com/'); // Do something... // Don't forget to close the browser $page->close(); $browser->close();
Vercel agent-browser
use Revolution\Salvager\AgentBrowser; use Revolution\Salvager\Facades\Salvager; Salvager::agent(function (AgentBrowser $agent) use (&$url, &$text, &$html) { $agent->userAgent('Chromium'); $agent->open('https://example.com/'); $agent->screenshot(config('salvager.screenshots').'agent-test.png'); $url = $agent->url(); $text = $agent->text('xpath=//p[1]', '--json'); $html = $agent->html('css=html'); // Run any agent-browser command $result = $agent->run(command: '', args: '', options: ''); // If you are using a cloud provider, it is recommended not to close manually. // $agent->close(); });
Since text() and html() use Playwright's page.locator(), using a CSS selector will result in an error if multiple elements are found. If you want to specify one of multiple elements, use XPath.
LICENSE
MIT