andreinocenti/file-type-detector-php

A PHP Lib to detect file type (category, MIME, extension) from local paths, URLs or data URIs using MIME, extensions, magic numbers and HTTP HEAD.

v1.0.0 2025-10-01 19:42 UTC

This package is auto-updated.

Last update: 2025-10-01 19:44:06 UTC


README

A PHP package to accurately identify a file type from a local path, URL (HEAD), or Data URI, combining multiple strategies: finfo (real MIME), binary signatures (magic numbers), extension lookups, and HTTP HEAD Content-Type (with SSRF/security controls). Perfect for upload validation, download gatekeeping, conditional processing (e.g., images vs. docs), and indexing.

PHP MIME detection, detect file type by extension and magic number, validate uploads in Laravel/PHP, detect docx/xlsx/pptx, detect mp4/m4a/heic/avif, tell zip vs docx, anti-SSRF HTTP HEAD.

✨ Highlights

  • Multi-strategy: finfosmart refinementsmagicextension (fallback).
  • Family-level refinements:
    • ZIP: distinguishes plain .zip from docx/xlsx/pptx, EPUB, ODF (odt/ods/odp), JAR, APK, 3MF, KMZ by peeking internal entries.
    • EBML: distinguishes WebM vs Matroska (MKV).
    • ISO-BMFF: identifies MP4/M4A/HEIC/AVIF based on ftyp brand.
  • Security (URLs): protections against SSRF, unsafe redirects, limited protocols, timeouts, allow-/block-lists, private network blocking (optional).
  • Configurable overrides: inject your own ext→mime and mime→category maps without touching the core.
  • Simple, typed API (PHP 8.2+), PSR-12 compliant.

🧩 Supported types & extensions

Below is a snapshot (non-exhaustive; you can expand via overrides).

Images

Extensions Primary MIME Category
jpg, jpeg image/jpeg image
png image/png image
gif image/gif image
webp image/webp image
avif image/avif image
heic, heif image/heic, image/heif image
tiff, tif image/tiff image
bmp image/bmp image
ico image/vnd.microsoft.icon image
psd image/vnd.adobe.photoshop image
svg image/svg+xml image
cr2, nef, arw, dng image/x-* (raw) raw-image

Video

Extensions Primary MIME Category
mp4, m4v video/mp4 video
webm video/webm video
mkv video/x-matroska video
mov video/quicktime video
3gp video/3gpp video

Audio

Extensions Primary MIME Category
mp3 audio/mpeg audio
wav audio/wav audio
flac audio/flac audio
ogg, oga audio/ogg audio
opus audio/opus audio
aac audio/aac audio
m4a audio/mp4 audio
mid, midi audio/midi audio
caf audio/x-caf audio

Documents / Text / Code

Extensions Primary MIME Category
pdf application/pdf pdf
doc application/msword document
docx application/vnd.openxmlformats-officedocument.wordprocessingml.document document
xls application/vnd.ms-excel spreadsheet
xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet spreadsheet
ppt application/vnd.ms-powerpoint presentation
pptx application/vnd.openxmlformats-officedocument.presentationml.presentation presentation
rtf application/rtf document
odt, ods, odp application/vnd.oasis.opendocument.* document/spreadsheet/presentation
epub application/epub+zip ebook
txt text/plain text
csv text/csv spreadsheet
md text/markdown text
html, htm text/html code
json application/json code
xml application/xml code
yaml, yml text/yaml code
php, js, css text/x-php, application/javascript, text/css code

Compressed / Packaged

Extensions Primary MIME Category
zip, cbz application/zip archive
7z application/x-7z-compressed archive
rar, cbr application/x-rar-compressed archive
tar application/x-tar archive
gz application/x-gzip archive
bz2 application/x-bzip2 archive
xz application/x-xz archive
zst application/zstd archive
jar application/java-archive code
apk application/vnd.android.package-archive code
kmz application/vnd.google-earth.kmz gis
3mf model/3mf 3d-model

Executables / Fonts / Others

Extensions Primary MIME Category
exe application/x-dosexec executable
elf application/x-executable executable
mach-o application/x-mach-binary executable
ttf, otf, woff, woff2 font/* font
iso application/x-iso9660-image disk-image
torrent application/x-bittorrent torrent
m3u, m3u8, pls audio/x-mpegurl, application/vnd.apple.mpegurl, application/pls+xml playlist
srt, vtt, ass, ssa application/x-subrip, text/vtt, text/x-ssa subtitle
pem, der, p7m, p7s x509/pkcs7 certificate
ics text/calendar calendar
vcf text/vcard contact
eml message/rfc822 email
kml application/vnd.google-earth.kml+xml gis
gpx application/gpx+xml gis
obj, stl, gltf model/* 3d-model
step, stp, iges, igs, dxf, dwg model/* or image/vnd.* cad

You can extend/modify via overrides; see below.

📦 Installation

composer require andreinocenti/file-type-detector-php

Requirements:

  • PHP 8.2+
  • fileinfo extension enabled (for finfo)
  • (Optional) cURL for more robust HEAD on URLs

🚀 Basic Usage

use AndreInocenti\FileTypeDetector\FileTypeDetector;

$detector = new FileTypeDetector();

// 1) Local path
$result = $detector->detect('/path/to/file.png');
/*
$result->toArray():
[
  'category'   => 'image',
  'mime'       => 'image/png',
  'extension'  => 'png',
  'confidence' => 0.95,
  'source'     => 'mime' // mime | magic | extension | http-head | data-uri
]
*/

// 2) URL (HEAD only, safe)
$result = $detector->detect('https://cdn.example.com/file');
// Uses Content-Type and, when needed, "Content-Disposition: filename=..." and/or URL path extension.

// 3) Data URI
$result = $detector->detect('data:image/png;base64,iVBORw0KGgo=');

Detection order

  1. finfo (MIME)Family refinements:
    • ZIP (docx/xlsx/pptx/epub/odf/jar/apk/3mf/kmz),
    • EBML (webm/mkv),
    • ISO-BMFF (mp4/m4a/heic/avif). If MIME is weak (application/octet-stream, text/plain, inode/x-empty, etc.), we prefer extension if reliable.
  2. Magic numbers → same refinements.
  3. Extension (fallback).
  4. URL: HEAD (Content-Type) + Content-Disposition filename + path extension.

🧩 Enums

  • FileCategory: IMAGE, VIDEO, AUDIO, DOCUMENT, SPREADSHEET, PRESENTATION, PDF, ARCHIVE, EXECUTABLE, FONT, DISK_IMAGE, TORRENT, CODE, TEXT, EBOOK, CONTACT, CALENDAR, SUBTITLE, CERTIFICATE, GIS, 3D_MODEL, CAD, RAW_IMAGE, UNKNOWN.

Eg:

use AndreInocenti\FileTypeDetector\Enums\FileCategory;
if ($result->category === FileCategory::IMAGE->value) {
    // process image
}

🔧 Overrides (external config)

Inject your own mappings without touching the core:

use AndreInocenti\FileTypeDetector\FileTypeDetector;
use AndreInocenti\FileTypeDetector\Enums\FileCategory;

$extOverrides = [
    'abc' => 'application/x-custom', // map ".abc" to x-custom
];

$mimeCatOverrides = [
    'application/x-custom' => FileCategory::DOCUMENT, // category for your custom MIME
];

$detector = new FileTypeDetector(
    http: null,                         // default HTTP client
    extToMimeOverrides: $extOverrides,  // your mappings
    mimeToCategoryOverrides: $mimeCatOverrides
);

$res = $detector->detect('/files/report.abc');

Override tips

  • Use overrides for proprietary types, internal extensions, or to normalize categories in your domain.
  • Overrides take precedence over package defaults.

🛡️ Security hardening (URLs)

The package ships with NativeHttpClient + SecurityOptions:

use AndreInocenti\FileTypeDetector\FileTypeDetector;
use AndreInocenti\FileTypeDetector\Config\SecurityOptions;
use AndreInocenti\FileTypeDetector\Http\NativeHttpClient;

$security = new SecurityOptions(
    allowPrivateNetworks: false,          // blocks 10.0.0.0/8, 192.168.0.0/16, 127.0.0.1, ::1, etc.
    allowedSchemes: ['https'],            // HTTPS only
    allowedHosts: ['cdn.yoursite.com'],     // optional allow-list
    blockedHosts: ['example-insecure.com'], // optional block-list
    timeout: 8,                           // seconds
    maxRedirects: 3,                      // manual redirects
    userAgent: 'Yoursite-FileTypeDetector/1.0'
);

$http = new NativeHttpClient($security);
$detector = new FileTypeDetector($http, [], [], $security);

$res = $detector->detect('https://cdn.yoursite.com/file');

Protected by default:

  • SSRF: resolves DNS → validates public IPs only (when allowPrivateNetworks=false).
  • Redirects: handled manually and re-validated at every hop (scheme/host/IP).
  • Protocols: http/https only by default.
  • Timeouts and User-Agent configurable.
  • Allow-/block-lists by host (optional).

If you must reach internal hosts (intranet), set allowPrivateNetworks=true only in trusted environments.

🧪 Tests

Uses Pest + PHPUnit CodeCoverage.

composer install
vendor/bin/pest
# Coverage
XDEBUG_MODE=coverage vendor/bin/pest --coverage

📚 API (quick reference)

FileTypeDetector::__construct(
    ?HttpClientInterface $http = null,
    array $extToMimeOverrides = [],
    array $mimeToCategoryOverrides = [],
    ?SecurityOptions $security = null
)
  • $http: HTTP client for HEAD; defaults to NativeHttpClient with SecurityOptions.
  • $extToMimeOverrides: override extension→MIME.
  • $mimeToCategoryOverrides: override MIME→Category.
  • $security: HTTP security policy.

FileTypeDetector::detect(string $input): FileTypeResult

  • $input: local path, URL (http/https), or data: URI.
  • Returns: category, mime, extension, confidence, source.

Confidence Level

  • ConfidenceLevel: HIGH (0.95), MEDIUM (0.75), LOW (0.5), VERY_LOW (0.25), NONE (0.0).

📄 License

MIT — free for commercial and open-source projects.

🙋 Support & Contributions

  • Issues and PRs are welcome.
  • Share real-world “tricky files” to help expand coverage and heuristics.