talan-hdf / semantic-suggestion
TYPO3 extension for suggesting semantically related pages
Installs: 1 551
Dependents: 0
Suggesters: 0
Security: 0
Stars: 10
Watchers: 4
Forks: 5
Open Issues: 1
Type:typo3-cms-extension
Requires
- cywolf/nlp-tools: ^1.0||dev-main
- typo3/cms-core: ^12.0 || ^13.0
- typo3/cms-scheduler: ^12.0 || ^13.0
- dev-master
- 3.1.0
- 2.3.0
- 2.2.0
- 2.1.1
- 2.1.0
- 2.0.0
- 1.5.2.x-dev
- 1.5.1.x-dev
- 1.5.0.x-dev
- 1.4.0
- 1.3.2.x-dev
- 1.3.2
- 1.3.1.x-dev
- 1.3.1
- 1.3.0.x-dev
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.9
- 1.0.8
- 1.0.7
- 1.0.6
- 1.0.5
- 1.0.4
- 1.0.3
- 1.0.1
- dev-v12compatibility
- dev-bugfix/backend-module
- dev-nlp_1_5_0
- dev-feature/architecture-refactor
- dev-affinageCalculs
- dev-news
- dev-branch110
- dev-cleaning
- dev-phpnlp
- dev-nlp
- dev-nlpGoogle
- dev-recencyWeight
- dev-UnitTests
- dev-module
- dev-talan
This package is auto-updated.
Last update: 2025-09-18 13:17:14 UTC
README
Elevate your TYPO3 website with intelligent, AI-powered content recommendations using advanced NLP technology.
Introduction
The Semantic Suggestion extension revolutionizes the way related content is presented on TYPO3 websites. Moving beyond traditional category-based functionalities, this extension employs advanced NLP (Natural Language Processing) and TF-IDF (Term Frequency-Inverse Document Frequency) analysis to create genuinely relevant content connections.
Since version 2.0.0, similarity scores are stored in a dedicated database table (tx_semanticsuggestion_similarities
) instead of the TYPO3 cache. A Scheduler task handles calculation and storage, ensuring persistence and performance.
๐ NEW in v3.0.0: Enhanced with nlp_tools integration for professional-grade text analysis, featuring intelligent language detection for TYPO3 12/13 multilingual sites and optimized German language support.
Key Benefits:
- ๐ฏ Highly Relevant Links: TF-IDF vectorization with advanced NLP creates genuinely relevant content connections
- ๐ Multilingual Intelligence: Smart language detection for TYPO3 12/13 sites with automatic content analysis
- ๐ฉ๐ช German Language Optimized: Professional stemming and handling of compound words (Automobilindustrie โ Automobil)
- ๐ Increased User Engagement: Keep visitors on your site longer by offering truly related content
- ๐ Semantic Cocoon: Contributes to a high-quality semantic network, enhancing SEO and navigation
- โก Intelligent Automation: Reduces manual linking work while improving internal link quality by 30-50%
Performance Considerations
- The similarity calculation process performed by the Scheduler task can take time, especially on sites with a large number of pages (>500 pages might require 30s or more depending on the server).
- Displaying suggestions and statistics (reading from the database) is optimized.
- Use the backend module to assess the performance and relevance of suggestions for your specific setup.
๐ What's New in Version 3.0.0
๐ง Advanced NLP Integration
- TF-IDF Vectorization: Professional-grade similarity calculation replacing basic cosine similarity
- Intelligent Language Detection: Hybrid approach using TYPO3 configuration + content analysis
- Advanced Text Processing: Stemming, stop word removal, and text normalization via nlp_tools
๐ Enhanced Multilingual Support
- TYPO3 12/13 Compatibility: Uses modern Context API and Site Configuration
- Smart Language Detection:
- Priority 1: Uses TYPO3 site language configuration (
de_DE.UTF-8
โde
) - Priority 2: Falls back to intelligent content analysis when TYPO3 config is uncertain
- Priority 3: Confidence scoring prevents misdetection of mixed-language content
- Priority 1: Uses TYPO3 site language configuration (
- Multi-site Ready: Supports complex multilingual TYPO3 setups
๐ฉ๐ช German Language Excellence
- Professional Stemming: Uses Wamania\Snowball for German morphology
- Compound Word Support: Recognizes relationships between "Automobil" and "Automobilindustrie"
- Umlaut Handling: Perfect support for รค, รถ, รผ, ร characters
- German Stop Words: Optimized list including der, die, das, ein, eine, mit, von, etc.
๐ Enhanced Algorithm
- TF-IDF Vectors: More accurate similarity scoring than traditional word counting
- Confidence Thresholds: Prevents false positives in language detection
- Fallback Mechanisms: Graceful degradation if NLP processing fails
- Performance Caching: TF-IDF vectors and stemming results are cached
Table of Contents
- Introduction
- Features
- Requirements
- Installation
- Language Detection System
- Configuration
- Usage (Frontend)
- Backend Module
- Scheduler Task
- Similarity Logic (TF-IDF Enhanced)
- Multilingual Sites Setup
- Display Customization
- Debugging & Performance
- Migration from v2.x
- Contributing
- License
- Support
Features
๐ง Core Functionality
- Advanced TF-IDF Analysis: Professional semantic similarity calculation using vectorization
- Database Storage: Persistent similarity scores in
tx_semanticsuggestion_similarities
table - Automated Processing: Scheduler task for background calculation and updates
- Frontend Integration: Clean API for displaying suggestions (title, media, excerpt)
- Backend Analytics: Comprehensive statistics and performance metrics
๐ Multilingual & NLP Features
- Intelligent Language Detection: Hybrid TYPO3 + content analysis approach
- Professional Text Processing: Stemming, stop word removal, text normalization
- German Language Excellence: Optimized for compound words and umlauts
- Multi-site Support: Handles complex TYPO3 multilingual configurations
- Confidence Scoring: Prevents false language detection in mixed content
โ๏ธ Configuration & Performance
- Highly Configurable: TypoScript settings for display and Scheduler for analysis scope
- Performance Optimized: TF-IDF vector caching and intelligent fallbacks
- Flexible Exclusions: Page-level exclusions for analysis and/or display
- Debug Mode: Comprehensive logging for development and troubleshooting
Requirements
- TYPO3: 12.0.0 - 13.9.99
- PHP: 8.0 or higher
- Dependencies:
cywolf/nlp-tools
(automatically installed via Composer)wamania/php-stemmer
(for advanced stemming support)
Installation
Composer Installation (recommended)
- Install the extension with dependencies:
composer require talan-hdf/semantic-suggestion composer require cywolf/nlp-tools
- Activate both extensions in the TYPO3 Extension Manager.
- Clear TYPO3 cache:
./vendor/bin/typo3 cache:flush
Manual Installation
- Download the extension from the TER or GitHub.
- Upload the archive to
typo3conf/ext/
. - Activate the extension in the Extension Manager.
Language Detection System
๐ง How Language Detection Works
The extension uses a hybrid approach that combines TYPO3's multilingual configuration with intelligent content analysis:
Detection Priority (Cascade System):
-
๐ฏ Priority 1: TYPO3 Site Configuration (for TYPO3 12/13)
// Uses TYPO3's Context API and Site Configuration $site = $request->getAttribute('site'); $siteLanguage = $site->getLanguageById($languageId); $locale = $siteLanguage->getLocale(); // e.g., "de_DE.UTF-8" return strtolower(substr($locale->getLanguageCode(), 0, 2)); // โ "de"
-
๐ Priority 2: Content Analysis (when TYPO3 config is uncertain)
- Extracts character trigrams from text content
- Compares against language profiles built from stop words
- Uses confidence scoring to prevent false positives
-
โ๏ธ Priority 3: Confidence Verification
// If confidence difference < 30%, trust TYPO3 context if (($firstScore - $secondScore) / $firstScore < 0.3) { return $this->getTypo3LanguageContext(); }
๐ Multilingual Site Examples
Example 1: Well-configured TYPO3 Site
# site/config.yaml languages: - languageId: 0 locale: 'en_US.UTF-8' # โ nlp_tools detects "en" - languageId: 1 locale: 'de_DE.UTF-8' # โ nlp_tools detects "de" - languageId: 2 locale: 'fr_FR.UTF-8' # โ nlp_tools detects "fr"
Result: nlp_tools uses TYPO3 configuration directly via Site API
Example 2: Mixed Content Detection
Page with:
- sys_language_uid = 0 (English)
- But content: "Die deutsche Automobilindustrie ist sehr wichtig"
Detection results:
- Content analysis: 85% German confidence
- TYPO3 context: English
- Final decision: German (high confidence overrides TYPO3)
Example 3: Uncertain Content
Detection scores: French: 45%, German: 42%
Confidence difference: (45-42)/45 = 6.7% < 30%
โ Falls back to TYPO3 language context
๐ nlp_tools Integration
Yes, the extension fully integrates with nlp_tools:
// In PageAnalysisService.php protected function getCurrentLanguage(): string { // Uses nlp_tools LanguageDetectionService $detectedLanguage = $this->languageDetector->detectLanguage(''); return $detectedLanguage; // Smart hybrid detection }
The language detection leverages:
- LanguageDetectionService: Trigram analysis and confidence scoring
- TextAnalysisService: Advanced text processing and stemming
- TextVectorizerService: TF-IDF vectorization for similarity calculation
๐ Language Support Matrix
All languages benefit from nlp_tools integration, but with different optimization levels:
๐ฅ EXPERT Level Support (Maximum Optimization)
Language | Code | Stemming | Stop Words | TF-IDF | Improvement |
---|---|---|---|---|---|
๐ฉ๐ช German | de |
โ Advanced (Wamania\Snowball) | โ Specialized | โ Full | +40-50% |
German Features:
- Professional compound word handling (
Automobilindustrie
โAutomobil
) - Perfect umlaut support (
รค, รถ, รผ, ร
) - Specialized stop words (
der, die, das, ein, eine, mit, von, zu...
)
๐ฅ ADVANCED Level Support (Professional Optimization)
Language | Code | Stemming | Stop Words | TF-IDF | Improvement |
---|---|---|---|---|---|
๐ซ๐ท French | fr |
โ Wamania\Snowball | โ Specialized | โ Full | +30-40% |
๐ฌ๐ง English | en |
โ Wamania\Snowball | โ Specialized | โ Full | +30-40% |
๐ช๐ธ Spanish | es |
โ Wamania\Snowball | โ Specialized | โ Full | +30-40% |
๐ฅ STANDARD Level Support (TF-IDF Optimization)
Language | Code | Stemming | Stop Words | TF-IDF | Improvement |
---|---|---|---|---|---|
๐ฎ๐น Italian | it |
โ Basic tokenization | โ ๏ธ Generic | โ Full | +20-30% |
๐ต๐น Portuguese | pt |
โ Basic tokenization | โ ๏ธ Generic | โ Full | +20-30% |
๐ณ๐ฑ Dutch | nl |
โ Basic tokenization | โ ๏ธ Generic | โ Full | +20-30% |
All Others | * |
โ Basic tokenization | โ ๏ธ Generic | โ Full | +20-25% |
๐ What Every Language Gets
โ All languages benefit from:
- TF-IDF Vectorization: Professional semantic similarity calculation
- Intelligent Language Detection: Hybrid TYPO3 + content analysis
- Advanced Text Cleaning: Unicode normalization, accent removal
- Performance Caching: TF-IDF vectors and processing results cached
// ALL languages get TF-IDF processing $tfidfResult = $textVectorizer->createTfIdfVectors([$text1, $text2], $language); $similarity = $textVectorizer->cosineSimilarity($vector1, $vector2); // Only de/fr/en/es get advanced stemming if (in_array($language, ['de', 'fr', 'en', 'es'])) { $stemmedWords = $textAnalyzer->stem($text, $language); } else { $stemmedWords = $textAnalyzer->tokenize($text); // Basic tokenization }
๐ฏ Language-Specific Configuration Examples
๐ฉ๐ช German Sites (Maximum Performance)
# Scheduler Configuration: minimumSimilarity = 0.15
plugin.tx_semanticsuggestion_suggestions.settings {
enableStemming = 1 # CRITICAL for compound words
proximityThreshold = 0.25 # Lower threshold (compound matching)
minTextLength = 50 # German compound words in short text
analyzedFields {
title = 2.0 # German titles contain key compounds
keywords = 2.5 # German keywords very specific
content = 1.0 # Standard weight
description = 1.2 # Meta descriptions useful
abstract = 1.3 # German abstracts well-structured
}
}
๐ซ๐ท๐ฌ๐ง๐ช๐ธ French/English/Spanish Sites
# Scheduler Configuration: minimumSimilarity = 0.2
plugin.tx_semanticsuggestion_suggestions.settings {
enableStemming = 1 # Advanced stemming available
proximityThreshold = 0.3 # Standard TF-IDF threshold
minTextLength = 50 # Standard minimum length
analyzedFields {
title = 1.5 # Titles important but less compound
keywords = 2.0 # Keywords still highly relevant
content = 1.0 # Main content
description = 1.0 # Meta descriptions
abstract = 1.1 # Abstracts helpful
}
}
๐ฎ๐น๐ต๐น๐ณ๐ฑ Other Languages (Italian, Portuguese, Dutch, etc.)
# Scheduler Configuration: minimumSimilarity = 0.25
plugin.tx_semanticsuggestion_suggestions.settings {
enableStemming = 0 # No advanced stemming, but TF-IDF still helps
proximityThreshold = 0.35 # Higher threshold (less precise without stemming)
minTextLength = 100 # Longer text needed for TF-IDF accuracy
analyzedFields {
title = 1.8 # Rely more on titles without stemming
keywords = 2.2 # Keywords become more important
content = 0.8 # Content less reliable without stemming
description = 1.0 # Standard weight
abstract = 1.0 # Standard weight
}
}
Note: Even without stemming, languages like Italian and Portuguese still get significant improvements from TF-IDF vectorization compared to basic word counting.
Configuration
๐ฏ NEW in v3.1: Unified Quality Configuration replaces the old complex hierarchy! Now use a single qualityLevel
parameter instead of managing separate minimumSimilarity
and proximityThreshold
. Full backward compatibility maintained.
๐ Unified Configuration (v3.1+)
The new unified configuration eliminates the confusing hierarchy and dual parameters. Now you only need ONE parameter for quality control:
๐ฏ UNIFIED CONFIGURATION FLOW:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ qualityLevel โโโโโถโ Storage: -0.1 โโโโโถโ Display: same โ
โ (0.1 - 1.0) โ โ (broad range) โ โ (quality) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
ONE PARAMETER AUTOMATIC USER SEES QUALITY
TO MAINTAIN OPTIMIZATION SUGGESTIONS
Benefits:
- โ Single parameter instead of complex hierarchy
- โ Automatic optimization (storage = qualityLevel - 0.1, display = qualityLevel)
- โ No more conflicts between Scheduler and TypoScript
- โ Backward compatibility with legacy configurations
- โ Self-explanatory values (higher = more selective)
Legacy Configuration Hierarchy (v3.0 and earlier)
โ ๏ธ DEPRECATED: This section documents the old system for reference. Use unified
qualityLevel
instead!
Click to view legacy configuration details
Understanding the configuration hierarchy is critical for proper setup. The extension uses a two-tier system where Scheduler settings always take precedence over TypoScript settings:
๐ CONFIGURATION FLOW:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ Scheduler โโโโโถโ Database โโโโโถโ TypoScript โโโโโถโ Frontend โ
โ Settings โ โ Storage โ โ Settings โ โ Display โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
LEVEL 1 LEVEL 2 LEVEL 3 LEVEL 4
(Analysis & (Persistent (Display & (User Sees
Storage) Data) Filtering) Results)
๐ฏ Level 1: Scheduler Configuration (Master Control)
- Controls: What gets analyzed and stored in database
- Authority: Absolute - cannot be overridden by TypoScript
- Key Settings:
startPageId
: Defines analysis scopeexcludePages
: Pages never analyzed (permanent exclusion)minimumSimilarity
: Minimum score to store in databaserecursiveExclusion
: How exclusions are applied
๐ Level 2: Database Storage (Persistent Data)
- Contains: Only similarities โฅ
minimumSimilarity
from Scheduler - Source: Generated by Scheduler task execution
- Limitation: TypoScript cannot access data that was never stored
๐จ Level 3: TypoScript Configuration (Display Filter)
- Controls: What gets displayed from existing database data
- Limitation: Can only filter/limit existing data, cannot create new data
- Key Settings:
proximityThreshold
: Minimum score to display (must be โฅ SchedulerminimumSimilarity
)maxSuggestions
: Limit displayed resultsexcludePages
: Pages excluded from display only (still analyzed/stored)
๐๏ธ Level 4: Frontend Display (Final Output)
- Shows: Final filtered and limited results
- Source: Database data filtered by TypoScript settings
โ๏ธ Priority Rules
-
Scheduler ALWAYS wins:
Scheduler excludePages = "42,56" TypoScript excludePages = "" (empty) โ Pages 42,56 will NEVER appear (not analyzed at all)
-
TypoScript cannot override Scheduler thresholds:
Scheduler minimumSimilarity = 0.5 TypoScript proximityThreshold = 0.3 โ Impossible! No similarity < 0.5 exists in database
-
TypoScript can only be MORE restrictive:
Scheduler minimumSimilarity = 0.3 โ TypoScript proximityThreshold = 0.5 โ โ Works: Shows only similarities โฅ 0.5 from stored data โฅ 0.3
Scheduler Task Configuration
๐ง CRITICAL: This is the primary configuration that controls what gets analyzed and stored in the database. TypoScript cannot override these settings!
Create a "Semantic Suggestion: Generate Similarities" task in the TYPO3 Scheduler module with these settings:
โ ๏ธ WARNING: Choose your
minimumSimilarity
carefully - it cannot be lowered retroactively without re-running the entire analysis!
-
startPageId
(required): The UID of the root page from which the analysis will begin. This defines the scope of the analysis for this task run. Each task execution is linked to a Start Page ID (stored asroot_page_id
in the DB).- Example:
1
(for site root page)
- Example:
-
excludePages
(optional): Comma-separated list of page UIDs that will not be analyzed, and their similarities will not be stored.- Example:
42,56,78
- Example:
-
minimumSimilarity
(required): Threshold (0.0 to 1.0) below which a pair of similar pages will not be saved to the database. This controls storage efficiency.- Example:
0.3
(saves only pairs with similarity โฅ 30%)
- Example:
Recommended scheduling:
- Frequency: Daily or weekly
- Execution time: During off-peak hours (e.g., 2:00 AM)
TypoScript Settings
๐จ DISPLAY FILTER: These settings control the frontend display and the analysis algorithm details. They can only filter/limit what was already stored by the Scheduler.
โ CONSTRAINT:
proximityThreshold
MUST be โฅ SchedulerminimumSimilarity
(otherwise no suggestions will display)
Define them in your TypoScript Setup file under plugin.tx_semanticsuggestion_suggestions.settings
.
Constants (constants.typoscript)
plugin.tx_semanticsuggestion_suggestions.settings {
# --- Frontend Display Settings ---
# โ ๏ธ IMPORTANT: Must be โฅ Scheduler minimumSimilarity
proximityThreshold = 0.25 # TF-IDF optimized threshold (was 0.5 in v2.x)
maxSuggestions = 3 # Maximum number of suggestions to display
excerptLength = 100 # Max length of the text excerpt
excludePages = # Pages to exclude from DISPLAY only (comma-separated UIDs)
# --- Analysis Algorithm Settings (Used by Scheduler task) ---
recencyWeight = 0.2 # Weight of recency in the final score (0.0 to 1.0)
# Fields analyzed and their weights (TF-IDF enhanced)
analyzedFields {
title = 1.5 # Page titles are highly relevant
description = 1.0 # Meta descriptions
keywords = 2.0 # Explicit keywords have high weight
abstract = 1.2 # Page abstracts/summaries
content = 1.0 # Main page content (can be noisy)
}
# --- NLP & Language Settings (v3.0+) ---
enableStemming = 1 # Enable advanced stemming (especially for German)
defaultLanguage = en # Fallback language
minTextLength = 50 # Minimum text length for analysis
confidenceThreshold = 0.3 # Language detection confidence threshold
# --- Debugging ---
debugMode = 0 # Enable debug logs (0 or 1)
}
Setup (setup.typoscript)
# Main plugin configuration
plugin.tx_semanticsuggestion_suggestions {
settings {
proximityThreshold = {$plugin.tx_semanticsuggestion_suggestions.settings.proximityThreshold}
maxSuggestions = {$plugin.tx_semanticsuggestion_suggestions.settings.maxSuggestions}
excludePages = {$plugin.tx_semanticsuggestion_suggestions.settings.excludePages}
excerptLength = {$plugin.tx_semanticsuggestion_suggestions.settings.excerptLength}
recencyWeight = {$plugin.tx_semanticsuggestion_suggestions.settings.recencyWeight}
debugMode = {$plugin.tx_semanticsuggestion_suggestions.settings.debugMode}
analyzedFields {
title = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.title}
description = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.description}
keywords = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.keywords}
abstract = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.abstract}
content = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.content}
}
}
view {
# Paths to your Fluid templates if you wish to customize them
templateRootPaths.10 = EXT:your_extension/Resources/Private/Templates/
partialRootPaths.10 = EXT:your_extension/Resources/Private/Partials/
layoutRootPaths.10 = EXT:your_extension/Resources/Private/Layouts/
}
}
# Reusable TypoScript object
lib.semantic_suggestion = USER
lib.semantic_suggestion {
userFunc = TYPO3\CMS\Extbase\Core\Bootstrap->run
extensionName = SemanticSuggestion
pluginName = Suggestions
vendorName = TalanHdf
view =< plugin.tx_semanticsuggestion_suggestions.view
persistence =< plugin.tx_semanticsuggestion_suggestions.persistence
settings =< plugin.tx_semanticsuggestion_suggestions.settings
}
# Content element integration
tt_content.list.20.semanticsuggestion_suggestions =< plugin.tx_semanticsuggestion_suggestions
NLP Configuration
Advanced Language & Text Processing Settings
plugin.tx_semanticsuggestion_suggestions {
settings {
# ๐ง NLP Features (NEW in v3.0)
enableStemming = 1 # Enable advanced stemming (German compound words)
defaultLanguage = en # Fallback language
minTextLength = 50 # Minimum text length for analysis
confidenceThreshold = 0.3 # Language detection confidence threshold
# ๐ Language Mapping (TYPO3 12/13)
languageMapping {
0 = en # Language UID 0 โ English
1 = fr # Language UID 1 โ French
2 = de # Language UID 2 โ German
3 = es # Language UID 3 โ Spanish
4 = it # Language UID 4 โ Italian
5 = pt # Language UID 5 โ Portuguese
}
# ๐ฉ๐ช German Language Optimization
proximityThreshold = 0.25 # Lower threshold for TF-IDF (more sensitive)
# ๐ TF-IDF Algorithm Settings
analyzedFields {
title = 1.5 # German titles often contain key compound words
keywords = 2.0 # German keywords are highly indicative
content = 1.0 # Base content weight
description = 1.0
abstract = 1.2
}
}
}
Multilingual Site Example Configuration
For a German/English bilingual site:
# constants.typoscript
plugin.tx_semanticsuggestion_suggestions.settings {
# ๐ฉ๐ช German language optimization
enableStemming = 1 # Critical for compound words
proximityThreshold = 0.25 # Lower threshold for German compound matching
confidenceThreshold = 0.3
# Language detection mapping
languageMapping {
0 = en # TYPO3 language UID 0 = English
1 = de # TYPO3 language UID 1 = German
}
# German-optimized field weights
analyzedFields {
title = 2.0 # German titles contain key compounds
keywords = 2.5 # German keywords are very specific
content = 1.0 # Standard content weight
}
}
# IMPORTANT: Create separate Scheduler tasks:
# Task 1: English content (startPageId = 1, minimumSimilarity = 0.3)
# Task 2: German content (startPageId = 10, minimumSimilarity = 0.25)
Configuration Interaction
- Analysis Scope: Defined by the Scheduler task's
startPageId
. - DB Storage: Controlled by the Scheduler task's
minimumSimilarity
andexcludePages
. - Similarity Calculation: Performed by the
PageAnalysisService
(called by the Scheduler task), which uses the TypoScript settingsanalyzedFields
andrecencyWeight
. - Frontend Display: Reads from the DB and filters/limits based on the TypoScript settings
proximityThreshold
,maxSuggestions
,excludePages
. - Backend Display: Reads from the DB (based on the selected
root_page_id
) and filters based on the TypoScriptproximityThreshold
.
Key Points:
- The
proximityThreshold
(TypoScript) cannot display suggestions with a score lower than theminimumSimilarity
(Scheduler) because they were not saved. For the TypoScript setting to be effective, it must be โฅ the Scheduler threshold. - A page excluded in the Scheduler will never be analyzed/stored. A page excluded only in TypoScript will be analyzed/stored (if not excluded in Scheduler) but not displayed. It's often simpler to keep the
excludePages
lists synchronized. - You can create multiple Scheduler tasks with different
startPageId
values to analyze different sections of the site.
Unified Configuration Setup
1. Scheduler Task Configuration (Simplified)
Create a "Semantic Suggestion: Generate Similarities" task with these settings:
startPageId
(required): Root page UID for analysis scopequalityLevel
(required): Quality threshold (0.1-1.0) for suggestions- Storage: Automatically set to
qualityLevel - 0.1
(broad data collection) - Display: Uses
qualityLevel
directly (quality suggestions to users)
- Storage: Automatically set to
excludePages
(optional): Pages to exclude from both analysis and displayrecursiveExclusion
(optional): Apply exclusions recursively
Recommended Quality Levels:
๐ฉ๐ช German sites: 0.25 (compound word optimization)
๐ Standard sites: 0.30 (balanced quality/quantity)
๐ Content sites: 0.35 (higher quality threshold)
๐ Premium sites: 0.40 (very selective suggestions)
2. TypoScript Configuration (Simplified)
plugin.tx_semanticsuggestion_suggestions {
settings {
# ๐ฏ UNIFIED QUALITY CONTROL
qualityLevel = 0.3 # Single parameter for all quality control
# Display Settings
maxSuggestions = 3 # Number of suggestions to show
excludePages = # Additional pages to exclude from display
excerptLength = 100 # Text excerpt length
# NLP Settings (v3.0+)
enableStemming = 1 # Enable advanced text processing
defaultLanguage = en # Fallback language
debugMode = 0 # Debug logging
}
}
Configuration Examples
Basic Setup (Most Common)
# Single quality level controls everything
qualityLevel = 0.3
# Result:
# - Storage threshold: 0.2 (collects broad range)
# - Display threshold: 0.3 (shows quality suggestions)
# - No configuration conflicts possible
German Site Optimization
# Lower threshold for German compound words
qualityLevel = 0.25
enableStemming = 1
# Result optimized for compound words like:
# "Automobil" โ "Automobilindustrie"
High-Quality Site
# Higher threshold for premium content
qualityLevel = 0.4
# Result:
# - Storage threshold: 0.3 (good range)
# - Display threshold: 0.4 (only excellent suggestions)
Migration from Legacy Configuration
Automatic Migration
The extension automatically migrates old configurations:
Legacy v3.0:
Scheduler: minimumSimilarity = 0.4
TypoScript: proximityThreshold = 0.5
Auto-migrated to v3.1:
qualityLevel = 0.5 (migrated from proximityThreshold)
Internal storage = 0.4 (preserved)
Manual Migration Guide
- Identify your old
proximityThreshold
from TypoScript - Set
qualityLevel
to that value - Remove old parameters:
- Delete
proximityThreshold
from TypoScript - Delete
minimumSimilarity
from Scheduler (auto-computed) - Merge duplicate
excludePages
lists
- Delete
Example Migration:
# OLD v3.0 configuration
plugin.tx_semanticsuggestion_suggestions.settings {
proximityThreshold = 0.35 # OLD: Display threshold
excludePages = 42,56,78 # OLD: Display exclusions only
}
# NEW v3.1 configuration
plugin.tx_semanticsuggestion_suggestions.settings {
qualityLevel = 0.35 # NEW: Unified quality control
excludePages = 42,56,78 # NEW: Unified exclusions (analysis + display)
}
# Scheduler task OLD: minimumSimilarity = 0.25, excludePages = ""
# Scheduler task NEW: qualityLevel = 0.35 (automatically computes storage = 0.25)
Legacy Configuration Constraints & Common Pitfalls
Understanding these technical constraints will save you hours of debugging:
๐ซ Critical Constraints
1. Threshold Hierarchy Rule
โ WRONG Configuration:
Scheduler: minimumSimilarity = 0.7
TypoScript: proximityThreshold = 0.3
โ Result: NO suggestions displayed (none stored below 0.7)
โ
CORRECT Configuration:
Scheduler: minimumSimilarity = 0.3
TypoScript: proximityThreshold = 0.7
โ Result: Shows high-quality suggestions from broader stored data
Rule: proximityThreshold
โฅ minimumSimilarity
(or equal)
2. Exclusion Scope Difference
โ PROBLEMATIC:
Scheduler: excludePages = "" (empty)
TypoScript: excludePages = "42,56,78"
โ Result: Pages 42,56,78 are analyzed/stored but never displayed (wasted processing)
โ
EFFICIENT:
Scheduler: excludePages = "42,56,78"
TypoScript: excludePages = "" (empty or same list)
โ Result: Pages 42,56,78 are never processed (faster, cleaner)
3. TF-IDF Score Ranges (NEW in v3.0)
โ ๏ธ TF-IDF produces LOWER scores than v2.x cosine similarity:
v2.x typical range: 0.3-0.9
v3.0 TF-IDF range: 0.05-0.4
โ Legacy Configuration:
minimumSimilarity = 0.8 โ NO results with TF-IDF
โ
TF-IDF Optimized:
minimumSimilarity = 0.15 โ Good range for TF-IDF
proximityThreshold = 0.25 โ Quality suggestions
๐ Language-Specific Constraints
German Language (Compound Words)
๐ฉ๐ช German sites need LOWER thresholds due to compound word stemming:
โ Standard Configuration:
proximityThreshold = 0.5 โ Misses compound relationships
โ
German Optimized:
minimumSimilarity = 0.15
proximityThreshold = 0.25
enableStemming = 1
โ Captures "Automobil" โ "Automobilindustrie" relationships
Multi-language Sites
โ ๏ธ CONSTRAINT: Each language needs separate analysis
โ Single Task Configuration:
Task 1: startPageId = 1 (analyzing both English + German pages)
โ Result: Language mixing, poor similarity quality
โ
Multi-Task Configuration:
Task 1: startPageId = 1 (English root) - minimumSimilarity = 0.3
Task 2: startPageId = 10 (German root) - minimumSimilarity = 0.25
โ Result: Optimized per-language analysis
โก Performance Constraints
Large Site Thresholds
Sites with >500 pages:
โ Permissive Configuration:
minimumSimilarity = 0.05 โ Database bloat (millions of records)
maxSuggestions = 10 โ Slow frontend queries
โ
Performance Optimized:
minimumSimilarity = 0.25 โ Quality storage
proximityThreshold = 0.4 โ Fast display
maxSuggestions = 3 โ Quick queries
๐ง Validation Rules Checklist
Before deploying, verify these constraints:
- Threshold Check:
proximityThreshold
โฅminimumSimilarity
- Exclusion Sync: Scheduler
excludePages
includes all TypoScript exclusions - TF-IDF Adjustment: Thresholds lowered from v2.x values (โค 0.4 typically)
- Language Separation: Each language has its own Scheduler task
- Performance Test: Task execution time acceptable for your server
- Storage Monitoring: Database table
tx_semanticsuggestion_similarities
size reasonable
๐จ Warning Signs of Misconfiguration
Watch for these symptoms:
Symptom | Likely Cause | Solution |
---|---|---|
Zero suggestions displayed | proximityThreshold > all stored similarities |
Lower proximityThreshold or minimumSimilarity |
Poor suggestion quality | Threshold too low | Raise proximityThreshold |
Missing expected pages | Pages excluded in Scheduler | Check excludePages settings |
Scheduler timeouts | Large site with low threshold | Raise minimumSimilarity |
Mixed language results | Single task for multilingual site | Create per-language tasks |
Usage (Frontend)
Integrate the plugin into your Fluid templates to display suggestions:
<f:cObject typoscriptObjectPath='lib.semantic_suggestion' />
Or include it directly in TypoScript:
# Include on a page
page.10 =< lib.semantic_suggestion
# Or in a content element
lib.myContent = COA
lib.myContent {
10 =< lib.semantic_suggestion
}
The plugin will read relevant suggestions for the current page from the database, applying filters defined in the TypoScript settings (proximityThreshold
, maxSuggestions
, excludePages
).
Backend Module
A backend module ("Semantic Suggestion" under "Web") allows visualizing the results of the analyses stored in the database.
Features
- Analysis Selection: Choose which analysis to view (based on the
startPageId
/root_page_id
of executed Scheduler tasks). - Detailed Statistics: Most similar pairs, score distribution, pages with the most links, language statistics.
- Configuration Overview: Reminder of the main parameters used (display threshold, etc.).
- Performance Metrics: Module load time, number of stored pairs for the selected analysis.
Scheduler Task
The "Semantic Suggestion: Generate Similarities" task is essential for the extension's operation.
- Role: Calculates similarities between pages (using
PageAnalysisService
) and saves relevant results (above theminimumSimilarity
threshold) to thetx_semanticsuggestion_similarities
table. - Configuration: Set the
startPageId
,excludePages
, andminimumSimilarity
via the Scheduler interface. - Frequency: Schedule its execution regularly (e.g., daily, weekly) during off-peak hours to keep suggestions up-to-date without impacting site performance.
Similarity Logic (TF-IDF Enhanced)
๐ Processing Pipeline
-
๐ฏ Language Detection (NEW in v3.0)
Text Input โ TYPO3 Site Context โ Content Analysis โ Language: "de"
-
๐ Advanced Text Processing (via nlp_tools)
Raw Text โ Stop Words Removal โ Stemming (German) โ Clean Tokens Example: "Die deutschen Automobilhersteller" โ "deutsch automobilherstell" (stemmed)
-
๐ข TF-IDF Vectorization (NEW in v3.0)
[Text1, Text2] โ TF-IDF Vectors โ Cosine Similarity Score Traditional: Simple word counting TF-IDF: Professional semantic analysis
-
๐พ Smart Storage & Display
Similarity Score + Recency Boost โ Database โ Frontend Filter Scheduler threshold: 0.3 โ Display threshold: 0.5
๐ Algorithm Improvements
Before (v2.x): Basic Cosine Similarity
// Simple word frequency comparison $similarity = $dotProduct / ($magnitude1 * $magnitude2);
After (v3.0): TF-IDF + NLP
// Professional semantic analysis $tfidfResult = $this->textVectorizer->createTfIdfVectors([$text1, $text2], $language); $similarity = $this->textVectorizer->cosineSimilarity($vector1, $vector2); // + German stemming, stop word removal, confidence scoring
Performance Impact: 30-50% better accuracy, especially for German compound words.
Multilingual Sites Setup
๐ TYPO3 12/13 Site Configuration
The extension automatically detects languages from your TYPO3 site configuration:
# config/sites/main/config.yaml languages: - languageId: 0 title: 'English' hreflang: 'en-US' locale: 'en_US.UTF-8' # โ Auto-detected as "en" iso-639-1: 'en' - languageId: 1 title: 'Deutsch' hreflang: 'de-DE' locale: 'de_DE.UTF-8' # โ Auto-detected as "de" iso-639-1: 'de' - languageId: 2 title: 'Franรงais' hreflang: 'fr-FR' locale: 'fr_FR.UTF-8' # โ Auto-detected as "fr" iso-639-1: 'fr'
๐ฏ Per-Language Scheduler Tasks
For optimal performance, create separate Scheduler tasks for each language:
Task 1: "Generate Similarities - English"
- startPageId: 1 (English root)
- minimumSimilarity: 0.3
Task 2: "Generate Similarities - German"
- startPageId: 2 (German root)
- minimumSimilarity: 0.25 # Lower for German (compound words)
Task 3: "Generate Similarities - French"
- startPageId: 3 (French root)
- minimumSimilarity: 0.3
๐ง Language-Specific Tuning
German Language Optimization
plugin.tx_semanticsuggestion_suggestions.settings {
# German compound words need lower thresholds
proximityThreshold = 0.25
# Enable stemming for better compound word detection
enableStemming = 1
# German titles are often highly descriptive
analyzedFields {
title = 2.0 # Higher weight for German titles
keywords = 2.5 # German keywords are very specific
}
}
๐จ Troubleshooting Multilingual Sites
Problem: Language always detected as "en"
Solution: Check your TypoScript language mapping:
plugin.tx_semanticsuggestion_suggestions.settings {
languageMapping {
0 = en
1 = de # Make sure this matches your site language UID
2 = fr
}
}
Problem: Mixed language content not detected properly
Enable debug mode to see detection process:
plugin.tx_semanticsuggestion_suggestions.settings {
debugMode = 1
}
Check logs for:
Language detected via nlp_tools: de
TF-IDF similarity calculated: language=de, vocabularySize=156
Confidence verification: firstScore=0.85, secondScore=0.45, using content analysis
Display Customization
Modify the appearance of suggestions by overriding the plugin's Fluid template (List.html
). Configure the paths to your custom templates in TypoScript (see Configuration section).
Debugging & Performance
๐ Debug Mode
Enable comprehensive logging to troubleshoot language detection and similarity calculation:
plugin.tx_semanticsuggestion_suggestions.settings {
debugMode = 1
}
Debug logs show:
[INFO] Language detected via nlp_tools: de
[DEBUG] TF-IDF similarity calculated: page1=123, page2=456, similarity=0.75, vocabularySize=245
[DEBUG] German stemming applied: "Automobilindustrie" โ "automobilindustr"
[DEBUG] Confidence verification: using TYPO3 context due to low confidence difference
[ERROR] Failed to create TF-IDF vectors: text too short, using fallback calculation
โก Performance Monitoring
Backend Module shows performance metrics:
- TF-IDF processing time per page pair
- Language detection accuracy
- Vocabulary size per language
- Cache hit rates for stemming/vectorization
Scheduler Task execution time:
- Small sites (<100 pages): ~10-30 seconds
- Medium sites (100-500 pages): ~1-5 minutes
- Large sites (>500 pages): ~5-30 minutes
๐ Performance Optimization
Recommended Settings for Large Sites (>500 pages)
# Scheduler Configuration: minimumSimilarity = 0.3 (storage optimization)
plugin.tx_semanticsuggestion_suggestions.settings {
# Performance optimizations
minTextLength = 100 # Skip short content (faster processing)
proximityThreshold = 0.4 # Higher display threshold (faster queries)
maxSuggestions = 3 # Limit results (faster rendering)
# Selective field analysis (reduce processing time)
analyzedFields {
title = 2.0 # Titles are fast to process
keywords = 2.0 # Keywords are lightweight
content = 0.5 # Reduce content weight (can be slow)
description = 1.0 # Meta descriptions are fast
abstract = 0 # Disable if not commonly used
}
# Conservative language settings
confidenceThreshold = 0.4 # Higher confidence reduces processing
enableStemming = 1 # Keep enabled (cached results)
}
# SCHEDULER OPTIMIZATION for large sites:
# Split into multiple tasks:
# Task 1: Pages 1-100 (daily)
# Task 2: Pages 101-200 (every 2 days)
# Task 3: Pages 201+ (weekly)
Cache Configuration
Ensure TYPO3 cache is properly configured for optimal performance:
# Clear cache after configuration changes ./vendor/bin/typo3 cache:flush # Monitor cache effectiveness ./vendor/bin/typo3 cache:listGroups
Configuration Troubleshooting
๐ Step-by-Step Diagnosis
When suggestions aren't working as expected, follow this systematic approach:
Step 1: Verify Scheduler Task Execution
# Check if task has run successfully ./vendor/bin/typo3 scheduler:run # Check TYPO3 logs for task errors tail -f var/log/typo3_*.log | grep -i semantic
Expected Output:
[INFO] Starting similarity generation task, startPageId: 1, minimumSimilarity: 0.3
[INFO] Similarity generation task completed successfully
Step 2: Verify Database Content
-- Check if similarities are stored SELECT COUNT(*) as total_similarities FROM tx_semanticsuggestion_similarities; -- Check score distribution SELECT ROUND(similarity_score, 1) as score_range, COUNT(*) as count, sys_language_uid FROM tx_semanticsuggestion_similarities GROUP BY ROUND(similarity_score, 1), sys_language_uid ORDER BY score_range DESC;
Healthy Output Example:
score_range | count | sys_language_uid
0.4 | 15 | 0
0.3 | 42 | 0
0.2 | 128 | 0
0.1 | 203 | 0
Step 3: Test Configuration Hierarchy
# Enable debug mode temporarily
plugin.tx_semanticsuggestion_suggestions.settings {
debugMode = 1
}
Check Debug Logs:
tail -f typo3temp/logs/semantic_suggestion.log
๐จ Common Problems & Solutions
Problem 1: "No suggestions displayed anywhere"
Symptoms:
- Frontend shows empty suggestions list
- Backend module shows 0 similar pairs
Diagnosis Checklist:
โ Scheduler task executed successfully?
โ Database contains similarities?
โ proximityThreshold โค stored similarities?
โ Current page has stored similarities?
Solutions:
-
Threshold Too High
โ Current: proximityThreshold = 0.8 โ Fix: proximityThreshold = 0.3 (or lower) # Or check what's actually in database: SELECT MAX(similarity_score) FROM tx_semanticsuggestion_similarities;
-
Scheduler Never Ran
# Manual execution ./vendor/bin/typo3 scheduler:run <task_id> # Check task configuration SELECT * FROM tx_scheduler_task WHERE classname LIKE '%Similarities%';
-
Wrong Root Page
โ Current: startPageId = 1, viewing page = 42 โ Fix: startPageId = 1, ensure page 42 is child of page 1 # Verify page tree relationship SELECT pid, title FROM pages WHERE uid = 42;
Problem 2: "Suggestions displayed but poor quality"
Symptoms:
- Suggestions shown but irrelevant
- Mixed languages in suggestions
- Very low similarity scores
Solutions:
-
TF-IDF Score Adjustment (v3.0+)
โ Legacy: proximityThreshold = 0.7 โ TF-IDF: proximityThreshold = 0.25 # TF-IDF scores are naturally lower
-
Language Separation Required
โ Single task: Pages 1-100 (mixed EN/DE content) โ Multi-task: - Task 1: English pages (1-50) - Task 2: German pages (51-100)
-
Field Weight Optimization
# Increase weight of reliable fields analyzedFields { title = 2.0 # Titles are usually accurate keywords = 2.5 # Keywords are intentional content = 0.5 # Content can be noisy }
Problem 3: "Missing expected pages in suggestions"
Symptoms:
- Page A should suggest Page B (they're clearly related)
- Page B exists in database but not suggested to Page A
Diagnosis:
-- Check if relationship exists in database SELECT similarity_score FROM tx_semanticsuggestion_similarities WHERE page_id = A AND similar_page_id = B; -- Check if excluded somewhere SELECT exclude_pages FROM tx_scheduler_task WHERE classname LIKE '%Similarities%';
Solutions:
-
Page Excluded in Scheduler
โ Scheduler excludePages = "42,56,78" (contains Page B) โ Remove Page B from exclusions, re-run Scheduler
-
Similarity Below Threshold
-- Find actual similarity score SELECT similarity_score FROM tx_semanticsuggestion_similarities WHERE page_id = A AND similar_page_id = B; -- If score = 0.22 but proximityThreshold = 0.3 -- Lower the threshold or improve content similarity
-
Text Content Insufficient
Check if Page A or B has minimal text content: - Minimum 50 characters required - Pure image pages won't generate similarities - Check 'minTextLength' setting
Problem 4: "Scheduler task timeouts or fails"
Symptoms:
- Task shows "Failed" status
- PHP timeout errors in logs
- Task takes >5 minutes
Solutions:
-
Increase Processing Limits
# In Scheduler task or php.ini ini_set('max_execution_time', 300); // 5 minutes ini_set('memory_limit', '512M');
-
Reduce Analysis Scope
โ Current: startPageId = 1 (1000+ pages) โ Split: - Task 1: startPageId = 1 (pages 1-100) - Task 2: startPageId = 101 (pages 101-200)
-
Increase Threshold
โ Current: minimumSimilarity = 0.05 (stores everything) โ Optimized: minimumSimilarity = 0.25 (quality only)
Problem 5: "Mixed language suggestions"
Symptoms:
- English page suggests German pages
- Suggestions ignore language boundaries
Solutions:
-
Enable Language Mapping
plugin.tx_semanticsuggestion_suggestions.settings { languageMapping { 0 = en 1 = de 2 = fr } }
-
Check Site Configuration
# site/config.yaml should have proper locales languages: - languageId: 0 locale: 'en_US.UTF-8' # โ Must be properly formatted - languageId: 1 locale: 'de_DE.UTF-8' # โ Must be properly formatted
-
Separate Scheduler Tasks
Instead of: One task analyzing mixed language tree Use: One task per language branch
๐ง Quick Fix Commands
# Clear all caches ./vendor/bin/typo3 cache:flush # Re-run all scheduler tasks ./vendor/bin/typo3 scheduler:run # Check database table size echo "SELECT COUNT(*) FROM tx_semanticsuggestion_similarities;" | mysql your_db # Reset extension configuration (emergency) ./vendor/bin/typo3 configuration:remove --path="EXTENSIONS/semantic_suggestion" ./vendor/bin/typo3 extension:deactivate semantic_suggestion ./vendor/bin/typo3 extension:activate semantic_suggestion
๐ Pre-Deployment Checklist
Before going live, verify:
- Scheduler Tasks: All tasks run successfully without timeouts
- Database Check:
tx_semanticsuggestion_similarities
contains expected data - Threshold Validation:
proximityThreshold
โฅminimumSimilarity
- Language Testing: Each language shows appropriate suggestions
- Performance Test: Frontend loads suggestions in <200ms
- Content Quality: Manual review of suggestion relevance
- Exclusion Review: All intentionally excluded pages work correctly
๐ก๏ธ Final Configuration Validation Checklist
Use this comprehensive checklist to ensure your configuration is optimal:
โ Scheduler Configuration Validation
- startPageId exists and is accessible:
SELECT title FROM pages WHERE uid = [startPageId];
- minimumSimilarity appropriate for TF-IDF: Between 0.1 and 0.4 (not v2.x values like 0.8)
- excludePages list verified: All UIDs exist and are intentionally excluded
- Task execution successful: Check task history and logs for errors
- Multilingual separation: Each language has its own task (recommended)
โ TypoScript Configuration Validation
- Threshold hierarchy respected:
proximityThreshold โฅ minimumSimilarity
- TF-IDF thresholds updated: Not using v2.x legacy values (>0.5)
- Language settings match site:
languageMapping
corresponds to TYPO3 language UIDs - Field weights optimized: Higher weights for title/keywords, lower for content
- Performance settings:
maxSuggestions
andminTextLength
appropriate for site size
โ Database & Storage Validation
-- Verify data exists and has reasonable distribution SELECT sys_language_uid, MIN(similarity_score) as min_score, MAX(similarity_score) as max_score, AVG(similarity_score) as avg_score, COUNT(*) as total_pairs FROM tx_semanticsuggestion_similarities GROUP BY sys_language_uid; -- Check for unexpected language mixing SELECT DISTINCT root_page_id, sys_language_uid, COUNT(*) as pairs FROM tx_semanticsuggestion_similarities GROUP BY root_page_id, sys_language_uid;
โ Frontend Integration Validation
- Plugin displays correctly:
<f:cObject typoscriptObjectPath='lib.semantic_suggestion' />
works - Suggestions appear on pages: Test on multiple pages with different content
- Language separation working: German pages don't show English suggestions
- Exclusions effective: Excluded pages don't appear in suggestions
- Performance acceptable: Page load time impact <100ms
โ NLP & Language Detection Validation
- nlp_tools dependency installed:
composer show cywolf/nlp-tools
- Stemming working (German sites): Debug logs show stemmed words
- Language detection accurate: Pages analyzed in correct language
- Site configuration proper: Locales formatted as
de_DE.UTF-8
(not justde
)
๐จ Red Flags to Watch For
Warning Sign | Quick Test | Solution |
---|---|---|
Zero suggestions anywhere | SELECT COUNT(*) FROM tx_semanticsuggestion_similarities; |
Lower thresholds or check task |
Very low similarity scores (all <0.1) | Check debug logs for TF-IDF failures | Verify text length and language detection |
Mixed language results | Test DE page shows EN suggestions | Separate scheduler tasks |
Poor suggestion quality | Manual review of top suggestions | Adjust field weights or thresholds |
Slow performance | Frontend timing >500ms | Increase thresholds or reduce maxSuggestions |
๐ฏ Configuration Quality Score
Rate your setup (aim for 80%+ before going live):
Basic Setup (50 points)
- Scheduler task runs (20 pts)
- Database contains data (15 pts)
- Frontend shows suggestions (15 pts)
Optimization (30 points)
- TF-IDF thresholds optimized (10 pts)
- Language separation implemented (10 pts)
- Performance <200ms (10 pts)
Advanced Features (20 points)
- German stemming active (5 pts)
- Debug logging configured (5 pts)
- Exclusions properly managed (5 pts)
- Field weights customized (5 pts)
Score: ___ / 100
Target: 80+ for production deployment Minimum: 60+ for staging/testing
Migration from v2.x
๐ Automatic Migration
The extension includes automatic fallback mechanisms:
- TF-IDF Processing: Falls back to old cosine similarity if nlp_tools fails
- Language Detection: Falls back to TypoScript mapping if site detection fails
- Text Processing: Falls back to basic stop word removal if stemming fails
๐ Migration Checklist
-
Install nlp_tools dependency:
composer require cywolf/nlp-tools
-
Update TypoScript configuration (add new NLP settings):
plugin.tx_semanticsuggestion_suggestions.settings { enableStemming = 1 languageMapping { 0 = en 1 = de # ... your language mapping } }
-
Clear TYPO3 cache:
./vendor/bin/typo3 cache:flush
-
Regenerate similarities (run Scheduler task):
- All existing similarities will be recalculated with TF-IDF
- German content should see immediate improvement
- Check debug logs to verify language detection
-
Monitor performance:
- Check backend module for accuracy improvements
- Compare similarity scores before/after migration
- Verify multilingual sites work correctly
๐ Rollback Plan
If you experience issues, you can disable advanced features:
plugin.tx_semanticsuggestion_suggestions.settings {
enableStemming = 0 # Disable stemming
debugMode = 1 # Enable debug logging
minTextLength = 200 # Increase minimum text length
}
The extension will automatically fall back to v2.x behavior while maintaining database compatibility.
Contributing
Contributions are welcome! Fork the repository, create a branch, make your changes, and submit a Pull Request.
License
This project is licensed under the GNU General Public License v2.0 or later. See the LICENSE file.
Support
Contact: Wolfangel Cyril (cyril.wolfangel@gmail.com) Bugs & Features: GitHub Issues Documentation & Updates: GitHub Repository