Cache Reaper Script
The cache reaper is a Python script that automatically cleans up stale custom domain mappings from Redis. It's an essential maintenance tool that should be run periodically via cron to ensure that domains pointing to deleted or reconfigured repositories are removed from the cache.
Overview
What Problems Does It Solve?
Custom domain mappings in the Pages Server use persistent storage (TTL=0) to prevent them from expiring unexpectedly. However, this means they won't be automatically cleaned up when:
- A repository is deleted
- A
.pagesfile is removed from a repository - A custom domain is changed in the
.pagesfile - A repository becomes private and inaccessible
The reaper script solves this by:
- Periodically scanning all custom domain mappings in Redis
- Verifying each repository still has an active
.pagesfile - Removing stale mappings that are no longer valid
What Does It Clean Up?
When a stale domain mapping is detected, the reaper removes:
- Forward mapping:
custom_domain:{domain}→username:repository - Reverse mapping:
username:repository→domain - Traefik router configs: All
traefik/http/routers/custom-{domain}/*keys
This ensures complete cleanup of all related cache entries.
Installation
Prerequisites
- Python 3.7 or later
- Access to Redis - Same Redis instance used by the plugin
- Network access to Forgejo - To verify
.pagesfiles via API - (Optional) Forgejo API token - Required for checking private repositories
Install Dependencies
Navigate to the reaper directory and install Python dependencies:
cd reaper
pip install -r requirements.txt
Using a virtual environment (recommended):
cd reaper
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Configuration
The reaper accepts configuration via command-line arguments or environment variables.
Command-Line Arguments
| Argument | Description | Required | Default |
|---|---|---|---|
--redis-host |
Redis server hostname | No | localhost |
--redis-port |
Redis server port | No | 6379 |
--redis-password |
Redis password | No | None |
--forgejo-host |
Forgejo host URL | Yes | None |
--forgejo-token |
Forgejo API token | No* | None |
--dry-run |
Test mode - don't actually delete | No | false |
*Required if you need to check private repositories
Environment Variables
You can use environment variables instead of (or in addition to) command-line arguments:
export REDIS_HOST=localhost
export REDIS_PORT=6379
export REDIS_PASSWORD=mypassword
export FORGEJO_HOST=https://git.example.com
export FORGEJO_TOKEN=my-api-token
Environment variables are recommended for production deployments as they're more secure than command-line arguments (which may be visible in process listings).
Usage
Testing with Dry Run
Always test with --dry-run first to see what would be deleted without actually deleting anything:
python reaper.py --redis-host localhost \
--forgejo-host https://git.example.com \
--dry-run
Example Dry Run Output
✓ Redis connection successful
🔍 Scanning Redis at localhost:6379
🌐 Forgejo API: https://git.example.com
🔍 DRY RUN MODE - No changes will be made
📋 example.com -> user1/old-repo
❌ Repository no longer has .pages file
🔍 [DRY RUN] Would delete 7 keys:
- custom_domain:example.com
- user1:old-repo
- traefik/http/routers/custom-example-com/rule
- traefik/http/routers/custom-example-com/entrypoints/0
- traefik/http/routers/custom-example-com/service
- traefik/http/routers/custom-example-com/tls/certresolver
- traefik/http/routers/custom-example-com/middlewares/0
📋 squarecows.com -> squarecows/sqcows-web
✓ Repository still has .pages file
============================================================
📊 REAPER SUMMARY
============================================================
Total domains scanned: 2
Stale domains cleaned: 1
Errors encountered: 0
Duration: 1.23 seconds
🔍 DRY RUN - No actual changes were made
============================================================
Production Usage
Once you've verified the dry run output, remove the --dry-run flag to actually delete stale entries:
python reaper.py --redis-host localhost \
--forgejo-host https://git.example.com \
--forgejo-token your-api-token
Using the Shell Wrapper
For easier execution, especially with environment variables, use the provided shell script:
-
Copy and configure the wrapper script:
cp run-reaper.sh my-reaper.sh chmod +x my-reaper.sh -
Edit the configuration in
my-reaper.sh:export REDIS_HOST=localhost export REDIS_PORT=6379 export REDIS_PASSWORD=mypassword export FORGEJO_HOST=https://git.example.com export FORGEJO_TOKEN=my-api-token -
Run the script:
./my-reaper.sh -
Dry run with wrapper:
./my-reaper.sh --dry-run
Scheduling with Cron
The reaper is designed to run periodically via cron. The recommended frequency depends on your usage:
- High-traffic sites: Run hourly
- Medium-traffic sites: Run every 6 hours
- Low-traffic sites: Run daily
Cron Examples
Edit your crontab:
crontab -e
Run Every Hour
0 * * * * /usr/bin/python3 /path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com >> /var/log/pages-reaper.log 2>&1
Run Every 6 Hours
0 */6 * * * /usr/bin/python3 /path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com >> /var/log/pages-reaper.log 2>&1
Run Daily at 3 AM
0 3 * * * /usr/bin/python3 /path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com >> /var/log/pages-reaper.log 2>&1
Using Shell Wrapper (Recommended)
0 * * * * /path/to/reaper/my-reaper.sh >> /var/log/pages-reaper.log 2>&1
Cron Setup Best Practices
- Use absolute paths for both the script and Python interpreter
- Redirect output to a log file for debugging
- Set appropriate permissions on the wrapper script:
chmod 700 - Test the cron command manually before adding to crontab
- Monitor the logs regularly to ensure it's working correctly
Exit Codes
The reaper returns different exit codes for integration with monitoring systems:
| Exit Code | Meaning | Description |
|---|---|---|
0 |
Success | All domains processed without errors |
1 |
Fatal error | Can't connect to Redis, unexpected exception |
2 |
Partial success | Some domains processed with errors |
130 |
Interrupted | User cancelled with Ctrl+C |
Monitoring with Exit Codes
Example monitoring script:
#!/bin/bash
/path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "Reaper completed successfully"
elif [ $EXIT_CODE -eq 2 ]; then
echo "WARNING: Reaper completed with some errors"
# Send alert
else
echo "ERROR: Reaper failed"
# Send critical alert
fi
Security Considerations
API Token Security
- Use read-only tokens: Create a Forgejo API token with minimal permissions
- Use environment variables: Don't pass tokens via command-line arguments
- Secure the wrapper script: Set permissions to
chmod 700 - Rotate tokens regularly: Follow your security policy for API token rotation
Redis Security
- Use Redis password: Configure
REDIS_PASSWORDif Redis requires authentication - Network isolation: Ensure Redis is not exposed to the internet
- Firewall rules: Restrict Redis access to authorized hosts only
File Permissions
Recommended permissions:
chmod 700 my-reaper.sh # Only owner can read/write/execute
chmod 644 reaper.py # Everyone can read, only owner can write
chmod 600 .env # Only owner can read/write (if using .env file)
Troubleshooting
Redis Connection Errors
Error:
✗ Failed to connect to Redis: Error 111 connecting to localhost:6379. Connection refused.
Solutions:
- Check Redis is running:
redis-cli ping - Verify correct host/port:
redis-cli -h localhost -p 6379 ping - Check Redis password if required
- Verify firewall rules allow connection
Forgejo API Errors
Error:
⚠️ Error checking user1/repo1: 401 Unauthorized
Solutions:
- Verify Forgejo host URL is correct (include
https://) - Check API token is valid: Test with
curl -H "Authorization: token YOUR_TOKEN" https://git.example.com/api/v1/user - Ensure token has repository read permissions
- Check if repository is accessible with the token
Permission Denied
Error:
bash: ./reaper.py: Permission denied
Solution:
chmod +x reaper.py
Module Not Found
Error:
ModuleNotFoundError: No module named 'redis'
Solution:
pip install -r requirements.txt
Or activate your virtual environment:
source venv/bin/activate
pip install -r requirements.txt
No Domains Found
If the reaper reports 0 domains scanned:
- Check Redis connection: Verify you're connecting to the correct Redis instance
- Check Redis database: Ensure you're using the same database number as the plugin
- Verify mappings exist: Use
redis-cli KEYS "custom_domain:*"to list mappings
Monitoring and Maintenance
Log File Management
Create a log rotation configuration to prevent log files from growing too large:
Create /etc/logrotate.d/pages-reaper:
/var/log/pages-reaper.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 0644 nobody nobody
}
Monitoring Metrics
Track these metrics to ensure the reaper is working correctly:
- Total domains scanned - Should match the number of active custom domains
- Stale domains cleaned - Higher than expected may indicate problems
- Error count - Should be zero or very low
- Execution duration - Monitor for performance degradation
Health Checks
Set up automated health checks:
#!/bin/bash
# Check if reaper ran in the last 2 hours
LOG_FILE="/var/log/pages-reaper.log"
HOURS=2
if [ -f "$LOG_FILE" ]; then
LAST_RUN=$(stat -f %m "$LOG_FILE") # macOS
# LAST_RUN=$(stat -c %Y "$LOG_FILE") # Linux
NOW=$(date +%s)
DIFF=$((NOW - LAST_RUN))
MAX_AGE=$((HOURS * 3600))
if [ $DIFF -gt $MAX_AGE ]; then
echo "WARNING: Reaper hasn't run in $HOURS hours"
# Send alert
fi
fi
Redis Key Monitoring
Monitor the number of custom domain keys in Redis:
# Count custom domain keys
redis-cli --scan --pattern "custom_domain:*" | wc -l
# List all custom domains
redis-cli --scan --pattern "custom_domain:*"
# Check a specific domain
redis-cli GET "custom_domain:example.com"
Advanced Usage
Custom Redis Database
If your plugin uses a specific Redis database number (not 0):
# Modify the Redis connection in reaper.py:
self.redis_client = redis.Redis(
host=redis_host,
port=redis_port,
password=redis_password,
db=5, # Add database number
decode_responses=True,
)
Rate Limiting
If you have many domains and want to avoid overwhelming the Forgejo API:
# Add to reaper.py after checking each repository:
import time
time.sleep(0.1) # 100ms delay between API calls
Custom Patterns
To clean up additional cache patterns, modify the delete_domain_mappings method in reaper.py to include additional keys.
Best Practices
Development vs Production
- Development: Run hourly with dry-run enabled
- Staging: Run every 6 hours without dry-run
- Production: Run every 1-6 hours depending on traffic
Before Major Changes
Always run a dry-run before:
- Updating the reaper script
- Changing Forgejo host or credentials
- Modifying Redis configuration
- After long periods of downtime
Backup Strategy
Before running the reaper in production:
- Backup Redis:
redis-cli --rdb /backup/dump.rdb - Export mappings:
redis-cli --scan --pattern "custom_domain:*" > domains-backup.txt - Test restore: Verify you can restore from backup
Example Production Setup
Directory Structure
/opt/forgejo-pages/
├── reaper/
│ ├── reaper.py
│ ├── requirements.txt
│ ├── venv/
│ └── production-reaper.sh
└── logs/
└── reaper.log
Production Script
/opt/forgejo-pages/reaper/production-reaper.sh:
#!/bin/bash
set -euo pipefail
# Configuration
export REDIS_HOST=redis.internal
export REDIS_PORT=6379
export REDIS_PASSWORD=$(cat /secrets/redis-password)
export FORGEJO_HOST=https://git.example.com
export FORGEJO_TOKEN=$(cat /secrets/forgejo-token)
# Activate virtual environment
cd /opt/forgejo-pages/reaper
source venv/bin/activate
# Run reaper
python reaper.py
# Check exit code
if [ $? -ne 0 ]; then
echo "Reaper failed with exit code $?" | mail -s "Reaper Alert" admin@example.com
fi
Systemd Timer (Alternative to Cron)
/etc/systemd/system/pages-reaper.service:
[Unit]
Description=Forgejo Pages Cache Reaper
After=network.target
[Service]
Type=oneshot
User=nobody
Group=nobody
ExecStart=/opt/forgejo-pages/reaper/production-reaper.sh
StandardOutput=append:/opt/forgejo-pages/logs/reaper.log
StandardError=append:/opt/forgejo-pages/logs/reaper.log
/etc/systemd/system/pages-reaper.timer:
[Unit]
Description=Run Forgejo Pages Reaper Hourly
[Timer]
OnCalendar=hourly
Persistent=true
[Install]
WantedBy=timers.target
Enable and start:
systemctl enable pages-reaper.timer
systemctl start pages-reaper.timer
Support
- Documentation: See
reaper/README.mdin the main repository - Issues: https://code.squarecows.com/SquareCows/pages-server/issues
- Main Wiki: Home