1 Cache Reaper
Ric Harvey edited this page 2025-12-01 21:40:43 +00:00

Cache Reaper Script

The cache reaper is a Python script that automatically cleans up stale custom domain mappings from Redis. It's an essential maintenance tool that should be run periodically via cron to ensure that domains pointing to deleted or reconfigured repositories are removed from the cache.

Overview

What Problems Does It Solve?

Custom domain mappings in the Pages Server use persistent storage (TTL=0) to prevent them from expiring unexpectedly. However, this means they won't be automatically cleaned up when:

  • A repository is deleted
  • A .pages file is removed from a repository
  • A custom domain is changed in the .pages file
  • A repository becomes private and inaccessible

The reaper script solves this by:

  1. Periodically scanning all custom domain mappings in Redis
  2. Verifying each repository still has an active .pages file
  3. Removing stale mappings that are no longer valid

What Does It Clean Up?

When a stale domain mapping is detected, the reaper removes:

  • Forward mapping: custom_domain:{domain}username:repository
  • Reverse mapping: username:repositorydomain
  • Traefik router configs: All traefik/http/routers/custom-{domain}/* keys

This ensures complete cleanup of all related cache entries.

Installation

Prerequisites

  • Python 3.7 or later
  • Access to Redis - Same Redis instance used by the plugin
  • Network access to Forgejo - To verify .pages files via API
  • (Optional) Forgejo API token - Required for checking private repositories

Install Dependencies

Navigate to the reaper directory and install Python dependencies:

cd reaper
pip install -r requirements.txt

Using a virtual environment (recommended):

cd reaper
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Configuration

The reaper accepts configuration via command-line arguments or environment variables.

Command-Line Arguments

Argument Description Required Default
--redis-host Redis server hostname No localhost
--redis-port Redis server port No 6379
--redis-password Redis password No None
--forgejo-host Forgejo host URL Yes None
--forgejo-token Forgejo API token No* None
--dry-run Test mode - don't actually delete No false

*Required if you need to check private repositories

Environment Variables

You can use environment variables instead of (or in addition to) command-line arguments:

export REDIS_HOST=localhost
export REDIS_PORT=6379
export REDIS_PASSWORD=mypassword
export FORGEJO_HOST=https://git.example.com
export FORGEJO_TOKEN=my-api-token

Environment variables are recommended for production deployments as they're more secure than command-line arguments (which may be visible in process listings).

Usage

Testing with Dry Run

Always test with --dry-run first to see what would be deleted without actually deleting anything:

python reaper.py --redis-host localhost \
                 --forgejo-host https://git.example.com \
                 --dry-run

Example Dry Run Output

✓ Redis connection successful

🔍 Scanning Redis at localhost:6379
🌐 Forgejo API: https://git.example.com
🔍 DRY RUN MODE - No changes will be made

📋 example.com -> user1/old-repo
  ❌ Repository no longer has .pages file
  🔍 [DRY RUN] Would delete 7 keys:
     - custom_domain:example.com
     - user1:old-repo
     - traefik/http/routers/custom-example-com/rule
     - traefik/http/routers/custom-example-com/entrypoints/0
     - traefik/http/routers/custom-example-com/service
     - traefik/http/routers/custom-example-com/tls/certresolver
     - traefik/http/routers/custom-example-com/middlewares/0

📋 squarecows.com -> squarecows/sqcows-web
  ✓ Repository still has .pages file

============================================================
📊 REAPER SUMMARY
============================================================
Total domains scanned:  2
Stale domains cleaned:  1
Errors encountered:     0
Duration:               1.23 seconds

🔍 DRY RUN - No actual changes were made
============================================================

Production Usage

Once you've verified the dry run output, remove the --dry-run flag to actually delete stale entries:

python reaper.py --redis-host localhost \
                 --forgejo-host https://git.example.com \
                 --forgejo-token your-api-token

Using the Shell Wrapper

For easier execution, especially with environment variables, use the provided shell script:

  1. Copy and configure the wrapper script:

    cp run-reaper.sh my-reaper.sh
    chmod +x my-reaper.sh
    
  2. Edit the configuration in my-reaper.sh:

    export REDIS_HOST=localhost
    export REDIS_PORT=6379
    export REDIS_PASSWORD=mypassword
    export FORGEJO_HOST=https://git.example.com
    export FORGEJO_TOKEN=my-api-token
    
  3. Run the script:

    ./my-reaper.sh
    
  4. Dry run with wrapper:

    ./my-reaper.sh --dry-run
    

Scheduling with Cron

The reaper is designed to run periodically via cron. The recommended frequency depends on your usage:

  • High-traffic sites: Run hourly
  • Medium-traffic sites: Run every 6 hours
  • Low-traffic sites: Run daily

Cron Examples

Edit your crontab:

crontab -e

Run Every Hour

0 * * * * /usr/bin/python3 /path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com >> /var/log/pages-reaper.log 2>&1

Run Every 6 Hours

0 */6 * * * /usr/bin/python3 /path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com >> /var/log/pages-reaper.log 2>&1

Run Daily at 3 AM

0 3 * * * /usr/bin/python3 /path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com >> /var/log/pages-reaper.log 2>&1
0 * * * * /path/to/reaper/my-reaper.sh >> /var/log/pages-reaper.log 2>&1

Cron Setup Best Practices

  1. Use absolute paths for both the script and Python interpreter
  2. Redirect output to a log file for debugging
  3. Set appropriate permissions on the wrapper script: chmod 700
  4. Test the cron command manually before adding to crontab
  5. Monitor the logs regularly to ensure it's working correctly

Exit Codes

The reaper returns different exit codes for integration with monitoring systems:

Exit Code Meaning Description
0 Success All domains processed without errors
1 Fatal error Can't connect to Redis, unexpected exception
2 Partial success Some domains processed with errors
130 Interrupted User cancelled with Ctrl+C

Monitoring with Exit Codes

Example monitoring script:

#!/bin/bash
/path/to/reaper/reaper.py --redis-host localhost --forgejo-host https://git.example.com
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
    echo "Reaper completed successfully"
elif [ $EXIT_CODE -eq 2 ]; then
    echo "WARNING: Reaper completed with some errors"
    # Send alert
else
    echo "ERROR: Reaper failed"
    # Send critical alert
fi

Security Considerations

API Token Security

  1. Use read-only tokens: Create a Forgejo API token with minimal permissions
  2. Use environment variables: Don't pass tokens via command-line arguments
  3. Secure the wrapper script: Set permissions to chmod 700
  4. Rotate tokens regularly: Follow your security policy for API token rotation

Redis Security

  1. Use Redis password: Configure REDIS_PASSWORD if Redis requires authentication
  2. Network isolation: Ensure Redis is not exposed to the internet
  3. Firewall rules: Restrict Redis access to authorized hosts only

File Permissions

Recommended permissions:

chmod 700 my-reaper.sh      # Only owner can read/write/execute
chmod 644 reaper.py          # Everyone can read, only owner can write
chmod 600 .env               # Only owner can read/write (if using .env file)

Troubleshooting

Redis Connection Errors

Error:

✗ Failed to connect to Redis: Error 111 connecting to localhost:6379. Connection refused.

Solutions:

  • Check Redis is running: redis-cli ping
  • Verify correct host/port: redis-cli -h localhost -p 6379 ping
  • Check Redis password if required
  • Verify firewall rules allow connection

Forgejo API Errors

Error:

⚠️  Error checking user1/repo1: 401 Unauthorized

Solutions:

  • Verify Forgejo host URL is correct (include https://)
  • Check API token is valid: Test with curl -H "Authorization: token YOUR_TOKEN" https://git.example.com/api/v1/user
  • Ensure token has repository read permissions
  • Check if repository is accessible with the token

Permission Denied

Error:

bash: ./reaper.py: Permission denied

Solution:

chmod +x reaper.py

Module Not Found

Error:

ModuleNotFoundError: No module named 'redis'

Solution:

pip install -r requirements.txt

Or activate your virtual environment:

source venv/bin/activate
pip install -r requirements.txt

No Domains Found

If the reaper reports 0 domains scanned:

  1. Check Redis connection: Verify you're connecting to the correct Redis instance
  2. Check Redis database: Ensure you're using the same database number as the plugin
  3. Verify mappings exist: Use redis-cli KEYS "custom_domain:*" to list mappings

Monitoring and Maintenance

Log File Management

Create a log rotation configuration to prevent log files from growing too large:

Create /etc/logrotate.d/pages-reaper:

/var/log/pages-reaper.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    create 0644 nobody nobody
}

Monitoring Metrics

Track these metrics to ensure the reaper is working correctly:

  1. Total domains scanned - Should match the number of active custom domains
  2. Stale domains cleaned - Higher than expected may indicate problems
  3. Error count - Should be zero or very low
  4. Execution duration - Monitor for performance degradation

Health Checks

Set up automated health checks:

#!/bin/bash
# Check if reaper ran in the last 2 hours
LOG_FILE="/var/log/pages-reaper.log"
HOURS=2

if [ -f "$LOG_FILE" ]; then
    LAST_RUN=$(stat -f %m "$LOG_FILE")  # macOS
    # LAST_RUN=$(stat -c %Y "$LOG_FILE")  # Linux
    NOW=$(date +%s)
    DIFF=$((NOW - LAST_RUN))
    MAX_AGE=$((HOURS * 3600))

    if [ $DIFF -gt $MAX_AGE ]; then
        echo "WARNING: Reaper hasn't run in $HOURS hours"
        # Send alert
    fi
fi

Redis Key Monitoring

Monitor the number of custom domain keys in Redis:

# Count custom domain keys
redis-cli --scan --pattern "custom_domain:*" | wc -l

# List all custom domains
redis-cli --scan --pattern "custom_domain:*"

# Check a specific domain
redis-cli GET "custom_domain:example.com"

Advanced Usage

Custom Redis Database

If your plugin uses a specific Redis database number (not 0):

# Modify the Redis connection in reaper.py:
self.redis_client = redis.Redis(
    host=redis_host,
    port=redis_port,
    password=redis_password,
    db=5,  # Add database number
    decode_responses=True,
)

Rate Limiting

If you have many domains and want to avoid overwhelming the Forgejo API:

# Add to reaper.py after checking each repository:
import time
time.sleep(0.1)  # 100ms delay between API calls

Custom Patterns

To clean up additional cache patterns, modify the delete_domain_mappings method in reaper.py to include additional keys.

Best Practices

Development vs Production

  • Development: Run hourly with dry-run enabled
  • Staging: Run every 6 hours without dry-run
  • Production: Run every 1-6 hours depending on traffic

Before Major Changes

Always run a dry-run before:

  • Updating the reaper script
  • Changing Forgejo host or credentials
  • Modifying Redis configuration
  • After long periods of downtime

Backup Strategy

Before running the reaper in production:

  1. Backup Redis: redis-cli --rdb /backup/dump.rdb
  2. Export mappings: redis-cli --scan --pattern "custom_domain:*" > domains-backup.txt
  3. Test restore: Verify you can restore from backup

Example Production Setup

Directory Structure

/opt/forgejo-pages/
├── reaper/
│   ├── reaper.py
│   ├── requirements.txt
│   ├── venv/
│   └── production-reaper.sh
└── logs/
    └── reaper.log

Production Script

/opt/forgejo-pages/reaper/production-reaper.sh:

#!/bin/bash
set -euo pipefail

# Configuration
export REDIS_HOST=redis.internal
export REDIS_PORT=6379
export REDIS_PASSWORD=$(cat /secrets/redis-password)
export FORGEJO_HOST=https://git.example.com
export FORGEJO_TOKEN=$(cat /secrets/forgejo-token)

# Activate virtual environment
cd /opt/forgejo-pages/reaper
source venv/bin/activate

# Run reaper
python reaper.py

# Check exit code
if [ $? -ne 0 ]; then
    echo "Reaper failed with exit code $?" | mail -s "Reaper Alert" admin@example.com
fi

Systemd Timer (Alternative to Cron)

/etc/systemd/system/pages-reaper.service:

[Unit]
Description=Forgejo Pages Cache Reaper
After=network.target

[Service]
Type=oneshot
User=nobody
Group=nobody
ExecStart=/opt/forgejo-pages/reaper/production-reaper.sh
StandardOutput=append:/opt/forgejo-pages/logs/reaper.log
StandardError=append:/opt/forgejo-pages/logs/reaper.log

/etc/systemd/system/pages-reaper.timer:

[Unit]
Description=Run Forgejo Pages Reaper Hourly

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

systemctl enable pages-reaper.timer
systemctl start pages-reaper.timer

Support