Retention & Cleanup
Per-stream retention policy, scheduled cleanup, and the system cleanup endpoint
Retention & Cleanup
Audit and activity logs are pruned on a schedule. Each stream has its own retention window; both are deleted by a single in-process scheduler that ships with the audit-service.
Defaults
From audit-service/config/config.yaml:
| Setting | Default | Bound to |
|---|---|---|
retention.audit_retention_days | 365 | Cutoff for audit_logs |
retention.activity_retention_days | 90 | Cutoff for activity_logs |
retention.cleanup_interval | 24h | How often the scheduler runs |
retention.batch_size | 5000 | Max rows deleted per round-trip |
Override these in your environment-specific config or via the standard EDUSHADE_AUDIT_RETENTION_* env vars.
How Cleanup Runs
StartRetentionScheduler (in audit-service/services/retention_service.go) launches a goroutine when the audit-service boots:
- Runs
CleanupOldLogsonce on startup. - Sets up a
time.Tickeratcleanup_interval. - On each tick, runs
CleanupOldLogsagain. - Listens for context cancellation — on shutdown, the goroutine exits cleanly.
Errors are logged (log.Printf("Retention cleanup error: %v", err)) but do not stop the scheduler; the next tick will retry from scratch.
Batch Deletion
PostgreSQL doesn't support DELETE ... LIMIT, so the cleanup uses the standard subquery pattern:
DELETE FROM audit_logs
WHERE id IN (
SELECT id FROM audit_logs
WHERE created_at < <cutoff>
LIMIT <batch_size>
)The loop keeps running until a batch deletes fewer than batch_size rows — at which point all eligible rows are gone. This avoids long-running locks on the table when there is a large backlog.
The same pattern runs once for audit_logs and once for activity_logs per cleanup invocation.
System Cleanup Endpoint
A manual trigger exists for operators:
POST /v1/system/audit/cleanup| Detail | Value |
|---|---|
| Auth middleware | None |
| Permission check | None |
| Tenant scope | None — global cleanup |
No auth. This endpoint is mounted on a
systemgroup with no middleware at all. Anyone who can reach the audit-service can trigger a cleanup. In production, firewall the/v1/system/*path so it is only reachable from inside the VPC / from your operations toolchain.
The handler simply calls retentionService.CleanupOldLogs(ctx) and returns:
{ "success": true, "message": "Retention cleanup completed", "data": null }Use this when you've shortened a retention window in config and want to reclaim space immediately instead of waiting for the next tick.
Per-Tenant Behaviour
Cleanup is global — it deletes by created_at only, with no tenant filter. There is no per-tenant retention override. If a single tenant requires a shorter window than the platform default, the only options today are:
- Lower the platform-wide default (affects everyone).
- Run a tenant-scoped DELETE manually against the database.
What Gets Deleted vs Preserved
Cleanup deletes rows by primary-key id. There are no foreign keys out of audit_logs / activity_logs, so deletion never cascades into other tables.
If a row references an actor (actor_id, user_id, impersonated_by) that has since been hard-deleted from the auth database, the row stays — only the enrichment block in the API response becomes empty. Cleanup does not chase orphan rows.
Operational Tuning
| Symptom | What to change |
|---|---|
| Cleanup is hammering the DB and competing with foreground queries | Lower batch_size; the loop will run more iterations but each transaction is shorter |
| Cleanup never finishes a single tick | Increase cleanup_interval (so ticks don't pile up) and lower batch_size to keep individual deletes cheap |
| Disk growth is concerning | Lower the retention days for the offending stream first; the next cleanup will shrink the table |
| Need a one-shot purge after lowering retention | POST /v1/system/audit/cleanup from inside the VPC |
Recovery
There is no soft-delete and no archive. Once cleanup deletes rows, they are gone. If long-term retention is a regulatory requirement:
- Set
audit_retention_daysandactivity_retention_daysto your minimum required period. - Export rows with the Export endpoint or by querying Postgres directly before they fall outside the window.
- Ship those exports to long-term storage (S3, etc.).
Do not set retention days to
0. The cutoff is computed astime.Now().AddDate(0, 0, -retention_days), so0resolves to "anything older than right now" — i.e., the cleanup will delete every row in the target table on the next tick. There is no guard for this case. If you want to effectively disable cleanup on a stream, set the value to a very large number (e.g.36500for ~100 years); a true "disabled" mode would require a code change.
Source
- Config:
audit-service/config/config.yaml - Scheduler + batch delete:
audit-service/services/retention_service.go - System endpoint:
audit-service/api/routes/routes.go

