Retention & Cleanup

Audit and activity logs are pruned on a schedule. Each stream has its own retention window; both are deleted by a single in-process scheduler that ships with the audit-service.

Defaults

From audit-service/config/config.yaml:

Setting	Default	Bound to
`retention.audit_retention_days`	365	Cutoff for `audit_logs`
`retention.activity_retention_days`	90	Cutoff for `activity_logs`
`retention.cleanup_interval`	24h	How often the scheduler runs
`retention.batch_size`	5000	Max rows deleted per round-trip

Override these in your environment-specific config or via the standard EDUSHADE_AUDIT_RETENTION_* env vars.

How Cleanup Runs

StartRetentionScheduler (in audit-service/services/retention_service.go) launches a goroutine when the audit-service boots:

Runs CleanupOldLogs once on startup.
Sets up a time.Ticker at cleanup_interval.
On each tick, runs CleanupOldLogs again.
Listens for context cancellation — on shutdown, the goroutine exits cleanly.

Errors are logged (log.Printf("Retention cleanup error: %v", err)) but do not stop the scheduler; the next tick will retry from scratch.

Batch Deletion

PostgreSQL doesn't support DELETE ... LIMIT, so the cleanup uses the standard subquery pattern:

DELETE FROM audit_logs
WHERE id IN (
  SELECT id FROM audit_logs
  WHERE created_at < <cutoff>
  LIMIT <batch_size>
)

The loop keeps running until a batch deletes fewer than batch_size rows — at which point all eligible rows are gone. This avoids long-running locks on the table when there is a large backlog.

The same pattern runs once for audit_logs and once for activity_logs per cleanup invocation.

System Cleanup Endpoint

A manual trigger exists for operators:

POST /v1/system/audit/cleanup

Detail	Value
Auth middleware	None
Permission check	None
Tenant scope	None — global cleanup

No auth. This endpoint is mounted on a system group with no middleware at all. Anyone who can reach the audit-service can trigger a cleanup. In production, firewall the /v1/system/* path so it is only reachable from inside the VPC / from your operations toolchain.

The handler simply calls retentionService.CleanupOldLogs(ctx) and returns:

{ "success": true, "message": "Retention cleanup completed", "data": null }

Use this when you've shortened a retention window in config and want to reclaim space immediately instead of waiting for the next tick.

Cleanup is global — it deletes by created_at only, with no tenant filter. There is no per-tenant retention override. If a single tenant requires a shorter window than the platform default, the only options today are:

Lower the platform-wide default (affects everyone).
Run a tenant-scoped DELETE manually against the database.

What Gets Deleted vs Preserved

Cleanup deletes rows by primary-key id. There are no foreign keys out of audit_logs / activity_logs, so deletion never cascades into other tables.

If a row references an actor (actor_id, user_id, impersonated_by) that has since been hard-deleted from the auth database, the row stays — only the enrichment block in the API response becomes empty. Cleanup does not chase orphan rows.

Operational Tuning

Symptom	What to change
Cleanup is hammering the DB and competing with foreground queries	Lower `batch_size`; the loop will run more iterations but each transaction is shorter
Cleanup never finishes a single tick	Increase `cleanup_interval` (so ticks don't pile up) and lower `batch_size` to keep individual deletes cheap
Disk growth is concerning	Lower the retention days for the offending stream first; the next cleanup will shrink the table
Need a one-shot purge after lowering retention	`POST /v1/system/audit/cleanup` from inside the VPC

Recovery

There is no soft-delete and no archive. Once cleanup deletes rows, they are gone. If long-term retention is a regulatory requirement:

Set audit_retention_days and activity_retention_days to your minimum required period.
Export rows with the Export endpoint or by querying Postgres directly before they fall outside the window.
Ship those exports to long-term storage (S3, etc.).

Do not set retention days to 0. The cutoff is computed as time.Now().AddDate(0, 0, -retention_days), so 0 resolves to "anything older than right now" — i.e., the cleanup will delete every row in the target table on the next tick. There is no guard for this case. If you want to effectively disable cleanup on a stream, set the value to a very large number (e.g. 36500 for ~100 years); a true "disabled" mode would require a code change.

Source

Config: audit-service/config/config.yaml
Scheduler + batch delete: audit-service/services/retention_service.go
System endpoint: audit-service/api/routes/routes.go

Retention & Cleanup

Retention & Cleanup

Defaults

How Cleanup Runs

Batch Deletion

System Cleanup Endpoint

Per-Tenant Behaviour

What Gets Deleted vs Preserved

Operational Tuning

Recovery

Source

On this page