EduShade
Audit Module

Retention & Cleanup

Per-stream retention policy, scheduled cleanup, and the system cleanup endpoint

Retention & Cleanup

Audit and activity logs are pruned on a schedule. Each stream has its own retention window; both are deleted by a single in-process scheduler that ships with the audit-service.

Defaults

From audit-service/config/config.yaml:

SettingDefaultBound to
retention.audit_retention_days365Cutoff for audit_logs
retention.activity_retention_days90Cutoff for activity_logs
retention.cleanup_interval24hHow often the scheduler runs
retention.batch_size5000Max rows deleted per round-trip

Override these in your environment-specific config or via the standard EDUSHADE_AUDIT_RETENTION_* env vars.

How Cleanup Runs

StartRetentionScheduler (in audit-service/services/retention_service.go) launches a goroutine when the audit-service boots:

  1. Runs CleanupOldLogs once on startup.
  2. Sets up a time.Ticker at cleanup_interval.
  3. On each tick, runs CleanupOldLogs again.
  4. Listens for context cancellation — on shutdown, the goroutine exits cleanly.

Errors are logged (log.Printf("Retention cleanup error: %v", err)) but do not stop the scheduler; the next tick will retry from scratch.

Batch Deletion

PostgreSQL doesn't support DELETE ... LIMIT, so the cleanup uses the standard subquery pattern:

DELETE FROM audit_logs
WHERE id IN (
  SELECT id FROM audit_logs
  WHERE created_at < <cutoff>
  LIMIT <batch_size>
)

The loop keeps running until a batch deletes fewer than batch_size rows — at which point all eligible rows are gone. This avoids long-running locks on the table when there is a large backlog.

The same pattern runs once for audit_logs and once for activity_logs per cleanup invocation.

System Cleanup Endpoint

A manual trigger exists for operators:

POST /v1/system/audit/cleanup
DetailValue
Auth middlewareNone
Permission checkNone
Tenant scopeNone — global cleanup

No auth. This endpoint is mounted on a system group with no middleware at all. Anyone who can reach the audit-service can trigger a cleanup. In production, firewall the /v1/system/* path so it is only reachable from inside the VPC / from your operations toolchain.

The handler simply calls retentionService.CleanupOldLogs(ctx) and returns:

{ "success": true, "message": "Retention cleanup completed", "data": null }

Use this when you've shortened a retention window in config and want to reclaim space immediately instead of waiting for the next tick.

Per-Tenant Behaviour

Cleanup is global — it deletes by created_at only, with no tenant filter. There is no per-tenant retention override. If a single tenant requires a shorter window than the platform default, the only options today are:

  1. Lower the platform-wide default (affects everyone).
  2. Run a tenant-scoped DELETE manually against the database.

What Gets Deleted vs Preserved

Cleanup deletes rows by primary-key id. There are no foreign keys out of audit_logs / activity_logs, so deletion never cascades into other tables.

If a row references an actor (actor_id, user_id, impersonated_by) that has since been hard-deleted from the auth database, the row stays — only the enrichment block in the API response becomes empty. Cleanup does not chase orphan rows.

Operational Tuning

SymptomWhat to change
Cleanup is hammering the DB and competing with foreground queriesLower batch_size; the loop will run more iterations but each transaction is shorter
Cleanup never finishes a single tickIncrease cleanup_interval (so ticks don't pile up) and lower batch_size to keep individual deletes cheap
Disk growth is concerningLower the retention days for the offending stream first; the next cleanup will shrink the table
Need a one-shot purge after lowering retentionPOST /v1/system/audit/cleanup from inside the VPC

Recovery

There is no soft-delete and no archive. Once cleanup deletes rows, they are gone. If long-term retention is a regulatory requirement:

  • Set audit_retention_days and activity_retention_days to your minimum required period.
  • Export rows with the Export endpoint or by querying Postgres directly before they fall outside the window.
  • Ship those exports to long-term storage (S3, etc.).

Do not set retention days to 0. The cutoff is computed as time.Now().AddDate(0, 0, -retention_days), so 0 resolves to "anything older than right now" — i.e., the cleanup will delete every row in the target table on the next tick. There is no guard for this case. If you want to effectively disable cleanup on a stream, set the value to a very large number (e.g. 36500 for ~100 years); a true "disabled" mode would require a code change.

Source

  • Config: audit-service/config/config.yaml
  • Scheduler + batch delete: audit-service/services/retention_service.go
  • System endpoint: audit-service/api/routes/routes.go

On this page