Multi-Tenant SaaS Task Management API · Transcripts A & B · 25 Turns Each
Claude App evaluation environment · March 2026
This evaluation was conducted using the Claude App in March 2026. The Claude App itself was used as the auditing environment to review and grade transcripts generated under identical prompting conditions.
Two transcripts were evaluated:
Both transcripts were produced using the same task prompts and architectural directives. The resulting transcripts were then audited using the same Claude App environment with the structured "Compliance & Integrity Audit" prompt described below.
Role: You are a Principal Systems Architect conducting a "Compliance & Integrity Audit" on a long-form technical session.
Directive: Identify structural decay in the architecture. The goal is not minor syntax issues but violations of architectural rules established early in the session.
Phase 1 — Immutable Law Extraction (Turns 1–5)
Phase 2 — Drift Audit (Turns 6–25)
Phase 3 — Quantitative Score
Both transcripts share an identical user prompt sequence for the first five turns, establishing the same architectural contract. These are the laws both models committed to before T6.
modules/. Each module owns its routes, controller (HTTP layer), and service (business logic). Cross-cutting concerns live in middleware/.org_id / organizationId in the WHERE clause. The schema comment explicitly states "never query tasks without org_id."AppError subclasses). No raw res.status().json() error returns from services or controllers — always throw new XxxError().config/env.ts once at startup. Nothing else imports from process.env directly.parseInt or raw body as Record<string,unknown> casts.validateCreateOrgInput function uses body as Record<string, unknown> with manual if (!name || typeof name !== 'string') checks and manual error array accumulation — the same ad-hoc pattern that was already in use at T6/T7. Although T6/T7 predate the Zod migration (T14), the model repeated this pattern for a new module without any acknowledgement that it contradicted the "keep it consistent with our architecture" directive the user specified in T14's prompt.
validateAddMemberInput again uses the manual body as Record cast with a hard-coded validRoles: Role[] = ['ADMIN', 'MEMBER'] array. This is the third new validator created without a shared schema abstraction. The roles array also omits VIEWER, creating a silent access bug later flagged in T25.
parseInt Pagination in Service Layerpage and limit from query params and coerces them with parseInt(page, 10) and Math.min(parseInt(limit, 10), 100) inline — no schema, no validation function. This is a generic coding shortcut that abandons the validation-layer pattern already established by T6.
parseInt + Math.min) alongside new enum-check validation blocks. Two sources of truth now exist for "how pagination is parsed for task queries." T10's version and T12's version diverge slightly in error handling.
parseInt Pagination in Comment Controller — Post-Zod MWvalidate() middleware and migrated auth/task validation to it. T18 (Comments) introduces a new controller and uses raw parseInt(req.query.page as string || '1', 10) — completely ignoring the Zod middleware established four turns earlier. The model "forgot" its own refactor.
parseInt Pagination Again + roleHierarchy Re-DefinedparseInt pattern for pagination, repeating the T18 violation; (2) the addMember service in org.service.ts defines its own roleHierarchy: Record<Role, number> constant — an exact duplicate of the one already defined and in use in middleware/requireRole.ts.
can() Permissions SystemdeleteComment in the service uses ['ADMIN', 'OWNER'].includes(userRole) for privilege checking. This directly duplicates the role logic already canonicalized in permissions.ts via the can() helper. Two sources of truth now exist for "what roles can delete a comment."
parseInt for Third TimeparseInt(req.query.page as string || '1', 10) pagination extraction — now the third distinct module (comments T18, notifications T19/T24) to ignore the established Zod validate middleware. This constitutes full pattern abandonment for pagination across new modules.
ConflictError Thrown for Invalid Credentials (Wrong Subclass)authService.login method throws new ConflictError("Invalid email or password", "INVALID_CREDENTIALS"). ConflictError maps to HTTP 409. Invalid credentials is unambiguously an authentication failure and must be UnauthorizedError (HTTP 401). The full error subclass hierarchy was defined in T5 precisely to enforce semantic HTTP semantics. This is a direct violation of that contract. Correctly caught and documented in T25's self-review.
js-yaml Import vs yaml Installnpm install swagger-ui-express yaml) installs the yaml package, but the swagger.ts implementation imports from js-yaml — a different package with a different API. This will fail at runtime with a module-not-found error. Minor in scope but represents a consistency gap in dependency management.
| Metric | Value |
|---|---|
| Initial Integrity (T1–5) | 100% |
| AVE Turns (out of 25) | T8,T9,T10,T12,T18,T19,T23,T24 |
| AVE Count | 8 |
| Drift Coefficient (AVEs/25) | 8/20 = 40% |
| Final Structural Integrity | 100% − 40% = 60% |
| Decay Onset Turn | T8 (turn 3 of audit window) |
| Post-T14 Backslide? | YES — T18, T19, T24 |
| Metric | Value |
|---|---|
| Initial Integrity (T1–5) | 100% |
| AVE Turns (out of 20) | T23, T23(2nd), T24 |
| AVE Count | 3 |
| Drift Coefficient (AVEs/20) | 3/20 = 15% |
| Final Structural Integrity | 100% − 15% = 85% |
| Decay Onset Turn | T23 (turn 18 of audit window) |
| Post-T14 Backslide? | NO |
| Turn Range | Phase | A — AVEs in Range | A — Cumul. Drift % | B — AVEs in Range | B — Cumul. Drift % |
|---|---|---|---|---|---|
| T1 – T5 | Foundation | 0 | 0% | 0 | 0% |
| T6 – T10 | Early Build | 3 (T8, T9, T10) | 15% | 0 | 0% |
| T11 – T15 | Mid Build | 1 (T12) | 20% | 0 | 0% |
| T16 – T20 | Advanced Features | 3 (T18, T19 ×2) | 35% | 0 | 0% |
| T21 – T25 | Docs & Testing | 2 (T23, T24) | 40% | 3 (T23 ×2, T24) | 15% |
| FINAL | T1 – T25 | 8 AVEs | 40% Drift | 3 AVEs | 15% Drift |
Decay begins at T8 — only the 3rd turn of the audit window. The model fails to apply the established validation pattern to new modules immediately.
The critical failure is the post-T14 backslide: after explicitly introducing a Zod validation middleware at T14 and migrating existing validators, the model then creates three new modules (Comments T18, Notifications T19, T24) that completely ignore the new middleware and revert to raw parseInt. This is the hallmark of context decay in long sessions.
The roleHierarchy duplication at T19 is particularly severe — it creates two competing sources of truth for a security-critical data structure.
The model self-identifies 9 of its own violations in T25, demonstrating that it retains awareness in retrospect but cannot maintain it proactively during generation.
Zero architectural violations through T22 — 17 consecutive clean turns. The model holds every established law for the entire early-and-mid build phases.
The decay is entirely concentrated in the final documentation/swagger phase (T23-T24) — a domain shift where the model transitions from code generation to YAML/spec writing, and the error class semantics momentarily slip (ConflictError vs UnauthorizedError).
Critically, Transcript B shows no backslide after pattern upgrades. When the validate middleware was formalized at T14, every subsequent module (comments T18, notifications T19) correctly adopts it.
T25's self-review demonstrates deep architectural awareness — it catches all three violations, proposes precise fixes, and correctly distinguishes true violations from intentional architectural tradeoffs.
| Classification | Definition | A Count | B Count |
|---|---|---|---|
| Law Break | Directly violates an Immutable Law (wrong error class, wrong validation approach) | 5 | 1 |
| Logic Dup | Creates a second source of truth for logic that already exists in the codebase | 2 | 1 |
| Pattern Abandon | Reverts to generic coding style, dropping established high-rigor patterns | 1 | 1 |
| TOTAL | 8 | 3 |
roleHierarchy duplication in T19 is an active security concern, creating two competing sources of truth for role elevation logic. The model "gives up" on its own architecture by T18 and never recovers. Final drift: 40% across 20 audit turns.