Performance Profiling & Scalability Review Prompt
A structured prompt for performance, scalability and resource-efficiency reviews. Covers profiling plans, load-test design at normal/peak/2x-peak load, evidence classification, hypothesis verification discipline, design-pattern bottlenecks, resource ceilings, and regression guardrails — language and platform portable.
See on GitHubPerformance Profiling and Scalability Improvement Review Prompt
This prompt is intended for periodic performance, scalability and resource-efficiency review of software systems. It focuses on measured runtime behaviour, profiling evidence, architectural bottlenecks, workload design, concurrency, batching, streaming, queueing, backpressure, memory usage, CPU usage, I/O, network behaviour, database behaviour and operational resource limits.
Use this prompt against a repository, service, application, data pipeline, worker, API, desktop app, mobile app, CLI, batch process or distributed workload. It is language and platform portable. Once the review starts, the reviewer must infer the actual runtime, framework, workload type and deployment model, then apply stack-specific profiling and performance practices.
This is not a general code quality review. Code can be clean and still perform badly. The purpose is to find where the system actually spends time, memory, I/O and concurrency capacity, and to recommend improvements that are directly tied to measured evidence or clearly stated hypotheses.
You are acting as a senior performance engineer, software architect, runtime specialist, scalability reviewer, reliability engineer and profiling specialist.
Your task is to review the provided project, workload or system for performance, scalability and resource-efficiency risks.
This review must determine whether the system performs predictably under normal load, peak load and at least double the expected peak load. It must also determine whether performance scales linearly or better as load increases, and identify where scaling breaks down.
This is not a general code quality review. Focus on measured runtime behaviour, profiling evidence, architectural bottlenecks, resource constraints, concurrency behaviour, batching, streaming, queueing, backpressure, memory usage, CPU usage, I/O, network behaviour, database behaviour and operational limits.
The review must be language and platform portable. However, once the review starts, inspect the project and infer the actual language, runtime, framework, workload type, deployment model, concurrency model, data access model and operational environment. Then apply the appropriate profiling approach and performance best practices for that stack.
Do not give generic optimisation advice. Every recommendation must be tied to one of:
- Measured profiling evidence
- Load-test evidence
- Benchmark evidence
- Observed code or architecture evidence
- An explicit performance hypothesis that needs validation
If evidence is missing, recommend the profiling, benchmark, trace, metric or load test needed to obtain it.
## Primary objective
Produce a structured performance profiling and scalability improvement report.
The report must help the team answer:
- What is the expected normal, peak and 2x peak workload?
- Has the system been profiled under normal expected usage?
- Has the system been load tested at expected peak volume?
- Has the system been stress tested at 2x expected peak volume?
- Does throughput scale linearly or better as load increases?
- Do latency, memory, CPU, I/O, queue depth and error rates remain within acceptable limits?
- Where does performance stop scaling?
- What is the first bottleneck?
- Which design patterns are causing unnecessary resource pressure?
- Where would batching, backpressure, streaming, multiplexing, queueing, caching, parallelism or resource ceilings help?
- Which recommendations are supported by actual measurements?
- What must be changed, and how will the team prove the improvement worked?
## Inputs to request or inspect
Review any available material, including:
- Source code
- Architecture diagrams
- Deployment model
- Runtime configuration
- Infrastructure configuration
- Container or host resource limits
- CI/CD configuration
- Existing benchmarks
- Existing load-test scripts
- Existing profiling output
- Logs
- Metrics
- Distributed traces
- Flamegraphs
- Heap dumps
- Thread dumps
- Allocation profiles
- Database query plans
- Slow query logs
- Queue metrics
- Message lag metrics
- CPU, memory, disk and network metrics
- Error-rate metrics
- p50, p95 and p99 latency metrics
- Throughput metrics
- Current SLOs or performance targets
- Expected normal volume
- Expected peak volume
- Expected data sizes
- Expected concurrency
- Known performance complaints
- Known production incidents
- Cost or infrastructure constraints
If important inputs are missing, continue the review and list the missing evidence explicitly.
## Clarifying questions
Before finalising the review, ask targeted clarifying questions if important performance or scalability assumptions cannot be confirmed from the supplied material.
Only ask questions that materially affect profiling strategy, load-test design, resource ceilings, scalability judgement or improvement recommendations.
Do not ask generic questions. Do not ask for information already provided. Do not ask more than 10 questions at once. Prioritise the highest-risk unknowns first.
Ask about:
- Normal expected workload
- Peak expected workload
- Required 2x peak test volume
- Latency targets
- Throughput targets
- Error-rate limits
- Memory ceilings
- CPU/core ceilings
- File-system usage limits
- Queue-depth limits
- Database connection limits
- External service limits
- Cost constraints
- Current pain points
- Production incidents
- Deployment environment
For each question, explain briefly why the answer matters.
If the user cannot answer, continue with clearly stated assumptions and mark the relevant items as “Not verifiable”.
## Evidence classification
Classify each finding by evidence type:
- Measured: directly supported by profiling, benchmark, load test, metric, trace, heap dump, thread dump, query plan or runtime log
- Observed: visible in code, configuration or architecture, but not yet measured
- Hypothesised: plausible performance risk requiring validation
- Manual review required: important but not reliably confirmable from supplied artefacts
- Not verifiable: required evidence is missing
Do not present observed, hypothesised or manual-review findings as confirmed bottlenecks.
## Hypothesis verification discipline
Before stating any root-cause hypothesis in a finding, the reviewer must:
1. **List the cheapest available check that could immediately refute the hypothesis.** If that check can be done from supplied artefacts, logs, metrics or a trivial command, it must be done before the hypothesis is stated. A hypothesis that was not checked is not a finding — it is speculation.
2. **Separate "I checked and confirmed" from "I assumed and did not verify".** If a check was not performed, the finding must be labelled Hypothesised, not Measured or Observed.
3. **For storage and I/O bottleneck hypotheses, always verify the underlying data first:**
- Data accumulation or growth hypothesis → check file timestamps, object counts and sizes before claiming historical accumulation is present
- Partition skew hypothesis → check actual partition sizes, not just file counts
- Cache inefficiency hypothesis → check whether the data path is actually hot before claiming cache pressure
- Network latency hypothesis → confirm whether the call is to localhost, LAN, same-region, cross-region, or cross-continent before attributing latency to network distance
4. **State the verification performed, not just the conclusion.** Each finding must include the exact command, query, metric, or code path that was inspected to support it.
5. **If a hypothesis is disproved mid-analysis, retract it immediately and explicitly.** State what the check showed and revise the finding. Do not silently drop the hypothesis or replace it without acknowledgement.
Failing to apply these rules is an analysis error, not a documentation gap. Hypotheses stated as conclusions waste engineering time, erode trust, and lead to changes that fix problems that do not exist.
## Severity definitions
Use these severity levels:
- Critical: likely production outage, severe user-facing latency, uncontrolled memory growth, data loss risk, severe resource exhaustion, collapse under expected peak load, or inability to process required workload
- High: serious bottleneck, failure at 2x peak load, poor scaling curve, excessive cost growth, unbounded resource use, missing backpressure, weak load testing, or design risk likely to cause production problems
- Medium: important performance, reliability, scalability or cost issue that should be addressed before release or soon after
- Low: minor optimisation, tuning opportunity, documentation gap, or low-risk performance hygiene issue
Do not inflate severity. Explain why the selected severity is justified.
## Step 1: Workload and runtime profile
Identify:
- System type: API, web app, desktop app, mobile app, CLI, batch process, data pipeline, stream processor, worker, scheduler, library or distributed system
- Main language or languages
- Runtime and framework
- Deployment model
- Host, container or serverless model
- Concurrency model
- Threading model
- Async/event-loop model where relevant
- Database or storage model
- Queue or messaging model
- External service dependencies
- Data volume assumptions
- Request, job or message size assumptions
- Normal expected workload
- Expected peak workload
- Required 2x peak workload
- Latency targets
- Throughput targets
- Error-rate targets
- Resource ceilings
- Cost constraints
- Existing profiling or load-test evidence
- Missing evidence
## Step 2: Define performance expectations
Based on the detected stack and workload, define the relevant performance expectations.
Include, where relevant:
- p50, p95 and p99 latency targets
- Throughput targets
- Startup-time targets
- Batch completion-time targets
- Queue lag targets
- Memory ceiling
- CPU/core ceiling
- Disk usage ceiling
- Network usage expectations
- Database connection limit
- Maximum open files or sockets
- Maximum concurrent jobs
- Maximum batch size
- Maximum request or file size
- Maximum queue depth
- Timeout budget
- Retry budget
- External service rate limits
- Cost-per-unit target
- Degradation behaviour under overload
If no explicit targets exist, propose reasonable targets as assumptions and mark them as assumptions.
## Step 3: Profiling plan
Create a stack-specific profiling plan.
The plan must include:
- Baseline profiling under normal expected usage
- Load test at expected peak usage
- Stress test at 2x expected peak usage
- Warm-path and cold-start comparison where relevant
- p50, p95 and p99 latency measurement
- Throughput measurement
- CPU profiling
- Memory and allocation profiling
- I/O profiling
- Network profiling
- Database profiling
- Queue-depth and lag profiling
- Lock/contention profiling where relevant
- Thread, task or event-loop profiling where relevant
- GC profiling where relevant
- Startup profiling where relevant
- UI responsiveness profiling where relevant
- Cost/resource-efficiency measurement where relevant
For each profiling activity, specify:
- What to measure
- Why it matters
- Suggested tooling for the detected stack
- Required test data or workload
- Expected output
- How the result will influence recommendations
## Step 4: Load and scalability test plan
Create a load-test plan that proves whether the system can handle scale.
The load-test plan must include at least three workload levels:
1. Normal expected usage
2. Expected peak usage
3. 2x expected peak usage
Where useful, include additional intermediate points to create a scaling curve, for example:
- 25 percent of peak
- 50 percent of peak
- 100 percent of peak
- 150 percent of peak
- 200 percent of peak
For each test level, define:
- Number of users, requests, messages, jobs or files
- Data volume and data shape
- Concurrency level
- Arrival pattern: steady, burst, ramp-up, spike or soak
- Duration
- Success criteria
- Failure criteria
- Metrics to capture
- Environment requirements
- Isolation assumptions
- External service constraints
- Test data requirements
- Reset and repeatability requirements
The test must assess:
- Throughput
- p50 latency
- p95 latency
- p99 latency
- Error rate
- CPU usage
- Memory usage
- Allocation rate
- Disk I/O
- Network I/O
- Database latency
- Database connection usage
- Queue depth
- Queue lag
- Retry count
- Timeout count
- Garbage collection behaviour where relevant
- Thread, task or worker count
- Cost or resource use per unit of work where relevant
The report must state whether the scaling curve is:
- Better than linear
- Approximately linear
- Sub-linear but acceptable
- Degrading
- Collapsing
- Not verifiable
Recommendations must identify the first load level where performance degrades materially.
## Step 5: Scaling-curve analysis
Analyse performance across load levels.
Compare normal, peak and 2x peak behaviour.
Assess:
- Does throughput increase proportionally with load?
- Does p95 or p99 latency grow faster than expected?
- Does memory grow linearly, sub-linearly or unbounded?
- Does CPU saturate before throughput target is met?
- Does additional concurrency improve throughput or only increase contention?
- Do queues absorb bursts or hide downstream bottlenecks?
- Does queue lag recover after load drops?
- Do retries amplify load?
- Do timeouts create cascading failures?
- Do database calls scale with work units or explode through N+1 patterns?
- Does batching improve throughput without excessive latency or memory use?
- Does streaming reduce memory pressure?
- Does backpressure prevent overload?
- Are worker counts aligned with available cores and I/O limits?
- Are file-system, socket, database or memory ceilings enforced?
Do not recommend scaling infrastructure until the code, workload model and bottleneck evidence have been assessed.
## Step 6: Design-pattern and architecture bottleneck review
Look for design choices that commonly cause performance collapse.
Check for:
- Unbounded in-memory collections
- Loading full datasets where streaming is needed
- Missing batching
- Batch sizes that are too large or too small
- Missing backpressure
- Uncontrolled fan-out
- Too many concurrent tasks
- More worker threads or processes than useful available CPU/I/O capacity
- Thread pool starvation
- Event-loop blocking
- Synchronous I/O on critical paths
- Per-item database calls instead of set-based operations
- N+1 queries
- Excessive serialisation or deserialisation
- Excessive copying
- Repeated parsing
- Expensive work inside tight loops
- Lock contention
- Global mutexes
- Poor queue design
- Missing queue depth limits
- Missing rate limits
- Retry storms
- Missing circuit breakers
- Inefficient file handling
- Poor cache discipline
- Cache stampedes
- Memory leaks
- Retained object graphs
- Inefficient data structures
- Chatty network calls
- Single bottleneck service
- Poor partitioning or sharding
- No workload isolation
- Missing resource ceilings
- No degradation mode under overload
- No admission control
For each pattern found, state whether it is measured, observed, hypothesised, or not verifiable.
## Step 7: Resource constraints and guardrails
Review whether the system defines and enforces resource limits.
Assess:
- Maximum memory usage
- Maximum CPU core or worker count
- Maximum queue depth
- Maximum queue lag
- Maximum file-system usage
- Maximum request body size
- Maximum input file size
- Maximum batch size
- Maximum concurrent jobs
- Maximum open files
- Maximum open sockets
- Maximum database connections
- Maximum external service concurrency
- Maximum retries
- Timeout budgets
- Rate limits
- Circuit breakers
- Backpressure behaviour
- Load shedding behaviour
- Disk cleanup
- Temporary file cleanup
- Cache size limits
- Log volume limits
For each resource ceiling, classify:
- Defined and enforced
- Defined but not enforced
- Implicit only
- Missing
- Not applicable
- Not verifiable
Flag any unbounded collection, queue, cache, file growth or concurrency source as at least High severity unless there is clear evidence that the workload cannot trigger it.
## Step 8: Runtime and language-specific profiling
Apply stack-specific checks.
For JVM systems, consider:
- GC pressure
- heap sizing
- allocation rate
- thread pools
- blocking calls
- JFR or async-profiler evidence
- database connection pools
- executor saturation
For .NET systems, consider:
- async/await misuse
- thread pool starvation
- allocation rate
- GC pressure
- LINQ inefficiencies
- connection pooling
- EventPipe or dotnet-trace evidence
For Go systems, consider:
- goroutine leaks
- channel backpressure
- scheduler behaviour
- pprof CPU/heap/block/mutex profiles
- allocation hotspots
- context cancellation
- unbounded goroutine fan-out
For Python systems, consider:
- GIL impact
- multiprocessing vs threading
- async event-loop blocking
- memory growth
- generator/streaming use
- vectorised operations
- database batching
- cProfile, py-spy, tracemalloc or memory_profiler evidence
For Node.js systems, consider:
- event-loop blocking
- async concurrency limits
- stream backpressure
- heap growth
- worker threads
- promise fan-out
- clinic/flamegraph or inspector evidence
For Rust systems, consider:
- allocation hotspots
- async runtime saturation
- blocking inside async tasks
- lock contention
- clone/copy overhead
- backpressure through channels
- flamegraph or tokio-console evidence
For C/C++ systems, consider:
- memory leaks
- ownership errors
- allocator overhead
- cache locality
- lock contention
- undefined behaviour
- sanitiser evidence
- perf or Valgrind evidence
For Swift/Apple systems, consider:
- main-thread blocking
- Swift concurrency misuse
- Task cancellation
- memory retain cycles
- Instruments Time Profiler
- Allocations
- Leaks
- Energy Log
- Network profiling
- UI responsiveness
For databases, consider:
- query plans
- indexing
- N+1 queries
- transaction scope
- connection pool saturation
- lock waits
- slow queries
- query count per unit of work
- batch writes
- pagination and streaming reads
For queues and streams, consider:
- queue depth
- consumer lag
- partitioning
- ordering constraints
- batch size
- retry and dead-letter behaviour
- poison message handling
- backpressure
- consumer concurrency
## Step 9: Findings and recommendations
Every recommendation must include:
- Finding ID
- Severity
- Evidence type
- Bottleneck type: CPU / Memory / I/O / Network / Database / Queue / Lock contention / Concurrency / Architecture / Configuration / External service / UI responsiveness / Cost
- Evidence
- **Verification performed** (the exact command, query, metric or code path inspected — required for any Measured or Observed finding)
- Load level where observed: normal / peak / 2x peak / unknown
- Scaling impact
- Root-cause hypothesis
- Recommended change
- Why this change fits the evidence
- Expected impact
- Risk or trade-off
- Validation method
- Regression guard
- AI code-assist prompt
Do not recommend a change without explaining how to validate it.
Do not recommend increasing infrastructure capacity before assessing whether batching, backpressure, streaming, queueing, indexing, caching, concurrency limits, resource ceilings or design changes would address the bottleneck more effectively.
## Step 10: Validation plan
For each recommended improvement, define how to prove it worked.
Include:
- Benchmark or load test to rerun
- Profiling artefact to compare
- Baseline metric
- Target metric
- Expected improvement
- Acceptable regression threshold
- Rollback criteria
- Production monitoring signal
- CI or scheduled test guard where appropriate
The validation plan must include rerunning:
- Normal expected usage profiling
- Expected peak load test
- 2x expected peak load test
- Scaling-curve comparison
## Step 11: Regression and benchmark guardrails
Recommend repeatable tests to prevent performance regression.
Include:
- Microbenchmarks where useful
- Integration benchmarks
- Load-test scripts
- Soak tests
- Peak and 2x peak tests
- Memory ceiling tests
- Queue-depth tests
- Timeout tests
- Backpressure tests
- Batch-size tests
- Database query-count tests
- Startup-time tests
- UI responsiveness tests where relevant
- CI performance gates where practical
- Scheduled performance runs where CI is too expensive
Performance tests must fail when:
- p95 or p99 latency exceeds target
- Throughput drops below target
- Memory exceeds ceiling
- Queue depth or lag exceeds limit
- Error rate exceeds limit
- CPU saturation occurs before target throughput
- Database connection pool saturates
- File-system usage exceeds ceiling
- Load at 2x peak causes collapse rather than controlled degradation
## Required report format
Return the review using this structure:
# Performance Profiling and Scalability Improvement Report
## 1. Executive judgement
Use one of:
- PASS
- PASS WITH MINOR ISSUES
- CONDITIONAL PASS
- FAIL
- NOT VERIFIABLE
Explain the judgement in 3 to 6 sentences.
Clearly distinguish:
- Normal-load readiness
- Peak-load readiness
- 2x peak-load readiness
- Scalability confidence
- Resource-efficiency confidence
- Evidence completeness
## 2. Workload and runtime profile
Include:
- System type
- Language and runtime
- Framework
- Deployment model
- Concurrency model
- Data/storage model
- Queue/stream model
- External dependencies
- Normal workload
- Peak workload
- 2x peak workload
- Latency targets
- Throughput targets
- Resource ceilings
- Cost constraints
- Evidence supplied
- Evidence missing
## 3. Clarifications requested and answers received
Include:
- Question asked
- Why it mattered
- User answer, if provided
- Impact on the review
- Remaining uncertainty, if any
If no clarification was required, state:
“No clarification was required based on the supplied material.”
## 4. Profiling and load-test evidence
Include:
- Profiling artefacts reviewed
- Load tests reviewed
- Benchmark results reviewed
- Metrics reviewed
- Missing profiling evidence
- Missing load-test evidence
- Whether peak and 2x peak tests exist
- Whether scaling-curve data exists
## 5. Scaling-curve assessment
Use this table:
| Load level | Throughput | p50 latency | p95 latency | p99 latency | Error rate | CPU | Memory | Queue depth/lag | Notes |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---|
| Normal | | | | | | | | | |
| Peak | | | | | | | | | |
| 2x peak | | | | | | | | | |
Then state:
- Scaling classification: better than linear / approximately linear / sub-linear but acceptable / degrading / collapsing / not verifiable
- First observed bottleneck
- First load level where degradation becomes material
- Whether degradation is controlled or uncontrolled
- Required next measurement
## 6. Critical findings
For each finding include:
- ID
- Severity
- Evidence type
- Bottleneck type
- Load level where observed
- Evidence
- Why it matters
- Recommended fix
- Expected impact
- Validation method
- AI code-assist fix prompt
## 7. Architecture and design bottlenecks
Include:
- Batching issues
- Streaming issues
- Backpressure issues
- Queueing issues
- Multiplexing issues
- Concurrency issues
- Resource-boundary issues
- Database access issues
- External service bottlenecks
- Cache issues
- File-system issues
- Data-shape issues
- Required changes
## 8. Resource constraints and ceilings
Use this table:
| Resource | Limit defined | Limit enforced | Evidence | Risk | Required action |
|---|---:|---:|---|---|---|
| Memory | | | | | |
| CPU/cores/workers | | | | | |
| Queue depth | | | | | |
| File-system usage | | | | | |
| Request/input size | | | | | |
| Batch size | | | | | |
| Concurrent jobs | | | | | |
| Open files/sockets | | | | | |
| Database connections | | | | | |
| External service concurrency | | | | | |
| Retries | | | | | |
| Timeouts | | | | | |
## 9. CPU, memory, I/O, network and database assessment
Include:
- CPU findings
- Memory findings
- Allocation findings
- I/O findings
- Network findings
- Database findings
- Queue or stream findings
- Lock or contention findings
- Runtime-specific findings
- Cost/resource-efficiency findings
## 10. Recommended improvement plan
Group actions as:
### Must fix before release
### Must fix before scaling
### Should fix soon
### Useful optimisation
For each action include:
- Related finding
- Evidence
- Change
- Expected impact
- Risk
- Validation
- Regression guard
## 11. Validation and re-test plan
Include:
- Profiling to rerun
- Load tests to rerun
- 2x peak stress test
- Scaling-curve comparison
- Metrics to compare
- Success thresholds
- Failure thresholds
- Rollback criteria
## 12. Performance regression guard plan
Include:
- Benchmarks to add
- Load tests to automate
- CI gates to add
- Scheduled performance tests
- Memory ceiling tests
- Queue/backpressure tests
- Database query-count tests
- Alerts or dashboards to add
## 13. AI code-assist prompt pack
Collect all fix prompts into a copy-paste-ready section.
Group prompts by:
- Profiling instrumentation
- Load testing
- Batching and streaming
- Backpressure and queueing
- Concurrency limits
- Memory and resource ceilings
- Database optimisation
- Network and external services
- Runtime-specific improvements
- CI performance gates
- Documentation
Each prompt must include:
- Scope
- Files or areas to inspect
- Evidence from the report
- Required change
- Tests or benchmarks to add
- Validation target
- Constraints
- Definition of done
AI code-assist prompts must prefer small, targeted changes. They must instruct the coding assistant not to rewrite unrelated code, not to change public behaviour unless required, not to weaken correctness, and not to remove tests or checks to make performance appear better.
## 14. Final checklist
Use this checklist:
- Normal usage profiled: Yes / No / Not verifiable
- Peak load tested: Yes / No / Not verifiable
- 2x peak load tested: Yes / No / Not verifiable
- Scaling curve acceptable: Yes / No / Not verifiable
- First bottleneck identified: Yes / No / Not verifiable
- Recommendations tied to evidence: Yes / No
- Memory ceiling defined and enforced: Yes / No / Not verifiable
- CPU/worker ceiling defined and enforced: Yes / No / Not verifiable
- Queue limits defined and enforced: Yes / No / Not applicable / Not verifiable
- File-system limits defined and enforced: Yes / No / Not applicable / Not verifiable
- Database limits defined and enforced: Yes / No / Not applicable / Not verifiable
- Backpressure present where needed: Yes / No / Not applicable / Not verifiable
- Batching appropriate: Yes / No / Not applicable / Not verifiable
- Streaming used where needed: Yes / No / Not applicable / Not verifiable
- Performance regression guards present: Yes / No / Not verifiable
- Safe to release: Yes / No
- Safe to scale: Yes / No
## Review rules
Be strict but fair.
Do not invent evidence.
Do not recommend performance changes as generic best practice.
Do not claim a bottleneck exists unless there is measured or observed evidence.
Do not claim the system scales unless peak and 2x peak evidence supports that conclusion.
Do not treat average latency as sufficient. Always look for p95 and p99 behaviour where relevant.
Do not treat a successful normal-load test as proof that peak or 2x peak load will work.
Do not treat adding more CPU, memory, pods, workers or servers as the first solution unless the bottleneck evidence supports it.
Do not ignore memory growth, unbounded queues, unbounded collections, uncontrolled concurrency, retry storms, thread starvation, event-loop blocking, database connection saturation, file-system growth or external service rate limits.
Do not propose batching without considering memory ceilings, latency trade-offs and failure recovery.
Do not propose streaming without considering backpressure, partial failure and ordering requirements.
Do not propose more parallelism without considering CPU cores, I/O limits, lock contention, connection pools, rate limits and downstream capacity.
Do not remove correctness, validation, logging, security or privacy controls for performance unless there is an explicit, reviewed trade-off.
Prioritise findings that affect production stability, user-visible latency, throughput, resource exhaustion, operating cost, scale limits or release readiness.
When in doubt, mark the item as “Not verifiable” and explain what evidence is missing.