Writing Audit Reports That Are Actually Useful — Darkwave Log

Most security researchers can find bugs. Far fewer can communicate them. The gap between those two skills is the gap between an audit that changes a protocol’s security posture and one that sits in a PDF, misunderstood, unimplemented, and eventually ignored.

Presenting findings with clarity and precision matters because your audit report is your primary deliverable. Identifying bugs is not enough — you must convincingly demonstrate their existence and potential impact. This guide is about the craft of that communication: how to structure a finding, write a proof of concept that survives scrutiny, classify severity without inflating or deflating risk, and produce a report whose attestation actually means something.

The Anatomy of a Finding

Every finding in a professional audit report must carry four mandatory sections. Think of them as load-bearing walls: remove any one of them and the structure collapses.

1. Description

The description is a precise, technical statement of what is wrong. It names the affected contract, the affected function or mechanism, and the root cause — not the symptom. It does not editorialize, speculate, or pad.

The description should be a clear and concise summary of the vulnerability. One common failure mode is writing a description that describes a behavior rather than a vulnerability. “The withdraw() function transfers ETH before updating the balance” is a description. “The contract may be exploitable” is not.

The description should also state the preconditions required for the vulnerability to be reachable. Is the attacker required to be a liquidity provider? Does the attack require a specific token configuration? These constraints belong in the description, not buried in the PoC.

2. Impact

Impact answers one question: what can an attacker achieve? It must be concrete and bounded.

The impact section is where auditors provide a detailed breakdown of possible losses or damage to the protocol. This means specifying who is harmed (users, the protocol, a specific role), what asset or invariant is damaged (funds, access control, data integrity), and to what degree (total loss, partial loss, temporary denial of service). Vague language like “could lead to issues” or “may impact users” is not impact — it is noise.

A good impact statement reads:

An attacker with a non-zero staked balance can drain the entire VaultManager treasury in a single transaction, resulting in complete loss of all deposited user funds (up to the TVL at time of exploit). No special permissions are required.

A bad impact statement reads:

This vulnerability could potentially allow unauthorized access to funds under certain conditions.

The second example is useless. The developer reading it cannot prioritize the fix, the project lead cannot communicate it to investors, and the downstream reader of the published report cannot assess whether the protocol is safe to use.

3. Proof of Concept

A proof of concept is a concrete test case or exploit scenario that demonstrates how a vulnerability can be triggered under real conditions. It ensures that severe issues are both verifiable and actionable for development teams.

For High and Critical issues, a rigorous methodology requires the inclusion of PoC test cases in audit reports. For Medium and Low findings, a PoC is strongly recommended wherever the exploitability is not immediately obvious from the description.

4. Recommendation

A recommendation is not a gesture toward a solution. It is a specific, implementable engineering change. The next section covers recommendations in depth.

Writing a Reproducible Proof of Concept

A PoC is a demonstration of the feasibility and viability of an idea. In the context of smart contract auditing, it serves to validate that the vulnerability is real. PoCs can be written using frameworks like Foundry, Hardhat, or Brownie.

PoCs not only serve as proof for external validation but also act as a self-validation tool, ensuring that as an auditor you fully understand the impact of the finding. Writing the PoC before finalizing your description and impact statement is a discipline worth adopting: if you cannot make it run, you do not yet fully understand the bug.

A reproducible PoC must satisfy three requirements:

Self-contained. The test should include all setup required to reproduce the state: contract deployments, token minting, role assignments, price oracle configuration. A reviewer should be able to run forge test --match-test testExploit -vvvv against the exact commit hash referenced in the report and see the test pass without any undocumented preconditions.

Commented. It is critical to provide sufficient comments throughout each PoC, both for the auditor’s own understanding and that of others. Each block of the test should explain why that step is being taken, not just what it does. This is especially important for multi-step economic exploits where the sequence of calls is non-obvious.

Targeted. The PoC should demonstrate the minimum necessary exploit path to prove the claim. It should not include unrelated setup, dead code from earlier iterations, or exploratory branches. A lean PoC is a credible PoC.

PoC Template (Foundry)

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;

import "forge-std/Test.sol";
import "../src/VaultManager.sol";
import "../src/MockERC20.sol";

/// @notice Demonstrates reentrancy drain of VaultManager.withdraw()
/// @dev Run: forge test --match-test testReentrancyDrain -vvvv
contract ReentrancyDrainPoC is Test {
    VaultManager vault;
    MockERC20 token;
    AttackContract attacker;

    function setUp() public {
        token = new MockERC20();
        vault = new VaultManager(address(token));

        // Simulate existing users depositing 100 ETH worth of value
        token.mint(address(vault), 100 ether);
    }

    function testReentrancyDrain() public {
        // Attacker deploys exploit contract with minimal initial deposit
        attacker = new AttackContract(address(vault));
        token.mint(address(attacker), 1 ether);

        uint256 vaultBefore = token.balanceOf(address(vault));
        uint256 attackerBefore = token.balanceOf(address(attacker));

        // Trigger the exploit — withdraw() sends ETH before updating state
        attacker.attack();

        uint256 vaultAfter = token.balanceOf(address(vault));
        uint256 attackerAfter = token.balanceOf(address(attacker));

        // Vault should be drained; attacker should hold ~vault's original balance
        assertEq(vaultAfter, 0, "Vault was not fully drained");
        assertGt(attackerAfter, attackerBefore, "Attacker did not profit");

        console.log("Vault drained:", vaultBefore - vaultAfter);
        console.log("Attacker profit:", attackerAfter - attackerBefore);
    }
}

Foundry’s native Solidity testing environment is particularly effective for testing complex DeFi protocols where understanding financial invariants is crucial. Always pin the test to a specific block if forking mainnet state, and document which fork URL and block number the test expects.

Severity Classification

Smart contract vulnerabilities are ranked by severity to measure risk and prioritize remediation. During audits, vulnerabilities are typically categorized as Critical, High, Medium, or Low, giving developers, auditors, and protocol teams a clear framework for prioritization. Each severity level reflects a balance between impact and exploitability.

The issue classification system is based on two axes: severity and difficulty. Severity measures how bad the outcome is if exploited; difficulty (or likelihood) measures how hard it is for an attacker to reach the vulnerable state. Both axes must be considered. A bug that requires no special permissions and results in total fund loss is Critical. A bug that requires an owner private key compromise and results in a temporary 1% fee miscalculation is Low — or possibly an observation.

The Four Tiers

Critical — Direct, unconstrained path to total or near-total fund loss, permanent denial of service, or complete privilege escalation. No external preconditions beyond public accessibility. Examples: unrestricted initialize() on a proxy, reentrancy in a withdrawal path, missing access control on selfdestruct.

A real-world example: a protocol lost $120 million because an initialize() function wasn’t protected and an attacker called it to make themselves the owner. This is exactly the kind of issue a Critical label must convey to produce the urgency required.

High — Significant fund loss or protocol compromise possible, but requires some precondition: a specific pool state, a particular sequence of transactions, or a constrained attacker role. Still exploitable in practice; still fix-before-deployment.

Medium — Limited or conditional impact. Users may lose small amounts under specific circumstances, or important invariants can be broken without direct fund loss (e.g., accounting errors that accumulate over time, front-running that results in slippage beyond tolerance).

Low — Minor issues that do not directly threaten funds or correctness but represent deviations from best practice, edge case handling deficiencies, or denial-of-service vectors with high attack cost and low attacker gain.

In addition to severity ratings, audits should include informational issues: cases that will not leave the contract vulnerable to attack, but where resolving them could result in better overall performance or readability.

Calibration Pitfalls

The most common calibration failure is severity inflation — marking findings High or Critical to appear more thorough. This destroys the signal value of the severity tier. If everything is Critical, nothing is.

Because smart contract systems directly hold or manage on-chain value, even a small logic mistake can lead to catastrophic loss. Classification helps teams focus resources — identifying which issues can break the protocol, which merely reduce efficiency, and which are worth addressing for best practice.

In the scientific community and audit communities throughout the existence of this technology, there has not been a universally acknowledged classification or assessment methodology developed. This is an honest admission that your firm’s methodology must be defined and published so that the client understands the rubric before reading the findings. Never assume the reader shares your severity intuition.

Writing the Executive Summary

The executive summary is the only section of your audit report that a founder, investor, legal counsel, or regulator will read before deciding whether to read further. It must be written for that audience — not for the developer who wrote the code.

The audit report compiles the entire assessment and presents all audit aspects. An executive summary highlights the report with an overview of vulnerabilities found, making it easier for stakeholders to grasp the key issues. The report focuses on findings, their severity, exploit scenarios, and potential remediations.

A good executive summary answers five questions in plain language:

What was reviewed? Scope, commit hash, review period, and what was explicitly not in scope.
What was found? Finding counts by severity tier, and a one-sentence characterization of the overall security posture.
What is the worst case? If the Critical or High findings were exploited before remediation, what would happen?
What has been fixed? Status of findings at time of publication (Open / Acknowledged / Resolved).
What should happen next? Concrete next steps: re-audit the remediations, deploy to testnet, enable a bug bounty.

The executive summary should never contain Solidity code, stack traces, or function signatures. It should never use the word “potentially” as a hedge against every claim — if you are not confident a finding is real, it should not be in the report.

A non-technical reader must be able to answer “is it safe to launch?” from the executive summary alone. If they cannot, rewrite it.

Findings vs. Observations

These two terms are not interchangeable, and conflating them degrades the usefulness of your report.

While findings typically indicate specific violations of criteria or requirements, observations often highlight areas for improvement or potential future concerns. These observations offer valuable insights that can help organizations enhance their operations before minor issues transform into significant problems.

In smart contract auditing, the distinction maps as follows:

Finding — A demonstrable security vulnerability or correctness bug. It has a root cause, a reachable exploit path (even a theoretical one at Low severity), a concrete impact, and a specific fix. It must be addressed before the code is considered safe for the corresponding severity tier.

Observation — A code quality concern, deviation from best practice, architectural decision with non-obvious tradeoffs, or potential future-risk area that does not constitute a current exploitable vulnerability. An observation does not carry a severity tier — it carries a recommendation. Examples: inconsistent use of SafeERC20, missing NatSpec on critical functions, reliance on a single oracle without a fallback, Solidity version below the latest stable release.

Positive observations normally include good practices along with their supporting evidence, while negative observations usually speak to potential nonconformities — weaknesses that might occur if nothing is done to prevent their occurrence.

Collapsing observations into the findings list artificially inflates finding counts and dilutes the urgency of real bugs. Maintain them as a distinct section. Clients will thank you for the clarity.

Writing Specific and Actionable Recommendations

Recommendations must be practical, actionable, and aligned with security best practices. This is the standard. The failure mode is writing recommendations that satisfy the form of the standard without the substance.

Bad Recommendations (and Why They Fail)

Weak: “Add access control to sensitive functions.”

This tells the developer nothing they did not already know. Which functions? What access control model? Ownable? AccessControl? A timelock? A multisig? The recommendation provides no path forward.

Weak: “Consider using a reentrancy guard.”

“Consider” is the weasel word of audit recommendations. Either the reentrancy guard is required, or it isn’t. If it is required, say so. If you genuinely cannot determine whether it is required without more information, say that, and explain what information is needed.

Weak: “Follow the Checks-Effects-Interactions pattern.”

This is a textbook citation, not a recommendation. The developer needs to know where in which function the pattern is violated and exactly how to reorder the statements.

Good Recommendations (and Why They Work)

Strong: “In VaultManager.withdraw() (line 142), move the balances[msg.sender] = 0 state update to before the (bool success,) = msg.sender.call{value: amount}("") external call. This eliminates the reentrancy vector by ensuring the balance is zeroed before control is passed to the recipient. Alternatively, apply OpenZeppelin’s ReentrancyGuard modifier to withdraw() as a defense-in-depth measure.”

This recommendation names the file, the function, the line, the exact change, the rationale, and an alternative. A developer can implement it without asking a single clarifying question.

Strong: “Replace the single Chainlink price feed in OracleManager.getPrice() with a multi-source TWAP using at least two independent price feeds (e.g., Chainlink + Uniswap V3 TWAP). Add a circuit breaker that reverts if the two sources diverge by more than a configurable threshold (suggested: 2%). This mitigates the flash-loan oracle manipulation vector described in this finding.”

This recommendation specifies the mechanism, names the specific integration points, quantifies the threshold, and ties back to the finding it resolves.

One recommendation per finding. If a finding has multiple viable fixes, present them as ranked options with tradeoffs described. Do not bundle multiple unrelated fixes into a single recommendation — if one is implemented and the other is not, the finding’s status becomes ambiguous at the re-audit stage.

Handling Disputed Findings

Disputes are a normal and healthy part of the audit process. A developer who disagrees with a finding is not an obstacle — they are a source of information about design intent that may not have been documented.

The project team will typically review the report and respond with counterpoints on findings and suggested fixes. Build time for this feedback cycle into every engagement. The initial draft is not the final report.

When a finding is disputed, the auditor has four possible resolutions:

Confirm and maintain. The developer’s counterargument does not address the technical root cause or changes the preconditions in a way that still leaves the vulnerability reachable. Document the counterargument and your rebuttal in the finding’s discussion section. The finding remains at its original severity.

Downgrade. The developer’s counterargument is correct and demonstrates that the preconditions are harder to satisfy than originally assessed, or that the impact is smaller than claimed. Revise the severity accordingly and document the reasoning for the change.

Reclassify as observation. The “vulnerability” is actually a design decision that the team is aware of and has accepted, and the accepted risk is documented. Move the item from the findings list to the observations section with a note that the team has acknowledged the tradeoff.

Withdraw. The finding is factually incorrect. The auditor misread the code, the PoC contains a logic error, or a dependency behavior was misunderstood. Withdraw the finding with an explanation. A withdrawn finding is not a failure — an incorrect finding that survives into the published report is.

The one thing an auditor must never do is silently downgrade or remove a finding because the client is unhappy with it. That path leads to reports that mean nothing, and to protocols that get drained.

Example Finding Write-Ups: Good vs. Bad

Example 1: Reentrancy — Bad Write-Up

[HIGH] Reentrancy in VaultManager

There is a reentrancy vulnerability in the VaultManager contract. An attacker could potentially exploit this to drain funds. The contract should use a reentrancy guard. See also: SWC-107.

What is wrong here:

No function name, no file, no line number.
“Potentially” hedges the claim without providing enough context to understand when it is or isn’t exploitable.
No proof of concept — unverified.
Recommendation cites a guard without saying where to apply it.
Severity “High” cannot be confirmed without a PoC.

Example 1: Reentrancy — Good Write-Up

[CRITICAL-01] Reentrancy in VaultManager.withdraw() enables full treasury drain

Description

VaultManager.withdraw() (src/VaultManager.sol, line 142) sends ETH to msg.sender via a low-level call before setting balances[msg.sender] to zero. This violates the Checks-Effects-Interactions pattern (CEI). A malicious contract can re-enter withdraw() in its receive() function before the balance is cleared, repeatedly withdrawing its full balance until the contract holds no ETH.

// Current (vulnerable) implementation
function withdraw(uint256 amount) external {
    require(balances[msg.sender] >= amount, "Insufficient balance");
    (bool success,) = msg.sender.call{value: amount}(""); // ← external call before state change
    require(success, "Transfer failed");
    balances[msg.sender] -= amount; // ← state change after external call
}

Impact

Any user with a non-zero balance can deploy an attack contract and drain the entire ETH balance of VaultManager in a single transaction. There are no admin guards, pausing mechanisms, or withdrawal limits that prevent this. All deposited user funds (up to the TVL at time of exploit) are at risk of total, unrecoverable loss.

Proof of Concept

// forge test --match-test testReentrancyDrain -vvvv
contract AttackPoC is Test {
    VaultManager vault;

    function setUp() public {
        vault = new VaultManager();
        // Seed vault with victim funds
        address victim = makeAddr("victim");
        vm.deal(victim, 10 ether);
        vm.prank(victim);
        vault.deposit{value: 10 ether}();
        // Give attacker a minimal balance so they can enter the function
        vm.deal(address(this), 1 ether);
        vault.deposit{value: 1 ether}();
    }

    receive() external payable {
        // Re-enter on each ETH receive until vault is empty
        if (address(vault).balance > 0) {
            vault.withdraw(1 ether);
        }
    }

    function testReentrancyDrain() public {
        vault.withdraw(1 ether); // Triggers the exploit
        assertEq(address(vault).balance, 0, "Vault not drained");
        assertGt(address(this).balance, 10 ether, "No profit");
    }
}

Recommendation

Apply the fix in order of preference:

(Preferred) Reorder withdraw() to follow CEI: zero balances[msg.sender] before the external call.
(Defense in depth) Additionally apply nonReentrant from OpenZeppelin’s ReentrancyGuard to withdraw().

// Fixed implementation
function withdraw(uint256 amount) external nonReentrant {
    require(balances[msg.sender] >= amount, "Insufficient balance");
    balances[msg.sender] -= amount; // ← state change before external call
    (bool success,) = msg.sender.call{value: amount}("");
    require(success, "Transfer failed");
}

Example 2: Oracle Manipulation — Bad Observation Masquerading as a Finding

[MEDIUM] Oracle could be manipulated

The protocol uses a price oracle. Price oracles can be manipulated. This is a known attack vector in DeFi. Consider using a TWAP.

This is not a finding. It is a category of risk dressed up as a specific claim. There is no evidence that the oracle integration is actually vulnerable — only that oracle manipulation exists as a concept. If the protocol uses a Chainlink price feed correctly with staleness checks, this is not a medium-severity issue. It may not be an issue at all.

How Report Quality Affects the Attestation

Audits from reputable security companies are extremely beneficial in building trust in a community of users as well as reassuring investors that the project prioritizes good security. But that trust is not automatic — it is a function of the quality of the report itself.

An audit attestation is a claim that a qualified, independent party reviewed the code and reported what they found. Attestation examines and compares data and evidence to a control or process, and determines whether it is aligned, appropriate, and true. The report is the evidence. If the report is vague, its findings unverifiable, its recommendations useless, then the attestation is empty. It tells downstream readers nothing about whether the code is safe.

Projects that publish only a “certificate” without detailed findings give users no real information about the security posture of the protocol. The same is true of reports with detailed findings that are technically unfalsifiable — findings without PoCs, recommendations without specifics, severities without consistent methodology.

A well-documented portfolio, including high-quality reports, plays a crucial role when seeking clients. Clients are not just interested in ranking but also in past reports, demonstrating the capability to identify and communicate vulnerabilities clearly.

There is a second-order effect worth naming: the report is a permanent artifact. Once any identified vulnerabilities are rectified, the client may choose to make the audit report public in order to demonstrate the project’s commitment to security. A public report with inflated severities, withdrawn findings that were never marked as such, or recommendations that were implemented incorrectly will be read by researchers, attackers, and journalists who understand the code. The report’s quality is the auditor’s public record. A bad report is not just a disservice to the client — it is reputational exposure for the firm.

An audit process is expected to be concluded with an expression of opinion, and the audit finding is the basis for that opinion. Hence, the quality of the audit outcome is highly dependent on the audit findings.

The bar for a useful attestation is high. The report must be specific enough that a developer can act on every finding without asking a follow-up question. It must be honest enough that a severity classification of Critical actually means the protocol should not go live. It must be transparent enough that when the client disputes a finding, the auditor’s reasoning is documented and reviewable.

Anything less is a rubber stamp. The ecosystem has enough of those.

Checklist: Before You Submit the Report

Use this checklist on every finding before the report leaves your hands.

If any box is unchecked, the report is not ready.

This article is grounded in the following sources. Citations are embedded inline in the <cite> tags throughout the body, but here is a summary of the supporting evidence drawn upon:

Identifying bugs is not enough — you must convince others of their existence and impact. This requires proofs of code, articulate titles, and thorough documentation to make findings impossible to ignore.
Severity classification works because each tier reflects a balance between impact and exploitability, from issues that could completely compromise a protocol down to best-practice improvements.
A quality assurance phase in a professional audit validates that findings are accurately described, correctly classified by severity, and reproducible; checks consistency across the methodology; and verifies that suggested mitigations are practical and actionable.
A great audit is not just about finding issues — it is about communicating them well.