solidity 变量内存布局——实战篇

Specer

2022-06-24

ETH, blockchain, solidity

背景

本文以实际的线上合约源码进一步了解 solidity 的变量底层存储形式，下面选取 Gnosis Safe的 logic 合约源码作为举例分析

正文

下面以类为单位逐一分析该合约里所有带 assembly 字段的代码的写法，因为站在汇编的角度去理解 solidity 变量的内存布局会更直观和底层一些

StorageAccessible

这个类里有个方法：

/**
 * @dev Reads `length` bytes of storage in the currents contract
 * @param offset - the offset in the current contract's storage in words to start reading from
 * @param length - the number of words (32 bytes) of data to read
 * @return the bytes that were read.
 */
function getStorageAt(uint256 offset, uint256 length) public view returns (bytes memory) {
    bytes memory result = new bytes(length * 32);
    for (uint256 index = 0; index < length; index++) {
        // solhint-disable-next-line no-inline-assembly
        assembly {
            let word := sload(add(offset, index))
            mstore(add(add(result, 0x20), mul(index, 0x20)), word)
        }
    }
    return result;
}

getStorageAt 可以获取指定 slot 位置（offset）后续 length 个 word 的具体内容。

首先第 8 行申明一个 result 变量作为最终的返回字段，长度就是调用者期望的长度，然后 for 循环挨个将 slot 的变量以 32bytes 为单位从 storage 里（其实就是 slot 区域）逐个拷贝到内存区域的 result 变量里
第 12 行意思是将指定位置（offset + index）的 slot 上存的内容赋值给 word 临时变量
13 行将 word 的内容存到正确的 result 数组里
- add(result, 0x20) 意思是跳过 result 的前 0x20 个字节，因为这前 0x20 个字节存的是变长数组 result 的长度值，也就是 length * 32
- mul(index, 0x20) 则是计算出正确的 result 的偏移位置
- 最后再将上面两者相加，意思是计算出了 word 变量拷贝到的 result 正确的内存位置

/**
 * @dev Performs a delegetecall on a targetContract in the context of self.
 * Internally reverts execution to avoid side effects (making it static).
 *
 * This method reverts with data equal to `abi.encode(bool(success), bytes(response))`.
 * Specifically, the `returndata` after a call to this method will be:
 * `success:bool  response.length:uint256  response:bytes`.
 *
 * @param targetContract Address of the contract containing the code to execute.
 * @param calldataPayload Calldata that should be sent to the target contract (encoded method name and arguments).
 */
function simulateAndRevert(address targetContract, bytes memory calldataPayload) external {
    // solhint-disable-next-line no-inline-assembly
    assembly {
        let success := delegatecall(gas(), targetContract, add(calldataPayload, 0x20), mload(calldataPayload), 0, 0)

        mstore(0x00, success)
        mstore(0x20, returndatasize())
        returndatacopy(0x40, 0, returndatasize())
        revert(0, add(returndatasize(), 0x40))
    }
}

这个函数的调用者主要是合约，delegatecall 等函数的原型及其解释说明可以参考网上的不少文档。另外这个函数调用的话必定会报错，但是却有返回错误信息，所以通常用于模拟一些极端场景的调用后会产生的结果，毕竟有 revert 最后都会回滚到调用前的状态

15 行，通过 delegatecall 调用 targetContract 的方法，calldataPayload 本身是是一个字符串，它的前 0x20 个字节存的是自身的长度值，所以 delegatecall 的第三个参数得在 calldataPayload 基础上加上 0x20 才是调用函数的 raw 二进制，第四个参数直接 mload calldataPayload 的前 0x20 个字节正好就是调用函数 raw 形式的长度值
17 行，将 success 存储到内存的 0x00 的位置，这里因为最终会 revert ，所以随意指定一个地址即可，这里就选了 0x00，也可以从 0x40 开始
18 行，顺着上文，将返回内容的长度存到紧挨着的 0x20 的位置
19 行，顺着上文，将返回内容存到紧挨着的 0x40 的位置
20 行，通过 revert 将 targetContract 返回的内容再原样返回给调用方（也是一个合约），其实我不太理解这里为什么一定需要用 revert 来实现，线上执行的时候不会报错么？

SignatureDecoder

/// @dev divides bytes signature into `uint8 v, bytes32 r, bytes32 s`.
/// @notice Make sure to peform a bounds check for @param pos, to avoid out of bounds access on @param signatures
/// @param pos which signature to read. A prior bounds check of this parameter should be performed, to avoid out of bounds access
/// @param signatures concatenated rsv signatures
function signatureSplit(bytes memory signatures, uint256 pos) internal pure returns ( uint8 v, bytes32 r, bytes32 s ) {
    // The signature format is a compact form of:
    //   {bytes32 r}{bytes32 s}{uint8 v}
    // Compact means, uint8 is not padded to 32 bytes.
    // solhint-disable-next-line no-inline-assembly
    assembly {
        let signaturePos := mul(0x41, pos)
        r := mload(add(signatures, add(signaturePos, 0x20)))
        s := mload(add(signatures, add(signaturePos, 0x40)))
        // Here we are loading the last 32 bytes, including 31 bytes
        // of 's'. There is no 'mload8' to do this.
        //
        // 'byte' is not working due to the Solidity parser, so lets
        // use the second best option, 'and'
        v := and(mload(add(signatures, add(signaturePos, 0x41))), 0xff)
    }
}

这个函数人如其名：提取签名里面的 r/s/v 三个字段

5 行，之所以入参有个 pos 字段，是因为有些 signatures 是多个签名串在一起的，所以需要一个 offset 来区分个数，这种场景一般外面都会有个 for 循环。如果就只是单个签名，那么这个 pos 传 0 即可
11 行，获取当前需要处理的签名的 offset ，一个签名长度是固定的 0x41 个字节
12 行，add(signaturePos, 0x20) 是指跳过 signatures 参数本身的前 0x20 个字节，因为它的前 0x20 个字节存的是 signatures 本身的长度值。然后的外面的 add 将指针跳到正确的偏移位置。最后通过 mload 将指针紧接着的后续 0x20 个字节从内存里（因为 signatures 是 memory 变量）加载出来存到 r
13 行，类似第 12 行处理，只不过需要多加个 0x20 ，因为还要跳过 r 的长度
19 行，分析方式与第 13 行类似，只不过错位了一个字节，最后通过一个 and 操作取出 v

Executor

function execute( address to, uint256 value, bytes memory data, Enum.Operation operation, uint256 txGas ) internal returns (bool success) {
    if (operation == Enum.Operation.DelegateCall) {
        // solhint-disable-next-line no-inline-assembly
        assembly {
            success := delegatecall(txGas, to, add(data, 0x20), mload(data), 0, 0)
        }
    } else {
        // solhint-disable-next-line no-inline-assembly
        assembly {
            success := call(txGas, to, value, add(data, 0x20), mload(data), 0, 0)
        }
    }
}

这段代码比较简单，主要是注意一下 bytes 参数 data 本身也是由前 0x20 个字节的长度值和后续的真正的内容组成的

ModuleManager

/// @dev Allows a Module to execute a Safe transaction without any further confirmations and return data
/// @param to Destination address of module transaction.
/// @param value Ether value of module transaction.
/// @param data Data payload of module transaction.
/// @param operation Operation type of module transaction.
function execTransactionFromModuleReturnData( address to, uint256 value, bytes memory data, Enum.Operation operation ) public returns (bool success, bytes memory returnData) {
    success = execTransactionFromModule(to, value, data, operation);
    // solhint-disable-next-line no-inline-assembly
    assembly {
        // Load free memory location
        let ptr := mload(0x40)
        // We allocate memory for the return data by setting the free memory location to
        // current free memory location + data size + 32 bytes for data size value
        mstore(0x40, add(ptr, add(returndatasize(), 0x20)))
        // Store the size
        mstore(ptr, returndatasize())
        // Store the data
        returndatacopy(add(ptr, 0x20), 0, returndatasize())
        // Point the return data to the correct memory location
        returnData := ptr
    }
}

讲解之前需要提一下背景，这个 execTransactionFromModule 会调用前面的那个 exec 方法，所以这里面的 returndatasize 等函数都是有效的

11 行，从 0x40 位置取出当前已经申请好的内存空间位置，也就是说这是一个指向指针的指针。至于为什么是 0x40，需要看看官方文档
14 行，给 0x40 位置上的值赋值一个新的起始地址（还未被使用的地址），因为这之间的内存位置在下文会被占用。赋值了一个新的空闲地址就可以避免被后面可能的代码误加载。其实这行可以优化一下，改成凑整 32 字节的形式：
1
mstore(0x40, add(ptr, and(add(add(returndatasize(), 0x20), 0x1f), not(0x1f))))
16 行，先存 0x20 个字节的返回内容的长度值
18 行，紧接着后面存上返回的具体内容

/// @dev Returns array of modules.
/// @param start Start of the page.
/// @param pageSize Maximum number of modules that should be returned.
/// @return array Array of modules.
/// @return next Start of the next page.
function getModulesPaginated(address start, uint256 pageSize) external view returns (address[] memory array, address next) {
    // Init array with max page size
    array = new address[](pageSize);

    // Populate return array
    uint256 moduleCount = 0;
    address currentModule = modules[start];
    while (currentModule != address(0x0) && currentModule != SENTINEL_MODULES && moduleCount < pageSize) {
        array[moduleCount] = currentModule;
        currentModule = modules[currentModule];
        moduleCount++;
    }
    next = currentModule;
    // Set correct size of returned array
    // solhint-disable-next-line no-inline-assembly
    assembly {
        mstore(array, moduleCount)
    }
}

上面的例子只用到了一行汇编，但是用到一个小 trick，首先用 new 关键字创建的数组都是变长的动态数组，然后变长数组的内存布局里（上一遍文章有提到），第一个 0x20 个字节存的是数组的长度。

这里就是利用了这个特点，直接修改第一个 0x20 的内容为最终的长度值，不然的话，常规做法就是要新申明一个定长的临时数组，然后通过 for 循环一一拷贝到临时数组，最后 return 这个临时数组。

但我不确定直接修改变长数组的 length 属性是否也同样可以达到这个效果

GnosisSafe

/// @dev Returns the chain id used by this contract.
function getChainId() public view returns (uint256) {
    uint256 id;
    // solhint-disable-next-line no-inline-assembly
    assembly {
        id := chainid()
    }
    return id;
}

这个是获取 chainid 的老式写法，因为刚启用这个 opcode 指令时只支持汇编获取。从 0.8.0 开始可以直接通过 block.chainid 获取了( 其他全局变量可以查看官方文档 )。