1.The Header
1.文件头(File Header)
【译者注:文件头是文件的开头部分,通常包含文件的类型、版本、编码等信息。】
1.Signature
1.1文件签名(File Signature)
【译者注:文件签名就是指各种文件格式中一段独特的字节。每一种格式都有和其他格式不同的字节段,这一段字节即“文件签名”,一般来说它的存在可以告诉我们某个文件是什么格式的[3]。譬如.png图片文件的数据开头就有“塒NG”,即十六进制的89 50 4E 47。更多信息可参考
https://www.filesignatures.net/】
Usually an archive will start with some sort of identifying string giving a signature for the archive format and version. You can use this as a way to make sure your utility is being run on the right type of file.
归档文件通常以特定标识字符串开头,用于标明格式类型及版本号。这可用于验证工具是否处理了
正确的
文件类型。
2.Index position
1.2索引位置
Most of the time, the archive index will start immediately, but sometimes the index is actually stored at the end of the file instead, since the archive packer doesn’t know how big it will be until afterwards (if the index is itself compressed, for instance). In that case, there will be a pointer to where the index is.
通常情况下索引会紧随开头的文件签名,但有时候索引实际上是储存在文件末尾处的,毕竟归档打包工具直到文件末尾处才知道索引的大小(例如,索引本身被压缩过的情况)。此时会有一个指向索引位置的指针。
2.The Index
2.内容索引
The key structure you will want to understand is the index of the archive contents, since it tells you how to get at the files contained inside.
内容索引是归档文件的核心结构,因为它会告诉你如何获取归档内包含的文件。
1.Index size
2.1 索引大小(Index size)
Usually the index will start with a size value, often simply the number of files contained in the index. This isn’t always the case, though, as sometimes instead the index will just continue until it hits a special ending entry (with, say, a negative file size or a null filename, etc.).
索引开头通常是一个表示大小的值,大部分情况下这个值表示包含的文件数。不过也会有例外,索引也可能一直延伸到特殊结束符(如负值文件大小或空文件名)为止。
2List of file entries
2.2文件条目列表(List of file entries)
Next there will be an list of the individual files contained in the archive. This can be either a constant- or variable-length data structure depending on how the filenames are handled. Sometimes there will be a hierarchical structure of filepaths to represent a whole directory tree inside, too. Each entry contains a number of standard bits of information:
接下来是记录归档内包含的独立文件的列表。条目长度可能是固定或可变的,这取决于文件名的存储方式。有时也会有一个文件路径的层次结构来表示整个目录树。每一项都记录着一定的标准信息:
1.Filename/filepath: can be a zero-terminated string, or sometimes the length will be explicitly given. Believe it or not, filenames are optional, as I’ve run across at least one case of a filename hash being stored instead.
2.2.1文件名/路径(Filename/filepath):可能以零终止符(\0)结尾或显式标注长度。需注意:文件名并非必需字段,笔者曾遇到以哈希值替代文件名的情况。
2.Position: an offet to the start of the file in the archive. Offsets can be from the start of the archive, the start of the index, or sometimes the start of the “file area” in the archive (i.e. an offset from the start of the contents of the first file, or equivalently from the end of the index).
2.2.2起始位置(Position):这是一个指向文件数据起始位置的偏移量,基准点可能是:归档文件头、索引起始位置,或首个文件内容起始位置(即归档内第一个文件的位置,或者说索引结束位置)。
3.Size: how large the file is. This is sometimes left out, since it can often be inferred from the offset to the next file. Other times, there are two different sizes: an original size for the file, and the compressed size as it is stored in the archive.
2.2.3文件大小(Size):记录了文件占用空间的大小。这个信息有时可能被省略,因为它可通过相邻文件偏移推算。有时候会有两个记录:原始大小与在归档中压缩后的大小。
4.Flags: is the file compressed or not, and if so, with what algorithm? Is it encrypted, and if so, is there a key or initialization value to use?
2.2.4标志位(Flags):标识文件是否被压缩过以及是否被加密过。若被压缩过,具体的压缩算法是什么呢?若被加密过,会否包含可供利用的密钥或者初始值?
5.Checksum: to ensure data integrity, sometimes a checksum for the file will be given. This can be highly annoying for hackers, since it means that to modify the archive we will need to reverse-engineer the checksum algorithm to be able to compute appropriate values for our new data (or else disable the check in the executable).
2.2.5校验和(Checksum):为确保数据的完整性,有时校验和会被给出。对破解者而言这可能比较麻烦,因为它意味着要修改归档就必须对校验和的算法进行逆向工程,以便计算出新数据对应的值(当然也可以禁用可执行文件中的检测)
Note that sometimes this information will be distributed, such as, say, the file position being given with the filename in an index structure, but the file size and compression flags given at that offset, right before the file data itself.
注意,某些归档会将元数据分散存储,例如索引中仅记录文件名和偏移量,文件大小和压缩标志存储在数据区起始位置。
3.The Files The files themselves are then just concatenated together in the archive, possibly compressed and possibly encrypted. The game engine knows where to find them from their index entries, so it can jump immediately to the ones it wants.
3. 文件数据(The Files)
文件数据按顺序连续存储,它们可能被压缩或加密。游戏引擎通过索引条目快速定位目标,以便能第一时间找到要调用的文件。
【由于涉及到专有名词,此处将原文一并附上,若有错误之处欢迎指出】
(未完待续)