DAT file format

Introduction
This articles describes the architecture of the DAT1 and DAT2 archive formats that have been used in Fallout 1 (DAT1) and Fallout 2 (DAT2).

DAT1 vs DAT2
There were two different DAT file formats used for the Fallout games. Both Fallout 1 and Fallout 2 used different formats but used the same file ending: *.dat. To avoid misunderstandings we'll refer to DAT1 (for the Fallout 1 DAT format) and DAT2 (for the Fallout 2 version) in this document. It's important that DAT2 is not an improved DAT1 version but more a complete rewrite that got not much in common with DAT1.

File architecture
uint32  4   Directory Count uint32  4   (unknown) uint32  4   (unknown - always 0) uint32  4   (unknown)

// Directory name Block - for each directory byte  1   Directory name length char  *   Directory name

// Directory content - for each directory uint32  4   Number of files in the directory uint32  4   (unknown) uint32  4   (unknown) uint32  4   (unknown) // File list block - for each file in directory byte  1   Filename length char  *   Filename uint32  4   Attribute (0x20 / 32 == plain, 0x40 / 64 == compressed) uint32  4   Offset from the beginning of the DAT file, indicating the start of file data. uint32  4   Uncompressed file size uint32  4   Compressed file size (0 if equal to uncompressed)

// Data block byte  *   File data for all files

Fallout 1 LZSS compressed files uncompression algorithm
Originally written by Shadowbird on NMA forum.

D = 4096; // Dictionary (a.k.a. sliding window / ring / buffer) size
 * This is a file uncompression algorithm for compressed files already extracted from the DAT1 file, not a an extraction algorithm. Since most unpacker applications already incorporate this, you would only normally need it if you're working on your own unpacker.
 * All variables (FLAGS, N, O, L) initialized as 0.
 * When writing to output below and after 2.2. (N > 0), write every byte also in dictionary, advancing dictionary offset by one. Once offset reaches the end of the dictionary, drop it back to the start (normally 0).
 * Clearing dictionary means replacing it's contents with spaces (character #32, 0x20), not 0 or other value!

0. Set dictionary offset to D-18 1. If at the end of file, exit (duh). 2. Read N (2 bytes == word) from input. The absolute value of N is how many bytes of data to read (if N=0, exit, duh) 2.1. If N < 0, read the absolute value of N bytes from input and write to output (do not put into dictionary). Go to 1. 2.2. If N > 0, clear dictionary (no NOT reset dictionary offset!), repeat below until N bytes have been read from input (not N times!), and then go to 1. 2.2.1. Read FLAGS (1 byte) from input. Repeat below 8 times (but go to 1. as soon as N bytes have been read from input). 2.2.1.1. If FLAGS is *odd* (& 1 <> 0), read 1 byte from input, write to output, then go to 2.2.1.3. 2.2.1.2. If FLAGS is *even* (& 1 == 0) 2.2.1.2.1. Read O (1 byte) and L (1 byte) from input. 2.2.1.2.2. Take away the High-nibble (first 4 bits) from L and prepend it to O (e.g., L=0x12, O=0x34 becomes L=0x2, O=0x134) 2.2.1.2.3. Read L+3 bytes from dictionary at offset O (wrap to the start of dictionary if past the end), and write them to the output. 2.2.1.3. Divide FLAGS by 2, rounding down (>> 1).

DAT2 specs Document
You can see Fallout's DAT2 files as if they where common ZIPs, they're used to store the files that will be used later inside the game, you can store anything you want, compressed or not, having the length you want from a common TXT to a immense MVE, and if you want, you can include a nice but useless DLL file. At the bottom of the DAT it's included all the information about each one of these files and some information about the DAT itself.

The DAT2 Format
DAT2 files are divided in 3 parts, Data Block, Directory Tree and Fixed DAT Information block. Data Blocks contains all the files stored in the DAT, some of them needs to be GZipped, others don't. The Directory Tree contains all the information about each file stored in Data Block, as well as the offset where it's located, if it's compressed or not, packed/unpacked sizes, etc. And finally the Fixed DAT Information block that contains the size in bytes of both full DAT and the Directory Tree. Here you can see a small scheme of how DAT's structure:
 * FilesTotal + DirTree corresponds to Directory Tree block
 * TreeSize + DataSize corresponds to Fixed DAT Information block

The Data Block
The Data Block contains just plain files, their technical information is located in the Directory Tree. Data Block starts from the very beginning of a DAT file. They can be compressed or not, (Fallout engine uses zlib stream data compression), if they're compressed the signature 0x78DA appears at the begin of the file, if not, there is no signature, the file starts without signature. The 0x78DA compression signature has an integer (2 bytes/WORD) nature. 0x78DA in ASCII is "xÚ" as char is 120 for 'x' and 218 for 'Ú' Compressed files are "zlib stream data" (RFC-1950(zlib format), RFC-1951(deflate format), RFC-1952(gzip format)). However, if you attach this header 1F 8B 08 08 9F E8 B7 36 02 03 to the file, such file could been easily decompressed with WinZip.

The Directory Tree
Directory Tree contains entries that specifies about a file stored in the Data Block. These entries can be varying depending on the FilenameSize of the file (Path + Filename). Like you saw in the scheme located at the beginning of this document, Directory Tree has been divided into 2 parts, FilesTotal and the DirTree. FilesTotal contains how many files are stored in the DAT, DirTree contains all the information about these files. FilesTotal is declared as a DWORD (4 bytes/Long) type and is readed in INTEL L-H format. Format of DirTree entries DirTree has a private structure. The length of this structure can vary depending on the length of the Filename (path + filename). All the entries are DWord types unless it's specified. At the end of this chapter you can find a scheme on the structure and the way it's declared on C and Visual Basic programming languages. All the directories and files are stored in DOS 8.3 format, that is 8 characters for the file name and 3 characters for the file extension. All the entries are sorted alphabetically in a descendent direction. Structure scheme: all Dwords are readed in INTEL L-H format.


 * Dword stands for 4 bytes/long integers 0xNN NN NN NN
 * Word stands for 2 bytes integers 0xNN NN
 * Byte stands for 1 byte integer 0xNN
 * String stands for common string bytes "ABCDEF123456!@#$%/][\", etc.

Declaration of a DirEntry  struct DirEntry {     DWORD FilenameSize; char Filename[FilenameSize]; BYTE Type; DWORD RealSize; DWORD PackedSize; DWORD Offset; };  Type DirEntryId FilenameSize As Long Filename As String * 255 End Type Type DirEntry Type As Byte RealSize As Long PackedSize As Long Offset As Long End Type
 * C decorated structure:
 * Visual Basic decorated structure:

 Entry Example 


 * This exact example can be found on the Team X DAT specs document.

How to find a DirTreeAddr (starting location of Directory Tree) To find the beginning of Directory Tree you can use this calculation: DirTreeAddr = DataSize - TreeSize - 4

Credits
Original DAT1 format description by Shadowbird (gmail.com, account "shadowbird.lv").

Original DAT2 format copyright 2000 by MatuX (matip@fibertel.com.ar) unless it's specified.