UOP file format [Updated for reading/writing]
Posted: Mon Oct 25, 2021 2:00 am
I wanted to capture the format of the UOP files used in UO, and also a reference class that can be used to subclass other classes to handle the different types of data. I updated this enable writing as well.
//Copyright © 2021 Charles Kerr. All rights reserved.
#ifndef UOPData_hpp
#define UOPData_hpp
#include <string>
#include <fstream>
#include <cstdint>
#include <vector>
#include <map>
#include <tuple>
/*******************************************************************************
Acknowledgement
This information was gleamed from Mythic LegacyMul Convertor.
Special thanks for those that deciphered that data, and allowing that
source to be available for others to examine and learn.
******************************************************************************/
/*******************************************************************************
Hashes
Hashes are used to define who the data is used (what it represents).
There are two types of hashes used (Alder32 and HashLittle2). For
more information on these refer to http://burtleburtle.net/bob/c/lookup3.c
In the hashstrings, {#} is used as subsitution placeholders. The # represents
the number of characters the final substituion should be (to pad leading 0).
So a {2} would indicate that it should be two characters. So if one is reprenting
a number 1, it would result in 01.
The hash strings used for each file type are as follows(case is important).
Some file types use two different hashes. In addition the number of keys(hashes)
to be built can very. Other programs that process UOP files use
0x7FFFF as an entry.
Art
"build/artlegacymul/{8}.tga"
The number being replaced essentially corresponds to the idx
entry in artidx.mul.
The number of keys to be built is around 0x13FDC.
UOFiddler requires this exact idx length to recognize UOHS art files (it checks with == operator, not with >=)
GumpArt
"build/gumpartlegacymul/{8}.tga"
"build/gumpartlegacymul/{7}.tga"
The number being replaced essentially corresponds to the idx
entry in gumpidx.mul.
Map
"build/map{1}legacymul/{8}.dat"
The first substitution is the map number, the second one is the
index. An index represents index*C4000 location in a corresponding
map mul file.
Sound
"build/soundlegacymul/{8}.dat"
Multi
"build/multicollection/{6}.bin"
Embedded with the multi data is a file, housing.bin. This
is identifed has file hash : 0x126D1E99DDEDEE0A
It is compressed, and that data should be treated as a
not part of the multi.mul, but a separate file housing.bin.
******************************************************************************/
/*******************************************************************************
Notes/Exceptions
For the most part, when one access the data pointed to by that
entry, it has the same format as the data in corresponding mul file.
Exceptions:
Gumps
The first 8 bytes of the data represent the the width
(bytes 0-4) and height (bytes 4-8) of the gump
******************************************************************************/
/*******************************************************************************
UOP file format
UOP format holds a variety of different data for Ultima Online. The
file contains table(s) of index entries , which contains information about where
the data is in the file for that entry. It also contains whether or not the data
is compress (zlib compression), and a hash! This hash is based on the original
file name , and it format varies based on each file type. The hash has a direct
correlation of what "index" in an IDX (or mapblock for non idx files) the data
is correlated with.
A table entry has the following format
UOP Table entry:
std::int64_t data_offset ; // Offset to the data for this entry
std::uint32_t header_length; // Length of header
std::uint32_t compress_size; // Compressed size of data
std::uint32_t decompress_size; // Decompressed size of data
std::uint64_t identifer; // Filename(index) hash (HashLittle2)
std::uint32_t data_hash; // Data hash (Alder32)
std::int16_t compression; // 0 = none, 1 = zlib
Using the table entry, the file format is as follows
UOP File Format (the table entry will be at offset 0x28 or greater):
std::int32_t signature; // This signifies to be a UOP file
// and has a fixed value of
// 0x50594D ('MYP')
std::int32_t version; // Version of the format/file
// At this time believe this documentation
// is valid for versions below 5 inclusive
std::int32_t timestamp; // ? Uknown, believed to be a timestamp or something
// for the file (0xFD23EC43)
std::uint64_t table_offset; // Offset to the next table
// There can be multiple tables in the file!
std::uint32_t tablesize // Only needed really for writing(table (block) size)
// current value is 100
std::uint32_t filecount // Each entry is consider a file
std::int32_t unknown // Value is 1, perhaps modified count?
std::int32_t unknown // Value is 1
std::int32_t unknown // Value is 0
The following is repeated for each table
std::uint32_t number_entries; // how many entries are in the table
std::uint64_t next_table; // Offset to the next table
UOPTable table[number_entries];
******************************************************************************/
namespace UO {
/************************************************************************
USE:
One subclasses this class for each data type to handle. The pertinent
methods to override are:
Reading:
virtual bool specialIdentifer(std::uint64_t identifier, std::vector<unsigned char> &data)
This method allows one to handle a special type hash. Multi's include a housing.bin
file, that this is a way to capture it. In that case one would either ignore it (but still
return true, meaning for the base class to not process it), or take the data
and create a housing.bin and return true. Regardless, this is a way to catch
special identifiers before processing happens;
virtual void processData(std::size_t index, std::vector<unsigned char> &data)
This method is the main method. It provides the subclass with the index it found
based on the identifier, and the data associated with it. The subclass can then
interpret the data accordingly. See above for notes on the data.
virtual bool identifierFailed(std::uint64_t identifier, std::vector<unsigned char> &data)
This method is alerting the subclass that the identifer was not found in the hashkey lookup.
This method can mean: 1. specialIdentifer was not used, and it is now being called because
no lookup key was found.
2. the max number of keys told to be created upon reading was
insufficent.
3. An improper hash string format was provided upon reading.
4. Something changed in the format?
If the subclass wants all processing to stop on the uop file, return false and the reading
will stop with a false result. If one responds true to this method, processing on the next
entry will continue.
Writing:
virtual std::tuple<std::size_t,std::string,bool> retrieveInfo(std::size_t count, std::vector<unsigned char> &data)
This method is called for the subclass to provide the data for the entry, and information on hash string format,
what index/key it represents, and if the data should be compressed.
There are two methods provided for reading/writing (not to be overridden).
bool readUOP(std::ifstream &input,std::size_t maxindex, const std::string &hash_format1, const std::string &hash_format2="" )
This method is to read a UOP. The file must be open (and the stream is passed via input, opened for binary reading). One provides the max number of keys
the method is to make (see above for information on this number). In addition, two hash formats may be provided to be used
to look for keys (see above for hash formats).
bool writeUOP(std::ofstream &output, std::uint32_t total_entries)
For this method, one provides an open stream (UOP file that is opened for binary writing). In addition, the total number of
entries that will be written.
************************************************************************/
class UOPData {
private:
/******************** Table entry structure ***********************/
struct TableEntry {
std::int64_t offset ;
std::uint32_t header_length ;
std::uint32_t compressed_length ;
std::uint32_t decompressed_length ;
std::uint64_t identifer ;
std::uint32_t data_block_hash ;
std::int16_t compression ;
TableEntry();
TableEntry & load(std::istream &input) ;
TableEntry & save(std::ostream &output) ;
// 34 bytes for a table entry
/*********************** Constants used ******************/
static constexpr unsigned int _entry_size = 34 ;
};
/******************** File Header structure ***********************/
struct FileHeader {
std::int32_t signature ;
std::int32_t version ;
std::int32_t stamp ;
std::uint64_t table_offset ;
std::uint32_t tablesize ;
std::uint32_t filecount ;
std::int32_t unknown0;
std::int32_t unknown1 ;
std::int32_t unknown2 ;
FileHeader() ;
bool valid() const ;
FileHeader & load(std::ifstream &input);
FileHeader & save(std::ofstream &output);
FileHeader & pad(std::ofstream &output); //Pad with zero
// from current
// location to table_offset
/*********************** Constants used ******************/
// File signature
static constexpr unsigned int _uop_identifer = 0x50594D;
// version
static constexpr unsigned int _uop_version = 5 ;
// Number entries in a table (maximum)
static constexpr unsigned int _table_size = 100 ;
// Value of the "timestamp" field in the file header
static constexpr unsigned int _uop_stamp = 0xFD23EC43;
// Offset where we start tables
static constexpr unsigned int _table_start = 0x200;
// the file header is at least 40 bytes. So no table should
// start until after this location
static constexpr unsigned int _file_header_size = 0x28 ;
};
/****************** zlib compression wrappers *********************/
std::vector<unsigned char> compress(const std::vector<unsigned char> &data) const;
std::vector<unsigned char> decompress(const std::vector<unsigned char> &source, std::size_t decompressed_size) const;
/********************* hash routines ******************************/
std::uint64_t HashLittle2(const std::string& s) const ;
std::uint32_t HashAdler32(const char* d, std::uintmax_t length ) const ;
std::uint32_t HashAdler32(const std::vector<unsigned char> &data) const ;
// Apply the index into the format string
std::string format(const std::string& hashformat, std::size_t index);
// Build a series of identifiers for a hash string and count of entries
std::map<std::uint64_t,std::size_t> buildIdentifiers(const std::string &hashstring, std::size_t number_entries) ;
// Retreive and index from identifer
std::size_t retrieveIndex(std::uint64_t identifer, const std::map<std::uint64_t,std::size_t> &lookup1,const std::map<std::uint64_t,std::size_t> &lookup2 ) const ;
/*************************** Index gather/writers ******************/
std::vector<UOPData::TableEntry> gatherEntries(std::ifstream &input, std::uint64_t offset);
// Writes out table entries (and the table) and returns a vector of offsets for each entry
std::vector<std::uint64_t> buildAllTable(std::ofstream &output,std::uint32_t totalentry,std::uint32_t tablecount);
// Builds a table (with the amount of entries) in the output stream. Returns a vector of entry offsets
std::vector<std::uint64_t> buildTable(std::uint32_t entrycount,std::ofstream &output);
protected:
/********************** Override these by subclasses **************/
// Special processing for this identifier (if special, return true, else false)
virtual bool specialIdentifer(std::uint64_t identifier, std::vector<unsigned char> &data);
// Process the data however it means to the subclass
virtual void processData(std::size_t index, std::vector<unsigned char> &data);
// Identifier lookup failed for the entry. Return true if processing should continue
virtual bool identifierFailed(std::uint64_t identifier, std::vector<unsigned char> &data);
// Retrieve the key/index, hashstring, compression flag, and data to be written (count is for the
// subclass to use or not to understand what data is being asked for. There may not be a one for one
// correlation of index to count if one wants to not save empty indexes (idx entries) in the uop file
virtual std::tuple<std::size_t,std::string,bool> retrieveInfo(std::size_t count, std::vector<unsigned char> &data);
// Read a UOP
bool readUOP(std::ifstream &input,std::size_t maxindex, const std::string &hash_format1, const std::string &hash_format2="" ) ;
bool writeUOP(std::ofstream &output, std::uint32_t total_entries);
public:
virtual ~UOPData() = default ;
UOPData() = default ;
};
}
#endif /* UOPData_hpp */
#ifndef UOPData_hpp
#define UOPData_hpp
#include <string>
#include <fstream>
#include <cstdint>
#include <vector>
#include <map>
#include <tuple>
/*******************************************************************************
Acknowledgement
This information was gleamed from Mythic LegacyMul Convertor.
Special thanks for those that deciphered that data, and allowing that
source to be available for others to examine and learn.
******************************************************************************/
/*******************************************************************************
Hashes
Hashes are used to define who the data is used (what it represents).
There are two types of hashes used (Alder32 and HashLittle2). For
more information on these refer to http://burtleburtle.net/bob/c/lookup3.c
In the hashstrings, {#} is used as subsitution placeholders. The # represents
the number of characters the final substituion should be (to pad leading 0).
So a {2} would indicate that it should be two characters. So if one is reprenting
a number 1, it would result in 01.
The hash strings used for each file type are as follows(case is important).
Some file types use two different hashes. In addition the number of keys(hashes)
to be built can very. Other programs that process UOP files use
0x7FFFF as an entry.
Art
"build/artlegacymul/{8}.tga"
The number being replaced essentially corresponds to the idx
entry in artidx.mul.
The number of keys to be built is around 0x13FDC.
UOFiddler requires this exact idx length to recognize UOHS art files (it checks with == operator, not with >=)
GumpArt
"build/gumpartlegacymul/{8}.tga"
"build/gumpartlegacymul/{7}.tga"
The number being replaced essentially corresponds to the idx
entry in gumpidx.mul.
Map
"build/map{1}legacymul/{8}.dat"
The first substitution is the map number, the second one is the
index. An index represents index*C4000 location in a corresponding
map mul file.
Sound
"build/soundlegacymul/{8}.dat"
Multi
"build/multicollection/{6}.bin"
Embedded with the multi data is a file, housing.bin. This
is identifed has file hash : 0x126D1E99DDEDEE0A
It is compressed, and that data should be treated as a
not part of the multi.mul, but a separate file housing.bin.
******************************************************************************/
/*******************************************************************************
Notes/Exceptions
For the most part, when one access the data pointed to by that
entry, it has the same format as the data in corresponding mul file.
Exceptions:
Gumps
The first 8 bytes of the data represent the the width
(bytes 0-4) and height (bytes 4-8) of the gump
******************************************************************************/
/*******************************************************************************
UOP file format
UOP format holds a variety of different data for Ultima Online. The
file contains table(s) of index entries , which contains information about where
the data is in the file for that entry. It also contains whether or not the data
is compress (zlib compression), and a hash! This hash is based on the original
file name , and it format varies based on each file type. The hash has a direct
correlation of what "index" in an IDX (or mapblock for non idx files) the data
is correlated with.
A table entry has the following format
UOP Table entry:
std::int64_t data_offset ; // Offset to the data for this entry
std::uint32_t header_length; // Length of header
std::uint32_t compress_size; // Compressed size of data
std::uint32_t decompress_size; // Decompressed size of data
std::uint64_t identifer; // Filename(index) hash (HashLittle2)
std::uint32_t data_hash; // Data hash (Alder32)
std::int16_t compression; // 0 = none, 1 = zlib
Using the table entry, the file format is as follows
UOP File Format (the table entry will be at offset 0x28 or greater):
std::int32_t signature; // This signifies to be a UOP file
// and has a fixed value of
// 0x50594D ('MYP')
std::int32_t version; // Version of the format/file
// At this time believe this documentation
// is valid for versions below 5 inclusive
std::int32_t timestamp; // ? Uknown, believed to be a timestamp or something
// for the file (0xFD23EC43)
std::uint64_t table_offset; // Offset to the next table
// There can be multiple tables in the file!
std::uint32_t tablesize // Only needed really for writing(table (block) size)
// current value is 100
std::uint32_t filecount // Each entry is consider a file
std::int32_t unknown // Value is 1, perhaps modified count?
std::int32_t unknown // Value is 1
std::int32_t unknown // Value is 0
The following is repeated for each table
std::uint32_t number_entries; // how many entries are in the table
std::uint64_t next_table; // Offset to the next table
UOPTable table[number_entries];
******************************************************************************/
namespace UO {
/************************************************************************
USE:
One subclasses this class for each data type to handle. The pertinent
methods to override are:
Reading:
virtual bool specialIdentifer(std::uint64_t identifier, std::vector<unsigned char> &data)
This method allows one to handle a special type hash. Multi's include a housing.bin
file, that this is a way to capture it. In that case one would either ignore it (but still
return true, meaning for the base class to not process it), or take the data
and create a housing.bin and return true. Regardless, this is a way to catch
special identifiers before processing happens;
virtual void processData(std::size_t index, std::vector<unsigned char> &data)
This method is the main method. It provides the subclass with the index it found
based on the identifier, and the data associated with it. The subclass can then
interpret the data accordingly. See above for notes on the data.
virtual bool identifierFailed(std::uint64_t identifier, std::vector<unsigned char> &data)
This method is alerting the subclass that the identifer was not found in the hashkey lookup.
This method can mean: 1. specialIdentifer was not used, and it is now being called because
no lookup key was found.
2. the max number of keys told to be created upon reading was
insufficent.
3. An improper hash string format was provided upon reading.
4. Something changed in the format?
If the subclass wants all processing to stop on the uop file, return false and the reading
will stop with a false result. If one responds true to this method, processing on the next
entry will continue.
Writing:
virtual std::tuple<std::size_t,std::string,bool> retrieveInfo(std::size_t count, std::vector<unsigned char> &data)
This method is called for the subclass to provide the data for the entry, and information on hash string format,
what index/key it represents, and if the data should be compressed.
There are two methods provided for reading/writing (not to be overridden).
bool readUOP(std::ifstream &input,std::size_t maxindex, const std::string &hash_format1, const std::string &hash_format2="" )
This method is to read a UOP. The file must be open (and the stream is passed via input, opened for binary reading). One provides the max number of keys
the method is to make (see above for information on this number). In addition, two hash formats may be provided to be used
to look for keys (see above for hash formats).
bool writeUOP(std::ofstream &output, std::uint32_t total_entries)
For this method, one provides an open stream (UOP file that is opened for binary writing). In addition, the total number of
entries that will be written.
************************************************************************/
class UOPData {
private:
/******************** Table entry structure ***********************/
struct TableEntry {
std::int64_t offset ;
std::uint32_t header_length ;
std::uint32_t compressed_length ;
std::uint32_t decompressed_length ;
std::uint64_t identifer ;
std::uint32_t data_block_hash ;
std::int16_t compression ;
TableEntry();
TableEntry & load(std::istream &input) ;
TableEntry & save(std::ostream &output) ;
// 34 bytes for a table entry
/*********************** Constants used ******************/
static constexpr unsigned int _entry_size = 34 ;
};
/******************** File Header structure ***********************/
struct FileHeader {
std::int32_t signature ;
std::int32_t version ;
std::int32_t stamp ;
std::uint64_t table_offset ;
std::uint32_t tablesize ;
std::uint32_t filecount ;
std::int32_t unknown0;
std::int32_t unknown1 ;
std::int32_t unknown2 ;
FileHeader() ;
bool valid() const ;
FileHeader & load(std::ifstream &input);
FileHeader & save(std::ofstream &output);
FileHeader & pad(std::ofstream &output); //Pad with zero
// from current
// location to table_offset
/*********************** Constants used ******************/
// File signature
static constexpr unsigned int _uop_identifer = 0x50594D;
// version
static constexpr unsigned int _uop_version = 5 ;
// Number entries in a table (maximum)
static constexpr unsigned int _table_size = 100 ;
// Value of the "timestamp" field in the file header
static constexpr unsigned int _uop_stamp = 0xFD23EC43;
// Offset where we start tables
static constexpr unsigned int _table_start = 0x200;
// the file header is at least 40 bytes. So no table should
// start until after this location
static constexpr unsigned int _file_header_size = 0x28 ;
};
/****************** zlib compression wrappers *********************/
std::vector<unsigned char> compress(const std::vector<unsigned char> &data) const;
std::vector<unsigned char> decompress(const std::vector<unsigned char> &source, std::size_t decompressed_size) const;
/********************* hash routines ******************************/
std::uint64_t HashLittle2(const std::string& s) const ;
std::uint32_t HashAdler32(const char* d, std::uintmax_t length ) const ;
std::uint32_t HashAdler32(const std::vector<unsigned char> &data) const ;
// Apply the index into the format string
std::string format(const std::string& hashformat, std::size_t index);
// Build a series of identifiers for a hash string and count of entries
std::map<std::uint64_t,std::size_t> buildIdentifiers(const std::string &hashstring, std::size_t number_entries) ;
// Retreive and index from identifer
std::size_t retrieveIndex(std::uint64_t identifer, const std::map<std::uint64_t,std::size_t> &lookup1,const std::map<std::uint64_t,std::size_t> &lookup2 ) const ;
/*************************** Index gather/writers ******************/
std::vector<UOPData::TableEntry> gatherEntries(std::ifstream &input, std::uint64_t offset);
// Writes out table entries (and the table) and returns a vector of offsets for each entry
std::vector<std::uint64_t> buildAllTable(std::ofstream &output,std::uint32_t totalentry,std::uint32_t tablecount);
// Builds a table (with the amount of entries) in the output stream. Returns a vector of entry offsets
std::vector<std::uint64_t> buildTable(std::uint32_t entrycount,std::ofstream &output);
protected:
/********************** Override these by subclasses **************/
// Special processing for this identifier (if special, return true, else false)
virtual bool specialIdentifer(std::uint64_t identifier, std::vector<unsigned char> &data);
// Process the data however it means to the subclass
virtual void processData(std::size_t index, std::vector<unsigned char> &data);
// Identifier lookup failed for the entry. Return true if processing should continue
virtual bool identifierFailed(std::uint64_t identifier, std::vector<unsigned char> &data);
// Retrieve the key/index, hashstring, compression flag, and data to be written (count is for the
// subclass to use or not to understand what data is being asked for. There may not be a one for one
// correlation of index to count if one wants to not save empty indexes (idx entries) in the uop file
virtual std::tuple<std::size_t,std::string,bool> retrieveInfo(std::size_t count, std::vector<unsigned char> &data);
// Read a UOP
bool readUOP(std::ifstream &input,std::size_t maxindex, const std::string &hash_format1, const std::string &hash_format2="" ) ;
bool writeUOP(std::ofstream &output, std::uint32_t total_entries);
public:
virtual ~UOPData() = default ;
UOPData() = default ;
};
}
#endif /* UOPData_hpp */
//Copyright © 2021 Charles Kerr. All rights reserved.
#include "UOPData.hpp"
#include "StringUtility.hpp"
#include <stdexcept>
#include <zlib.h>
namespace UO {
/*************************************************************************
TableEntry methods
************************************************************************/
//===============================================================
UOPData::TableEntry::TableEntry(){
offset = 0 ;
header_length = 0 ;
compressed_length = 0 ;
decompressed_length = 0 ;
identifer = 0;
data_block_hash = 0 ;
compression = 0 ;
}
//===============================================================
UOPData::TableEntry & UOPData::TableEntry::load(std::istream &input) {
input.read(reinterpret_cast<char*>(&offset),sizeof(offset));
input.read(reinterpret_cast<char*>(&header_length),sizeof(header_length));
input.read(reinterpret_cast<char*>(&compressed_length),sizeof(compressed_length));
input.read(reinterpret_cast<char*>(&decompressed_length),sizeof(decompressed_length));
input.read(reinterpret_cast<char*>(&identifer),sizeof(identifer));
input.read(reinterpret_cast<char*>(&data_block_hash),sizeof(data_block_hash));
input.read(reinterpret_cast<char*>(&compression),sizeof(compression));
return *this ;
}
//===============================================================
UOPData::TableEntry & UOPData::TableEntry::save(std::ostream &output) {
output.write(reinterpret_cast<char*>(&offset),sizeof(offset));
output.write(reinterpret_cast<char*>(&header_length),sizeof(header_length));
output.write(reinterpret_cast<char*>(&compressed_length),sizeof(compressed_length));
output.write(reinterpret_cast<char*>(&decompressed_length),sizeof(decompressed_length));
output.write(reinterpret_cast<char*>(&identifer),sizeof(identifer));
output.write(reinterpret_cast<char*>(&data_block_hash),sizeof(data_block_hash));
output.write(reinterpret_cast<char*>(&compression),sizeof(compression));
return *this ;
}
/*************************************************************************
FileHeader methods
************************************************************************/
//=============================================================================
UOPData::FileHeader::FileHeader() {
signature = _uop_identifer;
version = _uop_version ;
stamp = _uop_stamp ;
table_offset = _table_start;
tablesize = _table_size ;
filecount = 1;
unknown0 = 1;
unknown1 = 1;
unknown2 = 0 ;
}
//=============================================================================
bool UOPData::FileHeader::valid() const {
return ((signature == _uop_identifer) && (version==_uop_version));
}
//=============================================================================
UOPData::FileHeader & UOPData::FileHeader::load(std::ifstream &input){
input.read(reinterpret_cast<char*>(&signature),sizeof(signature));
input.read(reinterpret_cast<char*>(&version),sizeof(version));
input.read(reinterpret_cast<char*>(&stamp),sizeof(stamp));
input.read(reinterpret_cast<char*>(&table_offset),sizeof(table_offset));
input.read(reinterpret_cast<char*>(&tablesize),sizeof(tablesize));
input.read(reinterpret_cast<char*>(&filecount),sizeof(filecount));
input.read(reinterpret_cast<char*>(&unknown0),sizeof(unknown0));
input.read(reinterpret_cast<char*>(&unknown1),sizeof(unknown1));
input.read(reinterpret_cast<char*>(&unknown2),sizeof(unknown2));
return *this;
}
//=============================================================================
UOPData::FileHeader & UOPData::FileHeader::save(std::ofstream &output){
output.write(reinterpret_cast<char*>(&signature),sizeof(signature));
output.write(reinterpret_cast<char*>(&version),sizeof(version));
output.write(reinterpret_cast<char*>(&stamp),sizeof(stamp));
output.write(reinterpret_cast<char*>(&table_offset),sizeof(table_offset));
output.write(reinterpret_cast<char*>(&tablesize),sizeof(tablesize));
output.write(reinterpret_cast<char*>(&filecount),sizeof(filecount));
output.write(reinterpret_cast<char*>(&unknown0),sizeof(unknown0));
output.write(reinterpret_cast<char*>(&unknown1),sizeof(unknown1));
output.write(reinterpret_cast<char*>(&unknown2),sizeof(unknown2));
return *this;
}
//=============================================================================
UOPData::FileHeader & UOPData::FileHeader::pad(std::ofstream &output){
auto loc = output.tellp() ;
auto size = table_offset - loc ;
if (size>0) {
char zero = 0 ;
for (auto i=0;i<size ; i++){
output.write(&zero,1);
}
}
return *this ;
}
/************************************************************************
zlib wrappers for compression
***********************************************************************/
//=============================================================================
std::vector<unsigned char> UOPData::decompress(const std::vector<unsigned char> &source, std::size_t decompressed_size) const{
// uLongf is from zlib.h
auto srcsize = static_cast<uLongf>(source.size()) ;
auto destsize = static_cast<uLongf>(decompressed_size);
std::vector<unsigned char> dest(decompressed_size,0);
auto status = uncompress2(dest.data(), &destsize, source.data(), &srcsize);
if (status != Z_OK){
dest.clear() ;
dest.resize(0) ;
return dest ;
}
dest.resize(destsize);
return dest ;
}
//=============================================================================
std::vector<unsigned char> UOPData::compress(const std::vector<unsigned char> &source) const {
auto size = compressBound(source.size());
std::vector<unsigned char> rdata(size,0);
auto status = compress2(reinterpret_cast<Bytef*>(rdata.data()), &size, reinterpret_cast<const Bytef*>(source.data()), static_cast<uLongf>(source.size()),Z_DEFAULT_COMPRESSION);
if (status != Z_OK){
rdata.clear();
return rdata ;
}
rdata.resize(size) ;
return rdata;
}
/************************************************************************
Hash routines
***********************************************************************/
//=============================================================================
std::uint64_t UOPData::HashLittle2(const std::string& s) const {
std::uint32_t length = static_cast<std::uint32_t>(s.size()) ;
std::uint32_t a ;
std::uint32_t b ;
std::uint32_t c ;
c = 0xDEADBEEF + static_cast<std::uint32_t>(length) ;
a = c;
b = c ;
std::uint32_t k = 0 ;
std::uint32_t l = 0 ;
while (length > 12){
a += (s[k++]);
a += (s[k++] << 8);
a += (s[k++] << 16);
a += (s[k++] << 24);
b += (s[k++]);
b += (s[k++] << 8);
b += (s[k++] << 16);
b += (s[k++] << 24);
c += (s[k++]);
c += (s[k++] << 8);
c += (s[k++] << 16);
c += (s[k++] << 24);
a -= c; a ^= c << 4 | c >> 28; c += b;
b -= a; b ^= a << 6 | a >> 26; a += c;
c -= b; c ^= b << 8 | b >> 24; b += a;
a -= c; a ^= c << 16 | c >> 16; c += b;
b -= a; b ^= a << 19 | a >> 13; a += c;
c -= b; c ^= b << 4 | b >> 28; b += a;
length -= 12 ;
}
// Notice the lack of breaks! we actually want it to fall through
switch (length) {
case 12: {
l = k + 11;
c += (s[l] << 24);
}
case 11: {
l = k + 10;
c += (s[l] << 16);
}
case 10: {
l = k + 9;
c += (s[l] << 8);
}
case 9: {
l = k + 8;
c += (s[l]);
}
case 8: {
l = k + 7;
b += (s[l] << 24);
}
case 7: {
l = k + 6;
b += (s[l] << 16);
}
case 6: {
l = k + 5;
b += (s[l] << 8);
}
case 5: {
l = k + 4;
b += (s[l]);
}
case 4: {
l = k + 3;
a += (s[l] << 24);
}
case 3: {
l = k + 2;
a += (s[l] << 16);
}
case 2: {
l = k + 1;
a += (s[l] << 8);
}
case 1: {
a += (s[k]);
c ^= b; c -= b << 14 | b >> 18;
a ^= c; a -= c << 11 | c >> 21;
b ^= a; b -= a << 25 | a >> 7;
c ^= b; c -= b << 16 | b >> 16;
a ^= c; a -= c << 4 | c >> 28;
b ^= a; b -= a << 14 | a >> 18;
c ^= b; c -= b << 24 | b >> 8;
break;
}
default:
break;
}
return (static_cast<std::uint64_t>(b) << 32) | static_cast<std::uint64_t>(c) ;
}
//=============================================================================
std::uint32_t UOPData::HashAdler32(const std::vector<unsigned char> &data) const {
auto d = reinterpret_cast<const char*>(data.data());
auto length = data.size() ;
return HashAdler32(d, length);
}
//=============================================================================
std::uint32_t UOPData::HashAdler32(const char* d, std::uintmax_t length ) const {
std::uint32_t a = 1 ;
std::uint32_t b = 0 ;
for (std::uintmax_t i = 0 ; i < length; i++){
a = (a + (d[i] % 65521)) ;
b = (b + a) % 65521 ;
}
return (b<<16) | a ;
}
/************************************************************************
Hash string formatting
***********************************************************************/
//=============================================================================
std::string UOPData::format(const std::string& hashformat, std::size_t index){
// How much do we pad? Find the subsutition character
auto pos = hashformat.find_first_of("{") ;
if (pos == std::string::npos){
// we are not subsituting anything, pass on the string
return hashformat ;
}
auto loc = hashformat.find_first_of("}",pos+1) ;
if (loc == std::string::npos){
// we are not subsituting anything, pass on the string
return hashformat ;
}
auto sub = strutil::numtostr(index,10,false,strutil::strtoi(hashformat.substr(pos+1,loc-(pos+1))));
auto rvalue = hashformat;
return rvalue.replace(pos, (loc-pos)+1, sub);
}
//=============================================================================
std::map<std::uint64_t,std::size_t> UOPData::buildIdentifiers(const std::string &hashstring,std::size_t number_entries){
std::map<std::uint64_t,std::size_t> hashes ;
if (hashstring.empty()){
return hashes;
}
for (auto i = 0 ; i < number_entries;i++){
auto formatted = format(hashstring,i);
auto hash = HashLittle2(formatted);
hashes.insert_or_assign(hash, i);
}
return hashes ;
}
//=============================================================================
std::size_t UOPData::retrieveIndex(std::uint64_t identifer, const std::map<std::uint64_t,std::size_t> &lookup1,const std::map<std::uint64_t,std::size_t> &lookup2 ) const {
auto iter = lookup1.find(identifer) ;
if (iter == lookup1.end()){
iter = lookup2.find(identifer);
if (iter == lookup1.end()){
throw std::out_of_range("Identifer "s + strutil::numtostr(identifer,16,true,8)+ " not found");
}
return iter->second;
}
return iter->second ;
}
/*************************** Index gather/writers ******************/
//=============================================================================
std::vector<UOPData::TableEntry> UOPData::gatherEntries(std::ifstream &input, std::uint64_t offset){
std::vector<TableEntry> entries ;
input.seekg(offset,std::ios::beg);
auto entry_count = static_cast<std::uint32_t>(0);
while ((offset != 0 ) && (!input.eof()) && input.good()){
// Read in the number of entries, and next table offset
input.read(reinterpret_cast<char*>(&entry_count),sizeof(entry_count));
input.read(reinterpret_cast<char*>(&offset),sizeof(offset));
for (auto i = 0 ; i< entry_count;i++){
TableEntry entry ;
entry.load(input);
entries.push_back(entry);
}
if (offset != 0){
input.seekg(offset,std::ios::beg);
}
}
return entries ;
}
//=======================================================================
// Writes out table entries (and the table) and returns a vector of offsets for each entry
std::vector<std::uint64_t> UOPData::buildTable(std::uint32_t entrycount,std::ofstream &output){
// Number of entries
std::uint64_t zero = 0 ;
// write out the number of entries for this table
output.write(reinterpret_cast<char*>(&entrycount),4);
// write a place holder for the next table offset
output.write(reinterpret_cast<char*>(&zero),8);
std::vector<std::uint64_t> locations ;
locations.reserve(entrycount);
TableEntry entry ;
// For each entry, save the offset it is written to
while (entrycount>0){
locations.push_back(output.tellp());
entry.save(output) ;
entrycount--;
}
return locations;
}
//=======================================================================
// Builds a table (with the amount of entries) in the output stream. Returns a vector of entry offsets
std::vector<std::uint64_t> UOPData::buildAllTable(std::ofstream &output,std::uint32_t totalentry,std::uint32_t tablecount){
std::vector<std::uint64_t> entry_locations;
entry_locations.reserve(totalentry);
auto entrycount = FileHeader::_table_size ; // Set to the max number of entries
// Modify it on the last table entry
// for just the remaining entries
// This will loop though for each table, and buld a placeholder for
// the entries
for (auto i=0;i<tablecount;i++){
// Save where the next table_offset in the table should go
// It will be 4 bytes past current (past the number of entries)
auto position = output.tellp() ;
position+=4 ;
// If this is our last table entry, figure out the actual
// number of entries
if ((i==(tablecount-1)) && (totalentry != FileHeader::_table_size)){
entrycount = (totalentry%FileHeader::_table_size);
}
// Now, build the table
auto locations = buildTable(entrycount, output);
entry_locations.insert(entry_locations.end(),locations.begin(),locations.end());
// Write the next table offset into table we just did
std::uint64_t current = output.tellp() ;
output.seekp(position,std::ios::beg);
if (i!=(tablecount-1)){
output.write(reinterpret_cast<char*>(¤t),8);
}
else {
std::uint64_t zero = 0 ;
output.write(reinterpret_cast<char*>(&zero),8);
}
output.seekp(current,std::ios::beg);
}
return entry_locations ;
}
/************************* Subclass Overrides **************************/
//=============================================================================
// Special processing for this identifier (if special, return true, else false)
bool UOPData::specialIdentifer(std::uint64_t identifier, std::vector<unsigned char> &data){
return false ;
}
//=============================================================================
// Process the data however it means to the subclass
void UOPData::processData(std::size_t index, std::vector<unsigned char> &data){
}
//=============================================================================
// Identifier lookup failed for the entry. Return true if processing should continue
bool UOPData::identifierFailed(std::uint64_t identifier, std::vector<unsigned char> &data){
return false ;
}
//=============================================================================
// Retrieve the key/index, hashstring, compression flag, and data to be written (count is for the
// subclass to use or not to understand what data is being asked for. There may not be a one for one
// correlation of index to count if one wants to not save empty indexes (idx entries) in the uop file
std::tuple<std::size_t,std::string,bool> UOPData::retrieveInfo(std::size_t count, std::vector<unsigned char> &data){
data.resize(0) ;
return std::make_tuple(0,"nohash"s,false);
}
/************************* Read/Write UOP streams ***********************/
//=============================================================================
bool UOPData::readUOP(std::ifstream &input,std::size_t maxindex, const std::string &hash_format1, const std::string &hash_format2 ) {
if (!input.is_open() ) {
return false ;
}
FileHeader header;
header.load(input) ;
if (!input.good() || input.eof() || !header.valid()){
return false ;
}
auto entries = gatherEntries(input, header.table_offset) ;
if (!input.good() || input.eof() ){
return false ;
}
// Build the identiers
auto identifier_mapping1 = buildIdentifiers(hash_format1, maxindex);
auto identifier_mapping2 = buildIdentifiers(hash_format2, maxindex);
// Process the entries!
for (auto &entry:entries){
input.seekg(entry.offset,std::ios::beg);
auto size = entry.decompressed_length;
if (entry.compression != 0){
size = entry.compressed_length ;
}
auto data = std::vector<unsigned char>(size,0) ;
if (size >0){
input.read(reinterpret_cast<char*>(data.data()),size);
if (entry.compression != 0){
// We need to decompress this
data = decompress(data, entry.decompressed_length);
}
}
// Data read and decompressed
// Time to process it
// First, see if this identifer is special in some way?
if (!specialIdentifer(entry.identifer, data)){
// Wasn't, so get the index for it
try{
auto index = retrieveIndex(entry.identifer, identifier_mapping1, identifier_mapping2);
processData(static_cast<std::uint64_t>(index), data);
}
catch(...){
if (!identifierFailed(entry.identifer, data)){
return false ;
}
}
}
}
return true ;
}
//=============================================================================
bool UOPData::writeUOP(std::ofstream &output, std::uint32_t total_entries){
if (!output.is_open()){
return false ;
}
FileHeader header ;
header.filecount = total_entries ;
header.save(output);
header.pad(output);
if (!output.good()){
return false ;
}
// Now we need to build a table of entries
auto table_count = total_entries/FileHeader::_table_size + ((total_entries%FileHeader::_table_size)!=0?1:0) ;
auto entries = buildAllTable(output, total_entries, table_count);
if (!output.good()){
return false ;
}
// We now have all tables and entries done. We just need to update them
auto count = static_cast<std::size_t>(0) ;
auto data = std::vector<unsigned char>(0,0) ;
auto current = output.tellp() ;
for (auto &offset : entries){
TableEntry entry ;
entry.offset = current ;
const auto &[index,formatstring,compressdata] = retrieveInfo(count,data) ;
auto hashstring = format(formatstring, index) ;
entry.identifer = HashLittle2(hashstring);
entry.compression = (compressdata?1:0) ;
entry.decompressed_length = static_cast<std::uint32_t>(data.size()) ;
entry.compressed_length = static_cast<std::uint32_t>(data.size()) ;
if (compressdata){
data = compress(data);
entry.compressed_length = static_cast<std::uint32_t>(data.size()) ;
}
entry.data_block_hash = HashAdler32(data);
// Write out the data ;
output.write(reinterpret_cast<char*>(data.data()),data.size());
current = output.tellp() ;
output.seekp(offset,std::ios::beg);
entry.save(output);
output.seekp(current,std::ios::beg);
}
return true ;
}
}
#include "UOPData.hpp"
#include "StringUtility.hpp"
#include <stdexcept>
#include <zlib.h>
namespace UO {
/*************************************************************************
TableEntry methods
************************************************************************/
//===============================================================
UOPData::TableEntry::TableEntry(){
offset = 0 ;
header_length = 0 ;
compressed_length = 0 ;
decompressed_length = 0 ;
identifer = 0;
data_block_hash = 0 ;
compression = 0 ;
}
//===============================================================
UOPData::TableEntry & UOPData::TableEntry::load(std::istream &input) {
input.read(reinterpret_cast<char*>(&offset),sizeof(offset));
input.read(reinterpret_cast<char*>(&header_length),sizeof(header_length));
input.read(reinterpret_cast<char*>(&compressed_length),sizeof(compressed_length));
input.read(reinterpret_cast<char*>(&decompressed_length),sizeof(decompressed_length));
input.read(reinterpret_cast<char*>(&identifer),sizeof(identifer));
input.read(reinterpret_cast<char*>(&data_block_hash),sizeof(data_block_hash));
input.read(reinterpret_cast<char*>(&compression),sizeof(compression));
return *this ;
}
//===============================================================
UOPData::TableEntry & UOPData::TableEntry::save(std::ostream &output) {
output.write(reinterpret_cast<char*>(&offset),sizeof(offset));
output.write(reinterpret_cast<char*>(&header_length),sizeof(header_length));
output.write(reinterpret_cast<char*>(&compressed_length),sizeof(compressed_length));
output.write(reinterpret_cast<char*>(&decompressed_length),sizeof(decompressed_length));
output.write(reinterpret_cast<char*>(&identifer),sizeof(identifer));
output.write(reinterpret_cast<char*>(&data_block_hash),sizeof(data_block_hash));
output.write(reinterpret_cast<char*>(&compression),sizeof(compression));
return *this ;
}
/*************************************************************************
FileHeader methods
************************************************************************/
//=============================================================================
UOPData::FileHeader::FileHeader() {
signature = _uop_identifer;
version = _uop_version ;
stamp = _uop_stamp ;
table_offset = _table_start;
tablesize = _table_size ;
filecount = 1;
unknown0 = 1;
unknown1 = 1;
unknown2 = 0 ;
}
//=============================================================================
bool UOPData::FileHeader::valid() const {
return ((signature == _uop_identifer) && (version==_uop_version));
}
//=============================================================================
UOPData::FileHeader & UOPData::FileHeader::load(std::ifstream &input){
input.read(reinterpret_cast<char*>(&signature),sizeof(signature));
input.read(reinterpret_cast<char*>(&version),sizeof(version));
input.read(reinterpret_cast<char*>(&stamp),sizeof(stamp));
input.read(reinterpret_cast<char*>(&table_offset),sizeof(table_offset));
input.read(reinterpret_cast<char*>(&tablesize),sizeof(tablesize));
input.read(reinterpret_cast<char*>(&filecount),sizeof(filecount));
input.read(reinterpret_cast<char*>(&unknown0),sizeof(unknown0));
input.read(reinterpret_cast<char*>(&unknown1),sizeof(unknown1));
input.read(reinterpret_cast<char*>(&unknown2),sizeof(unknown2));
return *this;
}
//=============================================================================
UOPData::FileHeader & UOPData::FileHeader::save(std::ofstream &output){
output.write(reinterpret_cast<char*>(&signature),sizeof(signature));
output.write(reinterpret_cast<char*>(&version),sizeof(version));
output.write(reinterpret_cast<char*>(&stamp),sizeof(stamp));
output.write(reinterpret_cast<char*>(&table_offset),sizeof(table_offset));
output.write(reinterpret_cast<char*>(&tablesize),sizeof(tablesize));
output.write(reinterpret_cast<char*>(&filecount),sizeof(filecount));
output.write(reinterpret_cast<char*>(&unknown0),sizeof(unknown0));
output.write(reinterpret_cast<char*>(&unknown1),sizeof(unknown1));
output.write(reinterpret_cast<char*>(&unknown2),sizeof(unknown2));
return *this;
}
//=============================================================================
UOPData::FileHeader & UOPData::FileHeader::pad(std::ofstream &output){
auto loc = output.tellp() ;
auto size = table_offset - loc ;
if (size>0) {
char zero = 0 ;
for (auto i=0;i<size ; i++){
output.write(&zero,1);
}
}
return *this ;
}
/************************************************************************
zlib wrappers for compression
***********************************************************************/
//=============================================================================
std::vector<unsigned char> UOPData::decompress(const std::vector<unsigned char> &source, std::size_t decompressed_size) const{
// uLongf is from zlib.h
auto srcsize = static_cast<uLongf>(source.size()) ;
auto destsize = static_cast<uLongf>(decompressed_size);
std::vector<unsigned char> dest(decompressed_size,0);
auto status = uncompress2(dest.data(), &destsize, source.data(), &srcsize);
if (status != Z_OK){
dest.clear() ;
dest.resize(0) ;
return dest ;
}
dest.resize(destsize);
return dest ;
}
//=============================================================================
std::vector<unsigned char> UOPData::compress(const std::vector<unsigned char> &source) const {
auto size = compressBound(source.size());
std::vector<unsigned char> rdata(size,0);
auto status = compress2(reinterpret_cast<Bytef*>(rdata.data()), &size, reinterpret_cast<const Bytef*>(source.data()), static_cast<uLongf>(source.size()),Z_DEFAULT_COMPRESSION);
if (status != Z_OK){
rdata.clear();
return rdata ;
}
rdata.resize(size) ;
return rdata;
}
/************************************************************************
Hash routines
***********************************************************************/
//=============================================================================
std::uint64_t UOPData::HashLittle2(const std::string& s) const {
std::uint32_t length = static_cast<std::uint32_t>(s.size()) ;
std::uint32_t a ;
std::uint32_t b ;
std::uint32_t c ;
c = 0xDEADBEEF + static_cast<std::uint32_t>(length) ;
a = c;
b = c ;
std::uint32_t k = 0 ;
std::uint32_t l = 0 ;
while (length > 12){
a += (s[k++]);
a += (s[k++] << 8);
a += (s[k++] << 16);
a += (s[k++] << 24);
b += (s[k++]);
b += (s[k++] << 8);
b += (s[k++] << 16);
b += (s[k++] << 24);
c += (s[k++]);
c += (s[k++] << 8);
c += (s[k++] << 16);
c += (s[k++] << 24);
a -= c; a ^= c << 4 | c >> 28; c += b;
b -= a; b ^= a << 6 | a >> 26; a += c;
c -= b; c ^= b << 8 | b >> 24; b += a;
a -= c; a ^= c << 16 | c >> 16; c += b;
b -= a; b ^= a << 19 | a >> 13; a += c;
c -= b; c ^= b << 4 | b >> 28; b += a;
length -= 12 ;
}
// Notice the lack of breaks! we actually want it to fall through
switch (length) {
case 12: {
l = k + 11;
c += (s[l] << 24);
}
case 11: {
l = k + 10;
c += (s[l] << 16);
}
case 10: {
l = k + 9;
c += (s[l] << 8);
}
case 9: {
l = k + 8;
c += (s[l]);
}
case 8: {
l = k + 7;
b += (s[l] << 24);
}
case 7: {
l = k + 6;
b += (s[l] << 16);
}
case 6: {
l = k + 5;
b += (s[l] << 8);
}
case 5: {
l = k + 4;
b += (s[l]);
}
case 4: {
l = k + 3;
a += (s[l] << 24);
}
case 3: {
l = k + 2;
a += (s[l] << 16);
}
case 2: {
l = k + 1;
a += (s[l] << 8);
}
case 1: {
a += (s[k]);
c ^= b; c -= b << 14 | b >> 18;
a ^= c; a -= c << 11 | c >> 21;
b ^= a; b -= a << 25 | a >> 7;
c ^= b; c -= b << 16 | b >> 16;
a ^= c; a -= c << 4 | c >> 28;
b ^= a; b -= a << 14 | a >> 18;
c ^= b; c -= b << 24 | b >> 8;
break;
}
default:
break;
}
return (static_cast<std::uint64_t>(b) << 32) | static_cast<std::uint64_t>(c) ;
}
//=============================================================================
std::uint32_t UOPData::HashAdler32(const std::vector<unsigned char> &data) const {
auto d = reinterpret_cast<const char*>(data.data());
auto length = data.size() ;
return HashAdler32(d, length);
}
//=============================================================================
std::uint32_t UOPData::HashAdler32(const char* d, std::uintmax_t length ) const {
std::uint32_t a = 1 ;
std::uint32_t b = 0 ;
for (std::uintmax_t i = 0 ; i < length; i++){
a = (a + (d[i] % 65521)) ;
b = (b + a) % 65521 ;
}
return (b<<16) | a ;
}
/************************************************************************
Hash string formatting
***********************************************************************/
//=============================================================================
std::string UOPData::format(const std::string& hashformat, std::size_t index){
// How much do we pad? Find the subsutition character
auto pos = hashformat.find_first_of("{") ;
if (pos == std::string::npos){
// we are not subsituting anything, pass on the string
return hashformat ;
}
auto loc = hashformat.find_first_of("}",pos+1) ;
if (loc == std::string::npos){
// we are not subsituting anything, pass on the string
return hashformat ;
}
auto sub = strutil::numtostr(index,10,false,strutil::strtoi(hashformat.substr(pos+1,loc-(pos+1))));
auto rvalue = hashformat;
return rvalue.replace(pos, (loc-pos)+1, sub);
}
//=============================================================================
std::map<std::uint64_t,std::size_t> UOPData::buildIdentifiers(const std::string &hashstring,std::size_t number_entries){
std::map<std::uint64_t,std::size_t> hashes ;
if (hashstring.empty()){
return hashes;
}
for (auto i = 0 ; i < number_entries;i++){
auto formatted = format(hashstring,i);
auto hash = HashLittle2(formatted);
hashes.insert_or_assign(hash, i);
}
return hashes ;
}
//=============================================================================
std::size_t UOPData::retrieveIndex(std::uint64_t identifer, const std::map<std::uint64_t,std::size_t> &lookup1,const std::map<std::uint64_t,std::size_t> &lookup2 ) const {
auto iter = lookup1.find(identifer) ;
if (iter == lookup1.end()){
iter = lookup2.find(identifer);
if (iter == lookup1.end()){
throw std::out_of_range("Identifer "s + strutil::numtostr(identifer,16,true,8)+ " not found");
}
return iter->second;
}
return iter->second ;
}
/*************************** Index gather/writers ******************/
//=============================================================================
std::vector<UOPData::TableEntry> UOPData::gatherEntries(std::ifstream &input, std::uint64_t offset){
std::vector<TableEntry> entries ;
input.seekg(offset,std::ios::beg);
auto entry_count = static_cast<std::uint32_t>(0);
while ((offset != 0 ) && (!input.eof()) && input.good()){
// Read in the number of entries, and next table offset
input.read(reinterpret_cast<char*>(&entry_count),sizeof(entry_count));
input.read(reinterpret_cast<char*>(&offset),sizeof(offset));
for (auto i = 0 ; i< entry_count;i++){
TableEntry entry ;
entry.load(input);
entries.push_back(entry);
}
if (offset != 0){
input.seekg(offset,std::ios::beg);
}
}
return entries ;
}
//=======================================================================
// Writes out table entries (and the table) and returns a vector of offsets for each entry
std::vector<std::uint64_t> UOPData::buildTable(std::uint32_t entrycount,std::ofstream &output){
// Number of entries
std::uint64_t zero = 0 ;
// write out the number of entries for this table
output.write(reinterpret_cast<char*>(&entrycount),4);
// write a place holder for the next table offset
output.write(reinterpret_cast<char*>(&zero),8);
std::vector<std::uint64_t> locations ;
locations.reserve(entrycount);
TableEntry entry ;
// For each entry, save the offset it is written to
while (entrycount>0){
locations.push_back(output.tellp());
entry.save(output) ;
entrycount--;
}
return locations;
}
//=======================================================================
// Builds a table (with the amount of entries) in the output stream. Returns a vector of entry offsets
std::vector<std::uint64_t> UOPData::buildAllTable(std::ofstream &output,std::uint32_t totalentry,std::uint32_t tablecount){
std::vector<std::uint64_t> entry_locations;
entry_locations.reserve(totalentry);
auto entrycount = FileHeader::_table_size ; // Set to the max number of entries
// Modify it on the last table entry
// for just the remaining entries
// This will loop though for each table, and buld a placeholder for
// the entries
for (auto i=0;i<tablecount;i++){
// Save where the next table_offset in the table should go
// It will be 4 bytes past current (past the number of entries)
auto position = output.tellp() ;
position+=4 ;
// If this is our last table entry, figure out the actual
// number of entries
if ((i==(tablecount-1)) && (totalentry != FileHeader::_table_size)){
entrycount = (totalentry%FileHeader::_table_size);
}
// Now, build the table
auto locations = buildTable(entrycount, output);
entry_locations.insert(entry_locations.end(),locations.begin(),locations.end());
// Write the next table offset into table we just did
std::uint64_t current = output.tellp() ;
output.seekp(position,std::ios::beg);
if (i!=(tablecount-1)){
output.write(reinterpret_cast<char*>(¤t),8);
}
else {
std::uint64_t zero = 0 ;
output.write(reinterpret_cast<char*>(&zero),8);
}
output.seekp(current,std::ios::beg);
}
return entry_locations ;
}
/************************* Subclass Overrides **************************/
//=============================================================================
// Special processing for this identifier (if special, return true, else false)
bool UOPData::specialIdentifer(std::uint64_t identifier, std::vector<unsigned char> &data){
return false ;
}
//=============================================================================
// Process the data however it means to the subclass
void UOPData::processData(std::size_t index, std::vector<unsigned char> &data){
}
//=============================================================================
// Identifier lookup failed for the entry. Return true if processing should continue
bool UOPData::identifierFailed(std::uint64_t identifier, std::vector<unsigned char> &data){
return false ;
}
//=============================================================================
// Retrieve the key/index, hashstring, compression flag, and data to be written (count is for the
// subclass to use or not to understand what data is being asked for. There may not be a one for one
// correlation of index to count if one wants to not save empty indexes (idx entries) in the uop file
std::tuple<std::size_t,std::string,bool> UOPData::retrieveInfo(std::size_t count, std::vector<unsigned char> &data){
data.resize(0) ;
return std::make_tuple(0,"nohash"s,false);
}
/************************* Read/Write UOP streams ***********************/
//=============================================================================
bool UOPData::readUOP(std::ifstream &input,std::size_t maxindex, const std::string &hash_format1, const std::string &hash_format2 ) {
if (!input.is_open() ) {
return false ;
}
FileHeader header;
header.load(input) ;
if (!input.good() || input.eof() || !header.valid()){
return false ;
}
auto entries = gatherEntries(input, header.table_offset) ;
if (!input.good() || input.eof() ){
return false ;
}
// Build the identiers
auto identifier_mapping1 = buildIdentifiers(hash_format1, maxindex);
auto identifier_mapping2 = buildIdentifiers(hash_format2, maxindex);
// Process the entries!
for (auto &entry:entries){
input.seekg(entry.offset,std::ios::beg);
auto size = entry.decompressed_length;
if (entry.compression != 0){
size = entry.compressed_length ;
}
auto data = std::vector<unsigned char>(size,0) ;
if (size >0){
input.read(reinterpret_cast<char*>(data.data()),size);
if (entry.compression != 0){
// We need to decompress this
data = decompress(data, entry.decompressed_length);
}
}
// Data read and decompressed
// Time to process it
// First, see if this identifer is special in some way?
if (!specialIdentifer(entry.identifer, data)){
// Wasn't, so get the index for it
try{
auto index = retrieveIndex(entry.identifer, identifier_mapping1, identifier_mapping2);
processData(static_cast<std::uint64_t>(index), data);
}
catch(...){
if (!identifierFailed(entry.identifer, data)){
return false ;
}
}
}
}
return true ;
}
//=============================================================================
bool UOPData::writeUOP(std::ofstream &output, std::uint32_t total_entries){
if (!output.is_open()){
return false ;
}
FileHeader header ;
header.filecount = total_entries ;
header.save(output);
header.pad(output);
if (!output.good()){
return false ;
}
// Now we need to build a table of entries
auto table_count = total_entries/FileHeader::_table_size + ((total_entries%FileHeader::_table_size)!=0?1:0) ;
auto entries = buildAllTable(output, total_entries, table_count);
if (!output.good()){
return false ;
}
// We now have all tables and entries done. We just need to update them
auto count = static_cast<std::size_t>(0) ;
auto data = std::vector<unsigned char>(0,0) ;
auto current = output.tellp() ;
for (auto &offset : entries){
TableEntry entry ;
entry.offset = current ;
const auto &[index,formatstring,compressdata] = retrieveInfo(count,data) ;
auto hashstring = format(formatstring, index) ;
entry.identifer = HashLittle2(hashstring);
entry.compression = (compressdata?1:0) ;
entry.decompressed_length = static_cast<std::uint32_t>(data.size()) ;
entry.compressed_length = static_cast<std::uint32_t>(data.size()) ;
if (compressdata){
data = compress(data);
entry.compressed_length = static_cast<std::uint32_t>(data.size()) ;
}
entry.data_block_hash = HashAdler32(data);
// Write out the data ;
output.write(reinterpret_cast<char*>(data.data()),data.size());
current = output.tellp() ;
output.seekp(offset,std::ios::beg);
entry.save(output);
output.seekp(current,std::ios::beg);
}
return true ;
}
}