THG Home 

HDF compression and chunking

The HDF interfaces that support compression and/or chunking are in the following table

Interface Compression Chunking
SD - Multifile Scientific Data yes yes
GR - Multifile General Raster Image yes yes
DFR8 - Single-file 8-Bit Raster Image yes no
DF24 - Single-file 24-Bit Raster Image yes no

SD - Multifile Scientific Datasets

Compression

In the SDS interface, compression is done with the SDsetcompress routine. The syntax of the routine SDsetcompress is as follows:

status = SDsetcompress(sds_id, comp_type, &c_info);

The parameter comp_type specifies the compression type definition. Compression information is specified by the parameter c_info. The following table summarizes the available options

comp_type algorithm c_info
COMP_CODE_RLE Run-length encoding not used
COMP_CODE_SKPHUFF Adaptive Huffman  the structure skphuff in the union comp_info must be provided with the size, in bytes, of the data elements
COMP_CODE_DEFLATE GZIP "deflation" (Lempel/Ziv-77 dictionary coder) the deflate structure in the union comp_info must be provided with the information about the compression effort

SDsetcompress writes the compressed data, in its entirety, to the data set. The data set is built in-core then written in a single write operation.

Chunking

The SDsetchunk function is called to make a SDS a chunked SDS. There are two restrictions that apply to chunked SDSs. The maximum number of chunks in a single HDF file is 65,535 and a chunked SDS cannot contain an unlimited dimension. SDsetchunk sets the chunk size and the compression method for a data set. The syntax of SDsetchunk is as follows:

status = SDsetchunk(sds_id, c_def, c_flag);

The chunking information is provided in the parameters c_def and c_flag. The parameter flag specifies the type of the data set, i.e., if the data set is chunked or chunked and compressed. The following table summarizes the available options

c_flag   c_def
HDF_CHUNK chunked data set the elements of the array chunk_lengths in the union c_def (c_def.chunk_lengths[]) have to be initialized to the chunk dimension sizes
HDF_CHUNK | HDF_COMP chunked data set compressed with RLE, Skipping Huffman, and GZIP compression the elements of the array chunk_lengths of the structure comp in the union c_def (c_def.comp.chunk_lengths[]) have to be initialized to the chunk dimension sizes
HDF_CHUNK | HDF_NBIT chunked NBIT-compressed data set the elements of the array chunk_lengths of the structure nbit in the union c_def (c_def.nbit.chunk_lengths[]) have to be initialized to the chunk dimension sizes

GR - Multifile General Raster Images

Compression

GR Images are compressed using the routine GRsetcompress. The syntax of the routine GRsetcompress is as follows:

status = GRsetcompress(ri_id, comp_type, c_info); 

The compression method is specified by the parameter comp_type. The parameter c_info has type comp_info and contains algorithm-specific information for the library compression routines. The following table summarizes the available options

comp_type   c_info
COMP_CODE_NONE no compression not used
COMP_CODE_RLE  RLE run-length encoding not used
COMP_CODE_SKPHUFF  Skipping Huffman compression the skipping size for the Skipping Huffman algorithm is specified in the field c_info.skphuff.skp_size
COMP_CODE_DEFLATE  GZIP compression the deflate level for the GZIP algorithm is specified in the field c_info.deflate.level
COMP_CODE_JPEG  JPEG compression not used

Chunking

The GR interface also supports chunking in a manner similar to that of the SD interface. There is one restriction on a raster image: it must be created with MFGR_INTERLACE_PIXEL in the call to GRcreate. See why. The function GRsetchunk makes the raster image, identified by the parameter ri_id, a chunked raster image according to the provided chunking and compression information. The syntax of GRsetchunk is as follows:

status = GRsetchunk(ri_id, c_def, c_flags); 

The parameters c_def and c_flags provide the chunking and compression information and are discussed below

c_flags     c_def
HDF_CHUNK chunked and uncompressed data   the chunk dimensions must be specified in the field c_def.chunk_lengths[]
HDF_CHUNK | HDF_COMP chunked data set compressed with RLE, Skipping Huffman, and GZIP compression   the chunk dimensions must be specified in the field c_def.comp.chunk_lengths[] and the compression type in the field c_def.comp.comp_type. Valid values of compression type values are:
  COMP_CODE_NONE uncompressed data  
  COMP_CODE_RLE data compressed using the RLE compression algorithm  
  COMP_CODE_SKPHUFF data compressed using the Skipping Huffman compression algorithm the skipping size is specified in the field c_def.comp.cinfo.skphuff.skp_size
  COMP_CODE_DEFLATE data compressed using the GZIP compression algorithm the deflate level is specified in the field c_def.comp.cinfo.deflate.level. Valid deflate level values are integers from 1 to 9 inclusive

DFR8 - Single-file 8-Bit Raster Images

The compression type is determined by the tag passed as the fifth argument in calls to the DFR8putimage and DFR8addimage routines. DFR8setcompress provides a method for compressing the next raster image written. 

intn DFR8addimage(char *filename, VOIDP image, int32 width, int32 height, uint16 compress);
intn DFR8setcompress(int32 type, comp_info *cinfo);

The compress options are

COMP_NONE not compressed
COMP_JPEG compresses images with a JPEG algorithm, which is a lossy method
COMP_RLE COMP_RLE uses lossless run-length encoding to store the image
COMP_IMCOMP uses a lossy compression algorithm called IMCOMP, and is included for backward compatibility only. If IMCOMP compression is used, the image must include a palette.

The comp_info union contains algorithm-specific information for the library routines that perform the compression. It is only used by the COMP_JPEG compression type. A pointer to a valid comp_info union is required for all compression types other than COMP_JPEG, but the values in the union are not used.

DF24 - Single-file 24-Bit Raster Image Interface

To store a 24-bit raster image using compression, the calling program must contain the following function calls

intn DF24setcompress(int32 type, comp_info *cinfo);
intn DF24addimage(char *filename, VOIDP image, int32 width, int32 height);

The compress options are the same as for the DFR8 case. 

Compression and chunking

Compression done with the SDsetcompress (or GRsetcompress) and chunking done with SDsetchunk (or GRsetchunk) are mutually exclusive. This is, in a application it is not possible to chunk a dataset and compress it with SDsetcompress. The following table illustrates the valid and invalid options.

API call behaviour
SDsetcompress compresses, not possible to chunk
SDsetchunk chunks, not possible to compress with SDsetcompress. to compress and chunk use the flags HDF_CHUNK | HDF_COMP  on this call

 

 



Last modified: March 19, 2007
Describes HDF compression/chunking.