Skip to content

Rechunker

The Rechunker class provides on-the-fly rechunking without modifying the stored data. Access it via variable.rechunker().

cfdb.support_classes.Rechunker

guess_chunk_shape(target_chunk_size)

Guess an appropriate chunk layout for a dataset, given its shape and the size of each element in bytes. Will allocate chunks only as large as target_chunk_size. Chunks will be assigned to the highest composite number within the target_chunk_size. Using composite numbers will benefit the rehunking process as there is a very high likelihood that the least common multiple of two composite numbers will be significantly lower than the product of those two numbers.

Parameters:

Name Type Description Default
target_chunk_size int

The maximum size per chunk in bytes.

required

Returns:

Type Description
tuple of ints

shape of the chunk

rechunk(target_chunk_shape, max_mem=2 ** 27)

This method takes a target chunk_shape and max memory size and returns a generator that converts to the new target chunk shape. It optimises the rechunking by using an in-memory numpy ndarray with a size defined by the max_mem.

Parameters:

Name Type Description Default
target_chunk_shape

The chunk_shape of the target.

required
max_mem int

The max allocated memory to perform the chunking operation in bytes. This will only be as large as necessary for an optimum size chunk for the rechunking.

2 ** 27

Returns:

Type Description
Generator

tuple of the target slices to the np.ndarray of data

calc_n_chunks()

Calculate the total number of chunks in the existing variable.

calc_n_reads_rechunker(target_chunk_shape, max_mem=2 ** 27)

Calculate the total number of reads and writes using the rechunker.

Parameters:

Name Type Description Default
target_chunk_shape Tuple[int, ...]

The chunk_shape of the target.

required
max_mem int

The max allocated memory to perform the chunking operation in bytes. This will only be as large as necessary for an optimum size chunk for the rechunking.

2 ** 27

Returns:

Type Description
tuple

of n_reads, n_writes

calc_ideal_read_chunk_shape(target_chunk_shape)

Calculates the minimum ideal read chunk shape between a source and target.

calc_ideal_read_chunk_mem(target_chunk_shape)

Calculates the minimum ideal read chunk memory between a source and target.

calc_source_read_chunk_shape(target_chunk_shape, max_mem)

Calculates the optimum read chunk shape given a maximum amount of available memory.

Parameters:

Name Type Description Default
target_chunk_shape Tuple[int, ...]

The target chunk shape

required
max_mem int

The max allocated memory to perform the chunking operation in bytes.

required

Returns:

Type Description
optimal chunk shape: tuple of ints