API Reference

class infinistore.ClientConfig(*args: Any, **kwargs: Any)

ClientConfig is a configuration class for the Infinistore client.

connection_type

The type of connection to use (e.g., TYPE_LOCAL_GPU, TYPE_RDMA).

Type:

str

host_addr

The address of the host.

Type:

str

dev_name

The name of the device (default is “mlx5_1”).

Type:

str

ib_port

The port number of the InfiniBand device (default is 1).

Type:

int

The type of link (default is “IB”).

Type:

str

service_port

The port number of the service.

Type:

int

log_level

The logging level (default is “warning”).

Type:

str

class infinistore.DisableTorchCaching

Context manager to disable PyTorch CUDA memory caching.

When this context manager is entered, it sets the environment variable “PYTORCH_NO_CUDA_MEMORY_CACHING” to “1”, which disables CUDA memory caching in PyTorch. When the context manager is exited, the environment variable is deleted, restoring the default behavior.

Usage:
with DisableTorchCaching():

# Your code here

class infinistore.InfinityConnection(config: ClientConfig)

A class to manage connections and data transfers with an Infinistore instance using either local or RDMA connections.

conn

The connection object to the Infinistore instance.

Type:

_infinistore.Connection

local_connected

Indicates if connected to a local instance.

Type:

bool

rdma_connected

Indicates if connected to a remote instance via RDMA.

Type:

bool

config

Configuration object for the connection.

Type:

ClientConfig

allocate_rdma(keys: List[str], page_size_in_bytes: int)

Allocates RDMA memory for the given keys. For RDMA writes, user must first allocate RDMA memory. and then use the allocated RDMA memory address to write data to the remote memory.

Parameters:
  • keys (List[str]) – A list of keys for which RDMA memory is to be allocated.

  • page_size_in_bytes (int) – The size of each page in bytes.

Returns:

A list of allocated RDMA memory addresses.

Return type:

List

Raises:
  • Exception – If RDMA is not connected.

  • Exception – If memory allocation fails.

check_exist(key: str)

Check if a given key exists in the store.

Parameters:

key (str) – The key to check for existence.

Returns:

True if the key exists, False otherwise.

Return type:

bool

Raises:

Exception – If there is an error checking the key’s existence.

connect()

Establishes a connection to the Infinistore instance based on the configuration.

Raises:
  • Exception – If already connected to a local instance.

  • Exception – If already connected to a remote instance.

  • Exception – If failed to initialize remote connection.

  • Exception – If local GPU connection is not to localhost.

  • Exception – If failed to setup RDMA connection.

get_match_last_index(keys: List[str])

Retrieve the last index of a match for the given keys.

Parameters:

keys (List[str]) – A list of string keys to search for matches.

Returns:

The last index of a match.

Return type:

int

Raises:

Exception – If no match is found (i.e., if the return value is negative).

local_gpu_write_cache(cache: torch.Tensor, blocks: List[Tuple[str, int]], page_size: int)

Writes a tensor to the local GPU cache. :param cache: The tensor to be written to the cache. :type cache: torch.Tensor :param blocks: A list of tuples where each tuple contains a key and an offset. :type blocks: List[Tuple[str, int]] :param page_size: The size of each page in the cache. :type page_size: int

Raises:

Exception – If writing to infinistore fails.

Returns:

Returns 0 on success.

Return type:

int

rdma_write_cache(cache: torch.Tensor, offsets: List[int], page_size, remote_blocks: List)

Writes the given cache tensor to remote memory using RDMA (Remote Direct Memory Access).

Parameters:
  • cache (torch.Tensor) – The tensor containing the data to be written to remote memory.

  • offsets (List[int]) – A list of offsets (in elements) where the data should be written.

  • page_size (int) – The size of each page to be written, in elements.

  • remote_blocks (List) – A list of remote memory blocks where the data should be written.

Raises:
  • AssertionError – If RDMA is not connected.

  • Exception – If the RDMA write operation fails.

Returns:

Returns 0 on success.

Return type:

int

read_cache(cache: torch.Tensor, blocks: List[Tuple[str, int]], page_size: int)

Reads data from the cache using either local or RDMA connection.

Parameters:
  • cache (torch.Tensor) – The tensor containing the cache data.

  • blocks (List[Tuple[str, int]]) – A list of tuples where each tuple contains a key and an offset.

  • parameter. (each pair represents a page to be written to. The page is fixed size and is specified by the page_size) –

  • page_size (int) – The size of the page to read.

Raises:

Exception – If the read operation fails or if not connected to any instance.

register_mr(cache: torch.Tensor)

Registers a memory region for RDMA (Remote Direct Memory Access) operations.

Parameters:

cache (torch.Tensor) – The tensor whose memory region is to be registered.

Returns:

A positive integer indicating the registration was successful.

Return type:

int

Raises:

Exception – If RDMA is not connected or if the memory region registration fails.

sync()

Synchronizes the current instance with the connected infinistore instance. This method attempts to synchronize the current instance using either a local connection or an RDMA connection. If neither connection is available, it raises an exception. :raises Exception: If not connected to any instance. :raises Exception: If synchronization fails with a negative return code.

class infinistore.ServerConfig(*args: Any, **kwargs: Any)
class ServerConfig

ServerConfig is a configuration class for the server settings.

manage_port

The port used for management. Defaults to 0.

Type:

int

service_port

The port used for service. Defaults to 0.

Type:

int

log_level

The logging level. Defaults to “warning”.

Type:

str

dev_name

The device name. Defaults to “mlx5_1”.

Type:

str

ib_port

The InfiniBand port number. Defaults to 1.

Type:

int

The type of link. Defaults to “IB”.

Type:

str

prealloc_size

The preallocation size. Defaults to 16.

Type:

int

minimal_allocate_size

The minimal allocation size. Defaults to 64.

Type:

int

num_stream

The number of streams. Defaults to 1.

Type:

int

infinistore.register_server(loop, config: ServerConfig)

Registers a server with the given event loop and configuration.

This function is intended to be used internally and should not be called by clients directly.

Parameters:
  • loop – The event loop to register the server with.

  • config (ServerConfig) – The configuration for the server.

Raises:

Exception – If the server registration fails.