API Reference¶
- class infinistore.ClientConfig(*args: Any, **kwargs: Any)¶
ClientConfig is a configuration class for the Infinistore client.
- connection_type¶
The type of connection to use (e.g., TYPE_LOCAL_GPU, TYPE_RDMA).
- Type:
str
- host_addr¶
The address of the host.
- Type:
str
- dev_name¶
The name of the device (default is “mlx5_1”).
- Type:
str
- ib_port¶
The port number of the InfiniBand device (default is 1).
- Type:
int
- link_type¶
The type of link (default is “IB”).
- Type:
str
- service_port¶
The port number of the service.
- Type:
int
- log_level¶
The logging level (default is “warning”).
- Type:
str
- class infinistore.DisableTorchCaching¶
Context manager to disable PyTorch CUDA memory caching.
When this context manager is entered, it sets the environment variable “PYTORCH_NO_CUDA_MEMORY_CACHING” to “1”, which disables CUDA memory caching in PyTorch. When the context manager is exited, the environment variable is deleted, restoring the default behavior.
- Usage:
- with DisableTorchCaching():
# Your code here
- class infinistore.InfinityConnection(config: ClientConfig)¶
A class to manage connections and data transfers with an Infinistore instance using either local or RDMA connections.
- conn¶
The connection object to the Infinistore instance.
- Type:
_infinistore.Connection
- local_connected¶
Indicates if connected to a local instance.
- Type:
bool
- rdma_connected¶
Indicates if connected to a remote instance via RDMA.
- Type:
bool
- config¶
Configuration object for the connection.
- Type:
- allocate_rdma(keys: List[str], page_size_in_bytes: int)¶
Allocates RDMA memory for the given keys. For RDMA writes, user must first allocate RDMA memory. and then use the allocated RDMA memory address to write data to the remote memory.
- Parameters:
keys (List[str]) – A list of keys for which RDMA memory is to be allocated.
page_size_in_bytes (int) – The size of each page in bytes.
- Returns:
A list of allocated RDMA memory addresses.
- Return type:
List
- Raises:
Exception – If RDMA is not connected.
Exception – If memory allocation fails.
- check_exist(key: str)¶
Check if a given key exists in the store.
- Parameters:
key (str) – The key to check for existence.
- Returns:
True if the key exists, False otherwise.
- Return type:
bool
- Raises:
Exception – If there is an error checking the key’s existence.
- connect()¶
Establishes a connection to the Infinistore instance based on the configuration.
- Raises:
Exception – If already connected to a local instance.
Exception – If already connected to a remote instance.
Exception – If failed to initialize remote connection.
Exception – If local GPU connection is not to localhost.
Exception – If failed to setup RDMA connection.
- get_match_last_index(keys: List[str])¶
Retrieve the last index of a match for the given keys.
- Parameters:
keys (List[str]) – A list of string keys to search for matches.
- Returns:
The last index of a match.
- Return type:
int
- Raises:
Exception – If no match is found (i.e., if the return value is negative).
- local_gpu_write_cache(cache: torch.Tensor, blocks: List[Tuple[str, int]], page_size: int)¶
Writes a tensor to the local GPU cache. :param cache: The tensor to be written to the cache. :type cache: torch.Tensor :param blocks: A list of tuples where each tuple contains a key and an offset. :type blocks: List[Tuple[str, int]] :param page_size: The size of each page in the cache. :type page_size: int
- Raises:
Exception – If writing to infinistore fails.
- Returns:
Returns 0 on success.
- Return type:
int
- rdma_write_cache(cache: torch.Tensor, offsets: List[int], page_size, remote_blocks: List)¶
Writes the given cache tensor to remote memory using RDMA (Remote Direct Memory Access).
- Parameters:
cache (torch.Tensor) – The tensor containing the data to be written to remote memory.
offsets (List[int]) – A list of offsets (in elements) where the data should be written.
page_size (int) – The size of each page to be written, in elements.
remote_blocks (List) – A list of remote memory blocks where the data should be written.
- Raises:
AssertionError – If RDMA is not connected.
Exception – If the RDMA write operation fails.
- Returns:
Returns 0 on success.
- Return type:
int
- read_cache(cache: torch.Tensor, blocks: List[Tuple[str, int]], page_size: int)¶
Reads data from the cache using either local or RDMA connection.
- Parameters:
cache (torch.Tensor) – The tensor containing the cache data.
blocks (List[Tuple[str, int]]) – A list of tuples where each tuple contains a key and an offset.
parameter. (each pair represents a page to be written to. The page is fixed size and is specified by the page_size) –
page_size (int) – The size of the page to read.
- Raises:
Exception – If the read operation fails or if not connected to any instance.
- register_mr(cache: torch.Tensor)¶
Registers a memory region for RDMA (Remote Direct Memory Access) operations.
- Parameters:
cache (torch.Tensor) – The tensor whose memory region is to be registered.
- Returns:
A positive integer indicating the registration was successful.
- Return type:
int
- Raises:
Exception – If RDMA is not connected or if the memory region registration fails.
- sync()¶
Synchronizes the current instance with the connected infinistore instance. This method attempts to synchronize the current instance using either a local connection or an RDMA connection. If neither connection is available, it raises an exception. :raises Exception: If not connected to any instance. :raises Exception: If synchronization fails with a negative return code.
- class infinistore.ServerConfig(*args: Any, **kwargs: Any)¶
- class ServerConfig¶
ServerConfig is a configuration class for the server settings.
- manage_port¶
The port used for management. Defaults to 0.
- Type:
int
- service_port¶
The port used for service. Defaults to 0.
- Type:
int
- log_level¶
The logging level. Defaults to “warning”.
- Type:
str
- dev_name¶
The device name. Defaults to “mlx5_1”.
- Type:
str
- ib_port¶
The InfiniBand port number. Defaults to 1.
- Type:
int
- link_type¶
The type of link. Defaults to “IB”.
- Type:
str
- prealloc_size¶
The preallocation size. Defaults to 16.
- Type:
int
- minimal_allocate_size¶
The minimal allocation size. Defaults to 64.
- Type:
int
- num_stream¶
The number of streams. Defaults to 1.
- Type:
int
- infinistore.register_server(loop, config: ServerConfig)¶
Registers a server with the given event loop and configuration.
This function is intended to be used internally and should not be called by clients directly.
- Parameters:
loop – The event loop to register the server with.
config (ServerConfig) – The configuration for the server.
- Raises:
Exception – If the server registration fails.