Lin#

class gobrec.mabs.lin_mabs.lin.Lin(seed: int | None = None, l2_lambda: float = 1.0, use_gpu: bool = False, items_per_batch: int = 10000)[source]#

Linear Multi-Armed Bandit (MAB) Algorithm.

This class implements a linear MAB algorithm that uses ridge regression to estimate the expected rewards for each arm. This class is the base class for other linear MAB algorithms like LinUCB and LinTS. It can also be used directly, not having any exploration strategy (exploit only).

Examples

A simple example using Lin to generate scores for items.

>>> import numpy as np
>>> from gobrec.mabs.lin_mabs import Lin
>>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> decisions = np.array(['a', 'a', 'a', 
...                       'b', 'b', 'b',
...                       'c', 'c', 'c'])
>>> rewards = np.array([10, 0 , 1 , 
...                     1 , 10, 0 ,
...                     0 , 1 , 10])
>>> lin_mab = Lin()
>>> lin_mab.fit(contexts, decisions, rewards)
>>> lin_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[5.0000, 0.5000, 0.0000],
        [0.0000, 5.0000, 0.5000],
        [0.5000, 0.0000, 5.0000]], dtype=torch.float64)
Attributes:
l2_lambdafloat

Regularization parameter for ridge regression.

devicestr

Device to use for computations (‘cpu’ or ‘cuda’).

items_per_batchint

Number of items to process in each batch when updating the model. More items per batch means more memory usage but faster computation.

Methods

fit(contexts, decisions, rewards)

Fit the Lin algorithm with contexts, decisions, and rewards.

predict(contexts)

Predict the expected rewards for each arm given the contexts.

reset()

Reset the Lin algorithm to its initial state.

fit(contexts: ndarray, decisions: ndarray, rewards: ndarray)[source]#

Fit the Lin algorithm with contexts, decisions, and rewards.

Parameters:
contextsnp.ndarray

A 2D array of shape (n_samples, n_features) representing the context arrays

decisionsnp.ndarray

A 1D array of item IDs (arms or decisions) of shape (n_samples,) where each element can be strings or integers.

rewardsnp.ndarray

A 1D array of rewards (ratings) of shape (n_samples,). It can be integers or floats.

predict(contexts: ndarray)[source]#

Predict the expected rewards for each arm given the contexts.

Parameters:
contextsnp.ndarray

A 2D array where each row represents the context features for which predictions are to be made.

Returns:
expected_rewardstorch.Tensor

A 2D tensor of shape (n_samples, n_arms) where each element is the expected reward for the corresponding context-arm pair. The encoded items ids are used here. To get the original item IDs, it is possible to use the label_encoder.inverse_transform method.

reset()[source]#

Reset the Lin algorithm to its initial state.

This method clears the labeled encoder and reinitializes the internal matrices used for ridge regression.

After that the algorithm can be fitted again as if it was newly created.