LinUCB#

class gobrec.mabs.lin_mabs.lin_ucb.LinUCB(seed: int | None = None, alpha: float = 1.0, l2_lambda: float = 1.0, use_gpu: bool = False, items_per_batch: int = 10000)[source]#

LinUCB: Linear Upper Confidence Bound algorithm for Contextual Multi-Armed Bandits [1].

This class implements a linear MAB algorithm that uses ridge regression to estimate the expected rewards for each arm. Then, it calculates the upper confidence bound for each arm based on the estimated coefficients and the design matrix.

Attributes:

alphafloat: Controls the width of the confidence interval. Higher values lead to more exploration.
l2_lambdafloat: Regularization parameter for ridge regression.
devicestr: Device to use for computations (‘cpu’ or ‘cuda’).
items_per_batchint: Number of items to process in each batch when updating the model. More items per batch means more memory usage but faster computation.

Methods

`fit`(contexts, decisions, rewards)	Fit the Lin algorithm with contexts, decisions, and rewards.
`predict`(contexts)	Predict the expected rewards for each arm given the contexts.
`reset`()	Reset the Lin algorithm to its initial state.

References

[1]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW’09, pages 661-670, New York, NY, USA, 2010. Association for Computing Machinery. doi: 10.1145/1772690.1772758.

Examples

An example using LinUCB with \(\alpha = 0.5\).

>>> import numpy as np
>>> from gobrec.mabs.lin_mabs import LinUCB
>>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> decisions = np.array(['a', 'a', 'a', 
...                       'b', 'b', 'b',
...                       'c', 'c', 'c'])
>>> rewards = np.array([10, 0 , 1 , 
...                     1 , 10, 0 ,
...                     0 , 1 , 10])
>>> linucb_mab = LinUCB(seed=42, alpha=0.5)
>>> linucb_mab.fit(contexts, decisions, rewards)
>>> linucb_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[5.3536, 0.8536, 0.3536],
        [0.3536, 5.3536, 0.8536],
        [0.8536, 0.3536, 5.3536]], dtype=torch.float64)

predict(contexts: ndarray)[source]#

Predict the expected rewards for each arm given the contexts.

Parameters:

contextsnp.ndarray: A 2D array where each row represents the context features for which predictions are to be made.

Returns:

expected_rewardstorch.Tensor: A 2D tensor of shape (n_samples, n_arms) where each element is the expected reward for the corresponding context-arm pair. The encoded items ids are used here. To get the original item IDs, it is possible to use the label_encoder.inverse_transform method.