LinTS#

class gobrec.mabs.lin_mabs.lin_ts.LinTS(seed: int | None = None, alpha: float = 1.0, l2_lambda: float = 1.0, use_gpu: bool = False, items_per_batch: int = 10000)[source]#

LinTS: Linear Thompson Sampling algorithm [1].

This class implements a linear MAB algorithm that uses ridge regression to estimate the expected rewards for each arm. Then, it samples the coefficients from a multivariate normal distribution with mean equal to the estimated coefficients and covariance proportional to the inverse of the design matrix.

References

[1]

Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning, ICML’13, pages 1220-1228, New York, NY, USA, 2013. JMLR.org. doi: 10.48550/arXiv.1209.3352.

Examples

An example using LinTS with \(\alpha = 0.5\).

>>> import numpy as np
>>> from gobrec.mabs.lin_mabs import LinTS
>>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> decisions = np.array(['a', 'a', 'a', 
...                       'b', 'b', 'b',
...                       'c', 'c', 'c'])
>>> rewards = np.array([10, 0 , 1 , 
...                     1 , 10, 0 ,
...                     0 , 1 , 10])
>>> lints_mab = LinTS(seed=42, alpha=0.5)
>>> lints_mab.fit(contexts, decisions, rewards)
>>> lints_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[ 5.1077,  0.6077,  0.1077],
        [-0.6898,  4.3102, -0.1898],
        [ 0.4941, -0.0059,  4.9941]], dtype=torch.float64)
Attributes:
alphafloat

Controls the variance of the sampling distribution. Higher values lead to more exploration.

l2_lambdafloat

Regularization parameter for ridge regression.

devicestr

Device to use for computations (‘cpu’ or ‘cuda’).

items_per_batchint

Number of items to process in each batch when updating the model. More items per batch means more memory usage but faster computation.

Methods

fit(contexts, decisions, rewards)

Fit the Lin algorithm with contexts, decisions, and rewards.

predict(contexts)

Predict the expected rewards for each arm given the contexts.

reset()

Reset the Lin algorithm to its initial state.

predict(contexts: ndarray)[source]#

Predict the expected rewards for each arm given the contexts.

Parameters:
contextsnp.ndarray

A 2D array where each row represents the context features for which predictions are to be made.

Returns:
expected_rewardstorch.Tensor

A 2D tensor of shape (n_samples, n_arms) where each element is the expected reward for the corresponding context-arm pair. The encoded items ids are used here. To get the original item IDs, it is possible to use the label_encoder.inverse_transform method.