LinTS#

class gobrec.mabs.lin_mabs.lin_ts.LinTS(seed: int | None = None, alpha: float = 1.0, l2_lambda: float = 1.0, use_gpu: bool = False, items_per_batch: int = 10000)[source]#

LinTS: Linear Thompson Sampling algorithm [1].

This class implements a linear MAB algorithm that uses ridge regression to estimate the expected rewards for each arm. Then, it samples the coefficients from a multivariate normal distribution with mean equal to the estimated coefficients and covariance proportional to the inverse of the design matrix.

References

[1]

Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning, ICML’13, pages 1220-1228, New York, NY, USA, 2013. JMLR.org. doi: 10.48550/arXiv.1209.3352.

Examples

An example using LinTS with \(\alpha = 0.5\).

>>> import numpy as np
>>> from gobrec.mabs.lin_mabs import LinTS
>>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> decisions = np.array(['a', 'a', 'a', 
...                       'b', 'b', 'b',
...                       'c', 'c', 'c'])
>>> rewards = np.array([10, 0 , 1 , 
...                     1 , 10, 0 ,
...                     0 , 1 , 10])
>>> lints_mab = LinTS(seed=42, alpha=0.5)
>>> lints_mab.fit(contexts, decisions, rewards)
>>> lints_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[ 5.1077,  0.6077,  0.1077],
        [-0.6898,  4.3102, -0.1898],
        [ 0.4941, -0.0059,  4.9941]], dtype=torch.float64)

Attributes:

alphafloat: Controls the variance of the sampling distribution. Higher values lead to more exploration.
l2_lambdafloat: Regularization parameter for ridge regression.
devicestr: Device to use for computations (‘cpu’ or ‘cuda’).
items_per_batchint: Number of items to process in each batch when updating the model. More items per batch means more memory usage but faster computation.

Methods

`fit`(contexts, decisions, rewards)	Fit the Lin algorithm with contexts, decisions, and rewards.
`predict`(contexts)	Predict the expected rewards for each arm given the contexts.
`reset`()	Reset the Lin algorithm to its initial state.

predict(contexts: ndarray)[source]#

Predict the expected rewards for each arm given the contexts.

Parameters:

contextsnp.ndarray: A 2D array where each row represents the context features for which predictions are to be made.

Returns:

expected_rewardstorch.Tensor: A 2D tensor of shape (n_samples, n_arms) where each element is the expected reward for the corresponding context-arm pair. The encoded items ids are used here. To get the original item IDs, it is possible to use the label_encoder.inverse_transform method.