LinUCB#
- class gobrec.mabs.lin_mabs.lin_ucb.LinUCB(seed: int | None = None, alpha: float = 1.0, l2_lambda: float = 1.0, use_gpu: bool = False, items_per_batch: int = 10000)[source]#
LinUCB: Linear Upper Confidence Bound algorithm for Contextual Multi-Armed Bandits [1].
This class implements a linear MAB algorithm that uses ridge regression to estimate the expected rewards for each arm. Then, it calculates the upper confidence bound for each arm based on the estimated coefficients and the design matrix.
References
[1]Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW’09, pages 661-670, New York, NY, USA, 2010. Association for Computing Machinery. doi: 10.1145/1772690.1772758.
Examples
An example using LinUCB with \(\alpha = 0.5\).
>>> import numpy as np >>> from gobrec.mabs.lin_mabs import LinUCB >>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], ... [1, 0, 0], [0, 1, 0], [0, 0, 1], ... [1, 0, 0], [0, 1, 0], [0, 0, 1]]) >>> decisions = np.array(['a', 'a', 'a', ... 'b', 'b', 'b', ... 'c', 'c', 'c']) >>> rewards = np.array([10, 0 , 1 , ... 1 , 10, 0 , ... 0 , 1 , 10]) >>> linucb_mab = LinUCB(seed=42, alpha=0.5) >>> linucb_mab.fit(contexts, decisions, rewards) >>> linucb_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])) tensor([[5.3536, 0.8536, 0.3536], [0.3536, 5.3536, 0.8536], [0.8536, 0.3536, 5.3536]], dtype=torch.float64)
- Attributes:
- alphafloat
Controls the width of the confidence interval. Higher values lead to more exploration.
- l2_lambdafloat
Regularization parameter for ridge regression.
- devicestr
Device to use for computations (‘cpu’ or ‘cuda’).
- items_per_batchint
Number of items to process in each batch when updating the model. More items per batch means more memory usage but faster computation.
Methods
fit(contexts, decisions, rewards)Fit the Lin algorithm with contexts, decisions, and rewards.
predict(contexts)Predict the expected rewards for each arm given the contexts.
reset()Reset the Lin algorithm to its initial state.
- predict(contexts: ndarray)[source]#
Predict the expected rewards for each arm given the contexts.
- Parameters:
- contextsnp.ndarray
A 2D array where each row represents the context features for which predictions are to be made.
- Returns:
- expected_rewardstorch.Tensor
A 2D tensor of shape (n_samples, n_arms) where each element is the expected reward for the corresponding context-arm pair. The encoded items ids are used here. To get the original item IDs, it is possible to use the label_encoder.inverse_transform method.