LinGreedy#

class gobrec.mabs.lin_mabs.lin_greedy.LinGreedy(seed: int | None = None, epsilon: float = 0.1, l2_lambda: float = 1.0, use_gpu: bool = False, items_per_batch: int = 10000)[source]#

LinGreedy: Linear Contextual Bandit with Epsilon-Greedy Exploration [1].

This class implements a linear MAB algorithm that uses ridge regression to estimate the expected rewards for each arm. Then, with probability \(\epsilon\), the generated scores are random (exploration), and with probability \(1 - \epsilon\), the generated scores are the expected rewards (exploitation).

References

[1]

John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Proceedings of the 20th International Conference on Neural Information Pro- cessing Systems, NIPS’07, pages 817-824, Red Hook, NY, USA, 2007. Curran Associates Inc. doi: 10.5555/2981562.2981665.

Examples

An example using LinGreedy with \(\epsilon = 1\), which means always explore.

>>> import numpy as np
>>> from gobrec.mabs.lin_mabs import LinGreedy
>>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> decisions = np.array(['a', 'a', 'a', 
...                       'b', 'b', 'b',
...                       'c', 'c', 'c'])
>>> rewards = np.array([10, 0 , 1 , 
...                     1 , 10, 0 ,
...                     0 , 1 , 10])
>>> lin_greedy_mab = LinGreedy(seed=42, epsilon=1)
>>> lin_greedy_mab.fit(contexts, decisions, rewards)
>>> lin_greedy_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[0.6974, 0.0942, 0.9756],
        [0.7611, 0.7861, 0.1281],
        [0.4504, 0.3708, 0.9268]], dtype=torch.float64)
>>> lin_greedy_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[0.2272, 0.5546, 0.0638],
        [0.8276, 0.6317, 0.7581],
        [0.3545, 0.9707, 0.8931]], dtype=torch.float64)

An example using LinGreedy with \(\epsilon = 0\), which means always exploit.

>>> import numpy as np
>>> from gobrec.mabs.lin_mabs import LinGreedy
>>> contexts = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1],
...                      [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> decisions = np.array(['a', 'a', 'a', 
...                       'b', 'b', 'b',
...                       'c', 'c', 'c'])
>>> rewards = np.array([10, 0 , 1 , 
...                     1 , 10, 0 ,
...                     0 , 1 , 10])
>>> lin_greedy_mab = LinGreedy(seed=42, epsilon=0)
>>> lin_greedy_mab.fit(contexts, decisions, rewards)
>>> lin_greedy_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[5.0000, 0.5000, 0.0000],
        [0.0000, 5.0000, 0.5000],
        [0.5000, 0.0000, 5.0000]], dtype=torch.float64)
>>> lin_greedy_mab.predict(np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))
tensor([[5.0000, 0.5000, 0.0000],
        [0.0000, 5.0000, 0.5000],
        [0.5000, 0.0000, 5.0000]], dtype=torch.float64)
Attributes:
epsilonfloat

Probability of choosing a random action (exploration). Value should be in [0, 1]. 1 means always explore, 0 means always exploit.

l2_lambdafloat

Regularization parameter for ridge regression.

devicestr

Device to use for computations (‘cpu’ or ‘cuda’).

items_per_batchint

Number of items to process in each batch when updating the model. More items per batch means more memory usage but faster computation.

Methods

fit(contexts, decisions, rewards)

Fit the Lin algorithm with contexts, decisions, and rewards.

predict(contexts)

Predict the expected rewards for each arm given the contexts.

reset()

Reset the Lin algorithm to its initial state.

predict(contexts: ndarray)[source]#

Predict the expected rewards for each arm given the contexts.

Parameters:
contextsnp.ndarray

A 2D array where each row represents the context features for which predictions are to be made.

Returns:
expected_rewardstorch.Tensor

A 2D tensor of shape (n_samples, n_arms) where each element is the expected reward for the corresponding context-arm pair. The encoded items ids are used here. To get the original item IDs, it is possible to use the label_encoder.inverse_transform method.