A sensing policy based on confidence bounds and a restless multi-armed bandit model

Oksanen, Jan; Koivunen, Visa; Poor, H Vincent

A sensing policy based on confidence bounds and a restless multi-armed bandit model

Author(s): Oksanen, Jan; Koivunen, Visa; Poor, H Vincent

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1x77k

Full metadata record

DC Field	Value	Language
dc.contributor.author	Oksanen, Jan	-
dc.contributor.author	Koivunen, Visa	-
dc.contributor.author	Poor, H Vincent	-
dc.date.accessioned	2020-02-19T21:59:50Z	-
dc.date.available	2020-02-19T21:59:50Z	-
dc.date.issued	2012-11	en_US
dc.identifier.citation	Oksanen, Jan, Visa Koivunen, and H. Vincent Poor. "A sensing policy based on confidence bounds and a restless multi-armed bandit model." In 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), (2012): 318-323. doi:10.1109/ACSSC.2012.6489015	en_US
dc.identifier.issn	1058-6393	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1x77k	-
dc.description.abstract	A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts of the spectrum to sense and exploit. It is shown that the proposed policy attains asymptotically logarithmic weak regret rate when the rewards are bounded independent and identically distributed or finite state Markovian. Simulation results verifying uniformly logarithmic weak regret are also presented. The proposed policy is a centrally coordinated index policy, in which the index of a frequency band is comprised of a sample mean term and a confidence term. The sample mean term promotes spectrum exploitation whereas the confidence term encourages exploration. The confidence term is designed such that the time interval between consecutive sensing instances of any suboptimal band grows exponentially. This exponential growth between suboptimal sensing time instances leads to logarithmically growing weak regret. Simulation results demonstrate that the proposed policy performs better than other similar methods in the literature.	en_US
dc.format.extent	318 - 323	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)	en_US
dc.rights	Author's manuscript	en_US
dc.title	A sensing policy based on confidence bounds and a restless multi-armed bandit model	en_US
dc.type	Conference Article	en_US
dc.identifier.doi	10.1109/ACSSC.2012.6489015	-
dc.date.eissued	2013-03-28	en_US
dc.identifier.eissn	1058-6393	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/journal-article	en_US

Files in This Item:

File	Description	Size	Format
OA_SensingPolicyBasedConfidenceBoundsRestlessMultiArmedBanditModel.pdf		522.48 kB	Adobe PDF	View/Download

Show Simple Item Record