QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

Kar, Soummya; Moura, José MF; Poor, H Vincent

QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

Author(s): Kar, Soummya; Moura, José MF; Poor, H Vincent

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1wf63

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kar, Soummya	-
dc.contributor.author	Moura, José MF	-
dc.contributor.author	Poor, H Vincent	-
dc.date.accessioned	2020-02-19T21:59:40Z	-
dc.date.available	2020-02-19T21:59:40Z	-
dc.date.issued	2013-04-01	en_US
dc.identifier.citation	Kar, Soummya, Moura, José MF, Poor, H Vincent. (2013). QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations. IEEE Transactions on Signal Processing, 61 (7), 1848 - 1862. doi:10.1109/TSP.2013.2241057	en_US
dc.identifier.issn	1053-587X	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1wf63	-
dc.description.abstract	The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents’ objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of Q-learning, QD-learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the inter-agent communication network is weakly connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed time-scale stochastic dynamics of the consensus + innovations form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.	en_US
dc.format.extent	1848 - 1862	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	IEEE Transactions on Signal Processing	en_US
dc.rights	Author's manuscript	en_US
dc.title	QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations	en_US
dc.type	Journal Article	en_US
dc.identifier.doi	doi:10.1109/TSP.2013.2241057	-
dc.identifier.eissn	1941-0476	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/journal-article	en_US

Files in This Item:

File	Description	Size	Format
OA_QD_Learning_Collaborative_Distributed.pdf		540.7 kB	Adobe PDF	View/Download

Show Simple Item Record