Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach

Bei Li; Siddharth Gangadhar; Pramode Verma; Samuel Cheng; Bei Li; Siddharth Gangadhar; Pramode Verma; Samuel Cheng

doi:10.3934/energy.2015.1.162

AIMS Energy

2015, Volume 3, Issue 1: 162-172. doi: 10.3934/energy.2015.1.162

Previous Article Next Article

Research article Special Issues

Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach

1.
Google Inc., 1600 Amphitheatre Pkwy Mountain View, CA 94043, USA;
2.
Department of Electrical Engineering and Computer Science, University of Kansas, 1520 West 15th Street 2001 Eaton Hall, KS 66045, USA;
3.
Department of Telecommunication Engineering, University of Oklahoma, 4502 E41st ST #4403, Tulsa, OK 74105, USA;
4.
College of Electronics and Information Engineering, Tongji University, 4800 Cao'an Road, 201804, Shanghai, China

Received: 29 September 2014 Accepted: 03 February 2015 Published: 02 March 2015

In Smart Grid environments, homes equipped with windmills are encouraged to generate energy and sell it back to utilities. Time of Use pricing and the introduction of storage devices would greatly influence a user in deciding when to sell back energy and how much to sell. Therefore, a study of sequential decision making algorithms that can optimize the total pay off for the user is necessary. In this paper, reinforcement learning is used to tackle this optimization problem. The problem of determining when to sell back energy is formulated as a Markov decision process and the model is learned adaptively using Q-learning. Experiments are done with varying sizes of storage capacities and under periodic energy generation rates of different levels of fluctuations. The results show a notable increase in discounted total rewards from selling back energy with the proposed approach.
- smart grid,
- time of use pricing,
- reinforcement learning,
- Q-learning,
- producer payoff optimization
Citation: Bei Li, Siddharth Gangadhar, Pramode Verma, Samuel Cheng. Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach[J]. AIMS Energy, 2015, 3(1): 162-172. doi: 10.3934/energy.2015.1.162

Related Papers:

Abstract

In Smart Grid environments, homes equipped with windmills are encouraged to generate energy and sell it back to utilities. Time of Use pricing and the introduction of storage devices would greatly influence a user in deciding when to sell back energy and how much to sell. Therefore, a study of sequential decision making algorithms that can optimize the total pay off for the user is necessary. In this paper, reinforcement learning is used to tackle this optimization problem. The problem of determining when to sell back energy is formulated as a Markov decision process and the model is learned adaptively using Q-learning. Experiments are done with varying sizes of storage capacities and under periodic energy generation rates of different levels of fluctuations. The results show a notable increase in discounted total rewards from selling back energy with the proposed approach.

References

[1]	The Smart Grid: An Introduction. Technical report, Office of Electricity Delivery and Energy Reliability, Department of Energy, 2008.
[2]	Understanding the Benefits of the Smart Grid. Technical report, DOE/NETL-2010/1413, NETL Lab, Department of Energy, 2010.
[3]	Methodological Approach for Estimating the Benefits and Costs of Smart Grid Demonstration Projects. Technical report, 1020342, Electric Power Research Institute, 2010.
[4]	Borenstein S, Jaske M, Rosenfeld A (2002) Dynamic pricing, advanced metering, and demand response in electricity markets. Available from: https://escholarship.org/uc/item/11w8d6m4.
[5]	King CS (2001) The economics of real-time and time-of-use pricing for residential consumers. Technical report, Technical report, American Energy Institute.
[6]	SMART GRID POLICY. Technical report, Docket No. PL09-4-000, United States of America Federal Energy Regulatory Commission, 2009.
[7]	Communication Networks and Systems for Power Utility Automation—Part 7-420: Basic Communication Structure—Distributed Energy Resources Logical Nodes. Technical report, IEC 61850-7-420, International Electrotechnical Commission, 2009.
[8]	Distributed Generation and Renewable Energy Current Programs for Businesses. Available from: http://docs.cpuc.ca.gov/published/news release/7408.htm.
[9]	Understanding Net Metering. . Available from: http://www.solarcity.com/learn/understanding-netmetering.aspx.
[10]	Ketter W, Collins J, Block CA (2010) Smart grid economics: Policy guidance through competitive simulation. ERIM report series research in management Erasmus Research Institute of Management. Erasmus Research Institute of Management (ERIM). Available from: http://hdl.handle.net/1765/21307.
[11]	Nanduri V, Das TK (2007) A reinforcement learning model to assess market power under auction-based energy pricing. IEEE T Power Syst 22: 85-95. doi: 10.1109/TPWRS.2006.888977
[12]	Krause T, Beck EV, Cherkaoui R, et al. (2006) A comparison of Nash equilibria analysis and agent-based modelling for power markets. Int J Elec Power 28: 599-607. doi: 10.1016/j.ijepes.2006.03.002
[13]	Frezzi P, Garcés F, Haubrich HJ (2007) Analysis of Short-term Bidding Strategies in Power Markets. Power Tech, 2007 IEEE Lausanne 971-976.
[14]	Tellidou AC, Bakirtzis AG (2006) Multi-agent reinforcement learning for strategic bidding in power markets. Intelligent Systems, 2006 3rd International IEEE Conference on, 408-413.
[15]	Watanabe I, Okada K, Tokoro K, et al. (2002) Adaptive multiagent model of electric power market with congestion management. Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on, 523-528.
[16]	Bompard EF, Abrate G, Napoli R, et al. (2007) Multi-agent models for consumer choice and retailer strategies in the competitive electricity market. Int J Emerging Electr Pow Syst 8: 4.
[17]	Vytelingum P, Voice TD, Ramchurn SD, et al. (2010) Agent-based micro-storage management for the smart grid. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems 1: 39-46.
[18]	Li B, Gangadhar S, Cheng S et al. (2011) Predicting user comfort level using machine learning for Smart Grid environments. Innovative Smart Grid Technologies (ISGT), 2011 IEEE PES 1-6.
[19]	Reddy PP, Veloso MM (2011) Strategy Learning for Autonomous Agents in Smart Grid Markets. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 1446-1451.
[20]	Reddy PP, Veloso MM (2011) Learned Behaviors of Multiple Autonomous Agents in Smart Grid Markets. Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), 1396-1401.
[21]	Goldin J (2007) Making Decisions about the Future: The Discounted-Utility Model. Mind Matters: Wesleyan J Psychology 2: 49-55.
[22]	Watkins C. Learning from Delayed Rewards. PhD thesis, University of Cambridge,England, 1989.
[23]	Watkins C, Dayan P (1992) Technical Note: Q-Learning. Mach Learn 8: 279-292.
[24]	Puterman ML (1990) Markov decision processes. Handbooks in Operations Research and Management Science 2: 331-434. doi: 10.1016/S0927-0507(05)80172-0

Reader Comments

Your name:*

Email:*
© 2015 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)