Classifier chains: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Qwertyus
m External links: more specific category
 
en>John of Reading
m Problem transformation: Typo fixing, replaced: Furhtermore → Furthermore using AWB (8853)
Line 1: Line 1:
Four or five years ago, a reader of some of my columns bought the domain name jamesaltucher.com and gave it to me as a birthday gift. It was a total surprise to me. I didn't even know the reader. I hope one day we meet.<br>Two years ago a friend of mine, Tim Sykes, insisted I had to have a blog. He set it up for me. He even wrote the "About Me". I didn't want a blog. I had nothing to say. But about 6 or 7 months ago I decided I wanted to take this blog seriously. I kept putting off changing the "About Me" which was no longer really about me and maybe never was.<br>A few weeks ago I did a chapter in one of the books in Seth Godin's "The Domino Project". The book is out and called "No Idling". Mohit Pewar organized it (here's Mohit's blog) and sent me a bunch of questions recently. It's intended to be an interview on his blog but I hope Mohit forgives me because I want to use it as my new "About Me" also.<br>1. You are a trader, investor, writer, and entrepreneur? Which of these roles you enjoy the most and why?<br>When I first moved to New York City in 1994 I wanted to be everything to everyone. I had spent the six years prior to that writing a bunch of unpublished novels and unpublished short stories. I must've sent out 100s of stories to literary journals. I got form rejections from every publisher, journal, and agent I sent my novels and stories to.<br>Now, in 1994, everything was possible. The money was in NYC. Media was here. I lived in my 10�10 room and pulled suits out of a garbage bag every morning but it didn't matter...the internet was revving up and I knew how to build a website. One of the few in the city. My sister warned me though: nobody here is your friend. Everybody wants something<br>
{{multiple issues|orphan=December 2011|context=January 2012|
And I wanted something. I wanted the fleeting feelings of success, for the first time ever, in order to feel better about myself. I wanted a girl next to me. I wanted to build and sell companies and finally prove to everyone I was the smartest. I wanted to do a TV show. I wanted to write books<br>
{{underlinked|date=November 2012}}
But everything involved having a master. Clients. Employers. Investors. Publishers. The market (the deadliest master of all). Employees. I was a slave to everyone for so many years. And the more shackles I had on, the lonelier I got<br>
}}
Much of the time, even when I had those moments of success, I didn't know how to turn it into a better life. I felt ugly and then later, I felt stupid when I would let the success dribble away down the sink<br>
 
I love writing because every now and then that ugliness turns into honesty. When I write, I'm only a slave to myself. When I do all of those other things you ask about, I'm a slave to everyone else<br>
'''Automatic basis function construction''' (or '''basis discovery''') is the method of looking for a set of task-independent basis functions that map the state space to a lower-dimensional embedding, while still representing the value function accurately. Automatic basis construction is independent of prior knowledge of the domain, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.
Some links<br>
 
33 Unusual Tips to Being a Better Write<br>
==Motivation==
"The Tooth<br>
 
(one of my favorite posts on my blog<br>
In [[reinforcement learning]] (RL), most real-world [[Markov Decision Process]] (MDP) problems have large or continuous state spaces, which typically require some sort of approximation to be represented efficiently.
2. What inspires you to get up and start working/writing every day<br>
 
The other day I had breakfast with a fascinating guy who had just sold a piece of his fund of funds. He told me what "fracking" was and how the US was going to be a major oil player again. We spoke for two hours about a wide range of topics, including what happens when we can finally implant a google chip in our brains<br>
Linear function approximators<ref name=keller06>Keller,Philipp;Mannor,Shie;Precup,Doina. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA.</ref>(LFAs) are widely adopted for their low theoretical complexity. Two subproblems needs to be solved for better approximation: weight optimization and basis construction. To solve the second problem, one way is to design special basis functions. Those basis functions work well in specific tasks but are significantly restricted to domains. Thus constructing basis construction functions automatically is preferred for broader applications.{{Citation needed|date=December 2011}}
After that I had to go onto NPR because I firmly believe that in one important respect we are [https://Www.Vocabulary.com/dictionary/degenerating degenerating] as a country - we are graduating a generation of indentured servants who will spend 50 years or more paying down their student debt rather than starting companies and curing cancer. So maybe I made a difference<br>
 
Then I had lunch with a guy I hadn't seen in ten years. In those ten years he had gone to jail and now I was finally taking the time to forgive him for something he never did to me. I felt bad I hadn't helped him when he was at his low point. Then I came home and watched my kid play clarinet at her school. Then I read until I fell asleep. Today I did nothing but write. Both days inspired me<br>
==Problem definition==
It also inspires me that I'm being asked these questions. Whenever anyone asks me to do anything I'm infinitely grateful. Why me? I feel lucky. I like it when someone cares what I think. I'll write and do things as long as anyone cares. I honestly probably wouldn't write if nobody cared. I don't have enough humility for that, I'm ashamed to admit<br>
A Markov decision process with finite state space and fixed policy is defined with a 4-tuple <math>{s,p,\gamma,r}</math>, which includes the finite state space <math>S={{1,2,\ldots,s}}</math>, the reward function <math>r</math>, discount factor <math>\gamma\in [0,1)</math>, and the transition model <math>P</math>.
3. Your new book "How to be the luckiest person alive" has just come out. What is it about<br>
 
When I was a kid I thought I needed certain things: a college education from a great school, a great home, a lot of money, someone who would love me with ease. I wanted people to think I was smart. I wanted people to think I was even special.  And as I grew older more and more goals got added to the list: a high chess rating, a published book, perfect weather, good friends, respect in various fields, etc. I lied to myself that I needed these things to be happy. The world was going to work hard to give me these things, I thought. But it turned out the world owed me no favors<br>
Bellman equation is defined as:
And gradually, over time, I lost everything I had ever gained. Several times.  I've paced at night so many times wondering what the hell was I going to do next or trying not to care. The book is about regaining your sanity, regaining your happiness, finding luck in all the little pockets of life that people forget about. It's about turning away from the religion you've been hypnotized into believing into the religion you can find inside yourself every moment of the day<br>
 
[Note: in a few days I'm going to do a post on self-publishing and also how to get the ebook for free. The link above is to the paperback. Kindle should be ready soon also.<br>
: <math>v=r+\gamma Pv. \,</math>
Related link: Why I Write Books Even Though I've Lost Money On Every Book I've Ever Writte<br>
 
4. Is it possible to accelerate success? If yes, how<br>
When the number of elements in <math>S</math> is small, <math>v</math> is usually maintained as tabular form. While <math>S</math> grows too large for this kind of representation. <math>v</math> is commonly being approximated via a linear combination of basis function <math>\Phi={\phi_1,\phi_2,\ldots,\phi_n}</math>,<ref name=sutton_barto>Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction.(1998) MIT Press, Cambridge, MA, chapter 8</ref> so that we have:
Yes, and it's the only way I know actually to achieve success. It's by following the Daily Practice I outline in this post:<br>
 
It's the only way I know to exercise every muscle from the inside of you to the outside of you. I firmly believe that happiness starts with that practice<br>
: <math>v\approx\hat{v}=\sum_{i=1}^n\theta_n\phi_{n}</math>
5. You say that discipline, persistence and psychology are important if one has to achieve success. How can one work on improving "psychology" part<br>
 
Success doesn't really mean anything. People want to be happy in a harsh and unforgiving world. It's very difficult. We're so lucky most of us live in countries without major wars. Our kids aren't getting killed by random gunfire. We all have cell phones. We all can communicate with each other on the Internet. We have Google to catalog every piece of information in history!  We are so amazingly lucky already<br>
Here <math>\Phi</math> is a <math>|S|\times n</math> matrix in which every row contains a feature vector for corresponding row, <math>\theta</math> is a weight vector with n parameters and usually <math>n\ll |s|</math>.
How can it be I was so lucky to be born into such a body? In New York City of all places? Just by being born in such a way on this planet was an amazing success<br>
 
So what else is there? The fact is that most of us, including me, have a hard time being happy with such ready-made success. We quickly adapt and want so much more out of life. It's not wars or disease that kill us. It's the minor inconveniences that add up in life. It's the times we feel slighted or betrayed. Or even slightly betrayed. Or overcharged. Or we miss a train. Or it's raining today. Or the dishwasher doesn't work. Or the supermarket doesn't have the food we like. We forget how good the snow tasted when we were kids. Now we want gourmet food at every meal<br>
Basis construction looks for ways to automatically construct better basis function <math>\Phi</math> which can represent the value function well.  
Taking a step back, doing the Daily Practice I outline in the question above. For me, the results of that bring me happiness. That's success. Today. And hopefully tomorrow<br>
A good construction method should have the following characteristics:
6. You advocate not sending kids to college. What if kids grow up and then blame their parents about not letting them get a college education<br>
 
I went to one of my kid's music recitals yesterday. She was happy to see me. I hugged her afterwards. She played "the star wars theme" on the clarinet. I wish I could've played that for my parents. My other daughter has a dance recital in a few weeks. I tried to give her tips but she laughed at me. I was quite the breakdancer in my youth. The nerdiest breakdancer on the planet. I want to be present for them. To love them. To let them always know that in their own dark moments, they know I will listen to them. I love them. Even when they cry and don't always agree with me. Even when they laugh at me because sometimes I act like a clown<br>
* Small error bounds between the estimate and real value function
Later, if they want to blame me for anything at all then I will still love them. That's my "what if"<br>
* Form orthogonal basis in the value function space
Two posts<br>
* Converge to stationary value function fast
I want my daughters to be lesbian<br>
 
Advice I want to give my daughter<br><br>
==Popular methods==
7. Four of your favorite posts from The Altucher Confidential<br>
 
As soon as I publish a post I get scared to death. Is it good? Will people re-tweet? Will one part of the audience of this blog like it at the expense of another part of the audience. Will I get Facebook Likes? I have to stop clinging to these things but you also need to respect the audience. I don't know. It's a little bit confusing to me. I don't have the confidence of a real writer yet<br>
===Proto-value basis===
Here are four of my favorites<br>
 
How I screwed Yasser Arafat out of $2mm (and lost another $100mm in the process<br>
In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions.<ref name=mahadevan05/>
It's Your Fault<br>
 
I'm Guilty of Torturing Wome<br>
The normalized graph Laplacian is defined as:
The Girl Whose Name Was a Curs<br>
 
Although these three are favorites I really don't post anything unless it's my favorite of that moment<br>
: <math>L=I-D^{-\frac{1}{2}}WD^{-\frac{1}{2}}</math>
8. 3 must-read books for aspiring entrepreneurs<br>
 
The key in an entrepreneur book: you want to learn business. You want to learn how to honestly communicate with your customers. You want to stand out<br>
Here W is an adjacency marix which represents the states of fixed policy MDP which forms an undirected graph (N,E). D is a diagonal matrix related to nodes' degrees.
The Essays of Warren Buffett by Lawrence Cunningha<br>
 
"The Thank you Economy" by Gary Vaynerchu<br>
In discrete state space, the adjacency matrix <math>W</math> could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take random walk Laplacian of W.
"Purple cow" by Seth Godi<br>
 
9. I love your writing, so do so many others out there. Who are your favorite writers<br>
This spectral framework can be used for value function approximation(VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation, diffusion wavelets are used.<ref name=mahadevan05>Mahadevan,Sridhar;Maggioni,Mauro. (2005) Value function approximation with diffusion wavelets and Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.</ref>
"Jesus's Son" by Denis Johnson is the best collection of short stories ever written. I'm afraid I really don't like his novels though<br>
 
"Tangents" by M. Prado. A beautiful series of graphic stories about relationships<br>
===Krylov basis===
Other writers: Miranda July, Ariel Leve, Mary Gaitskill, Charles Bukowski, [http://www.pcs-systems.co.uk/Images/celinebag.aspx Celine Bag Online], Sam Lipsyte, William Vollmann, Raymond Carver. Arthur Nersesian. Stephen Dubner<br>
 
Many writers are only really good storytellers. Most writers come out of a cardboard factory MFA system and lack a real voice. A real voice is where every word exposes ten levels of hypocrisy in the world and brings us all the way back to see reality. The writers above have their own voices, their own pains, and their unique ways of [http://Search.Un.org/search?ie=utf8&site=un_org&output=xml_no_dtd&client=UN_Website_en&num=10&lr=lang_en&proxystylesheet=UN_Website_en&oe=utf8&q=expressing&Submit=Go expressing] those pains. Some of them are funny. Some a little more dark. I wish I could write 1/10 as good as any of them<br><br>
Krylov basis construction uses the actual transition matrix instead of random walk Laplacian. The assumption of this method is that transition model ''P'' and reward r are available.
10. You are a prolific writer. Do you have any hacks that help you write a lot in little time<br>
 
Coffee, plus everything else coffee does for you first thing in the morning<br>
The vectors in Neumann series are denoted as  
Only write about things you either love or hate. But if you hate something, try to find a tiny gem buried in the bag of dirt so you can reach in when nobody is looking and put that gem in your pocket. Stealing a diamond in all the shit around us and then giving it away for free via writing is a nice little hack, Being fearless precisely when you are most scared is the best hack<br><br>
<math>y_i=P^ir</math> for all <math>i\in[0,infty)</math>.
11. I totally get and love your idea about bleeding as a writer, appreciate if you share more with the readers of this blog<br>
It shows that Krylov space spanned by <math>y_0,y_1,\ldots,y_{m-1}</math> is enough to represent any value function,<ref name=Ipsen_Meyer>Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. American Mathematical Monthly, 105(10):889–899, 1998.</ref> and m is the degree of minimal polynomial of <math>(I-\gamma P)</math>.
Most people worry about what other people think of them. Most people worry about their health. Most people are at a crossroads and don't know how to take the next step and which road to take it on. Everyone is in a perpetual state of 'where do I put my foot next'. Nobody, including me, can avoid that<br>
 
You and I both need to wash our faces in the morning, brush our teeth, shower, shit, eat, fight the weather, fight the colds that want to attack us if we're not ready. Fight loneliness or learn how to love and appreciate the people who want to love you back. And learn how to forgive and love the people who are even more stupid and cruel than we are. We're afraid to tell each other these things because they are all both disgusting and true<br>
Suppose the minimal polynomial is <math>p(A)=\frac{1}{\alpha_0}\sum_{i=0}^{m-1}\alpha_{i+1}A^i</math>, and we have <math>BA=I</math>, the value function can be written as:
You and I both have the same color blood. If I cut my wrist open you can see the color of my blood. You look at it and see that it's the same color as yours. We have something in common. It doesn't have to be shameful. It's just red. Now we're friends. No matter whom you are or where you are from. I didn't have to lie to you to get you to be my friend<br>
 
Related Links<br>
: <math>v=Br=\frac{1}{\alpha_0}\sum_{i=0}^{m-1}\alpha_{i+1}(I-\gamma P)^ir=\sum_{i=0}^{m-1}\alpha_{i+1}\beta_i y_i.</math><ref name=krylov />
How to be a Psychic in Ten Easy Lesson<br>
 
My New Year's Resolution in 199<br>
:'''Algorithm''' Augmented Krylov Method<ref name=krylov>M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007</ref>
12. What is your advice for young entrepreneurs<br>
:<math>z_1,z_2,\ldots,z_k</math> are top real eigenvectors of P
Only build something you really want to use yourself. There's got to be one thing you are completely desperate for and no matter where you look you can't find it. Nobody has invented it yet. So there you go - you invent it. If there's other people like you, you have a business. Else. You fail. Then do it again. Until it works. One day it will<br>
:<math>z_{k+1}:=r</math>
Follow these 100 Rules<br>
:''for'' <math>i:=1:(l+k)</math> ''do''
The 100 Rules for Being a Good Entrepreneur<br>
::''if'' <math>i>k+1</math> ''then''
And, in particular this<br>
:::<math>z_i:=Pz_{i-1}</math>;
The Easiest Way to Succeed as an Entrepreneu<br>
::''end if''
In my just released book I have more chapters on my experiences as an entrepreneur<br>
::''for'' <math>j:=1:(i-1)</math> ''do''
13. I advocate the concept of working at a job while building your business. You have of course lived it. Now as you look back, what is your take on this? Is it possible to make it work while sailing on two boats<br><br>
:::<math>z_i:=z_i-<z_j,z_i>z_j;</math>
Your boss wants everything out of you. He wants you to work 80 hours a week. He wants to look good taking credit for your work. He wants your infinite loyalty. So you need something back<br>
::''end for''
Exploit your employer. It's the best way to get good experience, clients, contacts. It's a legal way to steal. It's a fast way to be an entrepreneur because you see what large companies with infinite money are willing to pay for. If you can provide that, you make millions. It's how many great businesses have started and will always start. It's how every exit I've had started<br><br>
::''if'' <math>\parallel z_i\parallel\approx 0</math> ''then''
14. Who is a "person with true moral fiber"? In current times are there any role models who are people with true moral fiber<br>
:::''break'';
I don't really know the answer. I think I know a few people like that. I hope I'm someone like that. And I pray to god the people I'm invested in are like that and my family is like that<br>
::''end if''
I find most people to be largely mean and stupid, a vile combination. It's not that I'm pessimistic or cynical. I'm very much an optimist. It's just reality. Open the newspaper or turn on the TV and watch these people<br>
:''end for''
Moral fiber atrophies more quickly than any muscle on the body. An exercise I do every morning is to promise myself that "I'm going to save a life today" and then leave it in the hands of the Universe to direct me how I can best do that. Through that little exercise plus the Daily Practice described above I hope to keep regenerating that fiber<br>
:* k: number of eigenvectors in basis
15.  Your message to the readers of this blog<br>
:* l: total number of vectors
Skip dinner. But follow me on Twitter.<br>
 
Read more posts on The Altucher Confidential �
===Bellman error basis===
More from The Altucher Confidentia<br>
Bellman error(or BEBFs) is defined as: <math>\varepsilon=r+\gamma P\hat{v}-\hat{v}=r+\gamma P\Phi\theta-\Phi\theta</math>.
Life is Like a Game. Here�s How You Master ANY Gam<br><br>
 
Step By Step Guide to Make $10 Million And Then Totally Blow <br><br>
Loosely speaking, Bellman error points towards the optimal value function.<ref name=parr07>R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007.</ref> The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly.
Can You Do One Page a Day?
:'''Algorithm''' BEBF
:stage stage i=1, <math>\phi_{1}=r</math>;
:stage <math>i\in[2,N]</math>
::compute the weight vector <math>\theta_i</math> according to current basis function <math>\Phi_i</math>;
::compute new bellman error by <math>\varepsilon=r+\gamma P \Phi_{i}\theta_{i}-\Phi_{i}\theta_{i}</math>;
::add bellman error to form new basis function: <math>\Phi_{i+1}=[\Phi_{i}:\varepsilon]</math>;
:* N represents the number of iterations till convergence.
:* ":" means juxtaposing matrices or vectors.
 
===Bellman average reward bases===
Bellman Average Reward Bases(or BARBs)<ref name=mahadevan10>S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010</ref> is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix <math>P-P^*</math>. Here <math>P^*</math> can be calculated by many methods in.<ref name=willian97>William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible markov chains. In Advances in Computational Probability. Kluwer Academic Publishers, 1997.</ref>
 
BARBs converges faster than BEBFs and Krylov when <math>\gamma</math> is close to 1.
:'''Algorithm''' BARBs
:stage stage i=1, <math>P^*r</math>;
:stage <math>i\in[2,N]</math>
::compute the weight vector <math>\theta_i</math> according to current basis function <math>\Phi_i</math>;
::compute new basis: <math>:\phi_{i+1}=r-P^*r+P\Phi_{i}\theta_i-\Phi_{i}\theta_i</math>, and add it to form new bases matrix<math>\Phi_{i+1}=[\Phi_{i}:\phi_{i+1}]</math>;
:* N represents the number of iterations till convergence.
:* ":" means juxtaposing matrices or vectors.
 
==Discussion and analysis==
There are two principal types of basis construction methods.  
 
The first type of methods are reward-sensitive, like Krylov and BEBFs, they dilate the reward function geometrically through transition matrix. However, when discount factor <math>\gamma</math> approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by Chebyshev polynomial bound.<ref name=krylov>M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs.In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007.</ref> To solve this problem, some methods, like BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when <math>\gamma</math> becomes large.
 
Another one {{Clarify|date=December 2011}} is reward-insensitive proto value basis function derived from graph Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze.<ref name=krylov />
 
==See also==
 
* [[Dynamic programming]]
* [[Bellman equation]]
* [[Optimal control]]
 
==References==
 
{{Reflist}}
 
== External links ==
* [http://www-anw.cs.umass.edu/index.shtml] UMASS ALL lab
 
[[Category:Optimal decisions]]
[[Category:Dynamic programming]]
[[Category:Stochastic control]]

Revision as of 16:34, 22 January 2013

Template:Multiple issues

Automatic basis function construction (or basis discovery) is the method of looking for a set of task-independent basis functions that map the state space to a lower-dimensional embedding, while still representing the value function accurately. Automatic basis construction is independent of prior knowledge of the domain, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.

Motivation

In reinforcement learning (RL), most real-world Markov Decision Process (MDP) problems have large or continuous state spaces, which typically require some sort of approximation to be represented efficiently.

Linear function approximators[1](LFAs) are widely adopted for their low theoretical complexity. Two subproblems needs to be solved for better approximation: weight optimization and basis construction. To solve the second problem, one way is to design special basis functions. Those basis functions work well in specific tasks but are significantly restricted to domains. Thus constructing basis construction functions automatically is preferred for broader applications.Potter or Ceramic Artist Truman Bedell from Rexton, has interests which include ceramics, best property developers in singapore developers in singapore and scrabble. Was especially enthused after visiting Alejandro de Humboldt National Park.

Problem definition

A Markov decision process with finite state space and fixed policy is defined with a 4-tuple s,p,γ,r, which includes the finite state space S=1,2,,s, the reward function r, discount factor γ[0,1), and the transition model P.

Bellman equation is defined as:

v=r+γPv.

When the number of elements in S is small, v is usually maintained as tabular form. While S grows too large for this kind of representation. v is commonly being approximated via a linear combination of basis function Φ=ϕ1,ϕ2,,ϕn,[2] so that we have:

vv^=i=1nθnϕn

Here Φ is a |S|×n matrix in which every row contains a feature vector for corresponding row, θ is a weight vector with n parameters and usually n|s|.

Basis construction looks for ways to automatically construct better basis function Φ which can represent the value function well. A good construction method should have the following characteristics:

  • Small error bounds between the estimate and real value function
  • Form orthogonal basis in the value function space
  • Converge to stationary value function fast

Popular methods

Proto-value basis

In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions.[3]

The normalized graph Laplacian is defined as:

L=ID12WD12

Here W is an adjacency marix which represents the states of fixed policy MDP which forms an undirected graph (N,E). D is a diagonal matrix related to nodes' degrees.

In discrete state space, the adjacency matrix W could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take random walk Laplacian of W.

This spectral framework can be used for value function approximation(VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation, diffusion wavelets are used.[3]

Krylov basis

Krylov basis construction uses the actual transition matrix instead of random walk Laplacian. The assumption of this method is that transition model P and reward r are available.

The vectors in Neumann series are denoted as yi=Pir for all i[0,infty). It shows that Krylov space spanned by y0,y1,,ym1 is enough to represent any value function,[4] and m is the degree of minimal polynomial of (IγP).

Suppose the minimal polynomial is p(A)=1α0i=0m1αi+1Ai, and we have BA=I, the value function can be written as:

v=Br=1α0i=0m1αi+1(IγP)ir=i=0m1αi+1βiyi.[5]
Algorithm Augmented Krylov Method[5]
z1,z2,,zk are top real eigenvectors of P
zk+1:=r
for i:=1:(l+k) do
if i>k+1 then
zi:=Pzi1;
end if
for j:=1:(i1) do
zi:=zi<zj,zi>zj;
end for
if zi0 then
break;
end if
end for
  • k: number of eigenvectors in basis
  • l: total number of vectors

Bellman error basis

Bellman error(or BEBFs) is defined as: ε=r+γPv^v^=r+γPΦθΦθ.

Loosely speaking, Bellman error points towards the optimal value function.[6] The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly.

Algorithm BEBF
stage stage i=1, ϕ1=r;
stage i[2,N]
compute the weight vector θi according to current basis function Φi;
compute new bellman error by ε=r+γPΦiθiΦiθi;
add bellman error to form new basis function: Φi+1=[Φi:ε];
  • N represents the number of iterations till convergence.
  • ":" means juxtaposing matrices or vectors.

Bellman average reward bases

Bellman Average Reward Bases(or BARBs)[7] is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix PP*. Here P* can be calculated by many methods in.[8]

BARBs converges faster than BEBFs and Krylov when γ is close to 1.

Algorithm BARBs
stage stage i=1, P*r;
stage i[2,N]
compute the weight vector θi according to current basis function Φi;
compute new basis: :ϕi+1=rP*r+PΦiθiΦiθi, and add it to form new bases matrixΦi+1=[Φi:ϕi+1];
  • N represents the number of iterations till convergence.
  • ":" means juxtaposing matrices or vectors.

Discussion and analysis

There are two principal types of basis construction methods.

The first type of methods are reward-sensitive, like Krylov and BEBFs, they dilate the reward function geometrically through transition matrix. However, when discount factor γ approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by Chebyshev polynomial bound.[5] To solve this problem, some methods, like BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when γ becomes large.

Another one Template:Clarify is reward-insensitive proto value basis function derived from graph Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze.[5]

See also

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

External links

  • [1] UMASS ALL lab
  1. Keller,Philipp;Mannor,Shie;Precup,Doina. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA.
  2. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction.(1998) MIT Press, Cambridge, MA, chapter 8
  3. 3.0 3.1 Mahadevan,Sridhar;Maggioni,Mauro. (2005) Value function approximation with diffusion wavelets and Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.
  4. Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. American Mathematical Monthly, 105(10):889–899, 1998.
  5. 5.0 5.1 5.2 5.3 M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007 Cite error: Invalid <ref> tag; name "krylov" defined multiple times with different content
  6. R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007.
  7. S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010
  8. William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible markov chains. In Advances in Computational Probability. Kluwer Academic Publishers, 1997.