| |
| |
| |
|
Page: 1 2
Comments:
<0> below average first year AI student ... just writing a simple genetic algorithm for the partition problem :p <1> can you not deduce the (now fourth) line from the first 3? the 5th is the important constraint <0> cwenner, yeah since B and C are disjoints and their union forms A it is deduceble that same goes for their sum. But I like to keep it in for clearity. <2> LinkAgent: please diagram: But I like to keep it in for clearity. <3> [S But [NP I NP] [VP like [S [VP to [VP keep [NP it NP] [PRT in PRT] [PP for [NP clearity NP] PP] VP] VP] S] VP] . S <0> fancy. <0> LinkAgent: please diagram: This is fancy. <3> [S are [NP we NP] [NP [NP [NP still talking NP] about [VP heard VP] NP] [SBAR [WHNP that WHNP] [S [NP gorillas NP] [VP find [NP repetitive humans NP] VP] S] SBAR] NP] [ADVP ***ually ADVP] attractive S <0> trane, what does it do? <2> eek <2> concurrency issues... <2> LinkAgent please diagram: this is fancy <3> [S [NP this NP] [VP is [NP fancy NP] VP] S <4> cwenner: It's interesting, that for unbounded RL domains, such as the maze world, the timeout threshold impacts the convergence rate. If set too high in small mazes, or set too low is large spaces, it can take forever for the agent to converge. <4> I'm now setting it as a ratio of the maze size, but this feels like cheating, since I'm taking advantage of domain knowledge. <1> cera1: perhaps you could have that parameter change according to the timeout, i.e. a form of search of the optimal lambda
<5> cwenner: heh, just tested my agent against a changing maze. Look at http://i3.tinypic.com/4gnh7r4.png and try to guess when I changed the maze :) <1> what did you make the graph in? what was the new shortest path, i think you should try to see if it goes down more on the second maze <1> but that's the results that you could see in other tests as well so that seems consistent and right <5> I made the graph with PyChart. <5> What do you mean by "new shortest path"? As far as I can see, it never finds one. It's timing out for most episodes after the change. <5> yeah, it looks like a neural net trained via TD-learning handles environmental changes very poorly. <5> cwenner: Have you read any RL papers on the maze domain using neural networks? <1> ceran: no, do you know any? <5> not many. I'm currently reading http://www.itk.ilstu.edu/faculty/portegys/research/meta-maze/meta-maze.pdf <1> but i'm not so sure it's for just ANNs, i think it's rather general for RL (i.e. all algorithms for such domains) <5> That paper you showed me the other day presented a couple algorithms that coping with changes very well... <1> i think those work rather bad as well, arbitrary exploration. what if you add those things to your net? <5> hmm, you mean increasing the amount of exploration based on the rate of error? <1> i didn't phrase that very well <1> there are a few different strategies to try and take care of that re-exploration for dynamic environments. the paper you're referring to is "the two facets of e-e"? <5> yes, that's the one <5> both DAE and RBE cope far better with changes than my system <1> i don't doubt that seeing what they are but could you not try to add that to your ANN? to evaluate your TD(l)-ANN on the basis of RB-DAE is of course a bit unfair <5> try to add what? <1> try a RBE/DAE or RB-DAE-trained ANN? <1> what was the two algorithms you had read about that was better than TD(l)? <1> you mentioned them sometime, not to me <5> The problem is that those methods are difficult to use with a neural net. They're designed to be used in toy domains where the state can be represented as a discrete number. <5> For example, RBE stands for recency based exploration, which works by recording the last time a state was explored, and weighting future explorations towards states that haven't been explored in a while. <5> I have no idea how I'd represent that kind of temporal data in a neural net. <5> And if there is a way, I doubt it's feasible with FANN. <1> i thought the output of your net was V/Q-hat, in that case, why cannot you add the RBE term? <5> The output of my net is the expected reward estimate (V) for each action A, given the input of state S. How would I translate RBE into a value estimate? <1> Q(s_t, a_t) = E[V(s_{t+1}) | s_t, a_t] (or in the case of deterministic actions, Q(s,a) = V(s') ). Do you have A or S' as input? In the first case, does it not learn the Q function? <1> or sorry, forgot the reward for state t+1 <1> so + r <5> cwenner: I have no idea what that is you just wrote. <5> ascii is not kind to algebraic notation ;) <5> The current update function I'm using is V(s-1) = (1-a)*V(s-1) + a*[r0 + y*V(s0)] <5> where my notation V(s-1) == V(s subscript t subscript 1) <5> sorry, I meant V(s subscript(t-1)) <1> sorry. the Q(s, a) is the expected (possibly discounted) total reward from state s given taht you take action a in s and then follow some policy pi. V(s) is the expected (possibly discounted) total reward from state s given taht you follow some policy pi. If you know V and the the transition function you can determine Q. I don't know the right name for the transition function, the function P(s' | s, a). in the deter <1> ministic case, the successor function s' = f(s, a). <1> Q(s_t, a_t) = E[ sum_{k=1}^inf gamma^{k-1} r_{t+k} | s_t, a_t] = E[r_{t+1} + gamma * V(s_{t+1}) | s_t, a_t ] (i try to write in latex by the way so you could get a nice formula quite easily) <6> hello all <7> /msg NickServ IDENTIFY 75262sam <8> oops. <8> Might want to change _all_ your p***words now. nice to know you. <1> your net either returns a value for each (s,a) or for each successor s' = f(s,a). in the first case, ti already learns the q function and it should be okay. in the later case, you need to rewrite Q(s,t) = r' + gamma * V(s') to V(s') = (Q(s't) - r') / gamma. <1> since they use some Q' = Q + R => V' = (Q + R - r') / gamma = V + R / gamma. i think that should do it at least <7> hello all ...again <7> anyone know any good tutorials for prolog? <8> Skaman_Sam: Did you get my earlier comment? You messaged the entire channel your freenode p***word. <8> #prolog <7> i got it <5> cwenner: A policy is something that, given the state, returns the action. My neural network IS the policy. Given the state, it outputs the expected value for each possible action, the highest action being the one chosen (in the greedy scenerio). <5> cwenner: What does the E[...] notation mean? Is it showing up right? http://www.mathbin.net/7855 <1> it's used by the policy which is some function operating on the net's approximated V: \hat{V}. in either case, the above relation (Q=r'+gamma*V(s'), deterministic case) holds if you know the transition function. <5> cwenner: I don't know the transition function. That would be cheating. If it knew the transition function, it would just parse the maze like an A* search and be done with it. <5> btw, if you have any further latex, please feel free to use the latex pastebin <1> you don't have to know the transition function between episodes though. are not your N/W/E/S actions deterministic? <5> yes, the actions are deterministic <1> and the maze does not change until after the episode? <1> last reply, http://www.mathbin.net/7855 <1> sorry, i left out the \ for sum and gamam <1> it needs to be E[ \sum_{k=1}^{inf} \gamma * ...
<1> i wrote E sum_{k=1}^inf gamma * <1> i will be more careful <1> what is the latex pastebin? is it different from mathbin? <5> no <1> what is it that the agent does not know in your problem that would not allow you to use A*? <1> would allow* <5> nothing <5> With the information I give the NN, I could easily solve the problem with A*. <5> I'm just curious if I can solve it with a NN. <5> Unfortunately, it seems rather impractical for this particular domain. <5> btw, I still don't know what that E[ ... ] notation means. <1> you could also have intermediate states (which would be pretty much like learning the q-function), where the current state s has a successor s' which is given deterministically, then a second move, a so called chance move is done by the environment on s' to produce a new successor s''. <1> in this case, s' could be for instance "was standing at (3,4) and is moving left", if you in fact managed to move left or not is revealed in s'' <1> in the case when s'' is given deterministically, it allows the net to have some additional information but otherwise just like giving s and a <5> well, I have to leave <1> E [ x ] : expected e <1> i.e. the mean x <5> thanks <5> goodnight <1> in RL, they usually write E_pi [ X ], the expected X given that you follow a policy pi <1> you're cerin as well, are you not? <5> yeah, that formula's basically identical to what I pasted earlier. I don't see how it helps me. <5> current value = next reward + gamma * next value <1> it tried to show the relation between Q and V and the V' given by RBE in the fashion of Q' <1> sorry, i'm worthless at explaining, what i meant to say is, just add the R(s,a) term to the net output <1> (times 1/gamma) <1> (but since you have a constant already= <5> but you didn't even use R(s,a) in your equation... <5> you can't formula R(s,a) in terms of the Q, or V. It's a separate function. <9> ceran early bedtime <5> It's nearly 10pm here. And now I really have to go. 'night <9> lightweight! <1> R(s,a) from RBE, which you calculate in a different manner. is that the problem, how to calculate R(s,a) ? <1> okay sorry, have fun <9> actually maybe I'll go to bed early too <9> just for fun <10> anyone seen dmiles? <11> not recently <11> (2 weeks agor or so maybe) <11> [05:03] -NickServ- Last Seen: 1 day (4h 10m 13s) ago <11> ah ;) <12> How hard would it be to analyze a Go board and decide instantly what strategy is deficient out of Connection/Separation, Life/Death, High/Low, Thick/Light, etc <12> I believe the core of a good Go AI is to determine the best strategic improvement that can be made on the current state; reduce the possible moves to those that affect this strategy; eliminate the moves that negatively impact the strategy; and analyze the remaining moves for impact on other strategies <13> gonna go, cya <14> hello people <15> hi <14> busy? <15> lol, no not at the moment. just chatting and browsing. <14> horar...regarding the tsearch2, it seems that its only applicable for linux <15> been programming 16x7 for 3 months straight and a couple of days ago found i simply had to take a break. <15> dang. and you're on windows. <14> im always using windows <15> i would have thought that all the extensions for postgresql would work under windows too. anyway there are other ways around it. <15> it is easy to implement full text searching using triggers, just not quite as fast as when it's native is all. <15> anyway, since i would imagine there will be a lot of batch processing involved, that kind of indexing won't be essential. <14> although i donwloaded already the latest version of postgreSQL but the problem is, it doesn't install in windows ME <14> i really hate the windows ME <14> thats the only license OS that I had <15> is there any chance that you could give linux a try? you can run it off a cd without altering your hard drive if necessary. <14> sometimes i hate linux because i found difficulty in configuring and installing application, it always give me a head ache of file depencies <14> hehe <15> it's a lot easier than it used to be. <14> ok <15> once upon a time it was very frustrating. <15> nowadays i use debian and even complex configurations are very easy to accomplish. <15> anyways, postgresql needs at least windows xp to run on windows. <14> i see, i had only redhat, bayanihan, fedora installation <14> no debian <15> lol, i've never liked redhat although i used it for a year or two. <15> i think they deliberately made it hard to get people to pay them money for support. <15> are you able to download and burn a cd? <14> hehe <14> yap... <15> there is a knoppix cd for debian that you can use to boot your computer and run linux without having to install anything.
Return to
#ai or Go to some related
logs:
#gentoo centos 5 +reiser support tcsetattr: Invalid argument #linux did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA smtp auth Ubuntu xorg change max resolution nzbget ubuntu compile glibcxx slackware ntpd daylight saving +gentoo #gentoo
|
|