| |
| |
| |
|
Page: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Comments:
<0> mmmmmm SOAP <1> "dogmeat" at 69.109.250.210 pasted "how to decypher html a tag src refernces" (23 lines, 778B) at http://sial.org/pbot/15991 <2> for a web page how can i figure out if the url tag is relative or absolute? is there a module that does this? i have some perl code that looks at all the img HTML tags and tries to resolve the URL that can be used to download the data. it is used to scrape images off the web. problem is that some urls are relative to the current base url path, or relative to the url domain. web browsers work properly. here's what i use http://sial.org/pbot/15991 <3> hey lakez, I have an idea <3> it involves doing icky things, but sometimes icky things are necessary <4> eval: print SOCK_STREAM; <5> sdakota: Return: <4> hm <4> eval: var_dump(*SOCK_STREAM); <5> sdakota: Error: Can't call method "charset_encode" without a package or object reference at /usr/local/share/perl/5.8.7/Bot/BasicBot.pm line 1463. <4> ugh. <3> sdakota: it's defined in Socket.pm <4> eval: var_dump(SOCK_STREAM); <5> sdakota: Error: Can't use string ("SOCK_STREAM") as a HASH ref while "strict refs" in use at /usr/local/share/perl/5.8.7/Bot/BasicBot.pm line 1463. <4> oh <4> okay
<4> and too in IO::Socket? <3> eval: use Socket qw( SOCK_STREAM ); +SOCK_STREAM <5> japhy: Error: Undefined subroutine &Socket::SOCK_STREAM called at (eval 125) line 1. <4> oh, seems to work <3> eval: use Socket qw( SOCK_STREAM ); print SOCK_STREAM <5> japhy: Error: Undefined subroutine &Socket::SOCK_STREAM called at (eval 125) line 1. <4> Hey, it seems to WORK! <4> eek <3> erm, yeah. <3> although the evalbot says otherwise <6> japhy, what's the idea? <3> lakez: $client->transport returns the SOAP::Transport object being used by the client <3> in your case, that's a SOAP::Transport::TCP::Client object <3> that object is simply a hashref of key-value pairs that get sent to the IO::Socket cl*** when a socket needs to be created <3> if you don't mind doing some heinous things, you can do: <3> $client->transport->{LocalAddr} = '...'; <6> sweet, i'll try that right now <6> i knew there was something like that <6> trying <4> I gtg :) <4> Thanks for your help, especially you, japhy <4> Bye :) <7> *sigh* I wish I hadn't missed the lectures on this stuff <2> can I use goto label to make my program flow better? <2> or gosub would be best <6> japhy, very slick... know something I think it's working.. I haven't got an error so i'm just checking netstat and all that <6> active box <8> dogmeat: you can use goto, but I suggest you reevaluate your reasons for doing so. <7> goto is very slow. <3> gosub is very basic <7> dogmeat: I think you want 'sub's <7> (Remember those basics where the entire source text is searched for the label? Perl approximates that) <2> if i use a sub routine.. looks like i could return 0 for fail, 1 for success <7> ... <2> thing is the code does a bunch of stuff <2> so its hard to summarize by true or false return value <7> "" is also false. and undef is also false. and 'return;' is false in scalar and list context! <2> integral: thanks <3> lakez: it might be ->transport->{LocalAddr}, or it might be ->transport->{_proxy}{LocalAddr} <6> japhy, damn it's not taking the IP <3> I'm not sure <3> try the {_proxy} in the middle <6> ok i'll try the proxy <9> dogmeat: m[^/] is absolute, m/^\w/ is relative. <9> dogmeat: actually, ^/ is absolute, everything else is relative. <7> Dear Linux, stop paging/swapping out sshd. <6> using "proxy"... won't even attempt <3> $client->transport->{_proxy}{LocalAddr} = '192.168.3.33'; <3> when I use that (on a box whose IP is 192.168.3.34) I get the IO::Socket::INET: Cannot ***ign requested address at cli.pl line 11 <3> and when I change that to 192.168.3.34, it works <3> so I'm pretty sure that's the way to do it. <9> japhy: what do you mean by "works" ? <10> hey, how does one retrieve the arguments p***ed to the perl script ? <7> StAnLeY^: perldoc perlvar # look for @ARGV <11> The perldoc for perlvar - Perl predefined variables is at http://perldoc.perl.org/perlvar.html <3> tag - the client connects, and the server reponds <2> tag: this page, for example uses relative links. but when used relatively, is a dead link. however the page still renders correctly because it takes the relative and rewrites the url wrt base url. http://www.artandartifact.com/artifact/Categories_100/Decorative_1AD/Item_Tissue-And-Treasure-Tower_PM0402_ps_cti-1AD.html <12> dogmeat's url is at http://xrl.us/j6kn <7> or read your tutorial properly, StAnLeY^ <13> tag, z0mg! <13> hi tag
<9> Hi. <10> integral: it is not in the beginner tutorial :) <9> dogmeat: you mean for the images? <2> yes <9> dogmeat: the links themselves are fully qualified URLs <2> what does that mean? <10> integral: thank you <9> dogmeat: http://... <9> dogmeat: The images are properly relative, there is no rewriting going on <9> dogmeat: the path, as returned in the request, is / <9> dogmeat: meaning the rest of the request is actually path information...Not the path itself. <2> tag: is this true for all web sites? <2> dont think so <9> no <2> what is the differnce then? <9> only websites that use programs that rely on path information <9> dogmeat: The webserver resolved your request as / <9> and told your web browser about it <9> So if you make a request, it will resolve the path as /, you need to adjust for that. <2> then it is rewriting on the webserver... but the request is relative to the end of the path. how can i adjust for this? <9> only on this page, some CGIs and things I've written in the past (as well as apache perl modules, etc) use this technique. It's documented in the HTTP RFCs. <9> no <9> it is NOT rewriting <9> Your request is not for /artifact/Categories_100/Decorative_1AD/Item_Tissue-And-Treasure-Tower_PM0402_ps_cti-1AD.html <9> its for /, with artifact/Categories_100/Decorative_1AD/Item_Tissue-And-Treasure-Tower_PM0402_ps_cti-1AD.html being the path information <9> (extra crap p***ed the resolved path) <2> tag: this is only true of img tags? <9> no <9> this is true for all things. <2> how can i get the true url path then? <9> what client are you using? <2> do i need to write a bunch of perl logic? <9> what client are you using? <2> what do you mean client? <9> as in, which http library <9> such as libwww-perl, or HTTP::Lite, etc. <9> or HTTP::GHTTP, or, whatever it might be <2> im looking sorry <2> LWP::UserAgent <9> LWP (also know as libwww-perl) <2> libwww-perl ok <3> lakez: it works for me <6> japhy, mine just hangs <9> Okay, well, you should be able to extract it from the response. $response->base should provide the URL object representing the base <3> lakez: want me to give you a simple set of test programs so you can see? <6> my $Search = SOAP::Lite->service("file:api.wsdl"); <6> $Search->transport->{_proxy}{LocalAddr} = "$ip"; <6> sure <2> ok, so the URL would be $response->base . $elm->attr("src"); ? <9> dogmeat: I think you need to stuff a "/" between those, but yes <9> perl -MLWP -le 'print LWP::UserAgent->new->get("http://www.artandartifact.com/artifact/Categories_100/Decorative_1AD/Item_Tissue-And-Treasure-Tower_PM0402_ps_cti-1AD.html")->base' <12> tag's url is at http://xrl.us/j6kz <9> http://www.artandartifact.com <3> lakez: are you using TCP or not? <3> lakez: what's your proxy? <9> smccoy@sludge:~> perl -MLWP -le 'print LWP::UserAgent->new->get("http://www.blisted.org/dl/index.xml")->base' <9> http://www.blisted.org/dl/index.xml <6> using TCP <6> can you show me your code? <9> dogmeat: as you can see by the two examples, the base URI changed relatively to how the request was handled. <3> yeah, hang on <2> tag: ok trying ... thanks <6> thx man <14> cool, ran out of disk space due to error log filling up due to an infinite loop of uninitialized var warnings <3> lakez: http://feather.perl6.nl/~japhy/lakez/ <14> on the one hand, I am annoyed that my co-workers seem to think ahving those warnings is ok. on the other, I wonder if they would have ever noticed the loop if it weren't for the warnings, :) <3> check those out. see if they run on your machine, and specifically set an IP address in client (one that your machine is allowed to go outbound on) <3> I've got a meeting to get to, so lakez, if something goes wrong, /msg me <6> ok thanks japhy <6> that rocks, i'll let you know what i get <2> tag: seemed to work OK for this page <9> dogmeat: you know, there is something else you could use that might make your task easier <9> WWW::Mechanize
Return to
#perl or Go to some related
logs:
google install yum godaddy
ubuntu dmx6fire ubuntu subprocess pre-removal script returned error
#debian Ubuntu not connecting ubuntu apt-get gzcat #python #web warning: connect to transport spamfilter: connection refused #proftpd
|
|