Perl Slurp Regex Capture -
using perl have "slurped" in large file contains text below , trying capture regex $1
matches within file given regex. regex
=~ /((get|put|post|connect).*?(content-type: (image\/jpeg)))/sgm
currently text in bold being captured, however, last capture treating lines
"get /~sgtatham/putty/latest/x86/pscp.exe http/1.1" "content-type: text/html; charset=iso-8859-1"
as part of last capture , should not b/c "text/html" not equal regex capture of (image\/jpeg)
. want able capture last capture without the
"get /~sgtatham/putty/latest/x86/pscp.exe http/1.1" "content-type: text/html; charset=iso-8859-1" being included.
appreciate help, thank you.
**get /~sgtatham/putty/latest/x86/pscp.exe http/1.1 host: the.earth.li user-agent: mozilla/5.0 (macintosh; intel mac os x 10.6; rv:13.0) gecko/20100101 firefox/13.0 accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 accept-language: en-us,en;q=0.5 accept-encoding: gzip, deflate connection: keep-alive content-type: text/html; charset=iso-8859-1 <!doctype html public "-//ietf//dtd html 2.0//en"> <html><head> \.+" /~sgtatham/putty/0.62/x86/pscp.exe http/1.1 host: the.earth.li user-agent: mozilla/5.0 (macintosh; intel mac os x 10.6; rv:13.0) gecko/20100101 firefox/13.0 accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 accept-language: en-us,en;q=0.5 content-length: 315392 keep-alive: timeout=15, max=99 connection: keep-alive content-type: image/jpeg** platform: digital engagement platform; version: 1.1.0.0
you can easy (?!pattern)
, it's negative look-ahead assertion. recap read article positive examples of positive , negative lookahead (ourcraft.wordpress.com)
regular expression
$text =~ / ( # start capture (?:get|put|post|connect) # start phrase (?: (?!get|put|post|connect) # make sure we'havent these phrase . # accept character )*? # number of times (not greedy) content-type:\simage\/jpeg # end phrase ) # end capture /msx; print $1;
all occurrences
while($text =~ m/regexp/msxg) { print $1; }
output
get /~sgtatham/putty/0.62/x86/pscp.exe http/1.1 host: the.earth.li user-agent: mozilla/5.0 (macintosh; intel mac os x 10.6; rv:13.0) gecko/20100101 firefox/13.0 accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 accept-language: en-us,en;q=0.5 content-length: 315392 keep-alive: timeout=15, max=99 connection: keep-alive content-type: image/jpeg
Comments
Post a Comment