Confusing Behaviour of regex in Python -

i'm trying match specific pattern using re module in python. wish match full sentence (more correctly alphanumeric string sequences separated spaces and/or punctuation)

eg.

"this regular sentence."
"this valid"
"so one"

i'm tried out of various combinations of regular expressions unable grasp working of patterns properly, each expression giving me different yet inexplicable result (i admit beginner, still).

i'm tried:

"((\w+)(\s?))*"

to best of knowledge should match 1 or more alpha alphanumerics greedily followed either 1 or no white-space character , should match entire pattern greedily. not seems do, wrong know why. (i expected return entire sentence result) result first sample string mentioned above [('sentence', 'sentence', ''), ('', '', ''), ('', '', ''), ('', '', '')].
"(\w+ ?)*"

i'm not sure how 1 should work. official documentation(python help('re')) says ,+,? match x or x (greedy) repetitions of preceding re. in such case space preceding re '?' or '\w+ ' preceding re? , re '' operator? output ['sentence'].
others such "(\w+\s?)+)" ; "((\w*)(\s??)) etc. variation of same idea sentence set of alpha numerics followed single/finite number of white spaces , pattern repeated on , over.

can tell me go wrong , why, , why above expressions not work way expecting them to?

p.s got "[ \w]+" work me cannot limit number of white-space characters in continuation.

your reasoning regex correct, problem coming using capturing groups *. here's alternative:

>>> s="this regular sentence." >>> import re >>> re.findall(r'\w+\s?', s) ['this ', 'is ', 'a ', 'regular ', 'sentence']

in case might make more sense use \b in order match word boundries.

>>> re.findall(r'\w+\b', s) ['this', 'is', 'a', 'regular', 'sentence']

alternatively can match entire sentence via re.match , use re.group(0) whole match:

>>> r = r"((\w+)(\s?))*" >>> s = "this regular sentence." >>> import re >>> m = re.match(r, s) >>> m.group(0) 'this regular sentence'

Search This Blog

Brayton

Confusing Behaviour of regex in Python -

Comments

Post a Comment

Popular posts from this blog

javascript - backbone.js Collection.add() doesn't `construct` (`initialize`) an object -

c++ - Accessing inactive union member and undefined behavior? -

php - Get uncommon values from two or more arrays -