Confusing Behaviour of regex in Python -


i'm trying match specific pattern using re module in python. wish match full sentence (more correctly alphanumeric string sequences separated spaces and/or punctuation)

eg.

  • "this regular sentence."
  • "this valid"
  • "so one"

i'm tried out of various combinations of regular expressions unable grasp working of patterns properly, each expression giving me different yet inexplicable result (i admit beginner, still).


i'm tried:

  • "((\w+)(\s?))*"

    to best of knowledge should match 1 or more alpha alphanumerics greedily followed either 1 or no white-space character , should match entire pattern greedily. not seems do, wrong know why. (i expected return entire sentence result) result first sample string mentioned above [('sentence', 'sentence', ''), ('', '', ''), ('', '', ''), ('', '', '')].

  • "(\w+ ?)*"

    i'm not sure how 1 should work. official documentation(python help('re')) says ,+,? match x or x (greedy) repetitions of preceding re. in such case space preceding re '?' or '\w+ ' preceding re? , re '' operator? output ['sentence'].

  • others such "(\w+\s?)+)" ; "((\w*)(\s??)) etc. variation of same idea sentence set of alpha numerics followed single/finite number of white spaces , pattern repeated on , over.

can tell me go wrong , why, , why above expressions not work way expecting them to?


p.s got "[ \w]+" work me cannot limit number of white-space characters in continuation.

your reasoning regex correct, problem coming using capturing groups *. here's alternative:

>>> s="this regular sentence." >>> import re >>> re.findall(r'\w+\s?', s) ['this ', 'is ', 'a ', 'regular ', 'sentence'] 

in case might make more sense use \b in order match word boundries.

>>> re.findall(r'\w+\b', s) ['this', 'is', 'a', 'regular', 'sentence'] 

alternatively can match entire sentence via re.match , use re.group(0) whole match:

>>> r = r"((\w+)(\s?))*" >>> s = "this regular sentence." >>> import re >>> m = re.match(r, s) >>> m.group(0) 'this regular sentence' 

Comments

Popular posts from this blog

javascript - backbone.js Collection.add() doesn't `construct` (`initialize`) an object -

c++ - Accessing inactive union member and undefined behavior? -

php - Get uncommon values from two or more arrays -