ruby - Find two phrases from the larger sentence with the least overlap -
i have:
phrase = "will have buy online pass ea play online in perfect condition" phrases = ["its", "perfect condition", "but its", "in perfect condition", "from ea", "buy online pass ea", "to play online in perfect condition", "online", "online pass", "play online in perfect condition", "online its", "ea", "will have buy online pass ea play online in perfect condition", "have buy online pass ea play online in perfect condition", "u", "pass", "to buy online pass ea"]
i find 2 phrases array within 6-10 words limit , have least overlap word-wise...
something like:
result = ["to buy online pass ea", "play online in perfect condition"]
would perfect.. best way it?
split_phrases = phrases.map {|phrase| phrase.split } # find number of words of overlap between 2 word vectors def overlap(p1,p2) s1 = p1.size s2 = p2.size # make p1 longer phrase if s2 > s1 s1,s2 = s2,s1 p1,p2 = p2,p1 end # check if p2 entirely contained in p1 return s2 if p1.each_cons(s2).any? {|p| p == p2} longest_prefix = (s2-1).downto(0).find { |len| p1.first(len) == p2.last(len) } longest_suffix = (s2-1).downto(0).find { |len| p2.first(len) == p1.last(len) } [longest_prefix, longest_suffix].max end def best_two_phrases_with_minimal_overlap(wphrases, minlen=6, maxlen=10) # reject small or large phrases, evaluate every combination, order word overlap scored_pairs = wphrases. select {|p| (minlen..maxlen).include? p.size}. combination(2). map { |pair| [ overlap(*pair), pair ] }. sort_by { |tuple| tuple.first } # consider pairs least word overlap least_overlap = scored_pairs.first.first least_overlap_pairs = scored_pairs. take_while {|tuple| tuple.first == least_overlap }. map {|tuple| tuple.last } # return longest pair minimal overlap least_overlap_pairs.sort_by {|pair| pair.first.size + pair.last.size }.last end puts best_two_phrases_with_minimal_overlap(split_phrases).map{|p| p.join ' '} # play online in perfect condition # buy online pass ea
Comments
Post a Comment