python - Better way to remove statistical outliers than this? -
this code works. can't feel it's hack, "offset" part. had put in there because otherwise index values in deletes shifted 1 every time del operation.
# remove outliers > devs # of std deviations devs = 1 deletes = [] num, duration in enumerate(durations): if (duration > (mean_duration + (devs * std_dev_one_test))) or \ (duration < (mean_duration - (devs * std_dev_one_test))): deletes.append(num) offset = 0 delete in deletes: del durations[delete - offset] del dates[delete - offset] offset += 1
ideas on how make better?
build list of keepers iterate on list:
def iskeeper( duration ): if (duration > (mean_duration + (devs * std_dev_one_test))) or \ (duration < (mean_duration - (devs * std_dev_one_test))): return false return true durations = [duration duration in durations if iskeeper(duration)]
Comments
Post a Comment