Merging one-by-many CSV files in Python -
i have output of series of stochastic simulations in form of .csv file this:
run,id,var 1,1,7 1,2,9 1,3,4 2,1,3 2,2,4 2,3,8
etc.
along that, have data file, .csv, formatted so:
id, var2, var3 1,0.89,0.10 2,0.45,0.98 3,0.27,0.05 4,0.98,0.24
note: there values in data file do not appear in simulation file. i'd these ignored.
what i'd write script takes each value id
first .csv file, , finds var2 , var3 , puts together, end like:
run, id, var, var2, var3 1,1,7,0.89,0.10 1,2,9,0.45,0.98 1,3,4,0.27,0.05 2,1,3,0.89,0.10 2,2,4,0.45,0.98 2,3,8,0.27,0.05
any suggestions on way this? confess @ limits of understanding data handling in python. i'd got fair sense of how in sas, i'd prefer keep one-language task can processed single script.
ouput.csv:
run, id, var 1, 1, 7 1, 2, 9 1, 3, 4 2, 1, 3 2, 2, 4 2, 3, 8
data.csv:
id, var2, var3 1, 0.89, 0.10 2, 0.45, 0.98 3, 0.27, 0.05 8, 0.4, 0.5
note if have entries within data.csv, not present in ouput.csv won't affect end result, since while parse output.csv lookup id's know off output.csv, though opposite not true data.csv @ minimun must contain ids output.csv, though can taken care of if need to.
code:
import csv pprint import pprint data = dict([(row['id'], row) row in csv.dictreader(open('data.csv', 'rb'), skipinitialspace = true)]) values = [] row in csv.dictreader(open('output.csv', 'rb'), skipinitialspace = true): values.append(row) values[-1].update(data[row['id']]) >>> pprint(values) [{'id': '1', 'run': '1', 'var': '7', 'var2': '0.89', 'var3': '0.10'}, {'id': '2', 'run': '1', 'var': '9', 'var2': '0.45', 'var3': '0.98'}, {'id': '3', 'run': '1', 'var': '4', 'var2': '0.27', 'var3': '0.05'}, {'id': '1', 'run': '2', 'var': '3', 'var2': '0.89', 'var3': '0.10'}, {'id': '2', 'run': '2', 'var': '4', 'var2': '0.45', 'var3': '0.98'}, {'id': '3', 'run': '2', 'var': '8', 'var2': '0.27', 'var3': '0.05'}] >>>
now save csv file.
fieldnames = ['run', 'id', 'var', 'var2', 'var3'] f = open('combined.csv', 'wb') csvwriter = csv.dictwriter(f, fieldnames = fieldnames) csvwriter.writerow(dict((fn,fn) fn in fieldnames)) # 2.7 has writeheader, cleaner [csvwriter.writerow(row) row in values] f.close() $ cat combined.csv run,id,var,var2,var3 1,1,7,0.89,0.10 1,2,9,0.45,0.98 1,3,4,0.27,0.05 2,1,3,0.89,0.10 2,2,4,0.45,0.98 2,3,8,0.27,0.05
i hope helps.
Comments
Post a Comment