python - Parallel mapping functions in IPython w/ multiple parameters -
i'm trying use ipython's parallel environment , far, it's looking great i'm running problem. lets have function, defined in library
def func(a,b): ... that use when want evaluate on 1 value of , bunch of values of b.
[func(mya, b) b in mylonglist] obviously, real function more complicated essence of matter takes multiple parameters , i'd map on 1 of them. problem map, @dview.parallel, etc. map on arguments.
so lets want answer func(mya, mylonglist). obvious way curry, either w/ functools.partial or
dview.map_sync(lambda b: func(mya, b), mylonglist) however, not work correctly on remote machines. reason when lambda expression pickled, value of mya not included , instead, value of mya local scope on remote machine used. when closures pickled, variables close on don't.
two ways can think of doing work manually construct lists every argument , have map work on of arguments,
dview.map_sync(func, [mya]*len(mylonglist), mylonglist) or horrifically use data default arguments function, forcing pickled:
# can't use lambda here b/c lambdas don't use default arguments :( def parallelfunc(b, mya = mya): return func(mya, b) dview.map_sync(parallelfunc, mylonglist) really, seems horribly contorted when real function takes lot of parameters , more complicated. there idiomatic way of doing this? like
@parallel(mapover='b') def biglongfn(a, b): ... but far know, nothing 'mapover' thing exists. have idea of how implement ... feels basic operation there should exist support want check if i'm missing something.
i can improve bit on batu's answer (which think one, doesn't perhaps document in detail why use options). ipython documentation woefully inadequate on point. function of form:
def myfxn(a,b,c,d): .... return z and stored in file called mylib. lets b,c, , d same during run, write lambda function reduce 1-parameter function.
import mylib mylamfxn=lambda a:mylib.myfxn(a,b,c,d) and want run:
z=dview.map_sync(mylamfxn, iterable_of_a) in dream world, magically work that. however, first you'd error of "mylib not found," because ipcluster processes haven't loaded mylib. make sure ipcluster processes have "mylib" in python path , in correct working directory myfxn, if necessary. need add python code:
dview.execute('import mylib') which runs import mylib command on each process. if try again, you'll error along lines of "global variable b not defined" because while variables in python session, aren't in ipcluster processes. however, python provides method of copying group of variables subprocesses. continuing example above:
mydict=dict(b=b, c=c, d=d) dview.push(mydict) now of subprocesses have access b,c,and d. can run:
z=dview.map_sync(mylamfxn, iterable_of_a) and should work advertised. anyway, i'm new parallel computing python, , found thread useful, thought i'd try explain few of points confused me bit....
the final code be:
import mylib #set parallel processes, start ipcluster command line prior! ipython.parallel import client rc=client() dview=rc[:] #...do stuff iterable_of_a , b,c,d.... mylamfxn=lambda a:mylib.myfxn(a,b,c,d) dview.execute('import mylib') mydict=dict(b=b, c=c, d=d) dview.push(mydict) z=dview.map_sync(mylamfxn, iterable_of_a) this quickest , easiest way make pretty embarrassingly parallel code run parallel in python....
update can use dview push data without loops , use lview (i.e. lview=rc.load_balanced_view(); lview.map(...) actual calculation in load balanced fashion.
Comments
Post a Comment