python - Parallel mapping functions in IPython w/ multiple parameters -


i'm trying use ipython's parallel environment , far, it's looking great i'm running problem. lets have function, defined in library

def func(a,b): ... 

that use when want evaluate on 1 value of , bunch of values of b.

[func(mya, b) b in mylonglist] 

obviously, real function more complicated essence of matter takes multiple parameters , i'd map on 1 of them. problem map, @dview.parallel, etc. map on arguments.

so lets want answer func(mya, mylonglist). obvious way curry, either w/ functools.partial or

dview.map_sync(lambda b: func(mya, b), mylonglist) 

however, not work correctly on remote machines. reason when lambda expression pickled, value of mya not included , instead, value of mya local scope on remote machine used. when closures pickled, variables close on don't.

two ways can think of doing work manually construct lists every argument , have map work on of arguments,

dview.map_sync(func, [mya]*len(mylonglist), mylonglist) 

or horrifically use data default arguments function, forcing pickled:

# can't use lambda here b/c lambdas don't use default arguments :( def parallelfunc(b, mya = mya): return func(mya, b) dview.map_sync(parallelfunc, mylonglist) 

really, seems horribly contorted when real function takes lot of parameters , more complicated. there idiomatic way of doing this? like

@parallel(mapover='b') def biglongfn(a, b): ... 

but far know, nothing 'mapover' thing exists. have idea of how implement ... feels basic operation there should exist support want check if i'm missing something.

i can improve bit on batu's answer (which think one, doesn't perhaps document in detail why use options). ipython documentation woefully inadequate on point. function of form:

def myfxn(a,b,c,d): .... return z 

and stored in file called mylib. lets b,c, , d same during run, write lambda function reduce 1-parameter function.

import mylib mylamfxn=lambda a:mylib.myfxn(a,b,c,d) 

and want run:

z=dview.map_sync(mylamfxn, iterable_of_a) 

in dream world, magically work that. however, first you'd error of "mylib not found," because ipcluster processes haven't loaded mylib. make sure ipcluster processes have "mylib" in python path , in correct working directory myfxn, if necessary. need add python code:

dview.execute('import mylib') 

which runs import mylib command on each process. if try again, you'll error along lines of "global variable b not defined" because while variables in python session, aren't in ipcluster processes. however, python provides method of copying group of variables subprocesses. continuing example above:

mydict=dict(b=b, c=c, d=d) dview.push(mydict) 

now of subprocesses have access b,c,and d. can run:

z=dview.map_sync(mylamfxn, iterable_of_a) 

and should work advertised. anyway, i'm new parallel computing python, , found thread useful, thought i'd try explain few of points confused me bit....

the final code be:

import mylib #set parallel processes, start ipcluster command line prior! ipython.parallel import client rc=client() dview=rc[:] #...do stuff iterable_of_a , b,c,d.... mylamfxn=lambda a:mylib.myfxn(a,b,c,d) dview.execute('import mylib') mydict=dict(b=b, c=c, d=d) dview.push(mydict) z=dview.map_sync(mylamfxn, iterable_of_a) 

this quickest , easiest way make pretty embarrassingly parallel code run parallel in python....

update can use dview push data without loops , use lview (i.e. lview=rc.load_balanced_view(); lview.map(...) actual calculation in load balanced fashion.


Comments

Popular posts from this blog

javascript - backbone.js Collection.add() doesn't `construct` (`initialize`) an object -

c++ - Accessing inactive union member and undefined behavior? -

php - Get uncommon values from two or more arrays -