Original and TV regularized image.
The CUDA implementation of both methods, when running on GeForce 9400M, is 4 times faster than the CPU version; and 15 times faster when running on GeForce GTX 285.
I must confess that I'm a little disappointed, I expected a ~100x speed-up with the GTX 285. Probably the speed-up is proportional to the coding skills :(
There is no stop condition for the algorithms other than the number of iterations, but the Zhu and Chan algorithm exhibits faster convergence than Chambolle's.
One of the most remarkable performance boosts came from the use of structures of arrays. Basically instead of storing a vector valued image as [ (u0,v0),(u1,v1),(u2,v2), .... ] , storing two arrays [ u0,u1,u2, .... ] and [ v0,v1,v2, .... ] favors the coalesced memory access.
Download the code. Tested on OSX and Linux.
No comments:
Post a Comment