cuda - Does Alea GPU allow keeping LLVM IR code in the compilation chain? -


nvidia not allow access generated llvm ir in compilation flow of gpu kernel written in cuda c/c++. know if possible if use alea gpu? in other words, alea gpu compilation procedure allows keeping generated optimized/unoptimized llvm ir code?

yes, right, nvidia doesnot show llvm ir, can ptx code. while alea gpu allows access llvm ir in several ways:

method 1

you use workflow-based method code gpu module template, compile template llvm ir module, link llvm irmodule, optionally other ir modules, ptx module. finally, load ptx module gpu worker. while llvm irmodule, can call method dump() print ir code console. or can bitcode byte[].

i suggest read more details here:

  1. workflow-based gpu coding
  2. workflows in detail

the f# this:

let template = cuda {     // define kernel functions or other gpu moudle stuff     let! kernel = <@ fun .... @> |> compiler.definekernel      // return entry pointer module,      // main() function c program     return entry(fun program ->         let worker = program.worker         let kernel = program.apply kernel         let main() = ....         main ) }  let irmodule = compiler.compile(template).irmodule irmodule.dump() // dump ir code  let ptxmodule = compiler.link(irmodule).ptxmodule ptxmodule.dump()  use program = worker.loadprogram(ptxmodule) program.run(...) 

method 2

if using method-based or instance-based way code gpu module, can add event handler llvm ir code generated , ptx generated though alea.cuda.events. code in f# like:

let desktopfolder = environment.getfolderpath(environment.specialfolder.desktop) let (@@) b = path.combine(a, b) events.instance.ircode.add(fun ircode ->     file.writeallbytes(desktopfolder @@ "module.ir", ircode)) events.instance.ptxcode.add(fun ptxcode ->     file.writeallbytes(desktopfolder @@ "module.ptx", ptxcode)) 

extend gpu function using llvm code

finally, there undocumented way, let directly operate on llvm ir code construct functions. done attribute implemented ir building interface. here simple example, accept parameter, , print (in compile-time), , return back:

[<attributeusage(attributetargets.method, allowmultiple = false)>] type identityattribute() =     inherit attribute()      interface icustomcallbuilder         member this.build(ctx, irobject, info, irparams) =             match irobject, irparams             | none, irparam :: [] ->                 // irparam of type irvalue,                 // can llvm native handle, irparam.llvm                 // also, can type irparam.type,                 // of type irtype, again, can llvmtyperef                 // handle irparam.type.llvm                 // can optionally construct llvm instructions here.                 printfn "irparam: %a" irparam                 irparam             | _ -> none  [<identity>] let identity(x:'t) : 't = failwith "this device function, better not call host" 

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -