A bit of the algorithm is described here:
More specifically Grapl defines a type of identity called a Session - this is an ID that is valid for a time, such as a PID on every major OS.
Sessions are tracked or otherwise guessed based on logs, such as process creation or termination logs. Because Grapl assumes that logs will be dropped or come out of order/ extremely delayed it makes the effort to "guess" at identities. It's been quite accurate in my experience but the algorithm has many areas for improvement - it's a bit naive right now.
Happy to answer more questions about it though.
Based on what you seem to be interested I'd like to recommend CloudMapper by Scott Piper.
I tried running cloudmapper but I think I would need to replace the backend with a graph database and scrap the UI parts. We've got hundreds of AWS accounts and I'm having trouble just getting it to process all the resources in one of them.
Glad I could help.