Something he mentions, but doesn't go into to detail on, that we've had luck with is LTTng. We've started adding LTTng tracepoints into various modules. They're relatively lightweight and provide a common place from multiple processes to submit events (key for us). It also means we can take a "production image" and enable tracing to start getting data without having special binaries.
It's not up and running yet, but we'll be doing per-image application startup times for all the applications in a default Ubuntu Phone image. That way we can watch for regressions/improvements over time. The data will be captured using LTTng tracepoints.
I keep meaning to write a blog post about this. Soon, I swear :-)