Using mprof for time profiling (The Mercury User’s Guide)

To display the graph profile information gathered from one or more profiling runs, just type ‘mprof’ or ‘mprof -c’. (For programs built with ‘--high-level-code’, you need to also pass the ‘--no-demangle’ option to ‘mprof’ as well.) Note that ‘mprof’ can take quite a while to execute (especially with ‘-c’), and will usually produce quite a lot of output, so you will usually want to redirect the output into a file with a command such as ‘mprof > mprof.out’.

The output of ‘mprof -c’ consists of three major sections. These are named the call graph profile, the flat profile and the alphabetic listing. The output of ‘mprof’ contains the flat profile and the alphabetic listing only.

The call graph profile presents the local call graph of each procedure. For each procedure it shows the parents (callers) and children (callees) of that procedure, and shows the execution time and call counts for each parent and child. It is sorted on the total amount of time spent in the procedure and all of its descendants (i.e. all of the procedures that it calls, directly or indirectly.)

The flat profile presents the just execution time spent in each procedure. It does not count the time spent in descendants of a procedure.

The alphabetic listing just lists the procedures in alphabetical order, along with their index number in the call graph profile, so that you can quickly find the entry for a particular procedure in the call graph profile.

The profiler works by interrupting the program at frequent intervals, and each time recording the currently active procedure and its caller. It uses these counts to determine the proportion of the total time spent in each procedure. This means that the figures calculated for these times are only a statistical approximation to the real values, and so they should be treated with some caution. In particular, if the profiler’s assumption that calls to a procedure from different callers have roughly similar costs is not true, the graph profile can be quite misleading.

The time spent in a procedure and its descendants is calculated by propagating the times up the call graph, assuming that each call to a procedure from a particular caller takes the same amount of time. This assumption is usually reasonable, but again the results should be treated with caution. (The deep profiler does not make such an assumption, and hence its output is significantly more reliable.)

Note that any time spent in a C function (e.g. time spent in ‘GC_malloc()’, which does memory allocation and garbage collection) is credited to the Mercury procedure that called that C function.

                                  called/total       parents
index  %time    self descendants  called+self    name           index
                                  called/total       children

                                                     <spontaneous>
[1]    100.0    0.00        0.75       0         call_engine_label [1]
                0.00        0.75       1/1           do_interpreter [3]

-----------------------------------------------

                0.00        0.75       1/1           do_interpreter [3]
[2]    100.0    0.00        0.75       1         io.run/0(0) [2]
                0.00        0.00       1/1           io.init_state/2(0) [11]
                0.00        0.74       1/1           main/2(0) [4]

-----------------------------------------------

                0.00        0.75       1/1           call_engine_label [1]
[3]    100.0    0.00        0.75       1         do_interpreter [3]
                0.00        0.75       1/1           io.run/0(0) [2]

-----------------------------------------------

                0.00        0.74       1/1           io.run/0(0) [2]
[4]     99.9    0.00        0.74       1         main/2(0) [4]
                0.00        0.74       1/1           sort/2(0) [5]
                0.00        0.00       1/1           print_list/3(0) [16]
                0.00        0.00       1/10          io.write_string/3(0) [18]

-----------------------------------------------

                0.00        0.74       1/1           main/2(0) [4]
[5]     99.9    0.00        0.74       1         sort/2(0) [5]
                0.05        0.65       1/1           list.perm/2(0) [6]
                0.00        0.09   40320/40320       sorted/1(0) [10]

-----------------------------------------------

                                       8             list.perm/2(0) [6]
                0.05        0.65       1/1           sort/2(0) [5]
[6]     86.6    0.05        0.65       1+8      list.perm/2(0) [6]
                0.00        0.60    5914/5914        list.insert/3(2) [7]
                                       8             list.perm/2(0) [6]

-----------------------------------------------

                0.00        0.60    5914/5914        list.perm/2(0) [6]
[7]     80.0    0.00        0.60    5914         list.insert/3(2) [7]
                0.60        0.60    5914/5914        list.delete/3(3) [8]

-----------------------------------------------

                                   40319             list.delete/3(3) [8]
                0.60        0.60    5914/5914        list.insert/3(2) [7]
[8]     80.0    0.60        0.60    5914+40319  list.delete/3(3) [8]
                                   40319             list.delete/3(3) [8]

-----------------------------------------------

                0.00        0.00       3/69283       tree234.set/4(0) [15]
                0.09        0.09   69280/69283       sorted/1(0) [10]
[9]     13.3    0.10        0.10   69283         compare/3(0) [9]
                0.00        0.00       3/3           __Compare___io__stream/0(0) [20]
                0.00        0.00   69280/69280       builtin_compare_int/3(0) [27]

-----------------------------------------------

                0.00        0.09   40320/40320       sort/2(0) [5]
[10]    13.3    0.00        0.09   40320         sorted/1(0) [10]
                0.09        0.09   69280/69283       compare/3(0) [9]

-----------------------------------------------

The first entry is ‘call_engine_label’ and its parent is ‘<spontaneous>’, meaning that it is the root of the call graph. (The first three entries, ‘call_engine_label’, ‘do_interpreter’, and ‘io.run/0’ are all part of the Mercury runtime; ‘main/2’ is the entry point to the user’s program.)

Each entry of the call graph profile consists of three sections, the parent procedures, the current procedure and the children procedures.

The predicate or function names are not just followed by their arity but also by their mode in brackets. A mode of zero corresponds to the first mode declaration of that predicate in the source code. For example, ‘list.delete/3(3)’ corresponds to the ‘(out, out, in)’ mode of ‘list.delete/3’.

Now for the parent and child procedures the self and descendant time have slightly different meanings. For the parent procedures the self and descendant time represent the proportion of the current procedure’s self and descendant time due to that parent. These times are obtained using the assumption that each call contributes equally to the total time of the current procedure.

8.4 Using mprof for time profiling