summaryrefslogtreecommitdiff
path: root/gnuradio-core/src/examples/volk_benchmark/README
blob: 516fc15bdd48af2edb8c25998c0e53e039bb1ad1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
VOLK Benchmarking Scripts

The Python programs in this directory are designed to help benchmark
and compare Volk enhancements to GNU Radio. There are two kinds of
scripts here: collecting data and displaying the data.

Data collection is done by running a Volk testing script that will
populate a SQLite database file (volk_results.db by default). The
plotting utility provided here reads from the database files and plots
bar graphs to compare the different installations.

These benchmarks can be used to compare previous versions of GNU
Radio to using Volk; they can be used to compare different Volk
proto-kernels, as well, by editing the volk_config file; or they could
be used to compare performance between different machines and/or
processors.


======================================================================
Volk Profiling

Before doing any kind of Volk benchmarking, it is important to run the
volk_profile program. The profiler will build a config file for the
best SIMD architecture for your processor. Run volk_profile that is
installed into $PREFIX/bin. This program tests all known Volk kernels
for each proto-kernel supported by the processor. When finished, it
will write to $HOME/.volk/volk_config the best architecture for the
VOLK function. This file is read when using a function to know the
best version of the function to execute.

The volk_config file contains a line for each kernel, where each line
looks like:

    volk_<KERNEL_NAME> <ARCHITECTURE>

The architecture will be something like (sse, sse2, sse3, avx, neon,
etc.), depending on your processor.


======================================================================
Benchmark Tests

There are currently two benchmark scripts defined for collecting
data. There is one that runs through the type conversions that have
been converted to Volk (volk_types.py) and the other runs through the
math operators converted to using Volk (volk_math.py).

Script prototypes
Both have the same structure for use:

----------------------------------------------------------------------
./volk_<test>.py [-h] -L LABEL [-D DATABASE] [-N NITEMS] [-I ITERATIONS]
                    [--tests [{0,1,2,3} [{0,1,2,3} ...]]] [--list]
                    [--all]

optional arguments:
  -h, --help            show this help message and exit
  -L LABEL, --label LABEL
                        Label of database table [default: None]
  -D DATABASE, --database DATABASE
                        Database file to store data in [default:
                        volk_results.db]
  -N NITEMS, --nitems NITEMS
                        Number of items per iterations [default: 1000000000.0]
  -I ITERATIONS, --iterations ITERATIONS
                        Number of iterations [default: 20]
  --tests [{0,1,2,3} [{0,1,2,3} ...]]
                        A list of tests to run; can be a single test or a
                        space-separated list.
  --list                List the available tests
  --all                 Run all tests
----------------------------------------------------------------------

To run, you specify the tests to run and a label to store along with
the results. To find out what the available tests are, use the
'--list' option.

To specify a subset of tests, use the '--tests' with space-separated
list of tests numbers (e.g., --tests 0 2 4 9).

Use the '--all' to run all tests.

The label specified is used as an identifier for the benchmarking
currently being done. This is required as it is important in
organizing the data in the database (each label is its own
table). Usually, the label will specify the type of run being done,
such as "volk_aligned" or "v3_5_1". In these cases, the "volk_aligned"
label says that this is for a benchmarking using the GNU Radio version
that uses the aligned scheduler and Volk calls in the work
functions. The "v3_5_1" label is if you were benchmarking an installed
version 3.5.1 of GNU Radio, which is pre-Volk. These will then be
plotted against each other to see the timing differences.

The 'database' option will output the results to a new database
file. This can be useful for separating the output of different runs
or of different benchmarks, such as the types versus the math scripts,
say, or to distinguish results from different computers.

If rerun using the same database and label, the entries in the table
will simply be replaced by the new results.

It is often useful to use the 'sqlitebrowser' program to interrogate
the database file farther, if you are interested in the structure or
the raw data.

Other parameters of this script set the number of items to process and
number of iterations to use when computing the benchmarking
data. These default to 1 billion samples per iteration over 20
iterations. Expect a default run to take a long time. Using the '-N'
and '-I' options can be used to change the runtime of the benchmarks
but are set high to remove problems of variance between iterations.

======================================================================
Plotting Results

The volk_plot.py script reads a given database file and plots the
results. The default behavior is to read all of the labels stored in
the database and plot them as data sets on a bar graph. This shows the
average time taken to process the number of items given.

The options for the plotting script are:

usage: volk_plot.py [-h] [-D DATABASE] [-E] [-P {mean,min,max}] [-% table]

Plot Volk performance results from a SQLite database. Run one of the volk
tests first (e.g, volk_math.py)

----------------------------------------------------------------------
optional arguments:
  -h, --help            show this help message and exit
  -D DATABASE, --database DATABASE
                        Database file to read data from [default:
                        volk_results.db]
  -E, --errorbars       Show error bars (1 standard dev.)
  -P {mean,min,max}, --plot {mean,min,max}
                        Set the type of plot to produce [default: mean]
  -% table, --percent table
                        Show percent difference to the given type [default:
                        None]
----------------------------------------------------------------------

This script allows you to specify the database used (-D), but will
always read all rows from all tables from it and display them. You can
also turn on plotting error bars (1 standard deviation the mean). Be
careful, though, as some older versions of Matplotlib might have an
issue with this option.

The mean time is only one possible statistic that we might be
interested in when looking at the data. It represents the average user
experience when running a given block. On the other hand, the minimum
runtime best represents the actual performance of a block given
minimal OS interruptions while running. Right now, the data collected
includes the mean, variance, min, and max over the number of
iterations given. Using the '-P' option, you can specify the type of
data to plot (mean, min, or max).

Another useful way of looking at the data is to compare the percent
improvement of a benchmark compared to another. This is done using the
'-%' option with the provided table (or label) as the baseline. So if
we were interested in comparing how much the 'volk_aligned' was over
'v3_5_1', we would specify '-% v3_5_1' to see this. The plot would
then only show the percent speedup observed using Volk for each of the
blocks.


======================================================================
Benchmarking Walkthrough

This will walk through an example of benchmarking the new Volk
implementation versus the pre-Volk GNU Radio. It also shows how to
look at the SIMD optimized versions versus the generic
implementations.

Since we introduced Volk in GNU Radio 3.5.2, we will use the following
labels for our data:

   1.) volk_aligned: v3.5.2 with volk_profile results in .volk/volk_config
   2.) v3_5_2: v3.5.2 with the generic (non-SIMD) calls to Volk
   3.) v3_5_1: an installation of GNU Radio from version v3.5.1

We assume that we have installed two versions of GNU Radio.

   v3.5.2 installed into /opt/gr-3_5_2
   v3.5.1 installed into /opt/gr-3_5_1

To test cases 1 and 2 above, we have to run GNU Radio from the v3.5.2
installation, so we set the following environmental variables. Note
that this is written for Ubuntu 11.10. These commands and directories
may have to be changed depending on your OS and versions.

    export LD_LIBRARY_PATH=/opt/gr-3_5_2/lib
    export LD_LIBRARY_PATH=/opt/gr-3_5_2/lib/python2.7/dist-packages

Now we can run the benchmark tests, so we will focus on the math
operators:

    ./volk_math.py -D volk_results_math.db --all -L volk_aligned

When this finishes, the 'volk_results_math.db' will contain our
results for this run.

We next want to run the generic, non-SIMD, calls. This can be done by
changing the Volk kernel settings in $HOME/.volk/volk_config. First,
make a backup of this file. Then edit it and change all architecture
calls (sse, sse2, etc.) to 'generic.' Now, Volk will only call the
generic versions of these functions. So we rerun the benchmark with:

    ./volk_math.py -D volk_results_math.db --all -L v3_5_2

Notice that the only thing changed here was the label to 'v3_5_2'.

Next, we want to collect data for the non-Volk version of GNU
Radio. This is important because some internals to GNU Radio were made
when adding support for Volk, so it is nice to know what the
differences do to our performance. First, we set the environmental
variables to point to the v3.5.1 installation:

    export LD_LIBRARY_PATH=/opt/gr-3_5_1/lib
    export LD_LIBRARY_PATH=/opt/gr-3_5_1/lib/python2.7/dist-packages

And when we run the test, we use the same command line, but the GNU
Radio libraries and Python files used come from v3.5.1. We also change
the label to indicate the different version to store.
    
    ./volk_math.py -D volk_results_math.db --all -L v3_5_1

We now have a database populated with three tables for the three
different labels. We can plot them all together by simply running:

    ./volk_plot.py -D volk_results_math.db

This will show the average run times for each of the three
configurations for all math functions tested. We might also be
interested to see the difference in performance from the v3.5.1
version, so we can run:

    ./volk_plot.py -D volk_results_math.db -% v3_5_1

That will plot both the 'volk_aligned' and 'v3_5_2' as a percentage
improvement over v3_5_1. A positive value indicates that this version
runs faster than the v3.5.1 version.


----------------------------------------------------------------------

Another interesting test case could be to compare results on different
processors. So if you have different generation Intels, AMD, or
whatever, you can simply pass the .db file around and run the Volk
benchmark script to populate the database with different results. For
this, you would specify a label like '-L i7_2620M' that indicates the
processor type to uniquely ID the data.