= GUID Tutorial for OML 2.10 =

[[TOC(heading=General Documentation, General/*, depth=1)]]

In this tutorial we discuss the use of the new GUID data type (oml_guid_t) which is introduced in OML 2.10. The name GUID is an acronym for "globally unique ID". Within OML we use GUIDs to group together a collection of related measurements and/or to uniquely identify a particular measurement (e.g. to provide a unique identity for a BLOB). To illustrate these uses we shall consider a simple RF monitor application which does some simple signal processing and records results using OML.

== Using GUIDs to group data within a measurement stream ==
In this example we assume that a software radio or other source is providing us with a time series of quadrature (I/Q) samples. The purpose of the RFMon example application is to collect a number of samples, perform an FFT on them and identify the peaks within the frequency domain. For every peak we discover we record information about it using OML. Lets take a look at the application template for RFMon:

{{{
 1# simple rfmon application configuration template
 2#
 3defApplication('omf:app:rfmon', 'rfmon') do |app|
 4
 5  app.version(1,0)
 6  app.shortDescription = "RF monitor" 
 7  app.description = %{'rfmon' is an application that identifies peaks in the RF spectrum.}
 8
 9  app.defMeasurement("peaks") do |peak|
10    peak.defMetric('freq', :int32)
11    peak.defMetric('amplitude', :int32)
12    peak.defMetric('sample_id', :guid)
13  end
14
15end
}}}

Here we define a single "peak" measurement point (at lines 9-13) to represent a peak in the frequency domain for the captured samples. Each measurement records the relative frequency and amplitude of the peak. There maybe many different peaks in the frequency domain output produced by the FFT and so we assign a sample ID so that we can group all of the different measurements that occurred in the same FFT together. The following fragment of the C code for this application to shows how we generate a unique GUID and assign it to each of the related measurements.

{{{
 1void
 2run(opts_t* opts, oml_mps_t* oml_mps)
 3{
 4  for(;;) {
 5    /* perform peak detection */
 6    _Complex samples[N_SAMPLES];
 7    read_samples(N_SAMPLES, samples);
 8    int32_t bins[N_BINS];
 9    fft(N_SAMPLES, samples, bins);
10    struct peak peaks[N_BINS];
11    size_t n_peaks = identify_peaks(N_BINS, bins, peaks);
12    /* record results */
13    if(n_peaks > 0) {
14      size_t i;
15      oml_guid_t id = omlc_guid_generate();
16      for(i = 0; i < n_peaks; i++) {
17        oml_inject_peaks(oml_mps->peaks, peaks[i].freq, peaks[i].ampl, id);
18      }
19    }
20  }
21}
}}}

This code is performing the peak extraction in lines 6-11 that populates a struct peak array. Lines 12-18 are responsible for injecting the resulting collection of peak values to OML. At line 13 the id is defined and initialized to the return value of omlc_guid_generate().

{{{
   oml_guid_t id = omlc_guid_generate();
}}}

This has the effect of generating a new unique GUID value which is then used across a number of subsequent calls to oml_inject_peaks() in lines 15-17. Every measurement injected will share the same value so identifying them as all belonging to the same group. We can see this by executing a query against the database:

{{{
 1rfmon=# SELECT * FROM RFMON_PEAKS;
 2 id  | oml_sender_id | oml_seq | oml_ts_client | oml_ts_server |  freq  | amplitude |      sample_id       
 3-----+---------------+---------+---------------+---------------+--------+-----------+----------------------
 4   1 |             1 |       1 |    4714.25265 |   4714.430439 | -87467 |       -29 |  1354744092159542987
 5   2 |             1 |       2 |   4714.252666 |    4714.43091 | -74973 |       -33 |  1354744092159542987
 6   3 |             1 |       3 |   4714.252668 |   4714.431136 | -49957 |       -42 |  1354744092159542987
 7   4 |             1 |       4 |   4714.252669 |   4714.431338 | -36089 |       -33 |  1354744092159542987
 8   5 |             1 |       5 |   4714.252671 |   4714.431536 | -30001 |       -26 |  1354744092159542987
 9   6 |             1 |       6 |   4714.252672 |   4714.431739 | -12488 |       -43 |  1354744092159542987
10   7 |             1 |       7 |   4714.252677 |   4714.431935 |  -6085 |       -35 |  1354744092159542987
11   8 |             1 |       8 |   4714.252678 |   4714.432135 |     13 |        39 |  1354744092159542987
12   9 |             1 |       9 |   4714.252679 |   4714.432331 |  12490 |       -43 |  1354744092159542987
13  10 |             1 |      10 |   4714.252681 |   4714.432527 |  18647 |       -33 |  1354744092159542987
14  11 |             1 |      11 |   4714.252684 |   4714.432742 |  49961 |       -52 |  1354744092159542987
15  12 |             1 |      12 |   4715.253405 |   4715.253506 | -87483 |       -26 |  1179636728160628604
16  13 |             1 |      13 |   4715.253415 |   4715.264028 | -74968 |       -38 |  1179636728160628604
17  14 |             1 |      14 |   4715.253417 |   4715.264234 | -49988 |       -44 |  1179636728160628604
18  15 |             1 |      15 |   4715.253418 |   4715.264433 | -36108 |       -30 |  1179636728160628604
19  16 |             1 |      16 |   4715.253419 |   4715.264631 | -30021 |       -25 |  1179636728160628604
20  17 |             1 |      17 |   4715.253421 |    4715.26485 | -12528 |       -42 |  1179636728160628604
21  ...
}}}

== Using GUIDs to associate different measurement streams ==

We can demonstrate how GUIDs can link different measurements by extending the RFMon application. As it stands we record the fact that a peak is discovered but we do not keep the samples from which we extracted the peak. Lets modify the application to have a second measurement point that will record the samples and related information.

{{{
 1# simple rfmon application configuration
 2#
 3defApplication('omf:app:rfmon', 'rfmon') do |app|
 4
 5  app.version(1,1)
 6  app.shortDescription = "RF monitor" 
 7  app.description = %{'rfmon' is an application that identifies and records peaks in the RF spectrum.}
 8
 9  app.defMeasurement("peaks") do |peak|
10    peak.defMetric('freq', 'int32')
11    peak.defMetric('amplitude', 'int32')
12    peak.defMetric('sample_id', 'guid')
13  end
14
15  app.defMeasurement("samples") do |sample|
16    sample.defMetric('sample_id', 'guid')
17    sample.defMetric('device', 'string')
18    sample.defMetric('center_freq', 'int64')
19    sample.defMetric('sample', 'blob')
20  end
21
22end
}}}

In the listing above we have added a new sample measurement point to record our samples. This consists of an sample_id field of type GUID that identifies this measurement. The measurement also contains a number of additional fields that describe the device was captured from, the center frequency of the capture device and finally the samples themselves stored as a BLOB. The peak and sample measurements are linked by the GUID - for any sample_id in the peak measurements there will be a sample entry whose sample_id has that same value. The relationship between the peak and sample is, as we'd expect, a many-to-one relationship. So what difference does it make to the code? The listing below shows a revised version of RFMon which injects both the sample and the peaks.

{{{
 1void
 2run(opts_t* opts, oml_mps_t* oml_mps)
 3{
 4  for(;;) {
 5    /* perform peak detection */
 6    _Complex samples[N_SAMPLES];
 7    read_samples(N_SAMPLES, samples);
 8    int32_t bins[N_BINS];
 9    fft(N_SAMPLES, samples, bins);
10    struct peak peaks[N_BINS];
11    size_t n_peaks = identify_peaks(N_BINS, bins, peaks);
12    /* record results */
13    if(n_peaks > 0) {
14      size_t i;
15      oml_guid_t id = omlc_guid_generate();
16      oml_inject_samples(oml_mps->samples, id, "n210", 434075000, samples, N_SAMPLES * sizeof(_Complex));
17      for(i = 0; i < n_peaks; i++) {
18        oml_inject_peaks(oml_mps->peaks, peaks[i].freq, peaks[i].ampl, id);
19      }
20    }
21  }
22}
23
}}}

The only difference in the two programs is that we've inserted a calll to oml_inject_samples() at line 16. This will record the actual sample. Note that we record the sample before we record the peaks that refer to it. This ensures that the sample_id for any entry in the rfmon_peaks table will refer to a valid entry in the rfmon_samples table. Now that we have two sets of measurements linked by GUID we can use it to perform database operations that link the tables. The following query, for example, lists the peaks in terms of their absolute frequency by adding the relative frequency of the rfmon_peaks table to the center_freq of the rfmon_samples.

{{{
 1rfmon=# select rfmon_peaks.freq + rfmon_samples.center_freq from rfmon_peaks, rfmon_samples where rfmon_peaks.sample_id = rfmon_samples.sample_id;
 2 ?column?  
 3-----------
 4 433987533
 5 434000027
 6 434025043
 7 434038911
 8 434044999
 9 434062512
10 434068915
11 434075013
12 434087490
13 434093647
14 434124961
15 433987517
16 434000032
17 434025012
18 434038892
19 434044979
20 434062472
21 434068894
22 434075043
23 434087461
24 ...
}}}

== Advanced issues ==
With the basic GUID mechanism we can link groups of measurements and link different kinds of measurements together. In this section we'll discuss some of the more advanced concepts: NULL valued GUIDs and the possibility of collisions.

=== Null or "non-existent" GUID values ===
It maybe the case that we wish to record a "NULL" GUID to indicate that there is nothing else associated with a given measurement. This is straightforward and achieved by assigning the GUID to OMLC_GUID_NULL as shown here:

{{{
   oml_guid_t id = OMLC_GUID_NULL;
}}}

Where a GUID is injected with this null value the database back-end will write a NULL-valued database entry.

=== Collisions ===
Internally the GUID is represented as a number in the range [1, 2^64^). The mechanism to create GUIDs is probabilistically unique and this means that after collecting more than 2^32^ GUIDs have been created the probability of the same GUID being assigned twice is P(.5). The likelihood of collision is, therefore, very small even for large data sets but it is possible that GUIDs will be created that have previously been issued and it is worth bearing this in mind when very large data sets are in use.