diff options
Diffstat (limited to 'docs/doxygen/other/metadata.dox')
-rw-r--r-- | docs/doxygen/other/metadata.dox | 330 |
1 files changed, 330 insertions, 0 deletions
diff --git a/docs/doxygen/other/metadata.dox b/docs/doxygen/other/metadata.dox new file mode 100644 index 000000000..f867a06f2 --- /dev/null +++ b/docs/doxygen/other/metadata.dox @@ -0,0 +1,330 @@ +/*! \page page_metadata Metadata Information + +\section Introduction + +Metadata files have extra information in the form of headers that +carry metadata about the samples in the file. Raw, binary files carry +no extra information and must be handled delicately. Any changes in +the system state such as sample rate or if a receiver's frequency are +not conveyed with the data in the file itself. Header of metadata +solve this problem. + +We write metadata files using gr::blocks::file_meta_sink and read metadata +files using gr::blocks::file_meta_source. + +Metadata files have headers that carry information about a segment of +data within the file. The header structure is described in detail in +the next section. A metadata file always starts with a header that +describes the basic structure of the data. It contains information +about the item size, data type, if it's complex, the sample rate of +the segment, the time stamp of the first sample of the segment, and +information regarding the header size and segment size. + +Headers have two main tags associated with them: + +- rx_rate: the sample rate of the stream. +- rx_time: the time stamp of the first item in the segment. + +These tags were inspired by the UHD tag format. + +The header gives enough information to process and handle the +data. One cautionary note, though, is that the data type should never +change within a file. There should be very little need for this, but +more importantly. GNU Radio blocks can only set the data type of their +IO signatures in the constructor, so changes in the data type +afterward will not be recognized. + +We also have an extra header segment that is option. This can be +loaded up at the beginning by the user specifying some extra metadata +that should be transmitted along with the data. It also grows whenever +it sees a stream tag, so the dictionary will contain and key:value +pairs out of tags from the flowgraph. + + +\subsection types Types of Metadata Files + +GNU Radio currently supports two types of metadata files: + +- inline: headers are inline with the data in the same file. +- detached: headers are in a separate header file from the data. + +The inline method is the standard version. When a detached header is +used, the headers are simply inserted back-to-back in the detached +header file. The dat file, then, is the standard raw binary format +with no interruptions in the data. + + +\subsection updating Updating Headers + +While there is always a header that starts a metadata file, they are +updated throughout as well. There are two events that trigger a new +header. We define a segment as the unit of data associated with the +last header. + +The first event that will trigger a new header is when enough samples +have been written for the given segment. This number is defined as the +maximum segment size and is a parameter we pass to the +file_meta_sink. It defaults to 1 million items (items, not +bytes). When that number of items is reached, a new header is +generated and a new segment is started. This makes it easier for us to +manipulate the data later and helps protect against catastrophic data +loss. + +The second event to trigger a new segment is if a new tag is +observed. If the tag is a standard tag in the header, the header value +is updated, the header and current extras are written to file, and the +segment begins again. If a tag from the extras is seen, the value +associated with that tag is updated; and if a new tag is seen, a new +key:value pair are added to the extras dictionary. + +When new tags are seen, we generate a new segment so that we make sure +that all samples in that segment are defined by the header. If the +sample rate changes, we create a new segment where all of the new +samples are at that new rate. Also, in the case of UHD devices, if a +segment loss is observed, it will generate a new timestamp as a tag of +'rx_time'. We create a new file segment that reflects this change to +keep the sample times exact. + + +\subsection implementation Implementation + +Metadata files are created using gr::blocks::file_meta_sink. The +default behavior is to create a single file with inline headers as +metadata. An option can be set to switch to detached header mode. + +Metadata file are read into a flowgraph using +gr::blocks::file_meta_source. This source reads a metadata file, +inline by default with a settable option to use detached headers. The +data from the segments is converted into a standard streaming +output. The 'rx_rate' and 'rx_time' and all key:value pairs in the +extra header are converted into tags and added to the stream tags +interface. + + +\section structure Structure + +The file metadata consists of a static mandatory header and a dynamic +optional extras header. Each header is a separate PMT +dictionary. Headers are created by building a PMT dictionary +(pmt::pmt_make_dict) of key:value pairs, then the dictionary is +serialized into a string to be written to file. The header is always +the same length that is predetermined by the version of the header +(this must be known already). The header will then indicate if there +is an extra data to be extracted as a separate serialized dictionary. + +To work with the PMTs for creating and extracting header information, +we use PMT operators. For example, we create a simplified version of +the header in C++ like this: + +\code + using namespace pmt; + const char METADATA_VERSION = 0x0; + pmt_t header; + header = pmt_make_dict(); + header = pmt_dict_add(header, mp("version"), mp(METADATA_VERSION)); + header = pmt_dict_add(header, mp("rx_rate"), mp(samp_rate)); + std::string hdr_str = pmt_serialize_str(header); +\endcode + +The call to pmt::pmt_dict_add adds a new key:value pair to the +dictionary. Notice that it both takes and returns the 'header' +variable. This is because we are actually creating a new dictionary +with this function, so we just assign it to the same variable. + +The 'mp' functions are convenience functions provided by the PMT +library. They interpret the data type of the value being inserted and +call the correct 'pmt_from_xxx' function. For more direct control over +the data type, see PMT functions in pmt.h, such as +pmt::pmt_from_uint64 or pmt::pmt_from_double. + +We finish this off by using pmt::pmt_serialize_str to convert the PMT +dictionary into a specialized string format that makes it easy to +write to a file. + +The header is always METADATA_HEADER_SIZE bytes long and a metadata +file always starts with a header. So to extract the header from a +file, we need to read in this many bytes from the beginning of the +file and deserialize it. An important note about this is that the +deserialize function must operate on a std::string. The serialized +format of a dictionary contains null characters, so normal C character +arrays (e.g., 'char *s') get confused. + +Assuming that 'std::string str' contains the full string as read from +a file, we can access the dictionary in C++ like this: + +\code + pmt_t hdr = pmt_deserialize_str(str); + if(pmt_dict_has_key(hdr, pmt_string_to_symbol("strt"))) { + pmt_t r = pmt_dict_ref(hdr, pmt_string_to_symbol("strt"), PMT_NIL); + uint64_t seg_start = pmt_to_uint64(r); + uint64_t extra_len = seg_start - METADATA_HEADER_SIZE; + } +\endcode + +This example first deserializes the string into a PMT dictionary +again. This will throw an error if the string is malformed and cannot +be deserialized correctly. We then want to get access to the item with +key 'strt'. As the next subsection will show, this value indicates at +which byte the data segment starts. We first check to make sure that +this key exists in the dictionary. If not, our header does not contain +the correct information and we might want to handle this as an error. + +Assuming the header is properly formatted, we then get the particular +item referenced by the key 'strt'. This is a uint64_t, so we use the +PMT function to extract and convert this value properly. We now know +if we have an extra header in the file by looking at the difference +between 'seg_start' and the static header size, +METADATA_HEADER_SIZE. If the 'extra_len' is greater than 0, we know we +have an extra header that we can process. Moreover, this also tells us +the size of the serialized PMT dictionary in bytes, so we can easily +read this many bytes from the file. We can then deserialize and parse +this header just like the first. + + +\subsection header Header Information + +The header is a PMT dictionary with a known structure. This structure +may change, but we version the headers, so all headers of version X +must be the same length and structure. As of now, we only have version +0 headers, which look like the following: + +- version: (char) version number (usually set to METADATA_VERSION) +- rx_rate: (double) Stream's sample rate +- rx_time: (pmt::pmt_t pair - (uint64_t, double)) Time stamp (format from UHD) +- size: (int) item size in bytes - reflects vector length if any. +- type: (int) data type (enum below) +- cplx: (bool) true if data is complex +- strt: (uint64_t) start of data relative to current header +- bytes: (uint64_t) size of following data segment in bytes + +The data types are indicated by an integer value from the following +enumeration type: + +\code +enum gr_file_types { + GR_FILE_BYTE=0, + GR_FILE_CHAR=0, + GR_FILE_SHORT=1, + GR_FILE_INT, + GR_FILE_LONG, + GR_FILE_LONG_LONG, + GR_FILE_FLOAT, + GR_FILE_DOUBLE, +}; +\endcode + +\subsection extras Extras Information + +The extras section is an optional segment of the header. If 'strt' == +METADATA_HEADER_SIZE, then there is no extras. Otherwise, it is simply +a PMT dictionary of key:value pairs. The extras header can contain +anything and can grow while a program is running. + +We can insert extra data into the header at the beginning if we +wish. All we need to do is use the pmt::pmt_dict_add function to insert +our hand-made metadata. This can be useful to add our own markers and +information. + +The main role of the extras header, though, is as a container to hold +any stream tags. When a stream tag is observed coming in, the tag's +key and value are added to the dictionary. Like a standard dictionary, +any time a key already exists, the value will be updated. If the key +does not exist, a new entry is created and the new key:value pair are +added together. So any new tags that the file metadata sink sees will +add to the dictionary. It is therefore important to always check the +'strt' value of the header to see if the length of the extras +dictionary has changed at all. + +When reading out data from the extras, we do not necessarily know the +data type of the PMT value. The key is always a PMT symbol, but the +value can be any other PMT type. There are PMT functions that allow us +to query the PMT to test if it is a particular type. We also have the +ability to do pmt::pmt_print on any PMT object to print it to +screen. Before converting from a PMT to it's natural data type, it is +necessary to know the data type. + + +\section Utilities + +GNU Radio comes with a couple of utilities to help in debugging and +manipulating metadata files. There is a general parser in Python that +will convert the PMT header and extra header into Python +dictionaries. This utility is: + +- gr-blocks/python/parse_file_metadata.py + +This program is installed into the Python directory under the +'gnuradio' module, so it can be accessed with: + +\code +from gnuradio.blocks import parse_file_metadata +\endcode + +It defines HEADER_LENGTH as the static length of the metadata header +size. It also has dictionaries that can be used to convert from the +file type to a string (ftype_to_string) and one to convert from the +file type to the size of the data type in bytes (ftype_to_size). + +The 'parse_header' takes in a PMT dictionary, parses it, and returns a +Python dictionary. An optional 'VERBOSE' bool can be set to print the +information to standard out. + +The 'parse_extra_dict' is similar in that it converts from a PMT +dictionary to a Python dictionary. The values are kept in their PMT +format since we do not necessarily know the native data type. + +A program called 'gr_read_file_metadata' is installed into the path +and can be used to read out all header information from a metadata +file. This program is just called with the file name as the first +command-line argument. An option '-D' will handle detached header +files where the file of headers is expected to be the file name of the +data with '.hdr' appended to it. + + +\section Examples + +Examples are located in: + +- gr-blocks/examples/metadata + +Currently, there are a few GRC example programs. + +- file_metadata_sink: create a metadata file from UHD samples. +- file_metadata_source: read the metadata file as input to a simple graph. +- file_metadata_vector_sink: create a metadata file from UHD samples. +- file_metadata_vector_source: read the metadata file as input to a simple graph. + +The file sink example can be switched to use a signal source instead +of a UHD source, but no extra tagged data is used in this mode. + +The file source example pushes the data stream to a new raw file while +a tag debugger block prints out any tags observed in the metedata +file. A QT GUI time sink is used to look at the signal as well. + +The versions with 'vector' in the name are similar except they use +vectors of data. + +The following shows a simple way of creating extra metadata for a +metadata file. This example is just showing how we can insert a date +into the metadata to keep track of later. The date in this case is +encoded as a vector of uint16 with [day, month, year]. + +\code + from gruel import pmt + from gnuradio import blocks + + key = pmt.pmt_intern("date") + val = pmt.pmt_init_u16vector(3, [13,12,2012]) + + extras = pmt.pmt_make_dict() + extras = pmt.pmt_dict_add(extras, key, val) + extras_str = pmt.pmt_serialize_str(extras) + self.sink = blocks.file_meta_sink(gr.sizeof_gr_complex, + "/tmp/metadat_file.out", + samp_rate, 1, + blocks.GR_FILE_FLOAT, True, + 1000000, extra_str, False) + +\endcode + +*/ |