Copyright Brian Starkey 2012-2014
Valid HTML 4.01 Transitional

Valid CSS!
iconbin2xml - A Generic Binary File Parsermax, min, close
10th March 2013
STRaND-1 Decoded Telemetry DataSTRaND-1 Decoded Telemetry Data
Description(top)

bin2xml is an attempt at a completely generic, configureable file parser, mainly aimed at binary input data. It uses an XML definition file to attempt to decode an input file into "packets", then wraps those up in XML structures for output. Optionally, it can use an XSL stylesheet to transform the XML before it gets output. By default, a generic "default.xsl" and "default.css" style combination will be used, which generates the red tables shown on this page.

The definition file contains one or more "templates", which define the data layout for a type of data packet. bin2xml will try each one of these templates in turn (in the order they appear in the template file), and if one of the templates matches, then the data will be parsed into it. A populated XML structure is then generated, which is the same as the template used, but fully populated with the input data. The format for the template file is described in full in writing_templates.txt.

bin2xml is written in Python, so it isn't going to break any records for sheer speed, however the advantage is it was easy to write (taking a few days) and is incredibly flexible.

    ./bin2xml.py -h
    usage: Process the binary input file according to the packet formats defined in the XML file
           [-h] [-i input_file] [-o output_file] [-s stylesheet] [-v] [-d]
           [-t --transform]
           template

           positional arguments:
               template        the XML format file to use

           optional arguments:
               -h, --help      show this help message and exit
               -i input_file   the binary file to process
               -o output_file  XML file to write parsed data to
               -s stylesheet   XSL stylesheet to use for the output
               -v              Increase output
               -d              Debug
               -t --transform  Perform an XSLT transform on the data before outputting
                

I think the functionality is best demonstrated by example, so the remainder of this page is devoted to usage and output examples. The code is available from the navigation bar on the left.

Broadly speaking, a bin2xml command will look something like this:
./bin2xml.py format_file.xml -i binary_data.log -t -o pretty_output.html


STRaND-1 Beacon Packets(top)

The University of Surrey and SSTL recently launched a nanosatellite called STRaND-1, which carries a Google Nexus One as one of its payloads. The satellite (at the time of writing) is broadcasting a telemetry beacon which can be picked up by amateurs according to the details here. The broadcast is in KISS/HDLC packets, which are not human-readable, so bin2xml can be used to decode them.

The first thing is to write a format file, which describes the layout of the data. Here is a section of the one I'm using for STRaND, written using the telemetry format spreadsheet:

<?xml version="1.0" ?>

    <formats name="STRaND-1 Beacon Packets">
        
        <template name="OBC Beacon TM">
            <field name="tnc_flag" length="2">
                0xC0 0x00
            </field>
            <field name="hdlc_flag" length="3">
                0xDB 0xDC 0x80
            </field>
            <field name="counter" length="1" format='B'/>
            <field name="length" length="1" format="B"/>
            <field name="id" length="1" format="B">
                0x02
            </field>
            <field name="node_addr" length="1" format="B" preferhex="yes">
                <value name="EPS">0x2C</value>
                <value name="BATTERY">0x2D</value>
                <value name="SWITCH_BOARD">0x66</value>
            </field>
            <field name="node_channel" length="1" format="B" preferhex="yes"/>
            <field name="data_size" length="1" format="B"/>
            <field name="data" length="$data_size" format="B"/>
            <field name="tnc_tail" length="1">
                0xC0
            </field>
        </template>

        <!-- more templates go here for the other packet types -->

    </formats>

This describes the OBC beacon packet, and you can see that some fields have defined data (such as the tnc_flag and tnc_tail) and others do not. These will be filled, if the input data matches the packet format. Also note that the data field is variable length, taking its length from the data_size field.

Given the binary input data:
'\xc0\x00\xdb\xdc\x80T\t\x02f\xac\x05\x00\x00\x00\x00\x00\xc0'
bin2xml will generate the following XML:

<packet name="OBC Beacon TM" length="17" numfields="10">
    <field name="tnc_flag" length="2" format="c">0xc0 0x0 </field>
    <field name="hdlc_flag" length="3" format="c">0xdb 0xdc 0x80 </field>
    <field name="counter" length="1" format="B">[84]</field>
    <field name="length" length="1" format="B">[9]</field>
    <field name="id" length="1" format="B">[2]</field>
    <field name="node_addr" length="1" format="B">0x66 = SWITCH_BOARD</field>
    <field name="node_channel" length="1" format="B">0xac</field>
    <field name="data_size" length="1" format="B">[5]</field>
    <field name="data" length="5" format="B">[0, 0, 0, 0, 0]</field>
    <field name="tnc_tail" length="1" format="c">0xc0 </field>
  </packet>

Formatted with bin2xml's default stylesheets, this generates the following HTML output:

OBC Beacon TM - 17 Bytes
Name Length Format Data
tnc_flag 2 c 0xc0 0x0
hdlc_flag 3 c 0xdb 0xdc 0x80
counter 1 B [84]
length 1 B [9]
id 1 B [2]
node_addr 1 B 0x66 = SWITCH_BOARD
node_channel 1 B 0xac
data_size 1 B [5]
data 5 B [0, 0, 0, 0, 0]
tnc_tail 1 c 0xc0

Which I think you'll agree is much more readable than '\xc0\x00\xdb\xdc\x80T\t\x02f\xac\x05\x00\x00\x00\x00\x00\xc0'.


Bitmap Files(top)

As an unrelated example, bitmap files are nothing but binary data packed in a relatively well documented format. bin2xml can thus be used to extract the header information from a bitmap file with a minimum of fuss. Here's a format definition for two types of bitmap:

<?xml version="1.0" ?>
<formats name="Bitmap File">
    <template name="BITMAPINFOHEADER">
        <!-- Header -->
        <field name="magic">
            BM
        </field>
        <field name="file_size" length="4" format="=I"/>
        <field name="reserved" length="4"/>
        <field name="data_offset" length="4" format="=I"/>
        
        <!-- DIB Header -->
        <field name="dib_size" length="4" format="=I">
            40
        </field>
        <field name="width" length="4" format="=I"/>
        <field name="height" length="4" format="=I"/>
        <field name="planes" length="2" format="=H">
            1
        </field>
        <field name="bpp" length="2" format="=H"/>
        <field name="compression" length="4" format="=I"/>
        <field name="data_size" length="4" format="=I"/>
        <field name="hres" length="4" format="=I"/>
        <field name="vres" length="4" format="=I"/>
        <field name="colours" length="4" format="=I"/>
        <field name="icolours" length="4" format="=I"/>
        
        <!-- Image Data -->
        <field name="data" length="$data_size"/>
    </template>

    <template name="BITMAPCOREHEADER">
        <!-- Header -->
        <field name="magic">
            BM
        </field>
        <field name="file_size" format="=I"/>
        <field name="reserved" length="4"/>
        <field name="data_offset" length="4" format="=I"/>
        
        <!-- DIB Header -->
        <field name="dib_size" length="4" format="=I">
            12
        </field>
        <field name="width" length="2" format="=H"/>
        <field name="height" length="2" format="=H"/>
        <field name="planes" length="2" format="=H">
            1
        </field>
        <field name="bpp" length="2" format="=H"/>
        
        <!-- Image Data -->
        <field name="data"/>
    </template>

</formats>
Example BMP fileExample BMP file

Now if you throw the bitmap on the right at it, again using default stylesheets, you'll get this output:

BITMAPINFOHEADER - 822 Bytes
Name Length Format Data
magic 2 c BM
file_size 4 =I [822]
reserved 4 c 0x0 0x0 0x0 0x0
data_offset 4 =I [54]
dib_size 4 =I [40]
width 4 =I [16]
height 4 =I [16]
planes 2 =H [1]
bpp 2 =H [24]
compression 4 =I [0]
data_size 4 =I [768]
hres 4 =I [2835]
vres 4 =I [2835]
colours 4 =I [0]
icolours 4 =I [0]
data 768 c 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9f 0xff 0x0 0xff 0xff 0x0 0xff 0x9f 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb8 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3e 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb8 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb8 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3e 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb8 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb8 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb8 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb8 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3e 0x0 0xff 0x0 0x24 0xff 0x0 0x84 0xff 0x0 0xe5 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9f 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9f 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb8 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9f 0x0 0xff 0x3e 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb8 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x83 0xff 0x0 0xe4 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x57 0xff 0x0 0xb8 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3e 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x57 0xff 0x0 0xb8 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3e 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9f 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb8 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9f 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe4 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe4 0xb9 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb9 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3e 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3d 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb8 0x0 0xff 0x58 0x0 0xff 0x0 0x0 0xff 0x0 0x58 0xff 0x0 0xb8 0xff 0x0 0xff 0xe5 0x0 0xff 0x84 0x0 0xff 0x23 0x3d 0xff 0x0 0x9e 0xff 0x0 0xff 0xff 0x0 0xff 0x9e 0x0 0xff 0x3e 0x0 0xff 0x0 0x23 0xff 0x0 0x84 0xff 0x0 0xe5 0xb9 0x0 0xff 0x58 0x0 0xff

The actual image data is still bascially unreadable, but the header is nicely decoded. (And you already knew what the picture looked like :P)


Plaintext Log Files(top)

bin2xml isn't limited to only processing binary data, with the right template file, it can work with text too. Equally, the output doesn't have to be XML or HTML. Use a different XSL sheet and get whatever output you like. For example, you could parse kernel log files into CSV format:

<?xml version="1.0" ?>
<formats name="Kernel Log">
    <template name="Kernel Message">
        
        <field name="timestamp" format="c" length="16"/>
        <field name="hostname" format="c"/>
        <field name="delim" format="s"> </field>
        <field name="label" format="s">kernel: </field>
        <field name="delim" format="s">[</field>
        <field name="ticks" format="c"/>
        <field name="delim" format="s">]</field>
        <field name="message" format="c"/>
        <field name="EOL" format="s">
</field>
    
    </template>
</formats>
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.362963",    "]",    "ata3.00: configured for UDMA/100",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.363209",    "]",    "sd 2:0:0:0: [sdb] Starting disk",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.410895",    "]",    "PM: resume of devices complete after 5439.932 msecs",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.410970",    "]",    "PM: Finishing wakeup.",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.410971",    "]",    "Restarting tasks ... done.",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.568320",    "]",    "r8169 0000:07:00.0: eth0: link down",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.568363",    "]",    "r8169 0000:07:00.0: eth0: link down",    ""
"Mar 10 11:30:48",    "eva",    "",    "kernel:",    "[",    "1481374.568481",    "]",    "IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready",    ""
"Mar 10 11:30:50",    "eva",    "",    "kernel:",    "[",    "1481376.165066",    "]",    "r8169 0000:07:00.0: eth0: link up",    ""
"Mar 10 11:30:50",    "eva",    "",    "kernel:",    "[",    "1481376.165073",    "]",    "IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready",    ""
                

Of course, there are much better tools for parsing text files (especially kernel logs!), but when all you have is a hammer... The point was to show how flexible bin2xml can be with the right configuration files.