1. Preamble

The DFASDL specification contains the description of all currently valid elements of the Data Format and Semantics Description Language.

Version

1.0.1

Copyright (c) 2014 - 2017 Contributors as noted in the AUTHORS.md file

The DFASDL specification is distributed under the terms of the
Creative Commons Attribution 4.0 International license (CC BY 4.0).

2. Struktur

  1. The root element of a DFASDL document is the element dfasdl.

2.1. Permitted nestings

The table Permitted nestings of the elements lists the permitted nestings of the elements within a DFASDL document. All elements of the topline can contain the marked elements.

Table 1. Permitted nestings of the elements

celem

choice

cid

const

elem

fixseq

seq

bin

x

x

x

x

x

x

bin64

x

x

x

x

x

x

binHex

x

x

x

x

x

x

celem

x

x

choice

x

x

x

cid

x

x

x

x

const

x

x

x

date

x

x

x

x

datetime

x

x

x

x

elem

x

x

x

fixseq

x

x

x

formatnum

x

x

x

x

x

x

formatstr

x

x

x

x

x

x

formattime

x

x

x

x

num

x

x

x

x

x

x

ref

x

x

x

seq

x

x

x

str

x

x

x

x

x

x

time

x

x

x

x

2.2. Element groups

To make it easier to describe the elements, they are organized in specific groups.

2.2.1. Structure-Element-Group

Structure-Elements are used to describe the structure of the data.

Representatives:

2.2.2. Data-Element-Group

Data-Elements contain no other elements and are container for the data.

Representatives:

2.2.3. Time-Element-Group

Time-Elements contain no other elements and are container for time and date values.

Representatives:

2.2.4. Expression-Element-Group

Expression-Elements define final expressions or constructs that must be evaluated.

Representatives:

3. Elements

3.1. bin

An element that contains binary data.

  1. A byteOrder must be specified.

  2. A coding can be specified via encoding (e.g. Base32, Base64, Base85).

  3. Mime Media Typ can be specified via the mime attribute.

Definition
<bin byteOrder="littleEndian" id="ID1"/>
<bin encoding="Base64" id="ID2"/>
<bin mime="text/plain" id="ID3"/>

3.2. bin64

This element contains binary data that are encoded via Base64.

Allowed attributes

3.3. binHex

An element that contains hexadecimal encoded data.

Allowed attributes

3.4. celem

A choice-container-element defines the smallest possible entity within a choice element. It is recursively defined and can contain other elements.

  1. A simple choice-container-element does not contain a value!

  2. A simple choice-container-element can contain other elements.

  3. A choice-container-element can only occur directly below a choice element.

Definition
<choice id="card">
  <celem id="row" s="semantic">
    <num id="row_num"/>
    <str id="row_str"/>
  </celem>
</choice>
Allowed attributes

3.5. choice

An element that allows the construction of alternatives in the structure.

  1. Matching of a structure or elements

  2. Elements must be within one or multiple celem elements within a choice.

  3. The order of the data elements determines the matching. Therefore, specific data elements should be defined before a str element.

  4. The last data elements within a choice should not contain a stop-sign.

Definition
<choice id="card">
  <celem id="row1">
    <num id="row1_num" start-sign="\d" stop-sign=";"/>
    <str id="row1_str" start-sign="NAME" stop-sign=":"/>
  </celem>
  <celem id="row2">
    <num id="row2_num" start-sign="\d" stop-sign=";" />
    <str id="row2_str" start-sign="NAME"/>
  </celem>
</choice>
Allowed attributes
Allowed elements

3.6. cid

An element that can be used as nesting element for a data element.

  1. A user defined ID represents the nesting element for a string or numerical data element

  2. A user defined ID can define a class class.

Definition
<elem id="someElement">
  <cid id="myCustomID" class="myCustomClass">
    <str/>
  </cid>
  <str id="ID"/>
</elem>

<seq id="someList" min="2">
  <elem id="structure">
    <cid id="anotherCustomID" class="nestedClass">
      <str id="ID"/>
    </cid>
    <str id="anotherID"/>
  </elem>
</seq>
Allowed attributes
Allowed elements

3.7. const

A constant is a nesting element for exactly one other element from the Data-Element-Group.

Definition
<const id="foo">
  <str id="fooStr">Foo</str>
</const>

<const id="bar">
  <num id="barNum">123</num>
</const>
Allowed attributes
Allowed elements

3.8. date

An element that describes a date. The date must be in the ISO format (yyyy-MM-dd)!

Definition
<date id="dateField/>

3.9. datetime

An element that describes a complete date with time (timestamp). The timestamp must be in the ISO format!

Definition
<datetime id="dateTime"/>

3.10. dfasdl

The root element of a DFASDL document contains attributes that describe the document.

  1. It exists only once in the whole document at the uppermost level.

  2. The used semantic space is defined in the semantic attribute.

  3. The attribute default-encoding can be used to set a default value for unset encoding attributes at elements.

Definition
<dfasdl xmlns="http://www.dfasdl.org/DFASDL" semantic="custom">
  ...
</dfasdl>
Allowed attributes
Allowed elements

All elements besides the root element (dfasdl).

3.11. elem

An element defines the smallest possible entity within a format. It is recursively defined and can contain other elements.

  1. A simple element does not contain a value!

  2. A simple element can contain other elements.

Definition
<elem id="foo">
  <seq id="bar" max="2">
    <str id="foobar"/>
  </seq>
</elem>
<elem id="empty"/>
Allowed aAttributes
Allowed elements

All elements besides the root element (dfasdl).

3.12. fixseq

A fixed sequence specifies a repeating child structure with a finite set of elements.

  1. A fixseq has the same characteristics like a seq, except that it defines a concrete number of elements.

  2. The number of elements will be defined with the count attribute.

  3. The stop-sign defines a character string that stops the sequence. If this stop-sign occurs in the data, the sequence is stopped and the next element after the sequence is processed.

Definition
<fixseq id="accountList" count="2">
  <elem id="account">
    <str id="number"/>
  </elem>
</fixseq>

3.13. formatnum

A numerical data element that must fulfill the specified (format) format.

The following characters are valid within the data and the defaultnum attribute:

  1. minus (-)

  2. numbers (0-9)

  3. point (.)

  4. comma (,)

Definition
<formatnum format="(\d\d\d)" id="ID" max-digits="12" />
<formatnum decimal-separator="." format="([0-9]{1,3}\.\d{1,2})" id="ID2"
max-digits="3" max-precision="2" />
The default value of the decimal-separator is the comma (,). If no value is specified for the decimal-separator, this default value is used.
The matching part of the format attribut must be within a group (…​)!
If the decimal separator is to be retained, it must be specified via the decimal-separator attribute.

3.14. formatstr

An element for a string that must fulfill the specified (format) format.

Definition
<formatstr id="formatA" format="(\w\w\d)"/>
<formatstr id="formatB" format="(\w{1,10})"/>
<formatstr id="formatC" format=".*?:(.*)"/>
The matching part must be within a group (…​)!

3.15. formattime

For date and time values that are not ISO conform. The specification for the format attribute must contain a value that can be processed by the following definition Java DateTimeFormatter!

Definition
<formattime id="my-time-is-now" format="dd.MM.yyyy HH:mm:ss X"/>

3.16. num

A data element that contains a numerical value.

  1. A numerical element may only contains numbers and can contain a minus as first character.

  2. A numerical element can define an exact number of digits (length).

    1. The minus sign is not included in the calculation of the length.

  3. A numerical element can specify a maximum number of signs (max-digits) that should be considered.

    1. The minus sign is not included in the calculation of the length.

  4. A numerical element can specify the number of signs after the comma (precision).

  5. A numerical element can define a default value (defaultnum) that will be inserted for missing data values.

The following signs are valid in the data and in the defaultnum attribute:

  1. minus (-)

  2. numbers (0-9)

Definition
<num id="numberA" length="4"/>
<num id="numberB" max-digits="5"/>
<num id="Pi" length="10" precision="9" defaultnum="3141592653"/>

3.17. ref

A reference refers to a data element within the document, that is placed at the position of the reference.

  1. A reference must define a source ID (sid), that corresponds to the id of the referenced data element!

  2. The referenced data element must be before the reference in the DFASDL.

  3. If a reference is specified within a sequence, the reference must be at the end.

  4. Only one reference is allowed within a sequence.

  5. If no semantic meaning is defined for the reference (s), the semantic meaning of the referenced element is used.

Definition
<elem id="someBlockElement">
  <elem id="anotherID">
    <str id="firstname"/>
    <str id="lastname"/>
    <num id="mainNumber"/>
  </elem>
</elem>
<ref id="number" sid="mainNumber"/>
<!-- Referenzieren aus einer Sequenz -->
<seq id="accountList" max="999">
  <elem id="account">
    <num id="account_id"/>
    <str id="name"/>
    <str id="account"/>
    <seq id="children">
      <elem id="alter">
        <num id="anzahl"/>
        <num id="age"/>
        <ref sid="account_id" id="children_account_id">
      </elem>
    </seq>
  </elem>
</seq>
Allowed attributes

3.18. seq

A sequence element defines a repeating structure.

  1. A reference can define the following variants:

    1. a minimum count (min)

    2. a maximum count (max)

    3. a minimum and a maximum count (min and max)

    4. no specification (corresponds to an infinite sequence)

  2. The IDs are not copied during the conversion within a sequence, but newly created in the class attribute. The ID foo becomes id:foo.

  3. If the IDs should be deleted, the attribute keepID must be specified with false.

  4. Data-Elements must be placed within an elem element within a sequence.

  5. The stop-sign defines a character string that can stop the sequence.

  6. The attribute filter allows filtering upon the source data. Only data fullfilling the filter will be used.

Definition
<seq id="accountList" min="42" max="999">
  <elem id="account">
    <str id="number" class="foo"/>
    <str id="name"/>
  </elem>
</seq>

<seq id="accountList2" keepID="false">
  <elem id="account">
    <str id="number" class="bar"/>
  </elem>
</seq>

<seq id="salaries" filter="salary > 20000">
  <elem id="employee">
    <str id="name"/>
    <num id="salary"/>
  </elem>
</seq>

3.19. str

A data element for character strings. Can be used as generic container to represent nearly every kind of data. (But should not)

  1. A character element is only allowed to contain characters in the standard or defined encoding.

  2. A character element can define the encoding of the expected characters (encoding).

  3. A character element can define the exact number of allowed characters (length).

  4. A character element can define the maximum number of characters (max-length).

  5. A character element can define a number of signs that are used as stop signs (stop-sign).

  6. A character element can define a default value that is inserted for missing data values (defaultstr).

Definition
<str id="A" encoding="UTF-16"/>
<str id="B" length="3"/>
<str id="C" max-length="5"/>
<str id="possiblyEmpty" defaultstr="missingValue"/>
<str id="D" stop-sign="\n"/>

3.20. sxp

An element that represents a Scala expression.

This element will be removed!
Definition
<sxp id="expOne">
  <ul><![CDATA[{List(apple, banana, orange).map(i => <li>{i}</li>)}]]></ul>
</sxp>
Allowed attributes

3.21. time

A data element for time values that must satisfy the ISO notation.

Definition
<time id="high-noon"/>

4. Attributes

4.1. Root attributes

Root attributes are only allowed at the root element dfasdl.

4.1.1. default-encoding

A default value for the encoding of read data. This has to be a valid definition like utf-8.

This attribute is useful if all or most elements use the same encoding.

4.1.2. semantic

This attribute describes the semantic space of the document. Currently, the following values are allowed:

  1. custom

  2. niem

  3. udef

4.2. Generic attributes

Generic attributes are allowed on all elements besides the root element.

4.2.1. class

Defines a class definition for the element.

4.2.2. correct-offset

This attribute corrects the offset of the read-in data.

The offset can be corrected into the positive or the negative direction.

4.2.3. encoding

The used encoding for the data. This has to be a valid definition like utf-8.

4.2.4. id

  1. An ID is a character string.

  2. An ID must start with an alphabetic character.

  3. An ID can contain characters from the ASCII alphabet, numbers, underscores and minus signs.

  4. An ID is only allowed to exist once within the document.

  5. Normally, all elements must define an ID!

If no ID is defined, the software automatically creates one. You should define your own IDs that are easier to read during the mapping process.

4.2.5. s

Describes a semantic meaning of the element.

  1. The semantic meaning is defined as character string.

  2. Only values of the defined sematic space are allowed.

4.2.6. start-sign

A regular expression that describes the beginning of the element.

A start-sign is not allowed to be empty!

4.2.7. stop-sign

A regular expression that describes the end of the element data.

The default stop-sign considers UNIX and Windows line endings and is defined as follows: \r\n?|\n
A stop-sign is not allowed to be empty!

4.3. Element specific attributes

Attributes that are only allowed at specific elements.

4.3.1. byteOrder

Defines the sort order for binary data. The following values are possible:

  1. bigEndian

  2. littleEndian

  3. middleEndian

4.3.2. count

Defines a quantity.

4.3.3. db-auto-inc

The column of the element in the database is an auto-increment column. Meaning that the value of the column will be filled automatically if no value is provided.

Allowed at the following elements
Because database are limited in the usage of auto-increment columns you should use this attribute only on a simple num element without the attributes precision and length!
Example
<seq id="companies">
  <elem id="companies-row">
    <num id="companies-row-id" db-column-name="id" db-auto-inc="true"/>
  </elem>
</seq>

4.3.4. db-column-name

The column name of the element in the database. If this is not set, the ID will be used as column name.

4.3.5. db-foreign-key

The foreign key definition of the database table described by the current element. You have to specify a comma separated list of DFASDL element ids that describe the referenced table columns.

Allowed at the following elements
Example
<seq id="companies">
  <elem id="companies-row">
    <num id="companies-row-id" db-column-name="id"/> (1)
    ...
  </elem>
</seq>

<seq id="contacts">
  <elem id="contacts-row">
    ...
    <num id="contacts-row-company-id" db-column-name="company_id" db-foreign-key="companies-row-id"/> (2)
  </elem>
</seq>

[1] The description of a data column. [2] The reference via db-foreign-key to the relevant column element.

4.3.6. db-insert

Allows the definition of database specific INSERT statements. The syntax must fulfill the definition for Prepared Statements.

Example
INSERT INTO mytable (column1, column2) VALUES(?, ?)
It is possible to use a database specific SQL-Syntax.
Allowed at the following elements

4.3.7. db-primary-key

Defines a primary key for a database table. If the attribute is defined, it must contain one or multiple (separated by comma) column names.

The column name(s) must correspond to the name of the database columns.
Allowed at the following elements

4.3.8. db-select

Allows the execution of database specific SELECT statements.

Example
SELECT
  x374 AS column1,
  y478 AS column2
FROM x2 JOIN y3 ON x2.id = y3.refId
WHERE x2.x23 = 1
ORDER BY y3.y1 ASC
Example DFASDL
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<dfasdl xmlns="http://www.dfasdl.org/DFASDL" semantic="custom">
  <seq id="people" db-select="SELECT t1.name, firstname, title, telephone, t2.name AS productname FROM `people` AS t1, `products` AS t2 WHERE t1.pid = t2.pid">
    <elem id="people_row">
      <str db-column-name="name" id="people_row_name" max-length="12"/>
      <str db-column-name="firstname" id="people_row_firstname" max-length="9"/>
      <str db-column-name="title" id="people_row_title" max-length="22"/>
      <str db-column-name="telephone" id="people_row_telephone" max-length="14"/>
      <str db-column-name="productname" id="productname"/>
    </elem>
  </seq>
</dfasdl>
Allowed at the following elements

4.3.9. db-update

Allows the execution of database specific UPDATE statements. The syntax· must fulfill the definition for Prepared Statements.

Example
UPDATE mytable SET id = ?, column1 = ?, column2 = ? WHERE id = ?
Example DFASDL
<?xml version="1.0" encoding="UTF-8"?>
<dfasdl xmlns="http://www.dfasdl.org/DFASDL" semantic="custom">
  <seq id="people" db-primary-key="id" db-update="UPDATE people SET id = ?, name = ?, time = now() WHERE id = ?">
    <elem id="people_row">
      <num db-column-name="id" id="id" max-digits="5"/>
      <str db-column-name="name" id="name" max-length="12"/>
    </elem>
  </seq>
</dfasdl>
It is possible to use database specific SQL-Syntax.
Allowed at the following elements

4.3.10. decimal-separator

Defines a decimal separator for a numerical data element. The following values are allowed:

  1. point (.)

  2. comma (,)

  3. Momayyez (٫)

Allowed at the following elements

4.3.11. defaultnum

Defines a default value for a numerical data element that is inserted when the data is empty.

Allowed at the following elements

4.3.12. defaultstr

Defines a character string for a data element that is inserted when the data is empty.

Allowed at the following elements

4.3.13. filter

Defines a filter expression that is used to limit the available source data.

Currently filtering is supported on databases only!
Special characters that my lead to problems with XML like < and & must be escaped properly!
Escaped characters in the filter expression
<seq id="foo" filter="my-column-data &lt; 1024">
  ...
</seq>
Allowed at the following elements

seq

4.3.14. format

Contains the format definition for the content of the data element.

The matching part must be within a group (…​)!
Allowed at the following elements

4.3.15. length

Defines the exact length of a character string.

Allowed at the following elements

num, str

4.3.16. keepID

Whether the values of the attribute id should be kept within sequences. true and false are allowed.

The default value of this attribute is true.
Allowed at the following elements

4.3.17. max

Defines a maximum numerical value as Integer.

Allowed at the following elements

seq

4.3.18. max-digits

Defines a maximum number of digits as Integer.

Allowed at the following elements

4.3.19. max-length

Defines the maximum length of a character string as Integer.

Allowed at the following elements

str

4.3.20. max-precision

Defines the precision after the comma for a numerical value.

Allowed at the following elements

4.3.21. mime

Defines the MIME type of binary data. e.g. application/postscript.

Allowed at the following elements

bin

4.3.22. min

Defines the minimum numerical value as Integer.

Allowed at the following elements

seq

4.3.23. precision

Defines the precision. The number of positions after the comma for a numerical value.

Allowed at the following elements

num

4.3.24. sep

Defines a separator for the values of a data set.

This attribute is not used

4.3.25. sid

Defines a source ID for a reference to another element.

Allowed at the following elements

ref

4.3.26. trim

Whether the read-in character string should be cleaned. Spaces, tabulators and line breaks are deleted. The following values are posssible:

left

Only at the beginning of the character string.

right

Only at the end of the character string.

both

At the beginning and the end of the character string.

Allowed at the following elements

4.3.27. unique

The unique attribute indicates that a concrete value of the element must only occur once! In principle this is the same like the UNIQUE constraint in relational databases. The attribute maybe omitted or contain "false" to be ignored. If set to true it takes effect. Currently this attribute is only allowed at numeric, string and time elements.

Allowed at the following elements

4.3.28. value

Defines a value for a data set.

Allowed at the following elements

ref

4.3.29. xml-attribute-name

Defines the name of the attribute at the XML element (defined via xml-attribute-parent). Allows to read-in data from XML attributes.

Definition
<seq id="foo">
  <elem id="row">
    <num id="age" xml-attribute-name="age" xml-attribute-parent="raw-data"/>
    <num id="count" xml-attribute-name="count" xml-attribute-parent="raw-data"/>
  </elem>
</seq>

4.3.30. xml-attribute-parent

Defines the name of a XML element that contains attributes which should be read-in. (see xml-attribute-name).

Definition
<seq id="foo">
  <elem id="row">
    <num id="age" xml-attribute-name="age" xml-attribute-parent="raw-data"/>
    <num id="count" xml-attribute-name="count" xml-attribute-parent="raw-data"/>
  </elem>
</seq>

4.3.31. xml-element-name

If the name of the XML element is not equal to id, it can be defined with this attribute (same as db-column-name).

Definition
<str id="some-id" xml-element-name="an-xml-id"/>