Documentation / Witness, Evidence, Collecting evidence



Witness represents one document, file, stream or any other character input that is subject to a BetterDiff process.

For the caller, the content of the Witness, regardless of its origin or form, must always be a sequence of characters (characters string). Every character must represent exactly one symbol, letter, command, or any other singular entity. Every such entity is represented to the caller as an array of bytes and the whole content as a string. For more technical details, please see the note at the end of this documentation.

Example 1:

The Witness represents a text file with two lines: "line 1" and "line 2". The content of the Witness will therefore be a string of 13 characters: "line 1" as characters from 1 to 6, new line as a character number 7, and "line 2" as characters from 8 to 13

Example 2:

The Witness represents a DNA fragment of a gene. The content of the Witness will therefore be a string of nucleic acids represented by nucleobases, e.g. "AGCATATCGG", where every single nucleobase is represented by exactly 1 character.

Example 3:

The Witness represents a binary file without a known internal structure. The content of the Witness will therefore be a string of bytes, e.g. 15, 202, 156, 245, 31, where every single byte is represented by exactly 1 character.

Example 4:

The Witness represents a thought of a human being. The content of the Witness will therefore be a string representation of the thought, e.g. neurons constellation and synapses activity, where every single neuron and a synapse activity is represented by exactly 1 character.

Note. The exact representation of the content is a {@link String}, not an array of chars. This is important to follow, because in Java, char is always stored as 2 bytes, but String is a sequence of bytes, where 1 character may occupy more than 1 byte, e.g. 2 bytes for UTF-16, 4 bytes for UTF-32, 1 byte for ASCII etc.

However, while technically a character is an array of bytes, in the Witness' context, a character is always an indivisible entity, regardless of the technical implementation of String class.



Evidence is a set of Witnesses in a defined order.

While the relation among Witnesses might not be a sequence, but rather a tree structure, or a graph, every Witness has its own number. This number must be unique and all numbers must form a sequence starting with number 1, and without any gaps in them. The Witness' number is referenced as ordinal number.

Collecting evidence


Collecting Evidence is one of the core phases. During this phase, Witnesses are identified and collected into an Evidence.

Generally, this phase consists of following steps:

  • Identification of Witnesses - During this step, it should be decided, what will be used as a Witness - files, streams, or any other structures.
  • Internal representation of a Witness - Based on the first step, the internal representation should be established, identified sources should be read and their content transformed to this representation. Please note that the external representation of a Witness is always a string.
  • Relation among Witnesses and their order - The relation among Witnesses should be established and every Witness should get a sequence number, called ordinal number. All this should be stored within an Evidence.

Collected evidence should not be modified after the phase has been finished.