3.2.4 Ansistrings

Ansistrings are strings that have no length limit. They are reference counted and are guaranteed to be null terminated. Internally, an ansistring is treated as a pointer: the actual content of the string is stored on the heap, as much memory as needed to store the string content is allocated.

This is all handled transparantly, i.e. they can be manipulated as a normal short string. Ansistrings can be defined using the predefined AnsiString type.

Remark: The null-termination does not mean that null characters (char(0) or #0) cannot be used: the null-termination is not used internally, but is there for convenience when dealing with external routines that expect a null-terminated string (as most C routines do).

If the {$H} switch is on, then a string definition using the regular String keyword and that doesn’t contain a length specifier, will be regarded as an ansistring as well. If a length specifier is present, a short string will be used, regardless of the {$H} setting.

If the string is empty (’’), then the internal pointer representation of the string pointer is Nil. If the string is not empty, then the pointer points to a structure in heap memory.

The internal representation as a pointer, and the automatic null-termination make it possible to typecast an ansistring to a pchar. If the string is empty (so the pointer is Nil) then the compiler makes sure that the typecasted pchar will point to a null byte.

Assigning one ansistring to another doesn’t involve moving the actual string. A statement

  S2:=S1;

results in the reference count of S2 being decreased with 1, The reference count of S1 is increased by 1, and finally S1 (as a pointer) is copied to S2. This is a significant speed-up in the code.

If the reference count of a string reaches zero, then the memory occupied by the string is deallocated automatically, and the pointer is set to Nil, so no memory leaks arise.

When an ansistring is declared, the Free Pascal compiler initially allocates just memory for a pointer, not more. This pointer is guaranteed to be Nil, meaning that the string is initially empty. This is true for local and global ansistrings or ansistrings that are part of a structure (arrays, records or objects).

This does introduce an overhead. For instance, declaring

Var  
  A : Array[1..100000] of string;

Will copy the value Nil 100,000 times into A. When A goes out of scope, then the reference count of the 100,000 strings will be decreased by 1 for each of these strings. All this happens invisible to the programmer, but when considering performance issues, this is important.

Memory for the string content will be allocated only when the string is assigned a value. If the string goes out of scope, then its reference count is automatically decreased by 1. If the reference count reaches zero, the memory reserved for the string is released.

If a value is assigned to a character of a string that has a reference count greater than 1, such as in the following statements:

  S:=T;  { reference count for S and T is now 2 }  
  S[I]:=’@’;

then a copy of the string is created before the assignment. This is known as copy-on-write semantics. It is possible to force a string to have reference count equal to 1 with the UniqueString call:

  S:=T;  
  R:=T; // Reference count of T is at least 3  
  UniqueString(T);  
  // Reference count of T is quaranteed 1

It’s recommended to do this e.g. when typecasting an ansistring to a PChar var and passing it to a C routine that modifies the string.

The Length function must be used to get the length of an ansistring: the length is not stored at character 0 of the ansistring. The construct

 L:=ord(S[0]);

which was valid for Turbo Pascal shortstrings, is no longer correct for Ansistrings. The compiler will warn if such a construct is encountered.

To set the length of an ansistring, the SetLength function must be used. Constant ansistrings have a reference count of -1 and are treated specially, The same remark as for Length must be given: The construct

  L:=12;  
  S[0]:=Char(L);

which was valid for Turbo Pascal shortstrings, is no longer correct for Ansistrings. The compiler will warn if such a construct is encountered.

Ansistrings are converted to short strings by the compiler if needed, this means that the use of ansistrings and short strings can be mixed without problems.

Ansistrings can be typecasted to PChar or Pointer types:

Var P : Pointer;  
    PC : PChar;  
    S : AnsiString;  
 
begin  
  S :=’This is an ansistring’;  
  PC:=Pchar(S);  
  P :=Pointer(S);

There is a difference between the two typecasts. When an empty ansistring is typecasted to a pointer, the pointer will be Nil. If an empty ansistring is typecasted to a PChar, then the result will be a pointer to a zero byte (an empty string).

The result of such a typecast must be used with care. In general, it is best to consider the result of such a typecast as read-only, i.e. only suitable for passing to a procedure that needs a constant pchar argument.

It is therefore not advisable to typecast one of the following:

  1. Expressions.
  2. Strings that have reference count larger than 1. In this case you should call Uniquestring to ensure the string has reference count 1.