The GNU C Library Reference Manual

The GNU C Library

Short Contents

Table of Contents


Next: Introduction,Previous: (dir),Up: (dir)

Main Menu

This isThe GNU C Library Reference Manual, for Version 2.16of the GNU C Library.

Appendices

Indices

--- The Detailed Node Listing ---

Introduction

Standards and Portability

Using the Library

Error Reporting

Memory

Memory Allocation

Unconstrained Allocation

Allocation Debugging

Obstacks

Variable Size Automatic

Locking Pages

Character Handling

String and Array Utilities

Argz and Envz Vectors

Character Set Handling

Restartable multibyte conversion

Non-reentrant Conversion

Generic Charset Conversion

Locales

Locale Information

The Lame Way to Locale Data

Message Translation

Message catalogs a la X/Open

The Uniforum approach

Message catalogs with gettext

Searching and Sorting

Pattern Matching

Globbing

Regular Expressions

Word Expansion

I/O Overview

I/O Concepts

File Names

I/O on Streams

Unreading

Formatted Output

Customizing Printf

Formatted Input

Stream Buffering

Other Kinds of Streams

Custom Streams

Formatted Messages

Low-Level I/O

Stream/Descriptor Precautions

Asynchronous I/O

File Status Flags

File System Interface

Accessing Directories

File Attributes

Pipes and FIFOs

Sockets

Socket Addresses

Local Namespace

Internet Namespace

Host Addresses

Open/Close Sockets

Connections

Transferring Data

Datagrams

Inetd

Socket Options

Low-Level Terminal Interface

Terminal Modes

Special Characters

Pseudo-Terminals

Syslog

Submitting Syslog Messages

Mathematics

Pseudo-Random Numbers

Arithmetic

Floating Point Errors

Arithmetic Functions

Parsing of Numbers

Date and Time

Processor And CPU Time

Calendar Time

Parsing Date and Time

Resource Usage And Limitation

Priority

Traditional Scheduling

Memory Resources

Non-Local Exits

Signal Handling

Concepts of Signals

Standard Signals

Signal Actions

Defining Handlers

Atomic Data Access

Generating Signals

Blocking Signals

Waiting for a Signal

BSD Signal Handling

Program Basics

Program Arguments

Parsing Program Arguments

Environment Variables

Program Termination

Processes

Job Control

Implementing a Shell

Functions for Job Control

Name Service Switch

NSS Configuration File

NSS Module Internals

Extending NSS

Users and Groups

User Accounting Database

User Database

Group Database

Netgroup Database

System Management

Filesystem Handling

Mount Information

System Configuration

Sysconf

Cryptographic Functions

Debugging Support

Language Features

Variadic Functions

How Variadic

Data Type Measurements

Floating Type Macros

Installation

Maintenance

Source Layout

Porting

Platform


Next: Error Reporting,Previous: Top,Up: Top

1 Introduction

The C language provides no built-in facilities for performing suchcommon operations as input/output, memory management, stringmanipulation, and the like. Instead, these facilities are definedin a standardlibrary, which you compile and link with yourprograms. The GNU C Library, described in this document, defines all of thelibrary functions that are specified by the ISO C standard, as well asadditional features specific to POSIX and other derivatives of the Unixoperating system, and extensions specific to GNU systems.

The purpose of this manual is to tell you how to use the facilitiesof the GNU C Library. We have mentioned which features belong to whichstandards to help you identify things that are potentially non-portableto other systems. But the emphasis in this manual is not on strictportability.


Next: Standards and Portability,Up: Introduction

1.1 Getting Started

This manual is written with the assumption that you are at leastsomewhat familiar with the C programming language and basic programmingconcepts. Specifically, familiarity with ISO standard C(seeISO C), rather than “traditional” pre-ISO C dialects, isassumed.

The GNU C Library includes several header files, each of whichprovides definitions and declarations for a group of related facilities;this information is used by the C compiler when processing your program. For example, the header filestdio.h declares facilities forperforming input and output, and the header filestring.hdeclares string processing utilities. The organization of this manualgenerally follows the same division as the header files.

If you are reading this manual for the first time, you should read allof the introductory material and skim the remaining chapters. There arealot of functions in the GNU C Library and it's not realistic toexpect that you will be able to remember exactlyhow to use eachand every one of them. It's more important to become generally familiarwith the kinds of facilities that the library provides, so that when youare writing your programs you can recognizewhen to make use oflibrary functions, and where in this manual you can find morespecific information about them.


Next: Using the Library,Previous: Getting Started,Up: Introduction

1.2 Standards and Portability

This section discusses the various standards and other sources that the GNU C Libraryis based upon. These sources include the ISO C andPOSIX standards, and the System V and Berkeley Unix implementations.

The primary focus of this manual is to tell you how to make effectiveuse of the GNU C Library facilities. But if you are concerned aboutmaking your programs compatible with these standards, or portable tooperating systems other than GNU, this can affect how you use thelibrary. This section gives you an overview of these standards, so thatyou will know what they are when they are mentioned in other parts ofthe manual.

See Library Summary, for an alphabetical list of the functions andother symbols provided by the library. This list also states whichstandards each function or symbol comes from.


Next: POSIX,Up: Standards and Portability

1.2.1 ISO C

The GNU C Library is compatible with the C standard adopted by theAmerican National Standards Institute (ANSI):American National Standard X3.159-1989—“ANSI C” and laterby the International Standardization Organization (ISO):ISO/IEC 9899:1990, “Programming languages—C”. We here refer to the standard as ISO C since this is the moregeneral standard in respect of ratification. The header files and library facilities that make up the GNU C Library area superset of those specified by the ISO C standard.

If you are concerned about strict adherence to the ISO C standard, youshould use the ‘-ansi’ option when you compile your programs withthe GNU C compiler. This tells the compiler to defineonly ISOstandard features from the library header files, unless you explicitlyask for additional features. SeeFeature Test Macros, forinformation on how to do this.

Being able to restrict the library to include only ISO C features isimportant because ISO C puts limitations on what names can be definedby the library implementation, and the GNU extensions don't fit theselimitations. SeeReserved Names, for more information about theserestrictions.

This manual does not attempt to give you complete details on thedifferences between ISO C and older dialects. It gives advice on howto write programs to work portably under multiple C dialects, but doesnot aim for completeness.


Next: Berkeley Unix,Previous: ISO C,Up: Standards and Portability

1.2.2 POSIX (The Portable Operating System Interface)

The GNU C Library is also compatible with the ISO POSIX family ofstandards, known more formally as thePortable Operating SystemInterface for Computer Environments (ISO/IEC 9945). They were alsopublished as ANSI/IEEE Std 1003. POSIX is derived mostly from variousversions of the Unix operating system.

The library facilities specified by the POSIX standards are a supersetof those required by ISO C; POSIX specifies additional features forISO C functions, as well as specifying new additional functions. Ingeneral, the additional requirements and functionality defined by thePOSIX standards are aimed at providing lower-level support for aparticular kind of operating system environment, rather than generalprogramming language support which can run in many diverse operatingsystem environments.

The GNU C Library implements all of the functions specified inISO/IEC 9945-1:1996, the POSIX System Application ProgramInterface, commonly referred to as POSIX.1. The primary extensions tothe ISO C facilities specified by this standard include file systeminterface primitives (see File System Interface), device-specificterminal control functions (see Low-Level Terminal Interface), andprocess control functions (see Processes).

Some facilities from ISO/IEC 9945-2:1993, the POSIX Shell andUtilities standard (POSIX.2) are also implemented in the GNU C Library. These include utilities for dealing with regular expressions and otherpattern matching facilities (seePattern Matching).


Next: SVID,Previous: POSIX,Up: Standards and Portability

1.2.3 Berkeley Unix

The GNU C Library defines facilities from some versions of Unix whichare not formally standardized, specifically from the 4.2 BSD, 4.3 BSD,and 4.4 BSD Unix systems (also known asBerkeley Unix) and fromSunOS (a popular 4.2 BSD derivative that includes some Unix SystemV functionality). These systems support most of the ISO C and POSIXfacilities, and 4.4 BSD and newer releases of SunOS in fact support them all.

The BSD facilities include symbolic links (see Symbolic Links), theselect function (see Waiting for I/O), the BSD signalfunctions (see BSD Signal Handling), and sockets (see Sockets).


Next: XPG,Previous: Berkeley Unix,Up: Standards and Portability

1.2.4 SVID (The System V Interface Description)

TheSystem V Interface Description (SVID) is a document describingthe AT&T Unix System V operating system. It is to some extent asuperset of the POSIX standard (seePOSIX).

The GNU C Library defines most of the facilities required by the SVIDthat are not also required by the ISO C or POSIX standards, forcompatibility with System V Unix and other Unix systems (such asSunOS) which include these facilities. However, many of the moreobscure and less generally useful facilities required by the SVID arenot included. (In fact, Unix System V itself does not provide them all.)

The supported facilities from System V include the methods forinter-process communication and shared memory, thehsearch anddrand48 families of functions,fmtmsg and several of themathematical functions.


Previous: SVID,Up: Standards and Portability

1.2.5 XPG (The X/Open Portability Guide)

The X/Open Portability Guide, published by the X/Open Company, Ltd., isa more general standard than POSIX. X/Open owns the Unix copyright andthe XPG specifies the requirements for systems which are intended to bea Unix system.

The GNU C Library complies to the X/Open Portability Guide, Issue 4.2,with all extensions common to XSI (X/Open System Interface)compliant systems and also all X/Open UNIX extensions.

The additions on top of POSIX are mainly derived from functionalityavailable in System V and BSD systems. Some of the really badmistakes in System V systems were corrected, though. Sincefulfilling the XPG standard with the Unix extensions is aprecondition for getting the Unix brand chances are good that thefunctionality is available on commercial systems.


Next: Roadmap to the Manual,Previous: Standards and Portability,Up: Introduction

1.3 Using the Library

This section describes some of the practical issues involved in usingthe GNU C Library.


Next: Macro Definitions,Up: Using the Library

1.3.1 Header Files

Libraries for use by C programs really consist of two parts:headerfiles that define types and macros and declare variables andfunctions; and the actual library orarchive that contains thedefinitions of the variables and functions.

(Recall that in C, a declaration merely provides information thata function or variable exists and gives its type. For a functiondeclaration, information about the types of its arguments might beprovided as well. The purpose of declarations is to allow the compilerto correctly process references to the declared variables and functions. Adefinition, on the other hand, actually allocates storage for avariable or says what a function does.)In order to use the facilities in the GNU C Library, you should be surethat your program source files include the appropriate header files. This is so that the compiler has declarations of these facilitiesavailable and can correctly process references to them. Once yourprogram has been compiled, the linker resolves these references tothe actual definitions provided in the archive file.

Header files are included into a program source file by the‘#include’ preprocessor directive. The C language supports twoforms of this directive; the first,

     #include "header"

is typically used to include a header file header that you writeyourself; this would contain definitions and declarations describing theinterfaces between the different parts of your particular application. By contrast,

     #include <file.h>

is typically used to include a header file file.h that containsdefinitions and declarations for a standard library. This file wouldnormally be installed in a standard place by your system administrator. You should use this second form for the C library header files.

Typically, ‘#include’ directives are placed at the top of the Csource file, before any other code. If you begin your source files withsome comments explaining what the code in the file does (a good idea),put the ‘#include’ directives immediately afterwards, following thefeature test macro definition (seeFeature Test Macros).

For more information about the use of header files and ‘#include’directives, seeHeader Files.

The GNU C Library provides several header files, each of which containsthe type and macro definitions and variable and function declarationsfor a group of related facilities. This means that your programs mayneed to include several header files, depending on exactly whichfacilities you are using.

Some library header files include other library header filesautomatically. However, as a matter of programming style, you shouldnot rely on this; it is better to explicitly include all the headerfiles required for the library facilities you are using. The GNU C Libraryheader files have been written in such a way that it doesn'tmatter if a header file is accidentally included more than once;including a header file a second time has no effect. Likewise, if yourprogram needs to include multiple header files, the order in which theyare included doesn't matter.

Compatibility Note: Inclusion of standard header files in anyorder and any number of times works in any ISO C implementation. However, this has traditionally not been the case in many older Cimplementations.

Strictly speaking, you don't have to include a header file to usea function it declares; you could declare the function explicitlyyourself, according to the specifications in this manual. But it isusually better to include the header file because it may define typesand macros that are not otherwise available and because it may definemore efficient macro replacements for some functions. It is also a sureway to have the correct declaration.


Next: Reserved Names,Previous: Header Files,Up: Using the Library

1.3.2 Macro Definitions of Functions

If we describe something as a function in this manual, it may have amacro definition as well. This normally has no effect on how yourprogram runs—the macro definition does the same thing as the functionwould. In particular, macro equivalents for library functions evaluatearguments exactly once, in the same way that a function call would. Themain reason for these macro definitions is that sometimes they canproduce an inline expansion that is considerably faster than an actualfunction call.

Taking the address of a library function works even if it is alsodefined as a macro. This is because, in this context, the name of thefunction isn't followed by the left parenthesis that is syntacticallynecessary to recognize a macro call.

You might occasionally want to avoid using the macro definition of afunction—perhaps to make your program easier to debug. There aretwo ways you can do this:

  • You can avoid a macro definition in a specific use by enclosing the nameof the function in parentheses. This works because the name of thefunction doesn't appear in a syntactic context where it is recognizableas a macro call.
  • You can suppress any macro definition for a whole source file by usingthe ‘#undef’ preprocessor directive, unless otherwise statedexplicitly in the description of that facility.

For example, suppose the header file stdlib.h declares a functionnamedabs with

     extern int abs (int);

and also provides a macro definition for abs. Then, in:

     #include <stdlib.h>
     int f (int *i) { return abs (++*i); }

the reference to abs might refer to either a macro or a function. On the other hand, in each of the following examples the reference isto a function and not a macro.

     #include <stdlib.h>
     int g (int *i) { return (abs) (++*i); }
     
     #undef abs
     int h (int *i) { return abs (++*i); }

Since macro definitions that double for a function behave inexactly the same way as the actual function version, there is usually noneed for any of these methods. In fact, removing macro definitions usuallyjust makes your program slower.


Next: Feature Test Macros,Previous: Macro Definitions,Up: Using the Library

1.3.3 Reserved Names

The names of all library types, macros, variables and functions thatcome from the ISO C standard are reserved unconditionally; your programmay not redefine these names. All other library names arereserved if your program explicitly includes the header file thatdefines or declares them. There are several reasons for theserestrictions:

  • Other people reading your code could get very confused if you were usinga function namedexit to do something completely different fromwhat the standardexit function does, for example. Preventingthis situation helps to make your programs easier to understand andcontributes to modularity and maintainability.
  • It avoids the possibility of a user accidentally redefining a libraryfunction that is called by other library functions. If redefinitionwere allowed, those other functions would not work properly.
  • It allows the compiler to do whatever special optimizations it pleaseson calls to these functions, without the possibility that they may havebeen redefined by the user. Some library facilities, such as those fordealing with variadic arguments (seeVariadic Functions)and non-local exits (see Non-Local Exits), actually require aconsiderable amount of cooperation on the part of the C compiler, andwith respect to the implementation, it might be easier for the compilerto treat these as built-in parts of the language.

In addition to the names documented in this manual, reserved namesinclude all external identifiers (global functions and variables) thatbegin with an underscore (‘_’) and all identifiers regardless ofuse that begin with either two underscores or an underscore followed bya capital letter are reserved names. This is so that the library andheader files can define functions, variables, and macros for internalpurposes without risk of conflict with names in user programs.

Some additional classes of identifier names are reserved for futureextensions to the C language or the POSIX.1 environment. While using thesenames for your own purposes right now might not cause a problem, they doraise the possibility of conflict with future versions of the Cor POSIX standards, so you should avoid these names.

  • Names beginning with a capital ‘E’ followed a digit or uppercaseletter may be used for additional error code names. SeeError Reporting.
  • Names that begin with either ‘is’ or ‘to’ followed by alowercase letter may be used for additional character testing andconversion functions. SeeCharacter Handling.
  • Names that begin with ‘LC_’ followed by an uppercase letter may beused for additional macros specifying locale attributes. SeeLocales.
  • Names of all existing mathematics functions (see Mathematics)suffixed with ‘f’ or ‘l’ are reserved for correspondingfunctions that operate onfloat andlong double arguments,respectively.
  • Names that begin with ‘SIG’ followed by an uppercase letter arereserved for additional signal names. SeeStandard Signals.
  • Names that begin with ‘SIG_’ followed by an uppercase letter arereserved for additional signal actions. SeeBasic Signal Handling.
  • Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by alowercase letter are reserved for additional string and array functions. SeeString and Array Utilities.
  • Names that end with ‘_t’ are reserved for additional type names.

In addition, some individual header files reserve names beyondthose that they actually define. You only need to worry about theserestrictions if your program includes that particular header file.

  • The header file dirent.h reserves names prefixed with‘d_’.
  • The header file fcntl.h reserves names prefixed with‘l_’, ‘F_’, ‘O_’, and ‘S_’.
  • The header file grp.h reserves names prefixed with ‘gr_’.
  • The header file limits.h reserves names suffixed with ‘_MAX’.
  • The header file pwd.h reserves names prefixed with ‘pw_’.
  • The header file signal.h reserves names prefixed with ‘sa_’and ‘SA_’.
  • The header file sys/stat.h reserves names prefixed with ‘st_’and ‘S_’.
  • The header file sys/times.h reserves names prefixed with ‘tms_’.
  • The header file termios.h reserves names prefixed with ‘c_’,‘V’, ‘I’, ‘O’, and ‘TC’; and names prefixed with‘B’ followed by a digit.


Previous: Reserved Names,Up: Using the Library

1.3.4 Feature Test Macros

The exact set of features available when you compile a source fileis controlled by whichfeature test macros you define.

If you compile your programs using ‘gcc -ansi’, you get only theISO C library features, unless you explicitly request additionalfeatures by defining one or more of the feature macros. SeeGNU CC Command Options,for more information about GCC options.

You should define these macros by using ‘#define’ preprocessordirectives at the top of your source code files. These directivesmust come before any#include of a system header file. Itis best to make them the very first thing in the file, preceded only bycomments. You could also use the ‘-D’ option to GCC, but it'sbetter if you make the source files indicate their own meaning in aself-contained way.

This system exists to allow the library to conform to multiple standards. Although the different standards are often described as supersets of eachother, they are usually incompatible because larger standards requirefunctions with names that smaller ones reserve to the user program. Thisis not mere pedantry — it has been a problem in practice. For instance,some non-GNU programs define functions namedgetline that havenothing to do with this library'sgetline. They would not becompilable if all features were enabled indiscriminately.

This should not be used to verify that a program conforms to a limitedstandard. It is insufficient for this purpose, as it will not protect youfrom including header files outside the standard, or relying on semanticsundefined within the standard.

— Macro: _POSIX_SOURCE

If you define this macro, then the functionality from the POSIX.1standard (IEEE Standard 1003.1) is available, as well as all of theISO C facilities.

The state of _POSIX_SOURCE is irrelevant if you define themacro _POSIX_C_SOURCE to a positive integer.

— Macro: _POSIX_C_SOURCE

Define this macro to a positive integer to control which POSIXfunctionality is made available. The greater the value of this macro,the more functionality is made available.

If you define this macro to a value greater than or equal to 1,then the functionality from the 1990 edition of the POSIX.1 standard(IEEE Standard 1003.1-1990) is made available.

If you define this macro to a value greater than or equal to 2,then the functionality from the 1992 edition of the POSIX.2 standard(IEEE Standard 1003.2-1992) is made available.

If you define this macro to a value greater than or equal to 199309L,then the functionality from the 1993 edition of the POSIX.1b standard(IEEE Standard 1003.1b-1993) is made available.

Greater values for _POSIX_C_SOURCE will enable future extensions. The POSIX standards process will define these values as necessary, andthe GNU C Library should support them some time after they become standardized. The 1996 edition of POSIX.1 (ISO/IEC 9945-1: 1996) states thatif you define _POSIX_C_SOURCE to a value greater thanor equal to199506L, then the functionality from the 1996edition is made available.

— Macro: _BSD_SOURCE

If you define this macro, functionality derived from 4.3 BSD Unix isincluded as well as the ISO C, POSIX.1, and POSIX.2 material.

Some of the features derived from 4.3 BSD Unix conflict with thecorresponding features specified by the POSIX.1 standard. If thismacro is defined, the 4.3 BSD definitions take precedence over thePOSIX definitions.

Due to the nature of some of the conflicts between 4.3 BSD and POSIX.1,you need to use a specialBSD compatibility library when linkingprograms compiled for BSD compatibility. This is because some functionsmust be defined in two different ways, one of them in the normal Clibrary, and one of them in the compatibility library. If your programdefines_BSD_SOURCE, you must give the option ‘-lbsd-compat’to the compiler or linker when linking the program, to tell it to findfunctions in this special compatibility library before looking for them inthe normal C library.

— Macro: _SVID_SOURCE

If you define this macro, functionality derived from SVID isincluded as well as the ISO C, POSIX.1, POSIX.2, and X/Open material.

— Macro: _XOPEN_SOURCE

— Macro: _XOPEN_SOURCE_EXTENDED

If you define this macro, functionality described in the X/OpenPortability Guide is included. This is a superset of the POSIX.1 andPOSIX.2 functionality and in fact_POSIX_SOURCE and_POSIX_C_SOURCE are automatically defined.

As the unification of all Unices, functionality only available inBSD and SVID is also included.

If the macro _XOPEN_SOURCE_EXTENDED is also defined, even morefunctionality is available. The extra functions will make all functionsavailable which are necessary for the X/Open Unix brand.

If the macro _XOPEN_SOURCE has the value 500 this includesall functionality described so far plus some new definitions from theSingle Unix Specification, version 2.

— Macro: _LARGEFILE_SOURCE

If this macro is defined some extra functions are available whichrectify a few shortcomings in all previous standards. Specifically,the functionsfseeko andftello are available. Withoutthese functions the difference between the ISO C interface(fseek,ftell) and the low-level POSIX interface(lseek) would lead to problems.

This macro was introduced as part of the Large File Support extension (LFS).

— Macro: _LARGEFILE64_SOURCE

If you define this macro an additional set of functions is made availablewhich enables 32 bit systems to use files of sizes beyondthe usual limit of 2GB. This interface is not available if the systemdoes not support files that large. On systems where the natural filesize limit is greater than 2GB (i.e., on 64 bit systems) the newfunctions are identical to the replaced functions.

The new functionality is made available by a new set of types andfunctions which replace the existing ones. The names of these new objectscontain64 to indicate the intention, e.g.,off_tvs. off64_t andfseeko vs. fseeko64.

This macro was introduced as part of the Large File Support extension(LFS). It is a transition interface for the period when 64 bitoffsets are not generally used (see_FILE_OFFSET_BITS).

— Macro: _FILE_OFFSET_BITS

This macro determines which file system interface shall be used, onereplacing the other. Whereas_LARGEFILE64_SOURCE makes the 64 bit interface available as an additional interface,_FILE_OFFSET_BITS allows the 64 bit interface toreplace the old interface.

If _FILE_OFFSET_BITS is undefined, or if it is defined to thevalue32, nothing changes. The 32 bit interface is used andtypes likeoff_t have a size of 32 bits on 32 bitsystems.

If the macro is defined to the value 64, the large file interfacereplaces the old interface. I.e., the functions are not made availableunder different names (as they are with_LARGEFILE64_SOURCE). Instead the old function names now reference the new functions, e.g., acall tofseeko now indeed calls fseeko64.

This macro should only be selected if the system provides mechanisms forhandling large files. On 64 bit systems this macro has no effectsince the*64 functions are identical to the normal functions.

This macro was introduced as part of the Large File Support extension(LFS).

— Macro: _ISOC99_SOURCE

Until the revised ISO C standard is widely adopted the new featuresare not automatically enabled. The GNU C Library nevertheless has a completeimplementation of the new standard and to enable the new features themacro_ISOC99_SOURCE should be defined.

— Macro: _GNU_SOURCE

If you define this macro, everything is included: ISO C89, ISO C99, POSIX.1, POSIX.2, BSD, SVID, X/Open, LFS, and GNU extensions. Inthe cases where POSIX.1 conflicts with BSD, the POSIX definitions takeprecedence.

If you want to get the full effect of _GNU_SOURCE but make theBSD definitions take precedence over the POSIX definitions, use thissequence of definitions:

          #define _GNU_SOURCE
          #define _BSD_SOURCE
          #define _SVID_SOURCE

Note that if you do this, you must link your program with the BSDcompatibility library by passing the ‘-lbsd-compat’ option to thecompiler or linker.NB: If you forget to do this, you mayget very strange errors at run time.

— Macro: _REENTRANT

— Macro: _THREAD_SAFE

If you define one of these macros, reentrant versions of several functions getdeclared. Some of the functions are specified in POSIX.1c but many othersare only available on a few other systems or are unique to the GNU C Library. The problem is the delay in the standardization of the thread safe C libraryinterface.

Unlike on some other systems, no special version of the C library must beused for linking. There is only one version but while compiling thisit must have been specified to compile as thread safe.

We recommend you use _GNU_SOURCE in new programs. If you don'tspecify the ‘-ansi’ option to GCC and don't define any of thesemacros explicitly, the effect is the same as defining_POSIX_C_SOURCE to 2 and _POSIX_SOURCE,_SVID_SOURCE, and _BSD_SOURCE to 1.

When you define a feature test macro to request a larger class of features,it is harmless to define in addition a feature test macro for a subset ofthose features. For example, if you define_POSIX_C_SOURCE, thendefining_POSIX_SOURCE as well has no effect. Likewise, if youdefine_GNU_SOURCE, then defining either_POSIX_SOURCE or_POSIX_C_SOURCE or_SVID_SOURCE as well has no effect.

Note, however, that the features of _BSD_SOURCE are not a subset ofany of the other feature test macros supported. This is because it definesBSD features that take precedence over the POSIX features that arerequested by the other macros. For this reason, defining_BSD_SOURCE in addition to the other feature test macros does havean effect: it causes the BSD features to take priority over the conflictingPOSIX features.


Previous: Using the Library,Up: Introduction

1.4 Roadmap to the Manual

Here is an overview of the contents of the remaining chapters ofthis manual.

  • Error Reporting, describes how errors detected by the libraryare reported.
  • Language Features, contains information about library support forstandard parts of the C language, including things like thesizeofoperator and the symbolic constant NULL, how to write functionsaccepting variable numbers of arguments, and constants describing theranges and other properties of the numerical types. There is also a simpledebugging mechanism which allows you to put assertions in your code, andhave diagnostic messages printed if the tests fail.
  • Memory, describes the GNU C Library's facilities for managing andusing virtual and real memory, including dynamic allocation of virtualmemory. If you do not know in advance how much memory your programneeds, you can allocate it dynamically instead, and manipulate it viapointers.
  • Character Handling, contains information about characterclassification functions (such asisspace) and functions forperforming case conversion.
  • String and Array Utilities, has descriptions of functions formanipulating strings (null-terminated character arrays) and generalbyte arrays, including operations such as copying and comparison.
  • I/O Overview, gives an overall look at the input and outputfacilities in the library, and contains information about basic conceptssuch as file names.
  • I/O on Streams, describes I/O operations involving streams (orFILE * objects). These are the normal C library functionsfromstdio.h.
  • Low-Level I/O, contains information about I/O operationson file descriptors. File descriptors are a lower-level mechanismspecific to the Unix family of operating systems.
  • File System Interface, has descriptions of operations on entirefiles, such as functions for deleting and renaming them and for creatingnew directories. This chapter also contains information about how youcan access the attributes of a file, such as its owner and file protectionmodes.
  • Pipes and FIFOs, contains information about simple interprocesscommunication mechanisms. Pipes allow communication between two relatedprocesses (such as between a parent and child), while FIFOs allowcommunication between processes sharing a common file system on the samemachine.
  • Sockets, describes a more complicated interprocess communicationmechanism that allows processes running on different machines tocommunicate over a network. This chapter also contains information aboutInternet host addressing and how to use the system network databases.
  • Low-Level Terminal Interface, describes how you can change theattributes of a terminal device. If you want to disable echo ofcharacters typed by the user, for example, read this chapter.
  • Mathematics, contains information about the math libraryfunctions. These include things like random-number generators andremainder functions on integers as well as the usual trigonometric andexponential functions on floating-point numbers.
  • Low-Level Arithmetic Functions, describes functionsfor simple arithmetic, analysis of floating-point values, and readingnumbers from strings.
  • Searching and Sorting, contains information about functionsfor searching and sorting arrays. You can use these functions on anykind of array by providing an appropriate comparison function.
  • Pattern Matching, presents functions for matching regular expressionsand shell file name patterns, and for expanding words as the shell does.
  • Date and Time, describes functions for measuring both calendar timeand CPU time, as well as functions for setting alarms and timers.
  • Character Set Handling, contains information about manipulatingcharacters and strings using character sets larger than will fit inthe usualchar data type.
  • Locales, describes how selecting a particular countryor language affects the behavior of the library. For example, the localeaffects collation sequences for strings and how monetary values areformatted.
  • Non-Local Exits, contains descriptions of thesetjmp andlongjmp functions. These functions provide a facility forgoto-like jumps which can jump from one function to another.
  • Signal Handling, tells you all about signals—what they are,how to establish a handler that is called when a particular kind ofsignal is delivered, and how to prevent signals from arriving duringcritical sections of your program.
  • Program Basics, tells how your programs can access theircommand-line arguments and environment variables.
  • Processes, contains information about how to start new processesand run programs.
  • Job Control, describes functions for manipulating process groupsand the controlling terminal. This material is probably only ofinterest if you are writing a shell or other program which handles jobcontrol specially.
  • Name Service Switch, describes the services which are availablefor looking up names in the system databases, how to determine whichservice is used for which database, and how these services areimplemented so that contributors can design their own services.
  • User Database, andGroup Database, tell you how to accessthe system user and group databases.
  • System Management, describes functions for controlling and gettinginformation about the hardware and software configuration your programis executing under.
  • System Configuration, tells you how you can get information aboutvarious operating system limits. Most of these parameters are provided forcompatibility with POSIX.
  • Library Summary, gives a summary of all the functions, variables, andmacros in the library, with complete data types and function prototypes,and says what standard or system each is derived from.
  • Installation, explains how to build and install the GNU C Library onyour system, and how to report any bugs you might find.
  • Maintenance, explains how to add new functions or port thelibrary to a new system.

If you already know the name of the facility you are interested in, youcan look it up inLibrary Summary. This gives you a summary ofits syntax and a pointer to where you can find a more detaileddescription. This appendix is particularly useful if you just want toverify the order and type of arguments to a function, for example. Italso tells you what standard or system each function, variable, or macrois derived from.


Next: Memory,Previous: Introduction,Up: Top

2 Error Reporting

Many functions in the GNU C Library detect and report error conditions,and sometimes your programs need to check for these error conditions. For example, when you open an input file, you should verify that thefile was actually opened correctly, and print an error message or takeother appropriate action if the call to the library function failed.

This chapter describes how the error reporting facility works. Yourprogram should include the header fileerrno.h to use thisfacility.


Next: Error Codes,Up: Error Reporting

2.1 Checking for Errors

Most library functions return a special value to indicate that they havefailed. The special value is typically-1, a null pointer, or aconstant such asEOF that is defined for that purpose. But thisreturn value tells you only that an error has occurred. To find outwhat kind of error it was, you need to look at the error code stored in thevariableerrno. This variable is declared in the header fileerrno.h.

— Variable: volatile int errno

The variable errno contains the system error number. You canchange the value oferrno.

Since errno is declared volatile, it might be changedasynchronously by a signal handler; seeDefining Handlers. However, a properly written signal handler saves and restores the valueoferrno, so you generally do not need to worry about thispossibility except when writing signal handlers.

The initial value of errno at program startup is zero. Manylibrary functions are guaranteed to set it to certain nonzero valueswhen they encounter certain kinds of errors. These error conditions arelisted for each function. These functions do not change errnowhen they succeed; thus, the value of errno after a successfulcall is not necessarily zero, and you should not useerrno todeterminewhether a call failed. The proper way to do that isdocumented for each function.If the call failed, you canexamineerrno.

Many library functions can set errno to a nonzero value as aresult of calling other library functions which might fail. You shouldassume that any library function might altererrno when thefunction returns an error.

Portability Note: ISO C specifies errno as a“modifiable lvalue” rather than as a variable, permitting it to beimplemented as a macro. For example, its expansion might involve afunction call, like*__errno_location (). In fact, that iswhat it ison GNU/Linux and GNU/Hurd systems. The GNU C Library, on each system, doeswhatever is right for the particular system.

There are a few library functions, like sqrt and atan,that return a perfectly legitimate value in case of an error, but alsoseterrno. For these functions, if you want to check to seewhether an error occurred, the recommended method is to seterrnoto zero before calling the function, and then check its value afterward.

All the error codes have symbolic names; they are macros defined inerrno.h. The names start with ‘E’ and an upper-caseletter or digit; you should consider names of this form to bereserved names. See Reserved Names.

The error code values are all positive integers and are all distinct,with one exception:EWOULDBLOCK andEAGAIN are the same. Since the values are distinct, you can use them as labels in aswitch statement; just don't use bothEWOULDBLOCK andEAGAIN. Your program should not make any other assumptions aboutthe specific values of these symbolic constants.

The value of errno doesn't necessarily have to correspond to anyof these macros, since some library functions might return other errorcodes of their own for other situations. The only values that areguaranteed to be meaningful for a particular library function are theones that this manual lists for that function.

Except on GNU/Hurd systems, almost any system call can return EFAULT ifit is given an invalid pointer as an argument. Since this could onlyhappen as a result of a bug in your program, and since it will nothappen on GNU/Hurd systems, we have saved space by not mentioningEFAULT in the descriptions of individual functions.

In some Unix systems, many system calls can also return EFAULT ifgiven as an argument a pointer into the stack, and the kernel for someobscure reason fails in its attempt to extend the stack. If this everhappens, you should probably try using statically or dynamicallyallocated memory instead of stack memory on that system.


Next: Error Messages,Previous: Checking for Errors,Up: Error Reporting

2.2 Error Codes

The error code macros are defined in the header fileerrno.h. All of them expand into integer constant values. Some of these errorcodes can't occur on GNU systems, but they can occur using the GNU C Libraryon other systems.

— Macro: int EPERM

Operation not permitted; only the owner of the file (or other resource)or processes with special privileges can perform the operation.

— Macro: int ENOENT

No such file or directory. This is a “file doesn't exist” errorfor ordinary files that are referenced in contexts where they areexpected to already exist.

— Macro: int ESRCH

No process matches the specified process ID.

— Macro: int EINTR

Interrupted function call; an asynchronous signal occurred and preventedcompletion of the call. When this happens, you should try the callagain.

You can choose to have functions resume after a signal that is handled,rather than failing withEINTR; seeInterrupted Primitives.

— Macro: int EIO

Input/output error; usually used for physical read or write errors.

— Macro: int ENXIO

No such device or address. The system tried to use the devicerepresented by a file you specified, and it couldn't find the device. This can mean that the device file was installed incorrectly, or thatthe physical device is missing or not correctly attached to thecomputer.

— Macro: int E2BIG

Argument list too long; used when the arguments passed to a new programbeing executed with one of theexec functions (seeExecuting a File) occupy too much memory space. This condition never arises onGNU/Hurd systems.

— Macro: int ENOEXEC

Invalid executable file format. This condition is detected by theexec functions; seeExecuting a File.

— Macro: int EBADF

Bad file descriptor; for example, I/O on a descriptor that has beenclosed or reading from a descriptor open only for writing (or viceversa).

— Macro: int ECHILD

There are no child processes. This error happens on operations that aresupposed to manipulate child processes, when there aren't any processesto manipulate.

— Macro: int EDEADLK

Deadlock avoided; allocating a system resource would have resulted in adeadlock situation. The system does not guarantee that it will noticeall such situations. This error means you got lucky and the systemnoticed; it might just hang. SeeFile Locks, for an example.

— Macro: int ENOMEM

No memory available. The system cannot allocate more virtual memorybecause its capacity is full.

— Macro: int EACCES

Permission denied; the file permissions do not allow the attempted operation.

— Macro: int EFAULT

Bad address; an invalid pointer was detected. On GNU/Hurd systems, this error never happens; you get a signal instead.

— Macro: int ENOTBLK

A file that isn't a block special file was given in a situation thatrequires one. For example, trying to mount an ordinary file as a filesystem in Unix gives this error.

— Macro: int EBUSY

Resource busy; a system resource that can't be shared is already in use. For example, if you try to delete a file that is the root of a currentlymounted filesystem, you get this error.

— Macro: int EEXIST

File exists; an existing file was specified in a context where it onlymakes sense to specify a new file.

— Macro: int EXDEV

An attempt to make an improper link across file systems was detected. This happens not only when you uselink (seeHard Links) butalso when you rename a file with rename (see Renaming Files).

— Macro: int ENODEV

The wrong type of device was given to a function that expects aparticular sort of device.

— Macro: int ENOTDIR

A file that isn't a directory was specified when a directory is required.

— Macro: int EISDIR

File is a directory; you cannot open a directory for writing,or create or remove hard links to it.

— Macro: int EINVAL

Invalid argument. This is used to indicate various kinds of problemswith passing the wrong argument to a library function.

— Macro: int EMFILE

The current process has too many files open and can't open any more. Duplicate descriptors do count toward this limit.

In BSD and GNU, the number of open files is controlled by a resourcelimit that can usually be increased. If you get this error, you mightwant to increase theRLIMIT_NOFILE limit or make it unlimited;seeLimits on Resources.

— Macro: int ENFILE

There are too many distinct file openings in the entire system. Notethat any number of linked channels count as just one file opening; seeLinked Channels. This error never occurs on GNU/Hurd systems.

— Macro: int ENOTTY

Inappropriate I/O control operation, such as trying to set terminalmodes on an ordinary file.

— Macro: int ETXTBSY

An attempt to execute a file that is currently open for writing, orwrite to a file that is currently being executed. Often using adebugger to run a program is considered having it open for writing andwill cause this error. (The name stands for “text file busy”.) Thisis not an error on GNU/Hurd systems; the text is copied as necessary.

— Macro: int EFBIG

File too big; the size of a file would be larger than allowed by the system.

— Macro: int ENOSPC

No space left on device; write operation on a file failed because thedisk is full.

— Macro: int ESPIPE

Invalid seek operation (such as on a pipe).

— Macro: int EROFS

An attempt was made to modify something on a read-only file system.

— Macro: int EMLINK

Too many links; the link count of a single file would become too large. rename can cause this error if the file being renamed already hasas many links as it can take (seeRenaming Files).

— Macro: int EPIPE

Broken pipe; there is no process reading from the other end of a pipe. Every library function that returns this error code also generates aSIGPIPE signal; this signal terminates the program if not handledor blocked. Thus, your program will never actually see EPIPEunless it has handled or blocked SIGPIPE.

— Macro: int EDOM

Domain error; used by mathematical functions when an argument value doesnot fall into the domain over which the function is defined.

— Macro: int ERANGE

Range error; used by mathematical functions when the result value isnot representable because of overflow or underflow.

— Macro: int EAGAIN

Resource temporarily unavailable; the call might work if you try againlater. The macroEWOULDBLOCK is another name forEAGAIN;they are always the same in the GNU C Library.

This error can happen in a few different situations:

  • An operation that would block was attempted on an object that hasnon-blocking mode selected. Trying the same operation again will blockuntil some external condition makes it possible to read, write, orconnect (whatever the operation). You can useselect to find outwhen the operation will be possible; see Waiting for I/O.

    Portability Note: In many older Unix systems, this conditionwas indicated byEWOULDBLOCK, which was a distinct error codedifferent fromEAGAIN. To make your program portable, you shouldcheck for both codes and treat them the same.

  • A temporary resource shortage made an operation impossible. forkcan return this error. It indicates that the shortage is expected topass, so your program can try the call again later and it may succeed. It is probably a good idea to delay for a few seconds before trying itagain, to allow time for other processes to release scarce resources. Such shortages are usually fairly serious and affect the whole system,so usually an interactive program should report the error to the userand return to its command loop.

— Macro: int EWOULDBLOCK

In the GNU C Library, this is another name for EAGAIN (above). The values are always the same, on every operating system.

C libraries in many older Unix systems have EWOULDBLOCK as aseparate error code.

— Macro: int EINPROGRESS

An operation that cannot complete immediately was initiated on an objectthat has non-blocking mode selected. Some functions that must alwaysblock (such asconnect; seeConnecting) never returnEAGAIN. Instead, they return EINPROGRESS to indicate thatthe operation has begun and will take some time. Attempts to manipulatethe object before the call completes returnEALREADY. You canuse the select function to find out when the pending operationhas completed; seeWaiting for I/O.

— Macro: int EALREADY

An operation is already in progress on an object that has non-blockingmode selected.

— Macro: int ENOTSOCK

A file that isn't a socket was specified when a socket is required.

— Macro: int EMSGSIZE

The size of a message sent on a socket was larger than the supportedmaximum size.

— Macro: int EPROTOTYPE

The socket type does not support the requested communications protocol.

— Macro: int ENOPROTOOPT

You specified a socket option that doesn't make sense for theparticular protocol being used by the socket. SeeSocket Options.

— Macro: int EPROTONOSUPPORT

The socket domain does not support the requested communications protocol(perhaps because the requested protocol is completely invalid). SeeCreating a Socket.

— Macro: int ESOCKTNOSUPPORT

The socket type is not supported.

— Macro: int EOPNOTSUPP

The operation you requested is not supported. Some socket functionsdon't make sense for all types of sockets, and others may not beimplemented for all communications protocols. On GNU/Hurd systems, thiserror can happen for many calls when the object does not support theparticular operation; it is a generic indication that the server knowsnothing to do for that call.

— Macro: int EPFNOSUPPORT

The socket communications protocol family you requested is not supported.

— Macro: int EAFNOSUPPORT

The address family specified for a socket is not supported; it isinconsistent with the protocol being used on the socket. SeeSockets.

— Macro: int EADDRINUSE

The requested socket address is already in use. See Socket Addresses.

— Macro: int EADDRNOTAVAIL

The requested socket address is not available; for example, you triedto give a socket a name that doesn't match the local host name. SeeSocket Addresses.

— Macro: int ENETDOWN

A socket operation failed because the network was down.

— Macro: int ENETUNREACH

A socket operation failed because the subnet containing the remote hostwas unreachable.

— Macro: int ENETRESET

A network connection was reset because the remote host crashed.

— Macro: int ECONNABORTED

A network connection was aborted locally.

— Macro: int ECONNRESET

A network connection was closed for reasons outside the control of thelocal host, such as by the remote machine rebooting or an unrecoverableprotocol violation.

— Macro: int ENOBUFS

The kernel's buffers for I/O operations are all in use. In GNU, thiserror is always synonymous withENOMEM; you may get one or theother from network operations.

— Macro: int EISCONN

You tried to connect a socket that is already connected. See Connecting.

— Macro: int ENOTCONN

The socket is not connected to anything. You get this error when youtry to transmit data over a socket, without first specifying adestination for the data. For a connectionless socket (for datagramprotocols, such as UDP), you getEDESTADDRREQ instead.

— Macro: int EDESTADDRREQ

No default destination address was set for the socket. You get thiserror when you try to transmit data over a connectionless socket,without first specifying a destination for the data withconnect.

— Macro: int ESHUTDOWN

The socket has already been shut down.

— Macro: int ETOOMANYREFS

???

— Macro: int ETIMEDOUT

A socket operation with a specified timeout received no response duringthe timeout period.

— Macro: int ECONNREFUSED

A remote host refused to allow the network connection (typically becauseit is not running the requested service).

— Macro: int ELOOP

Too many levels of symbolic links were encountered in looking up a file name. This often indicates a cycle of symbolic links.

— Macro: int ENAMETOOLONG

Filename too long (longer than PATH_MAX; see Limits for Files) or host name too long (in gethostname orsethostname; seeHost Identification).

— Macro: int EHOSTDOWN

The remote host for a requested network connection is down.

— Macro: int EHOSTUNREACH

The remote host for a requested network connection is not reachable.

— Macro: int ENOTEMPTY

Directory not empty, where an empty directory was expected. Typically,this error occurs when you are trying to delete a directory.

— Macro: int EPROCLIM

This means that the per-user limit on new process would be exceeded byan attemptedfork. SeeLimits on Resources, for details onthe RLIMIT_NPROC limit.

— Macro: int EUSERS

The file quota system is confused because there are too many users.

— Macro: int EDQUOT

The user's disk quota was exceeded.

— Macro: int ESTALE

Stale NFS file handle. This indicates an internal confusion in the NFSsystem which is due to file system rearrangements on the server host. Repairing this condition usually requires unmounting and remountingthe NFS file system on the local host.

— Macro: int EREMOTE

An attempt was made to NFS-mount a remote file system with a file name thatalready specifies an NFS-mounted file. (This is an error on some operating systems, but we expect it to workproperly on GNU/Hurd systems, making this error code impossible.)

— Macro: int EBADRPC

???

— Macro: int ERPCMISMATCH

???

— Macro: int EPROGUNAVAIL

???

— Macro: int EPROGMISMATCH

???

— Macro: int EPROCUNAVAIL

???

— Macro: int ENOLCK

No locks available. This is used by the file locking facilities; seeFile Locks. This error is never generated by GNU/Hurd systems, butit can result from an operation to an NFS server running anotheroperating system.

— Macro: int EFTYPE

Inappropriate file type or format. The file was the wrong type for theoperation, or a data file had the wrong format.

On some systems chmod returns this error if you try to set thesticky bit on a non-directory file; seeSetting Permissions.

— Macro: int EAUTH

???

— Macro: int ENEEDAUTH

???

— Macro: int ENOSYS

Function not implemented. This indicates that the function called isnot implemented at all, either in the C library itself or in theoperating system. When you get this error, you can be sure that thisparticular function will always fail withENOSYS unless youinstall a new version of the C library or the operating system.

— Macro: int ENOTSUP

Not supported. A function returns this error when certain parametervalues are valid, but the functionality they request is not available. This can mean that the function does not implement a particular commandor option value or flag bit at all. For functions that operate on someobject given in a parameter, such as a file descriptor or a port, itmight instead mean that onlythat specific object (filedescriptor, port, etc.) is unable to support the other parameters given;different file descriptors might support different ranges of parametervalues.

If the entire function is not available at all in the implementation,it returnsENOSYS instead.

— Macro: int EILSEQ

While decoding a multibyte character the function came along an invalidor an incomplete sequence of bytes or the given wide character is invalid.

— Macro: int EBACKGROUND

On GNU/Hurd systems, servers supporting the term protocol returnthis error for certain operations when the caller is not in theforeground process group of the terminal. Users do not usually see thiserror because functions such asread and write translateit into a SIGTTIN orSIGTTOU signal. SeeJob Control,for information on process groups and these signals.

— Macro: int EDIED

On GNU/Hurd systems, opening a file returns this error when the file istranslated by a program and the translator program dies while startingup, before it has connected to the file.

— Macro: int ED

The experienced user will know what is wrong.

— Macro: int EGREGIOUS

You did what?

— Macro: int EIEIO

Go home and have a glass of warm, dairy-fresh milk.

— Macro: int EGRATUITOUS

This error code has no purpose.

— Macro: int EBADMSG

— Macro: int EIDRM

— Macro: int EMULTIHOP

— Macro: int ENODATA

— Macro: int ENOLINK

— Macro: int ENOMSG

— Macro: int ENOSR

— Macro: int ENOSTR

— Macro: int EOVERFLOW

— Macro: int EPROTO

— Macro: int ETIME

— Macro: int ECANCELED

Operation canceled; an asynchronous operation was canceled before itcompleted. SeeAsynchronous I/O. When you callaio_cancel,the normal result is for the operations affected to complete with thiserror; seeCancel AIO Operations.

The following error codes are defined by the Linux/i386 kernel. They are not yet documented.

— Macro: int ERESTART

— Macro: int ECHRNG

— Macro: int EL2NSYNC

— Macro: int EL3HLT

— Macro: int EL3RST

— Macro: int ELNRNG

— Macro: int EUNATCH

— Macro: int ENOCSI

— Macro: int EL2HLT

— Macro: int EBADE

— Macro: int EBADR

— Macro: int EXFULL

— Macro: int ENOANO

— Macro: int EBADRQC

— Macro: int EBADSLT

— Macro: int EDEADLOCK

— Macro: int EBFONT

— Macro: int ENONET

— Macro: int ENOPKG

— Macro: int EADV

— Macro: int ESRMNT

— Macro: int ECOMM

— Macro: int EDOTDOT

— Macro: int ENOTUNIQ

— Macro: int EBADFD

— Macro: int EREMCHG

— Macro: int ELIBACC

— Macro: int ELIBBAD

— Macro: int ELIBSCN

— Macro: int ELIBMAX

— Macro: int ELIBEXEC

— Macro: int ESTRPIPE

— Macro: int EUCLEAN

— Macro: int ENOTNAM

— Macro: int ENAVAIL

— Macro: int EISNAM

— Macro: int EREMOTEIO

— Macro: int ENOMEDIUM

— Macro: int EMEDIUMTYPE

— Macro: int ENOKEY

— Macro: int EKEYEXPIRED

— Macro: int EKEYREVOKED

— Macro: int EKEYREJECTED

— Macro: int EOWNERDEAD

— Macro: int ENOTRECOVERABLE

— Macro: int ERFKILL

— Macro: int EHWPOISON


Previous: Error Codes,Up: Error Reporting

2.3 Error Messages

The library has functions and variables designed to make it easy foryour program to report informative error messages in the customaryformat about the failure of a library call. The functionsstrerror andperror give you the standard error messagefor a given error code; the variableprogram_invocation_short_name gives you convenient access to thename of the program that encountered the error.

— Function: char * strerror (int errnum)

The strerror function maps the error code (see Checking for Errors) specified by the errnum argument to a descriptive errormessage string. The return value is a pointer to this string.

The value errnum normally comes from the variable errno.

You should not modify the string returned by strerror. Also, ifyou make subsequent calls tostrerror, the string might beoverwritten. (But it's guaranteed that no library function ever callsstrerror behind your back.)

The function strerror is declared in string.h.

— Function: char * strerror_r (int errnum, char *buf, size_t n)

The strerror_r function works like strerror but instead ofreturning the error message in a statically allocated buffer shared byall threads in the process, it returns a private copy for thethread. This might be either some permanent global data or a messagestring in the user supplied buffer starting at buf with thelength ofn bytes.

At most n characters are written (including the NUL byte) so it isup to the user to select the buffer large enough.

This function should always be used in multi-threaded programs sincethere is no way to guarantee the string returned bystrerrorreally belongs to the last call of the current thread.

This function strerror_r is a GNU extension and it is declared instring.h.

— Function: void perror (const char *message)

This function prints an error message to the stream stderr;see Standard Streams. The orientation of stderr is notchanged.

If you call perror with a message that is either a nullpointer or an empty string,perror just prints the error messagecorresponding toerrno, adding a trailing newline.

If you supply a non-null message argument, then perrorprefixes its output with this string. It adds a colon and a spacecharacter to separate themessage from the error string correspondingtoerrno.

The function perror is declared in stdio.h.

strerror and perror produce the exact same message for anygiven error code; the precise text varies from system to system. Withthe GNU C Library, the messages are fairly short; there are no multi-linemessages or embedded newlines. Each error message begins with a capitalletter and does not include any terminating punctuation.

Compatibility Note: The strerror function was introducedin ISO C89. Many older C systems do not support this function yet.

Many programs that don't read input from the terminal are designed toexit if any system call fails. By convention, the error message fromsuch a program should start with the program's name, sans directories. You can find that name in the variableprogram_invocation_short_name; the full file name is stored thevariableprogram_invocation_name.

— Variable: char * program_invocation_name

This variable's value is the name that was used to invoke the programrunning in the current process. It is the same asargv[0]. Notethat this is not necessarily a useful file name; often it contains nodirectory names. SeeProgram Arguments.

— Variable: char * program_invocation_short_name

This variable's value is the name that was used to invoke the programrunning in the current process, with directory names removed. (That isto say, it is the same asprogram_invocation_name minuseverything up to the last slash, if any.)

The library initialization code sets up both of these variables beforecalling main.

Portability Note: These two variables are GNU extensions. Ifyou want your program to work with non-GNU libraries, you must save thevalue ofargv[0] inmain, and then strip off the directorynames yourself. We added these extensions to make it possible to writeself-contained error-reporting subroutines that require no explicitcooperation frommain.

Here is an example showing how to handle failure to open a filecorrectly. The functionopen_sesame tries to open the named filefor reading and returns a stream if successful. Thefopenlibrary function returns a null pointer if it couldn't open the file forsome reason. In that situation,open_sesame constructs anappropriate error message using thestrerror function, andterminates the program. If we were going to make some other librarycalls before passing the error code tostrerror, we'd have tosave it in a local variable instead, because those other libraryfunctions might overwriteerrno in the meantime.

     #include <errno.h>
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     
     FILE *
     open_sesame (char *name)
     {
       FILE *stream;
     
       errno = 0;
       stream = fopen (name, "r");
       if (stream == NULL)
         {
           fprintf (stderr, "%s: Couldn't open file %s; %s\n",
                    program_invocation_short_name, name, strerror (errno));
           exit (EXIT_FAILURE);
         }
       else
         return stream;
     }

Using perror has the advantage that the function is portable andavailable on all systems implementing ISO C. But often the textperror generates is not what is wanted and there is no way toextend or change whatperror does. The GNU coding standard, forinstance, requires error messages to be preceded by the program name andprograms which read some input files should provide informationabout the input file name and the line number in case an error isencountered while reading the file. For these occasions there are twofunctions available which are widely used throughout the GNU project. These functions are declared inerror.h.

— Function: void error (int status, int errnum, const char *format,...)

The error function can be used to report general problems duringprogram execution. Theformat argument is a format string justlike those given to theprintf family of functions. Thearguments required for the format can follow theformat parameter. Just likeperror, error also can report an error code intextual form. But unlikeperror the error value is explicitlypassed to the function in theerrnum parameter. This eliminatesthe problem mentioned above that the error reporting function must becalled immediately after the function causing the error since otherwiseerrno might have a different value.

The error prints first the program name. If the applicationdefined a global variableerror_print_progname and points it to afunction this function will be called to print the program name. Otherwise the string from the global variableprogram_name isused. The program name is followed by a colon and a space which in turnis followed by the output produced by the format string. If theerrnum parameter is non-zero the format string output is followedby a colon and a space, followed by the error message for the error codeerrnum. In any case is the output terminated with a newline.

The output is directed to the stderr stream. If thestderr wasn't oriented before the call it will be narrow-orientedafterwards.

The function will return unless the status parameter has anon-zero value. In this case the function will callexit withthestatus value for its parameter and therefore never return. Iferror returns the global variableerror_message_count isincremented by one to keep track of the number of errors reported.

— Function: void error_at_line (int status, int errnum, const char *fname, unsigned int lineno, const char *format,...)

The error_at_line function is very similar to the errorfunction. The only difference are the additional parametersfnameandlineno. The handling of the other parameters is identical tothat oferror except that between the program name and the stringgenerated by the format string additional text is inserted.

Directly following the program name a colon, followed by the file namepointer to byfname, another colon, and a value oflineno isprinted.

This additional output of course is meant to be used to locate an errorin an input file (like a programming language source code file etc).

If the global variable error_one_per_line is set to a non-zerovalueerror_at_line will avoid printing consecutive messages forthe same file and line. Repetition which are not directly followingeach other are not caught.

Just like error this function only returned if status iszero. Otherwiseexit is called with the non-zero value. Iferror returns the global variableerror_message_count isincremented by one to keep track of the number of errors reported.

As mentioned above the error and error_at_line functionscan be customized by defining a variable namederror_print_progname.

— Variable: void (*) error_print_progname (void)

If the error_print_progname variable is defined to a non-zerovalue the function pointed to is called byerror orerror_at_line. It is expected to print the program name or dosomething similarly useful.

The function is expected to be print to the stderr stream andmust be able to handle whatever orientation the stream has.

The variable is global and shared by all threads.

— Variable: unsigned int error_message_count

The error_message_count variable is incremented whenever one ofthe functionserror orerror_at_line returns. Thevariable is global and shared by all threads.

— Variable: int error_one_per_line

The error_one_per_line variable influences onlyerror_at_line. Normally theerror_at_line functioncreates output for every invocation. Iferror_one_per_line isset to a non-zero value error_at_line keeps track of the lastfile name and line number for which an error was reported and avoiddirectly following messages for the same file and line. This variableis global and shared by all threads.

A program which read some input file and reports errors in it could looklike this:

     {
       char *line = NULL;
       size_t len = 0;
       unsigned int lineno = 0;
     
       error_message_count = 0;
       while (! feof_unlocked (fp))
         {
           ssize_t n = getline (&line, &len, fp);
           if (n <= 0)
             /* End of file or error.  */
             break;
           ++lineno;
     
           /* Process the line.  */
           ...
     
           if (Detect error in line)
             error_at_line (0, errval, filename, lineno,
                            "some error text %s", some_variable);
         }
     
       if (error_message_count != 0)
         error (EXIT_FAILURE, 0, "%u errors found", error_message_count);
     }

error and error_at_line are clearly the functions ofchoice and enable the programmer to write applications which follow theGNU coding standard. The GNU C Library additionally contains functions whichare used in BSD for the same purpose. These functions are declared inerr.h. It is generally advised to not use these functions. Theyare included only for compatibility.

— Function: void warn (const char *format,...)

The warn function is roughly equivalent to a call like

            error (0, errno, format, the parameters)

except that the global variables error respects and modifiesare not used.

— Function: void vwarn (const char *format, va_list ap)

The vwarn function is just like warn except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list.

— Function: void warnx (const char *format,...)

The warnx function is roughly equivalent to a call like

            error (0, 0, format, the parameters)

except that the global variables error respects and modifiesare not used. The difference towarn is that no error numberstring is printed.

— Function: void vwarnx (const char *format, va_list ap)

The vwarnx function is just like warnx except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list.

— Function: void err (int status, const char *format,...)

The err function is roughly equivalent to a call like

            error (status, errno, format, the parameters)

except that the global variables error respects and modifiesare not used and that the program is exited even ifstatus is zero.

— Function: void verr (int status, const char *format, va_list ap)

The verr function is just like err except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list.

— Function: void errx (int status, const char *format,...)

The errx function is roughly equivalent to a call like

            error (status, 0, format, the parameters)

except that the global variables error respects and modifiesare not used and that the program is exited even ifstatusis zero. The difference toerr is that no error numberstring is printed.

— Function: void verrx (int status, const char *format, va_list ap)

The verrx function is just like errx except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list.


Next: Character Handling,Previous: Error Reporting,Up: Top

3 Virtual Memory Allocation And Paging

This chapter describes how processes manage and use memory in a systemthat uses the GNU C Library.

The GNU C Library has several functions for dynamically allocatingvirtual memory in various ways. They vary in generality and inefficiency. The library also provides functions for controlling pagingand allocation of real memory.

Memory mapped I/O is not discussed in this chapter. See Memory-mapped I/O.


Next: Memory Allocation,Up: Memory

3.1 Process Memory Concepts

One of the most basic resources a process has available to it is memory. There are a lot of different ways systems organize memory, but in atypical one, each process has one linear virtual address space, withaddresses running from zero to some huge maximum. It need not becontiguous; i.e., not all of these addresses actually can be used tostore data.

The virtual memory is divided into pages (4 kilobytes is typical). Backing each page of virtual memory is a page of real memory (called aframe) or some secondary storage, usually disk space. The diskspace might be swap space or just some ordinary disk file. Actually, apage of all zeroes sometimes has nothing at all backing it – there'sjust a flag saying it is all zeroes.The same frame of real memory or backing store can back multiple virtualpages belonging to multiple processes. This is normally the case, forexample, with virtual memory occupied by GNU C Library code. The samereal memory frame containing theprintf function backs a virtualmemory page in each of the existing processes that has aprintfcall in its program.

In order for a program to access any part of a virtual page, the pagemust at that moment be backed by (“connected to”) a real frame. Butbecause there is usually a lot more virtual memory than real memory, thepages must move back and forth between real memory and backing storeregularly, coming into real memory when a process needs to access themand then retreating to backing store when not needed anymore. Thismovement is calledpaging.

When a program attempts to access a page which is not at that momentbacked by real memory, this is known as apage fault. When a pagefault occurs, the kernel suspends the process, places the page into areal page frame (this is called “paging in” or “faulting in”), thenresumes the process so that from the process' point of view, the pagewas in real memory all along. In fact, to the process, all pages alwaysseem to be in real memory. Except for one thing: the elapsed executiontime of an instruction that would normally be a few nanoseconds issuddenly much, much, longer (because the kernel normally has to do I/Oto complete the page-in). For programs sensitive to that, the functionsdescribed inLocking Pages can control it. Within each virtual address space, a process has to keep track of whatis at which addresses, and that process is called memory allocation. Allocation usually brings to mind meting out scarce resources, but inthe case of virtual memory, that's not a major goal, because there isgenerally much more of it than anyone needs. Memory allocation within aprocess is mainly just a matter of making sure that the same byte ofmemory isn't used to store two different things.

Processes allocate memory in two major ways: by exec andprogrammatically. Actually, forking is a third way, but it's not veryinteresting. SeeCreating a Process.

Exec is the operation of creating a virtual address space for a process,loading its basic program into it, and executing the program. It isdone by the “exec” family of functions (e.g.execl). Theoperation takes a program file (an executable), it allocates space toload all the data in the executable, loads it, and transfers control toit. That data is most notably the instructions of the program (thetext), but also literals and constants in the program and evensome variables: C variables with the static storage class (seeMemory Allocation and C).Once that program begins to execute, it uses programmatic allocation togain additional memory. In a C program with the GNU C Library, thereare two kinds of programmatic allocation: automatic and dynamic. SeeMemory Allocation and C.

Memory-mapped I/O is another form of dynamic virtual memory allocation. Mapping memory to a file means declaring that the contents of certainrange of a process' addresses shall be identical to the contents of aspecified regular file. The system makes the virtual memory initiallycontain the contents of the file, and if you modify the memory, thesystem writes the same modification to the file. Note that due to themagic of virtual memory and page faults, there is no reason for thesystem to do I/O to read the file, or allocate real memory for itscontents, until the program accesses the virtual memory. SeeMemory-mapped I/O.Just as it programmatically allocates memory, the program canprogrammatically deallocate (free) it. You can't free the memorythat was allocated by exec. When the program exits or execs, you mightsay that all its memory gets freed, but since in both cases the addressspace ceases to exist, the point is really moot. SeeProgram Termination. A process' virtual address space is divided into segments. A segment isa contiguous range of virtual addresses. Three important segments are:

  • The text segment contains a program's instructions and literals andstatic constants. It is allocated by exec and stays the same size forthe life of the virtual address space.
  • The data segment is working storage for the program. It can bepreallocated and preloaded by exec and the process can extend or shrinkit by calling functions as described in SeeResizing the Data Segment. Its lower end is fixed.
  • The stack segment contains a program stack. It grows as the stackgrows, but doesn't shrink when the stack shrinks.


Next: Locking Pages,Previous: Memory Concepts,Up: Memory

3.2 Allocating Storage For Program Data

This section covers how ordinary programs manage storage for their data,including the famousmalloc function and some fancier facilitiesspecial the GNU C Library and GNU Compiler.


Next: Unconstrained Allocation,Up: Memory Allocation

3.2.1 Memory Allocation in C Programs

The C language supports two kinds of memory allocation through thevariables in C programs:

  • Static allocation is what happens when you declare a static orglobal variable. Each static or global variable defines one block ofspace, of a fixed size. The space is allocated once, when your programis started (part of the exec operation), and is never freed.
  • Automatic allocation happens when you declare an automaticvariable, such as a function argument or a local variable. The spacefor an automatic variable is allocated when the compound statementcontaining the declaration is entered, and is freed when thatcompound statement is exited. In GNU C, the size of the automatic storage can be an expressionthat varies. In other C implementations, it must be a constant.

A third important kind of memory allocation, dynamic allocation,is not supported by C variables but is available via GNU C Libraryfunctions.

3.2.1.1 Dynamic Memory Allocation

Dynamic memory allocation is a technique in which programsdetermine as they are running where to store some information. You needdynamic allocation when the amount of memory you need, or how long youcontinue to need it, depends on factors that are not known before theprogram runs.

For example, you may need a block to store a line read from an inputfile; since there is no limit to how long a line can be, you mustallocate the memory dynamically and make it dynamically larger as youread more of the line.

Or, you may need a block for each record or each definition in the inputdata; since you can't know in advance how many there will be, you mustallocate a new block for each record or definition as you read it.

When you use dynamic allocation, the allocation of a block of memory isan action that the program requests explicitly. You call a function ormacro when you want to allocate space, and specify the size with anargument. If you want to free the space, you do so by calling anotherfunction or macro. You can do these things whenever you want, as oftenas you want.

Dynamic allocation is not supported by C variables; there is no storageclass “dynamic”, and there can never be a C variable whose value isstored in dynamically allocated space. The only way to get dynamicallyallocated memory is via a system call (which is generally via a GNU C Libraryfunction call), and the only way to refer to dynamicallyallocated space is through a pointer. Because it is less convenient,and because the actual process of dynamic allocation requires morecomputation time, programmers generally use dynamic allocation only whenneither static nor automatic allocation will serve.

For example, if you want to allocate dynamically some space to hold astruct foobar, you cannot declare a variable of typestructfoobar whose contents are the dynamically allocated space. But you candeclare a variable of pointer typestruct foobar * and assign it theaddress of the space. Then you can use the operators ‘*’ and‘->’ on this pointer variable to refer to the contents of the space:

     {
       struct foobar *ptr
          = (struct foobar *) malloc (sizeof (struct foobar));
       ptr->name = x;
       ptr->next = current_foobar;
       current_foobar = ptr;
     }


Next: Allocation Debugging,Previous: Memory Allocation and C,Up: Memory Allocation

3.2.2 Unconstrained Allocation

The most general dynamic allocation facility ismalloc. Itallows you to allocate blocks of memory of any size at any time, makethem bigger or smaller at any time, and free the blocks individually atany time (or never).


Next: Malloc Examples,Up: Unconstrained Allocation
3.2.2.1 Basic Memory Allocation

To allocate a block of memory, callmalloc. The prototype forthis function is instdlib.h.

— Function: void * malloc (size_t size)

This function returns a pointer to a newly allocated block sizebytes long, or a null pointer if the block could not be allocated.

The contents of the block are undefined; you must initialize it yourself(or usecalloc instead; seeAllocating Cleared Space). Normally you would cast the value as a pointer to the kind of objectthat you want to store in the block. Here we show an example of doingso, and of initializing the space with zeros using the library functionmemset (see Copying and Concatenation):

     struct foo *ptr;
     ...
     ptr = (struct foo *) malloc (sizeof (struct foo));
     if (ptr == 0) abort ();
     memset (ptr, 0, sizeof (struct foo));

You can store the result of malloc into any pointer variablewithout a cast, because ISO C automatically converts the typevoid * to another type of pointer when necessary. But the castis necessary in contexts other than assignment operators or if you mightwant your code to run in traditional C.

Remember that when allocating space for a string, the argument tomalloc must be one plus the length of the string. This isbecause a string is terminated with a null character that doesn't countin the “length” of the string but does need space. For example:

     char *ptr;
     ...
     ptr = (char *) malloc (length + 1);

See Representation of Strings, for more information about this.


Next: Freeing after Malloc,Previous: Basic Allocation,Up: Unconstrained Allocation
3.2.2.2 Examples of malloc

If no more space is available, malloc returns a null pointer. You should check the value ofevery call tomalloc. It isuseful to write a subroutine that callsmalloc and reports anerror if the value is a null pointer, returning only if the value isnonzero. This function is conventionally calledxmalloc. Hereit is:

     void *
     xmalloc (size_t size)
     {
       register void *value = malloc (size);
       if (value == 0)
         fatal ("virtual memory exhausted");
       return value;
     }

Here is a real example of using malloc (by way of xmalloc). The functionsavestring will copy a sequence of characters intoa newly allocated null-terminated string:

     char *
     savestring (const char *ptr, size_t len)
     {
       register char *value = (char *) xmalloc (len + 1);
       value[len] = '\0';
       return (char *) memcpy (value, ptr, len);
     }

The block that malloc gives you is guaranteed to be aligned sothat it can hold any type of data. On GNU systems, the address isalways a multiple of eight on most systems, and a multiple of 16 on64-bit systems. Only rarely is any higher boundary (such as a pageboundary) necessary; for those cases, use memalign,posix_memalign orvalloc (seeAligned Memory Blocks).

Note that the memory located after the end of the block is likely to bein use for something else; perhaps a block already allocated by anothercall tomalloc. If you attempt to treat the block as longer thanyou asked for it to be, you are liable to destroy the data thatmalloc uses to keep track of its blocks, or you may destroy thecontents of another block. If you have already allocated a block anddiscover you want it to be bigger, userealloc (see Changing Block Size).


Next: Changing Block Size,Previous: Malloc Examples,Up: Unconstrained Allocation
3.2.2.3 Freeing Memory Allocated with malloc

When you no longer need a block that you got withmalloc, use thefunctionfree to make the block available to be allocated again. The prototype for this function is instdlib.h.

— Function: void free (void *ptr)

The free function deallocates the block of memory pointed atby ptr.

— Function: void cfree (void *ptr)

This function does the same thing as free. It's provided forbackward compatibility with SunOS; you should usefree instead.

Freeing a block alters the contents of the block. Do not expect tofind any data (such as a pointer to the next block in a chain of blocks) inthe block after freeing it. Copy whatever you need out of the block beforefreeing it! Here is an example of the proper way to free all the blocks ina chain, and the strings that they point to:

     struct chain
       {
         struct chain *next;
         char *name;
       }
     
     void
     free_chain (struct chain *chain)
     {
       while (chain != 0)
         {
           struct chain *next = chain->next;
           free (chain->name);
           free (chain);
           chain = next;
         }
     }

Occasionally, free can actually return memory to the operatingsystem and make the process smaller. Usually, all it can do is allow alater call tomalloc to reuse the space. In the meantime, thespace remains in your program as part of a free-list used internally bymalloc.

There is no point in freeing blocks at the end of a program, because allof the program's space is given back to the system when the processterminates.


Next: Allocating Cleared Space,Previous: Freeing after Malloc,Up: Unconstrained Allocation
3.2.2.4 Changing the Size of a Block

Often you do not know for certain how big a block you will ultimately needat the time you must begin to use the block. For example, the block mightbe a buffer that you use to hold a line being read from a file; no matterhow long you make the buffer initially, you may encounter a line that islonger.

You can make the block longer by calling realloc. This functionis declared instdlib.h.

— Function: void * realloc (void *ptr, size_t newsize)

The realloc function changes the size of the block whose address isptr to benewsize.

Since the space after the end of the block may be in use, reallocmay find it necessary to copy the block to a new address where more freespace is available. The value ofrealloc is the new address of theblock. If the block needs to be moved,realloc copies the oldcontents.

If you pass a null pointer for ptr, realloc behaves justlike ‘malloc (newsize)’. This can be convenient, but bewarethat older implementations (before ISO C) may not support thisbehavior, and will probably crash when realloc is passed a nullpointer.

Like malloc, realloc may return a null pointer if nomemory space is available to make the block bigger. When this happens,the original block is untouched; it has not been modified or relocated.

In most cases it makes no difference what happens to the original blockwhen realloc fails, because the application program cannot continuewhen it is out of memory, and the only thing to do is to give a fatal errormessage. Often it is convenient to write and use a subroutine,conventionally calledxrealloc, that takes care of the error messageas xmalloc does formalloc:

     void *
     xrealloc (void *ptr, size_t size)
     {
       register void *value = realloc (ptr, size);
       if (value == 0)
         fatal ("Virtual memory exhausted");
       return value;
     }

You can also use realloc to make a block smaller. The reason youwould do this is to avoid tying up a lot of memory space when only a littleis needed. In several allocation implementations, making a block smaller sometimesnecessitates copying it, so it can fail if no other space is available.

If the new size you specify is the same as the old size, reallocis guaranteed to change nothing and return the same address that you gave.


Next: Efficiency and Malloc,Previous: Changing Block Size,Up: Unconstrained Allocation
3.2.2.5 Allocating Cleared Space

The function calloc allocates memory and clears it to zero. Itis declared instdlib.h.

— Function: void * calloc (size_t count, size_t eltsize)

This function allocates a block long enough to contain a vector ofcount elements, each of sizeeltsize. Its contents arecleared to zero beforecalloc returns.

You could define calloc as follows:

     void *
     calloc (size_t count, size_t eltsize)
     {
       size_t size = count * eltsize;
       void *value = malloc (size);
       if (value != 0)
         memset (value, 0, size);
       return value;
     }

But in general, it is not guaranteed that calloc callsmalloc internally. Therefore, if an application provides its ownmalloc/realloc/free outside the C library, itshould always definecalloc, too.


Next: Aligned Memory Blocks,Previous: Allocating Cleared Space,Up: Unconstrained Allocation
3.2.2.6 Efficiency Considerations for malloc

As opposed to other versions, the malloc in the GNU C Librarydoes not round up block sizes to powers of two, neither for large norfor small sizes. Neighboring chunks can be coalesced on afreeno matter what their size is. This makes the implementation suitablefor all kinds of allocation patterns without generally incurring highmemory waste through fragmentation.

Very large blocks (much larger than a page) are allocated withmmap (anonymous or via/dev/zero) by this implementation. This has the great advantage that these chunks are returned to thesystem immediately when they are freed. Therefore, it cannot happenthat a large chunk becomes “locked” in between smaller ones and evenafter callingfree wastes memory. The size threshold formmap to be used can be adjusted withmallopt. The use ofmmap can also be disabled completely.


Next: Malloc Tunable Parameters,Previous: Efficiency and Malloc,Up: Unconstrained Allocation
3.2.2.7 Allocating Aligned Memory Blocks

The address of a block returned bymalloc orrealloc inGNU systems is always a multiple of eight (or sixteen on 64-bitsystems). If you need a block whose address is a multiple of a higherpower of two than that, usememalign,posix_memalign, orvalloc. memalign is declared inmalloc.h andposix_memalign is declared instdlib.h.

With the GNU C Library, you can use free to free the blocks thatmemalign,posix_memalign, andvalloc return. Thatdoes not work in BSD, however—BSD does not provide any way to freesuch blocks.

— Function: void * memalign (size_t boundary, size_t size)

The memalign function allocates a block of size bytes whoseaddress is a multiple ofboundary. Theboundary must be apower of two! The function memalign works by allocating asomewhat larger block, and then returning an address within the blockthat is on the specified boundary.

— Function: int posix_memalign (void **memptr, size_t alignment, size_t size)

The posix_memalign function is similar to the memalignfunction in that it returns a buffer ofsize bytes aligned to amultiple ofalignment. But it adds one requirement to theparameteralignment: the value must be a power of two multiple ofsizeof (void *).

If the function succeeds in allocation memory a pointer to the allocatedmemory is returned in*memptr and the return value is zero. Otherwise the function returns an error value indicating the problem.

This function was introduced in POSIX 1003.1d.

— Function: void * valloc (size_t size)

Using valloc is like using memalign and passing the page sizeas the value of the second argument. It is implemented like this:

          void *
          valloc (size_t size)
          {
            return memalign (getpagesize (), size);
          }

Query Memory Parameters for more information about the memorysubsystem.


Next: Heap Consistency Checking,Previous: Aligned Memory Blocks,Up: Unconstrained Allocation
3.2.2.8 Malloc Tunable Parameters

You can adjust some parameters for dynamic memory allocation with themallopt function. This function is the general SVID/XPGinterface, defined inmalloc.h.

— Function: int mallopt (int param, int value)

When calling mallopt, the param argument specifies theparameter to be set, andvalue the new value to be set. Possiblechoices forparam, as defined inmalloc.h, are:

M_TRIM_THRESHOLD
This is the minimum size (in bytes) of the top-most, releasable chunkthat will causesbrk to be called with a negative argument inorder to return memory to the system.
M_TOP_PAD
This parameter determines the amount of extra memory to obtain from thesystem when a call tosbrk is required. It also specifies thenumber of bytes to retain when shrinking the heap by callingsbrkwith a negative argument. This provides the necessary hysteresis inheap size such that excessive amounts of system calls can be avoided.
M_MMAP_THRESHOLD
All chunks larger than this value are allocated outside the normalheap, using themmap system call. This way it is guaranteedthat the memory for these chunks can be returned to the system onfree. Note that requests smaller than this threshold might stillbe allocated viammap.
M_MMAP_MAX
The maximum number of chunks to allocate with mmap. Setting thisto zero disables all use ofmmap.
M_PERTURB
If non-zero, memory blocks are filled with values depending on somelow order bits of this parameter when they are allocated (except whenallocated bycalloc) and freed. This can be used to debug theuse of uninitialized or freed heap memory.


Next: Hooks for Malloc,Previous: Malloc Tunable Parameters,Up: Unconstrained Allocation
3.2.2.9 Heap Consistency Checking

You can askmalloc to check the consistency of dynamic memory byusing themcheck function. This function is a GNU extension,declared in mcheck.h.

— Function: int mcheck (void (*abortfn) (enum mcheck_status status))

Calling mcheck tells malloc to perform occasionalconsistency checks. These will catch things such as writingpast the end of a block that was allocated withmalloc.

The abortfn argument is the function to call when an inconsistencyis found. If you supply a null pointer, thenmcheck uses adefault function which prints a message and callsabort(see Aborting a Program). The function you supply is called withone argument, which says what sort of inconsistency was detected; itstype is described below.

It is too late to begin allocation checking once you have allocatedanything withmalloc. Somcheck does nothing in thatcase. The function returns-1 if you call it too late, and0 otherwise (when it is successful).

The easiest way to arrange to call mcheck early enough is to usethe option ‘-lmcheck’ when you link your program; then you don'tneed to modify your program source at all. Alternatively you might usea debugger to insert a call to mcheck whenever the program isstarted, for example these gdb commands will automatically callmcheckwhenever the program starts:

          (gdb) break main
          Breakpoint 1, main (argc=2, argv=0xbffff964) at whatever.c:10
          (gdb) command 1
          Type commands for when breakpoint 1 is hit, one per line.
          End with a line saying just "end".
          >call mcheck(0)
          >continue
          >end
          (gdb) ...

This will however only work if no initialization function of any objectinvolved calls any of themalloc functions sincemcheckmust be called before the first such function.

— Function: enum mcheck_status mprobe (void *pointer)

The mprobe function lets you explicitly check for inconsistenciesin a particular allocated block. You must have already calledmcheck at the beginning of the program, to do its occasionalchecks; callingmprobe requests an additional consistency checkto be done at the time of the call.

The argument pointer must be a pointer returned by mallocorrealloc.mprobe returns a value that says whatinconsistency, if any, was found. The values are described below.

— Data Type: enum mcheck_status

This enumerated type describes what kind of inconsistency was detectedin an allocated block, if any. Here are the possible values:

MCHECK_DISABLED
mcheck was not called before the first allocation. No consistency checking can be done.
MCHECK_OK
No inconsistency detected.
MCHECK_HEAD
The data immediately before the block was modified. This commonly happens when an array index or pointeris decremented too far.
MCHECK_TAIL
The data immediately after the block was modified. This commonly happens when an array index or pointeris incremented too far.
MCHECK_FREE
The block was already freed.

Another possibility to check for and guard against bugs in the use ofmalloc,realloc andfree is to set the environmentvariable MALLOC_CHECK_. WhenMALLOC_CHECK_ is set, aspecial (less efficient) implementation is used which is designed to betolerant against simple errors, such as double calls offree withthe same argument, or overruns of a single byte (off-by-one bugs). Notall such errors can be protected against, however, and memory leaks canresult. IfMALLOC_CHECK_ is set to0, any detected heapcorruption is silently ignored; if set to1, a diagnostic isprinted onstderr; if set to 2,abort is calledimmediately. This can be useful because otherwise a crash may happenmuch later, and the true cause for the problem is then very hard totrack down.

There is one problem with MALLOC_CHECK_: in SUID or SGID binariesit could possibly be exploited since diverging from the normal programsbehavior it now writes something to the standard error descriptor. Therefore the use ofMALLOC_CHECK_ is disabled by default forSUID and SGID binaries. It can be enabled again by the systemadministrator by adding a file/etc/suid-debug (the content isnot important it could be empty).

So, what's the difference between using MALLOC_CHECK_ and linkingwith ‘-lmcheck’?MALLOC_CHECK_ is orthogonal with respect to‘-lmcheck’. ‘-lmcheck’ has been added for backwardcompatibility. BothMALLOC_CHECK_ and ‘-lmcheck’ shoulduncover the same bugs - but usingMALLOC_CHECK_ you don't need torecompile your application.


Next: Statistics of Malloc,Previous: Heap Consistency Checking,Up: Unconstrained Allocation
3.2.2.10 Memory Allocation Hooks

The GNU C Library lets you modify the behavior ofmalloc,realloc, andfree by specifying appropriate hookfunctions. You can use these hooks to help you debug programs that usedynamic memory allocation, for example.

The hook variables are declared in malloc.h.

— Variable: __malloc_hook

The value of this variable is a pointer to the function thatmalloc uses whenever it is called. You should define thisfunction to look likemalloc; that is, like:

          void *function (size_t size, const void *caller)

The value of caller is the return address found on the stack whenthemalloc function was called. This value allows you to tracethe memory consumption of the program.

— Variable: __realloc_hook

The value of this variable is a pointer to function that reallocuses whenever it is called. You should define this function to looklikerealloc; that is, like:

          void *function (void *ptr, size_t size, const void *caller)

The value of caller is the return address found on the stack whentherealloc function was called. This value allows you to trace thememory consumption of the program.

— Variable: __free_hook

The value of this variable is a pointer to function that freeuses whenever it is called. You should define this function to looklikefree; that is, like:

          void function (void *ptr, const void *caller)

The value of caller is the return address found on the stack whenthefree function was called. This value allows you to trace thememory consumption of the program.

— Variable: __memalign_hook

The value of this variable is a pointer to function that memalignuses whenever it is called. You should define this function to looklikememalign; that is, like:

          void *function (size_t alignment, size_t size, const void *caller)

The value of caller is the return address found on the stack whenthememalign function was called. This value allows you to trace thememory consumption of the program.

You must make sure that the function you install as a hook for one ofthese functions does not call that function recursively without restoringthe old value of the hook first! Otherwise, your program will get stuckin an infinite recursion. Before calling the function recursively, oneshould make sure to restore all the hooks to their previous value. Whencoming back from the recursive call, all the hooks should be resavedsince a hook might modify itself.

— Variable: __malloc_initialize_hook

The value of this variable is a pointer to a function that is calledonce when the malloc implementation is initialized. This is a weakvariable, so it can be overridden in the application with a definitionlike the following:

          void (*__malloc_initialize_hook) (void) = my_init_hook;

An issue to look out for is the time at which the malloc hook functionscan be safely installed. If the hook functions call the malloc-relatedfunctions recursively, it is necessary that malloc has already properlyinitialized itself at the time when__malloc_hook etc. isassigned to. On the other hand, if the hook functions provide acomplete malloc implementation of their own, it is vital that the hooksare assigned tobefore the very firstmalloc call hascompleted, because otherwise a chunk obtained from the ordinary,un-hooked malloc may later be handed to__free_hook, for example.

In both cases, the problem can be solved by setting up the hooks fromwithin a user-defined function pointed to by__malloc_initialize_hook—then the hooks will be set up safelyat the right time.

Here is an example showing how to use __malloc_hook and__free_hook properly. It installs a function that prints outinformation every timemalloc orfree is called. We justassume here that realloc and memalign are not used in ourprogram.

     /* Prototypes for __malloc_hook, __free_hook */
     #include <malloc.h>
     
     /* Prototypes for our hooks.  */
     static void my_init_hook (void);
     static void *my_malloc_hook (size_t, const void *);
     static void my_free_hook (void*, const void *);
     
     /* Override initializing hook from the C library. */
     void (*__malloc_initialize_hook) (void) = my_init_hook;
     
     static void
     my_init_hook (void)
     {
       old_malloc_hook = __malloc_hook;
       old_free_hook = __free_hook;
       __malloc_hook = my_malloc_hook;
       __free_hook = my_free_hook;
     }
     
     static void *
     my_malloc_hook (size_t size, const void *caller)
     {
       void *result;
       /* Restore all old hooks */
       __malloc_hook = old_malloc_hook;
       __free_hook = old_free_hook;
       /* Call recursively */
       result = malloc (size);
       /* Save underlying hooks */
       old_malloc_hook = __malloc_hook;
       old_free_hook = __free_hook;
       /* printf might call malloc, so protect it too. */
       printf ("malloc (%u) returns %p\n", (unsigned int) size, result);
       /* Restore our own hooks */
       __malloc_hook = my_malloc_hook;
       __free_hook = my_free_hook;
       return result;
     }
     
     static void
     my_free_hook (void *ptr, const void *caller)
     {
       /* Restore all old hooks */
       __malloc_hook = old_malloc_hook;
       __free_hook = old_free_hook;
       /* Call recursively */
       free (ptr);
       /* Save underlying hooks */
       old_malloc_hook = __malloc_hook;
       old_free_hook = __free_hook;
       /* printf might call free, so protect it too. */
       printf ("freed pointer %p\n", ptr);
       /* Restore our own hooks */
       __malloc_hook = my_malloc_hook;
       __free_hook = my_free_hook;
     }
     
     main ()
     {
       ...
     }

The mcheck function (see Heap Consistency Checking) works byinstalling such hooks.


Next: Summary of Malloc,Previous: Hooks for Malloc,Up: Unconstrained Allocation
3.2.2.11 Statistics for Memory Allocation with malloc

You can get information about dynamic memory allocation by calling themallinfo function. This function and its associated data typeare declared inmalloc.h; they are an extension of the standardSVID/XPG version.

— Data Type: struct mallinfo

This structure type is used to return information about the dynamicmemory allocator. It contains the following members:

int arena
This is the total size of memory allocated with sbrk bymalloc, in bytes.
int ordblks
This is the number of chunks not in use. (The memory allocatorinternally gets chunks of memory from the operating system, and thencarves them up to satisfy individualmalloc requests; seeEfficiency and Malloc.)
int smblks
This field is unused.
int hblks
This is the total number of chunks allocated with mmap.
int hblkhd
This is the total size of memory allocated with mmap, in bytes.
int usmblks
This field is unused.
int fsmblks
This field is unused.
int uordblks
This is the total size of memory occupied by chunks handed out bymalloc.
int fordblks
This is the total size of memory occupied by free (not in use) chunks.
int keepcost
This is the size of the top-most releasable chunk that normallyborders the end of the heap (i.e., the high end of the virtual addressspace's data segment).

— Function: struct mallinfo mallinfo (void)

This function returns information about the current dynamic memory usagein a structure of typestruct mallinfo.


Previous: Statistics of Malloc,Up: Unconstrained Allocation
3.2.2.12 Summary of malloc-Related Functions

Here is a summary of the functions that work with malloc:

void *malloc (size_tsize)
Allocate a block of size bytes. See Basic Allocation.
void free (void *addr)
Free a block previously allocated by malloc. See Freeing after Malloc.
void *realloc (void *addr, size_tsize)
Make a block previously allocated by malloc larger or smaller,possibly by copying it to a new location. SeeChanging Block Size.
void *calloc (size_tcount, size_teltsize)
Allocate a block of count * eltsize bytes usingmalloc, and set its contents to zero. SeeAllocating Cleared Space.
void *valloc (size_tsize)
Allocate a block of size bytes, starting on a page boundary. See Aligned Memory Blocks.
void *memalign (size_tsize, size_tboundary)
Allocate a block of size bytes, starting on an address that is amultiple ofboundary. SeeAligned Memory Blocks.
int mallopt (intparam, intvalue)
Adjust a tunable parameter. See Malloc Tunable Parameters.
int mcheck (void (*abortfn) (void))
Tell malloc to perform occasional consistency checks ondynamically allocated memory, and to callabortfn when aninconsistency is found. SeeHeap Consistency Checking.
void *(*__malloc_hook) (size_tsize, const void *caller)
A pointer to a function that malloc uses whenever it is called.
void *(*__realloc_hook) (void *ptr, size_tsize, const void *caller)
A pointer to a function that realloc uses whenever it is called.
void (*__free_hook) (void *ptr, const void *caller)
A pointer to a function that free uses whenever it is called.
void (*__memalign_hook) (size_tsize, size_talignment, const void *caller)
A pointer to a function that memalign uses whenever it is called.
struct mallinfo mallinfo (void)
Return information about the current dynamic memory usage. See Statistics of Malloc.


Next: Obstacks,Previous: Unconstrained Allocation,Up: Memory Allocation

3.2.3 Allocation Debugging

A complicated task when programming with languages which do not usegarbage collected dynamic memory allocation is to find memory leaks. Long running programs must assure that dynamically allocated objects arefreed at the end of their lifetime. If this does not happen the systemruns out of memory, sooner or later.

The malloc implementation in the GNU C Library provides somesimple means to detect such leaks and obtain some information to findthe location. To do this the application must be started in a specialmode which is enabled by an environment variable. There are no speedpenalties for the program if the debugging mode is not enabled.


Next: Using the Memory Debugger,Up: Allocation Debugging
3.2.3.1 How to install the tracing functionality

— Function: void mtrace (void)

When the mtrace function is called it looks for an environmentvariable namedMALLOC_TRACE. This variable is supposed tocontain a valid file name. The user must have write access. If thefile already exists it is truncated. If the environment variable is notset or it does not name a valid file which can be opened for writingnothing is done. The behavior ofmalloc etc. is not changed. For obvious reasons this also happens if the application is installedwith the SUID or SGID bit set.

If the named file is successfully opened, mtrace installs specialhandlers for the functionsmalloc,realloc, andfree (see Hooks for Malloc). From then on, all uses of thesefunctions are traced and protocolled into the file. There is now ofcourse a speed penalty for all calls to the traced functions so tracingshould not be enabled during normal use.

This function is a GNU extension and generally not available on othersystems. The prototype can be found inmcheck.h.

— Function: void muntrace (void)

The muntrace function can be called after mtrace was usedto enable tracing themalloc calls. If no (successful) call ofmtrace was mademuntrace does nothing.

Otherwise it deinstalls the handlers for malloc, realloc,andfree and then closes the protocol file. No calls areprotocolled anymore and the program runs again at full speed.

This function is a GNU extension and generally not available on othersystems. The prototype can be found inmcheck.h.


Next: Tips for the Memory Debugger,Previous: Tracing malloc,Up: Allocation Debugging
3.2.3.2 Example program excerpts

Even though the tracing functionality does not influence the runtimebehavior of the program it is not a good idea to callmtrace inall programs. Just imagine that you debug a program usingmtraceand all other programs used in the debugging session also trace theirmalloc calls. The output file would be the same for all programsand thus is unusable. Therefore one should callmtrace only ifcompiled for debugging. A program could therefore start like this:

     #include <mcheck.h>
     
     int
     main (int argc, char *argv[])
     {
     #ifdef DEBUGGING
       mtrace ();
     #endif
       ...
     }

This is all what is needed if you want to trace the calls during thewhole runtime of the program. Alternatively you can stop the tracing atany time with a call tomuntrace. It is even possible to restartthe tracing again with a new call tomtrace. But this can causeunreliable results since there may be calls of the functions which arenot called. Please note that not only the application uses the tracedfunctions, also libraries (including the C library itself) use thesefunctions.

This last point is also why it is no good idea to call muntracebefore the program terminated. The libraries are informed about thetermination of the program only after the program returns frommain or callsexit and so cannot free the memory they usebefore this time.

So the best thing one can do is to call mtrace as the very firstfunction in the program and never callmuntrace. So the programtraces almost all uses of themalloc functions (except thosecalls which are executed by constructors of the program or usedlibraries).


Next: Interpreting the traces,Previous: Using the Memory Debugger,Up: Allocation Debugging
3.2.3.3 Some more or less clever ideas

You know the situation. The program is prepared for debugging and inall debugging sessions it runs well. But once it is started withoutdebugging the error shows up. A typical example is a memory leak thatbecomes visible only when we turn off the debugging. If you foreseesuch situations you can still win. Simply use something equivalent tothe following little program:

     #include <mcheck.h>
     #include <signal.h>
     
     static void
     enable (int sig)
     {
       mtrace ();
       signal (SIGUSR1, enable);
     }
     
     static void
     disable (int sig)
     {
       muntrace ();
       signal (SIGUSR2, disable);
     }
     
     int
     main (int argc, char *argv[])
     {
       ...
     
       signal (SIGUSR1, enable);
       signal (SIGUSR2, disable);
     
       ...
     }

I.e., the user can start the memory debugger any time s/he wants if theprogram was started withMALLOC_TRACE set in the environment. The output will of course not show the allocations which happened beforethe first signal but if there is a memory leak this will show upnevertheless.


Previous: Tips for the Memory Debugger,Up: Allocation Debugging
3.2.3.4 Interpreting the traces

If you take a look at the output it will look similar to this:

     = Start
      [0x8048209] - 0x8064cc8
      [0x8048209] - 0x8064ce0
      [0x8048209] - 0x8064cf8
      [0x80481eb] + 0x8064c48 0x14
      [0x80481eb] + 0x8064c60 0x14
      [0x80481eb] + 0x8064c78 0x14
      [0x80481eb] + 0x8064c90 0x14
     = End

What this all means is not really important since the trace file is notmeant to be read by a human. Therefore no attention is given toreadability. Instead there is a program which comes with the GNU C Librarywhich interprets the traces and outputs a summary in anuser-friendly way. The program is called mtrace (it is in fact aPerl script) and it takes one or two arguments. In any case the name ofthe file with the trace output must be specified. If an optionalargument precedes the name of the trace file this must be the name ofthe program which generated the trace.

     drepper$ mtrace tst-mtrace log
     No memory leaks.

In this case the program tst-mtrace was run and it produced atrace filelog. The message printed bymtrace shows thereare no problems with the code, all allocated memory was freedafterwards.

If we call mtrace on the example trace given above we would get adifferent outout:

     drepper$ mtrace errlog
     - 0x08064cc8 Free 2 was never alloc'd 0x8048209
     - 0x08064ce0 Free 3 was never alloc'd 0x8048209
     - 0x08064cf8 Free 4 was never alloc'd 0x8048209
     
     Memory not freed:
     -----------------
        Address     Size     Caller
     0x08064c48     0x14  at 0x80481eb
     0x08064c60     0x14  at 0x80481eb
     0x08064c78     0x14  at 0x80481eb
     0x08064c90     0x14  at 0x80481eb

We have called mtrace with only one argument and so the scripthas no chance to find out what is meant with the addresses given in thetrace. We can do better:

     drepper$ mtrace tst errlog
     - 0x08064cc8 Free 2 was never alloc'd /home/drepper/tst.c:39
     - 0x08064ce0 Free 3 was never alloc'd /home/drepper/tst.c:39
     - 0x08064cf8 Free 4 was never alloc'd /home/drepper/tst.c:39
     
     Memory not freed:
     -----------------
        Address     Size     Caller
     0x08064c48     0x14  at /home/drepper/tst.c:33
     0x08064c60     0x14  at /home/drepper/tst.c:33
     0x08064c78     0x14  at /home/drepper/tst.c:33
     0x08064c90     0x14  at /home/drepper/tst.c:33

Suddenly the output makes much more sense and the user can seeimmediately where the function calls causing the trouble can be found.

Interpreting this output is not complicated. There are at most twodifferent situations being detected. First,free was called forpointers which were never returned by one of the allocation functions. This is usually a very bad problem and what this looks like is shown inthe first three lines of the output. Situations like this are quiterare and if they appear they show up very drastically: the programnormally crashes.

The other situation which is much harder to detect are memory leaks. Asyou can see in the output themtrace function collects all thisinformation and so can say that the program calls an allocation functionfrom line 33 in the source file/home/drepper/tst-mtrace.c fourtimes without freeing this memory before the program terminates. Whether this is a real problem remains to be investigated.


Next: Variable Size Automatic,Previous: Allocation Debugging,Up: Memory Allocation

3.2.4 Obstacks

An obstack is a pool of memory containing a stack of objects. Youcan create any number of separate obstacks, and then allocate objects inspecified obstacks. Within each obstack, the last object allocated mustalways be the first one freed, but distinct obstacks are independent ofeach other.

Aside from this one constraint of order of freeing, obstacks are totallygeneral: an obstack can contain any number of objects of any size. Theyare implemented with macros, so allocation is usually very fast as long asthe objects are usually small. And the only space overhead per object isthe padding needed to start each object on a suitable boundary.


Next: Preparing for Obstacks,Up: Obstacks
3.2.4.1 Creating Obstacks

The utilities for manipulating obstacks are declared in the headerfile obstack.h.

— Data Type: struct obstack

An obstack is represented by a data structure of type structobstack. This structure has a small fixed size; it records the statusof the obstack and how to find the space in which objects are allocated. It does not contain any of the objects themselves. You should not tryto access the contents of the structure directly; use only the functionsdescribed in this chapter.

You can declare variables of type struct obstack and use them asobstacks, or you can allocate obstacks dynamically like any other kindof object. Dynamic allocation of obstacks allows your program to have avariable number of different stacks. (You can even allocate anobstack structure in another obstack, but this is rarely useful.)

All the functions that work with obstacks require you to specify whichobstack to use. You do this with a pointer of typestruct obstack*. In the following, we often say “an obstack” when strictlyspeaking the object at hand is such a pointer.

The objects in the obstack are packed into large blocks calledchunks. Thestruct obstack structure points to a chain ofthe chunks currently in use.

The obstack library obtains a new chunk whenever you allocate an objectthat won't fit in the previous chunk. Since the obstack library manageschunks automatically, you don't need to pay much attention to them, butyou do need to supply a function which the obstack library should use toget a chunk. Usually you supply a function which usesmallocdirectly or indirectly. You must also supply a function to free a chunk. These matters are described in the following section.


Next: Allocation in an Obstack,Previous: Creating Obstacks,Up: Obstacks
3.2.4.2 Preparing for Using Obstacks

Each source file in which you plan to use the obstack functionsmust include the header fileobstack.h, like this:

     #include <obstack.h>

Also, if the source file uses the macroobstack_init, it mustdeclare or define two functions or macros that will be called by theobstack library. One,obstack_chunk_alloc, is used to allocatethe chunks of memory into which objects are packed. The other,obstack_chunk_free, is used to return chunks when the objects inthem are freed. These macros should appear before any use of obstacksin the source file.

Usually these are defined to use malloc via the intermediaryxmalloc (seeUnconstrained Allocation). This is done withthe following pair of macro definitions:

     #define obstack_chunk_alloc xmalloc
     #define obstack_chunk_free free

Though the memory you get using obstacks really comes from malloc,using obstacks is faster because malloc is called less often, forlarger blocks of memory. SeeObstack Chunks, for full details.

At run time, before the program can use a struct obstack objectas an obstack, it must initialize the obstack by callingobstack_init.

— Function: int obstack_init (struct obstack *obstack-ptr)

Initialize obstack obstack-ptr for allocation of objects. Thisfunction calls the obstack'sobstack_chunk_alloc function. Ifallocation of memory fails, the function pointed to byobstack_alloc_failed_handler is called. Theobstack_initfunction always returns 1 (Compatibility notice: Former versions ofobstack returned 0 if allocation failed).

Here are two examples of how to allocate the space for an obstack andinitialize it. First, an obstack that is a static variable:

     static struct obstack myobstack;
     ...
     obstack_init (&myobstack);

Second, an obstack that is itself dynamically allocated:

     struct obstack *myobstack_ptr
       = (struct obstack *) xmalloc (sizeof (struct obstack));
     
     obstack_init (myobstack_ptr);

— Variable: obstack_alloc_failed_handler

The value of this variable is a pointer to a function thatobstack uses whenobstack_chunk_alloc fails to allocatememory. The default action is to print a message and abort. You should supply a function that either callsexit(seeProgram Termination) or longjmp (see Non-Local Exits) and doesn't return.

          void my_obstack_alloc_failed (void)
          ...
          obstack_alloc_failed_handler = &my_obstack_alloc_failed;


Next: Freeing Obstack Objects,Previous: Preparing for Obstacks,Up: Obstacks
3.2.4.3 Allocation in an Obstack

The most direct way to allocate an object in an obstack is withobstack_alloc, which is invoked almost likemalloc.

— Function: void * obstack_alloc (struct obstack *obstack-ptr, int size)

This allocates an uninitialized block of size bytes in an obstackand returns its address. Hereobstack-ptr specifies which obstackto allocate the block in; it is the address of thestruct obstackobject which represents the obstack. Each obstack function or macrorequires you to specify anobstack-ptr as the first argument.

This function calls the obstack's obstack_chunk_alloc function ifit needs to allocate a new chunk of memory; it callsobstack_alloc_failed_handler if allocation of memory byobstack_chunk_alloc failed.

For example, here is a function that allocates a copy of a string strin a specific obstack, which is in the variablestring_obstack:

     struct obstack string_obstack;
     
     char *
     copystring (char *string)
     {
       size_t len = strlen (string) + 1;
       char *s = (char *) obstack_alloc (&string_obstack, len);
       memcpy (s, string, len);
       return s;
     }

To allocate a block with specified contents, use the functionobstack_copy, declared like this:

— Function: void * obstack_copy (struct obstack *obstack-ptr, void *address, int size)

This allocates a block and initializes it by copying sizebytes of data starting ataddress. It callsobstack_alloc_failed_handler if allocation of memory byobstack_chunk_alloc failed.

— Function: void * obstack_copy0 (struct obstack *obstack-ptr, void *address, int size)

Like obstack_copy, but appends an extra byte containing a nullcharacter. This extra byte is not counted in the argumentsize.

The obstack_copy0 function is convenient for copying a sequenceof characters into an obstack as a null-terminated string. Here is anexample of its use:

     char *
     obstack_savestring (char *addr, int size)
     {
       return obstack_copy0 (&myobstack, addr, size);
     }

Contrast this with the previous example of savestring usingmalloc (seeBasic Allocation).


Next: Obstack Functions,Previous: Allocation in an Obstack,Up: Obstacks
3.2.4.4 Freeing Objects in an Obstack

To free an object allocated in an obstack, use the functionobstack_free. Since the obstack is a stack of objects, freeingone object automatically frees all other objects allocated more recentlyin the same obstack.

— Function: void obstack_free (struct obstack *obstack-ptr, void *object)

If object is a null pointer, everything allocated in the obstackis freed. Otherwise,object must be the address of an objectallocated in the obstack. Thenobject is freed, along witheverything allocated in obstack sinceobject.

Note that if object is a null pointer, the result is anuninitialized obstack. To free all memory in an obstack but leave itvalid for further allocation, callobstack_free with the addressof the first object allocated on the obstack:

     obstack_free (obstack_ptr, first_object_allocated_ptr);

Recall that the objects in an obstack are grouped into chunks. When allthe objects in a chunk become free, the obstack library automaticallyfrees the chunk (seePreparing for Obstacks). Then otherobstacks, or non-obstack allocation, can reuse the space of the chunk.


Next: Growing Objects,Previous: Freeing Obstack Objects,Up: Obstacks
3.2.4.5 Obstack Functions and Macros

The interfaces for using obstacks may be defined either as functions oras macros, depending on the compiler. The obstack facility works withall C compilers, including both ISO C and traditional C, but there areprecautions you must take if you plan to use compilers other than GNU C.

If you are using an old-fashioned non-ISO C compiler, all the obstack“functions” are actually defined only as macros. You can call thesemacros like functions, but you cannot use them in any other way (forexample, you cannot take their address).

Calling the macros requires a special precaution: namely, the firstoperand (the obstack pointer) may not contain any side effects, becauseit may be computed more than once. For example, if you write this:

     obstack_alloc (get_obstack (), 4);

you will find that get_obstack may be called several times. If you use*obstack_list_ptr++ as the obstack pointer argument,you will get very strange results since the incrementation may occurseveral times.

In ISO C, each function has both a macro definition and a functiondefinition. The function definition is used if you take the address of thefunction without calling it. An ordinary call uses the macro definition bydefault, but you can request the function definition instead by writing thefunction name in parentheses, as shown here:

     char *x;
     void *(*funcp) ();
     /* Use the macro.  */
     x = (char *) obstack_alloc (obptr, size);
     /* Call the function.  */
     x = (char *) (obstack_alloc) (obptr, size);
     /* Take the address of the function.  */
     funcp = obstack_alloc;

This is the same situation that exists in ISO C for the standard libraryfunctions. SeeMacro Definitions.

Warning: When you do use the macros, you must observe theprecaution of avoiding side effects in the first operand, even in ISO C.

If you use the GNU C compiler, this precaution is not necessary, becausevarious language extensions in GNU C permit defining the macros so as tocompute each argument only once.


Next: Extra Fast Growing,Previous: Obstack Functions,Up: Obstacks
3.2.4.6 Growing Objects

Because memory in obstack chunks is used sequentially, it is possible tobuild up an object step by step, adding one or more bytes at a time to theend of the object. With this technique, you do not need to know how muchdata you will put in the object until you come to the end of it. We callthis the technique ofgrowing objects. The special functionsfor adding data to the growing object are described in this section.

You don't need to do anything special when you start to grow an object. Using one of the functions to add data to the object automaticallystarts it. However, it is necessary to say explicitly when the object isfinished. This is done with the functionobstack_finish.

The actual address of the object thus built up is not known until theobject is finished. Until then, it always remains possible that you willadd so much data that the object must be copied into a new chunk.

While the obstack is in use for a growing object, you cannot use it forordinary allocation of another object. If you try to do so, the spacealready added to the growing object will become part of the other object.

— Function: void obstack_blank (struct obstack *obstack-ptr, int size)

The most basic function for adding to a growing object isobstack_blank, which adds space without initializing it.

— Function: void obstack_grow (struct obstack *obstack-ptr, void *data, int size)

To add a block of initialized space, use obstack_grow, which isthe growing-object analogue ofobstack_copy. It addssizebytes of data to the growing object, copying the contents fromdata.

— Function: void obstack_grow0 (struct obstack *obstack-ptr, void *data, int size)

This is the growing-object analogue of obstack_copy0. It addssize bytes copied fromdata, followed by an additional nullcharacter.

— Function: void obstack_1grow (struct obstack *obstack-ptr, char c)

To add one character at a time, use the function obstack_1grow. It adds a single byte containingc to the growing object.

— Function: void obstack_ptr_grow (struct obstack *obstack-ptr, void *data)

Adding the value of a pointer one can use the functionobstack_ptr_grow. It addssizeof (void *) bytescontaining the value ofdata.

— Function: void obstack_int_grow (struct obstack *obstack-ptr, int data)

A single value of type int can be added by using theobstack_int_grow function. It addssizeof (int) bytes tothe growing object and initializes them with the value ofdata.

— Function: void * obstack_finish (struct obstack *obstack-ptr)

When you are finished growing the object, use the functionobstack_finish to close it off and return its final address.

Once you have finished the object, the obstack is available for ordinaryallocation or for growing another object.

This function can return a null pointer under the same conditions asobstack_alloc (seeAllocation in an Obstack).

When you build an object by growing it, you will probably need to knowafterward how long it became. You need not keep track of this as you growthe object, because you can find out the length from the obstack justbefore finishing the object with the functionobstack_object_size,declared as follows:

— Function: int obstack_object_size (struct obstack *obstack-ptr)

This function returns the current size of the growing object, in bytes. Remember to call this functionbefore finishing the object. After it is finished,obstack_object_size will return zero.

If you have started growing an object and wish to cancel it, you shouldfinish it and then free it, like this:

     obstack_free (obstack_ptr, obstack_finish (obstack_ptr));

This has no effect if no object was growing.

You can use obstack_blank with a negative size argument to makethe current object smaller. Just don't try to shrink it beyond zerolength—there's no telling what will happen if you do that.


Next: Status of an Obstack,Previous: Growing Objects,Up: Obstacks
3.2.4.7 Extra Fast Growing Objects

The usual functions for growing objects incur overhead for checkingwhether there is room for the new growth in the current chunk. If youare frequently constructing objects in small steps of growth, thisoverhead can be significant.

You can reduce the overhead by using special “fast growth”functions that grow the object without checking. In order to have arobust program, you must do the checking yourself. If you do this checkingin the simplest way each time you are about to add data to the object, youhave not saved anything, because that is what the ordinary growthfunctions do. But if you can arrange to check less often, or checkmore efficiently, then you make the program faster.

The function obstack_room returns the amount of room availablein the current chunk. It is declared as follows:

— Function: int obstack_room (struct obstack *obstack-ptr)

This returns the number of bytes that can be added safely to the currentgrowing object (or to an object about to be started) in obstackobstack using the fast growth functions.

While you know there is room, you can use these fast growth functionsfor adding data to a growing object:

— Function: void obstack_1grow_fast (struct obstack *obstack-ptr, char c)

The function obstack_1grow_fast adds one byte containing thecharacterc to the growing object in obstackobstack-ptr.

— Function: void obstack_ptr_grow_fast (struct obstack *obstack-ptr, void *data)

The function obstack_ptr_grow_fast adds sizeof (void *)bytes containing the value ofdata to the growing object inobstackobstack-ptr.

— Function: void obstack_int_grow_fast (struct obstack *obstack-ptr, int data)

The function obstack_int_grow_fast adds sizeof (int) bytescontaining the value ofdata to the growing object in obstackobstack-ptr.

— Function: void obstack_blank_fast (struct obstack *obstack-ptr, int size)

The function obstack_blank_fast adds size bytes to thegrowing object in obstackobstack-ptr without initializing them.

When you check for space using obstack_room and there is notenough room for what you want to add, the fast growth functionsare not safe. In this case, simply use the corresponding ordinarygrowth function instead. Very soon this will copy the object to anew chunk; then there will be lots of room available again.

So, each time you use an ordinary growth function, check afterward forsufficient space usingobstack_room. Once the object is copiedto a new chunk, there will be plenty of space again, so the program willstart using the fast growth functions again.

Here is an example:

     void
     add_string (struct obstack *obstack, const char *ptr, int len)
     {
       while (len > 0)
         {
           int room = obstack_room (obstack);
           if (room == 0)
             {
               /* Not enough room. Add one character slowly,
                  which may copy to a new chunk and make room.  */
               obstack_1grow (obstack, *ptr++);
               len--;
             }
           else
             {
               if (room > len)
                 room = len;
               /* Add fast as much as we have room for. */
               len -= room;
               while (room-- > 0)
                 obstack_1grow_fast (obstack, *ptr++);
             }
         }
     }


Next: Obstacks Data Alignment,Previous: Extra Fast Growing,Up: Obstacks
3.2.4.8 Status of an Obstack

Here are functions that provide information on the current status ofallocation in an obstack. You can use them to learn about an object whilestill growing it.

— Function: void * obstack_base (struct obstack *obstack-ptr)

This function returns the tentative address of the beginning of thecurrently growing object inobstack-ptr. If you finish the objectimmediately, it will have that address. If you make it larger first, itmay outgrow the current chunk—then its address will change!

If no object is growing, this value says where the next object youallocate will start (once again assuming it fits in the currentchunk).

— Function: void * obstack_next_free (struct obstack *obstack-ptr)

This function returns the address of the first free byte in the currentchunk of obstackobstack-ptr. This is the end of the currentlygrowing object. If no object is growing,obstack_next_freereturns the same value asobstack_base.

— Function: int obstack_object_size (struct obstack *obstack-ptr)

This function returns the size in bytes of the currently growing object. This is equivalent to

          obstack_next_free (obstack-ptr) - obstack_base (obstack-ptr)


Next: Obstack Chunks,Previous: Status of an Obstack,Up: Obstacks
3.2.4.9 Alignment of Data in Obstacks

Each obstack has an alignment boundary; each object allocated inthe obstack automatically starts on an address that is a multiple of thespecified boundary. By default, this boundary is aligned so thatthe object can hold any type of data.

To access an obstack's alignment boundary, use the macroobstack_alignment_mask, whose function prototype looks likethis:

— Macro: int obstack_alignment_mask (struct obstack *obstack-ptr)

The value is a bit mask; a bit that is 1 indicates that the correspondingbit in the address of an object should be 0. The mask value should be oneless than a power of 2; the effect is that all object addresses aremultiples of that power of 2. The default value of the mask is a valuethat allows aligned objects to hold any type of data: for example, ifits value is 3, any type of data can be stored at locations whoseaddresses are multiples of 4. A mask value of 0 means an object can starton any multiple of 1 (that is, no alignment is required).

The expansion of the macro obstack_alignment_mask is an lvalue,so you can alter the mask by assignment. For example, this statement:

          obstack_alignment_mask (obstack_ptr) = 0;

has the effect of turning off alignment processing in the specified obstack.

Note that a change in alignment mask does not take effect untilafter the next time an object is allocated or finished in theobstack. If you are not growing an object, you can make the newalignment mask take effect immediately by callingobstack_finish. This will finish a zero-length object and then do proper alignment forthe next object.


Next: Summary of Obstacks,Previous: Obstacks Data Alignment,Up: Obstacks
3.2.4.10 Obstack Chunks

Obstacks work by allocating space for themselves in large chunks, andthen parceling out space in the chunks to satisfy your requests. Chunksare normally 4096 bytes long unless you specify a different chunk size. The chunk size includes 8 bytes of overhead that are not actually usedfor storing objects. Regardless of the specified size, longer chunkswill be allocated when necessary for long objects.

The obstack library allocates chunks by calling the functionobstack_chunk_alloc, which you must define. When a chunk is nolonger needed because you have freed all the objects in it, the obstacklibrary frees the chunk by callingobstack_chunk_free, which youmust also define.

These two must be defined (as macros) or declared (as functions) in eachsource file that usesobstack_init (seeCreating Obstacks). Most often they are defined as macros like this:

     #define obstack_chunk_alloc malloc
     #define obstack_chunk_free free

Note that these are simple macros (no arguments). Macro definitions witharguments will not work! It is necessary thatobstack_chunk_allocorobstack_chunk_free, alone, expand into a function name if it isnot itself a function name.

If you allocate chunks with malloc, the chunk size should be apower of 2. The default chunk size, 4096, was chosen because it is longenough to satisfy many typical requests on the obstack yet short enoughnot to waste too much memory in the portion of the last chunk not yet used.

— Macro: int obstack_chunk_size (struct obstack *obstack-ptr)

This returns the chunk size of the given obstack.

Since this macro expands to an lvalue, you can specify a new chunk size byassigning it a new value. Doing so does not affect the chunks alreadyallocated, but will change the size of chunks allocated for that particularobstack in the future. It is unlikely to be useful to make the chunk sizesmaller, but making it larger might improve efficiency if you areallocating many objects whose size is comparable to the chunk size. Hereis how to do so cleanly:

     if (obstack_chunk_size (obstack_ptr) < new-chunk-size)
       obstack_chunk_size (obstack_ptr) = new-chunk-size;


Previous: Obstack Chunks,Up: Obstacks
3.2.4.11 Summary of Obstack Functions

Here is a summary of all the functions associated with obstacks. Eachtakes the address of an obstack (struct obstack *) as its firstargument.

void obstack_init (struct obstack *obstack-ptr)
Initialize use of an obstack. See Creating Obstacks.
void *obstack_alloc (struct obstack *obstack-ptr, intsize)
Allocate an object of size uninitialized bytes. See Allocation in an Obstack.
void *obstack_copy (struct obstack *obstack-ptr, void *address, intsize)
Allocate an object of size bytes, with contents copied fromaddress. SeeAllocation in an Obstack.
void *obstack_copy0 (struct obstack *obstack-ptr, void *address, intsize)
Allocate an object of size+1 bytes, with size of them copiedfromaddress, followed by a null character at the end. SeeAllocation in an Obstack.
void obstack_free (struct obstack *obstack-ptr, void *object)
Free object (and everything allocated in the specified obstackmore recently thanobject). SeeFreeing Obstack Objects.
void obstack_blank (struct obstack *obstack-ptr, intsize)
Add size uninitialized bytes to a growing object. See Growing Objects.
void obstack_grow (struct obstack *obstack-ptr, void *address, intsize)
Add size bytes, copied from address, to a growing object. SeeGrowing Objects.
void obstack_grow0 (struct obstack *obstack-ptr, void *address, intsize)
Add size bytes, copied from address, to a growing object,and then add another byte containing a null character. SeeGrowing Objects.
void obstack_1grow (struct obstack *obstack-ptr, chardata-char)
Add one byte containing data-char to a growing object. See Growing Objects.
void *obstack_finish (struct obstack *obstack-ptr)
Finalize the object that is growing and return its permanent address. See Growing Objects.
int obstack_object_size (struct obstack *obstack-ptr)
Get the current size of the currently growing object. See Growing Objects.
void obstack_blank_fast (struct obstack *obstack-ptr, intsize)
Add size uninitialized bytes to a growing object without checkingthat there is enough room. SeeExtra Fast Growing.
void obstack_1grow_fast (struct obstack *obstack-ptr, chardata-char)
Add one byte containing data-char to a growing object withoutchecking that there is enough room. SeeExtra Fast Growing.
int obstack_room (struct obstack *obstack-ptr)
Get the amount of room now available for growing the current object. See Extra Fast Growing.
int obstack_alignment_mask (struct obstack *obstack-ptr)
The mask used for aligning the beginning of an object. This is anlvalue. See Obstacks Data Alignment.
int obstack_chunk_size (struct obstack *obstack-ptr)
The size for allocating chunks. This is an lvalue. See Obstack Chunks.
void *obstack_base (struct obstack *obstack-ptr)
Tentative starting address of the currently growing object. See Status of an Obstack.
void *obstack_next_free (struct obstack *obstack-ptr)
Address just after the end of the currently growing object. See Status of an Obstack.


Previous: Obstacks,Up: Memory Allocation

3.2.5 Automatic Storage with Variable Size

The functionalloca supports a kind of half-dynamic allocation inwhich blocks are allocated dynamically but freed automatically.

Allocating a block with alloca is an explicit action; you canallocate as many blocks as you wish, and compute the size at run time. Butall the blocks are freed when you exit the function thatalloca wascalled from, just as if they were automatic variables declared in thatfunction. There is no way to free the space explicitly.

The prototype for alloca is in stdlib.h. This function isa BSD extension.

— Function: void * alloca (size_t size)

The return value of alloca is the address of a block of sizebytes of memory, allocated in the stack frame of the calling function.

Do not use alloca inside the arguments of a function call—youwill get unpredictable results, because the stack space for thealloca would appear on the stack in the middle of the space forthe function arguments. An example of what to avoid is foo (x,alloca (4), y).


Next: Advantages of Alloca,Up: Variable Size Automatic
3.2.5.1 alloca Example

As an example of the use of alloca, here is a function that opensa file name made from concatenating two argument strings, and returns afile descriptor or minus one signifying failure:

     int
     open2 (char *str1, char *str2, int flags, int mode)
     {
       char *name = (char *) alloca (strlen (str1) + strlen (str2) + 1);
       stpcpy (stpcpy (name, str1), str2);
       return open (name, flags, mode);
     }

Here is how you would get the same results with malloc andfree:

     int
     open2 (char *str1, char *str2, int flags, int mode)
     {
       char *name = (char *) malloc (strlen (str1) + strlen (str2) + 1);
       int desc;
       if (name == 0)
         fatal ("virtual memory exceeded");
       stpcpy (stpcpy (name, str1), str2);
       desc = open (name, flags, mode);
       free (name);
       return desc;
     }

As you can see, it is simpler with alloca. But alloca hasother, more important advantages, and some disadvantages.


Next: Disadvantages of Alloca,Previous: Alloca Example,Up: Variable Size Automatic
3.2.5.2 Advantages of alloca

Here are the reasons why alloca may be preferable to malloc:

  • Using alloca wastes very little space and is very fast. (It isopen-coded by the GNU C compiler.)
  • Since alloca does not have separate pools for different sizes ofblock, space used for any size block can be reused for any other size.alloca does not cause memory fragmentation.
  • Nonlocal exits done with longjmp (seeNon-Local Exits)automatically free the space allocated withalloca when they exitthrough the function that calledalloca. This is the mostimportant reason to usealloca.

    To illustrate this, suppose you have a functionopen_or_report_error which returns a descriptor, likeopen, if it succeeds, but does not return to its caller if itfails. If the file cannot be opened, it prints an error message andjumps out to the command level of your program using longjmp. Let's changeopen2 (seeAlloca Example) to use thissubroutine:

              int
              open2 (char *str1, char *str2, int flags, int mode)
              {
                char *name = (char *) alloca (strlen (str1) + strlen (str2) + 1);
                stpcpy (stpcpy (name, str1), str2);
                return open_or_report_error (name, flags, mode);
              }
    

    Because of the way alloca works, the memory it allocates isfreed even when an error occurs, with no special effort required.

    By contrast, the previous definition of open2 (which usesmalloc andfree) would develop a memory leak if it werechanged in this way. Even if you are willing to make more changes tofix it, there is no easy way to do so.


Next: GNU C Variable-Size Arrays,Previous: Advantages of Alloca,Up: Variable Size Automatic
3.2.5.3 Disadvantages of alloca

These are the disadvantages ofalloca in comparison withmalloc:

  • If you try to allocate more memory than the machine can provide, youdon't get a clean error message. Instead you get a fatal signal likethe one you would get from an infinite recursion; probably asegmentation violation (seeProgram Error Signals).
  • Some non-GNU systems fail to support alloca, so it is lessportable. However, a slower emulation ofalloca written in Cis available for use on systems with this deficiency.


Previous: Disadvantages of Alloca,Up: Variable Size Automatic
3.2.5.4 GNU C Variable-Size Arrays

In GNU C, you can replace most uses ofalloca with an array ofvariable size. Here is howopen2 would look then:

     int open2 (char *str1, char *str2, int flags, int mode)
     {
       char name[strlen (str1) + strlen (str2) + 1];
       stpcpy (stpcpy (name, str1), str2);
       return open (name, flags, mode);
     }

But alloca is not always equivalent to a variable-sized array, forseveral reasons:

  • A variable size array's space is freed at the end of the scope of thename of the array. The space allocated withallocaremains until the end of the function.
  • It is possible to use alloca within a loop, allocating anadditional block on each iteration. This is impossible withvariable-sized arrays.

NB: If you mix use of alloca and variable-sized arrayswithin one function, exiting a scope in which a variable-sized array wasdeclared frees all blocks allocated withalloca during theexecution of that scope.


Previous: Locking Pages,Up: Memory

3.3 Resizing the Data Segment

The symbols in this section are declared in unistd.h.

You will not normally use the functions in this section, because thefunctions described inMemory Allocation are easier to use. Thoseare interfaces to a GNU C Library memory allocator that uses thefunctions below itself. The functions below are simple interfaces tosystem calls.

— Function: int brk (void *addr)

brk sets the high end of the calling process' data segment toaddr.

The address of the end of a segment is defined to be the address of thelast byte in the segment plus 1.

The function has no effect if addr is lower than the low end ofthe data segment. (This is considered success, by the way).

The function fails if it would cause the data segment to overlap anothersegment or exceed the process' data storage limit (seeLimits on Resources).

The function is named for a common historical case where data storageand the stack are in the same segment. Data storage allocation growsupward from the bottom of the segment while the stack grows downwardtoward it from the top of the segment and the curtain between them iscalled the break.

The return value is zero on success. On failure, the return value is-1 anderrno is set accordingly. The followingerrnovalues are specific to this function:

ENOMEM
The request would cause the data segment to overlap another segment orexceed the process' data storage limit.

— Function: void *sbrk (ptrdiff_t delta)

This function is the same as brk except that you specify the newend of the data segment as an offsetdelta from the current endand on success the return value is the address of the resulting end ofthe data segment instead of zero.

This means you can use ‘sbrk(0)’ to find out what the current endof the data segment is.


Next: Resizing the Data Segment,Previous: Memory Allocation,Up: Memory

3.4 Locking Pages

You can tell the system to associate a particular virtual memory pagewith a real page frame and keep it that way — i.e., cause the page tobe paged in if it isn't already and mark it so it will never be pagedout and consequently will never cause a page fault. This is calledlocking a page.

The functions in this chapter lock and unlock the calling process'pages.


Next: Locked Memory Details,Up: Locking Pages

3.4.1 Why Lock Pages

Because page faults cause paged out pages to be paged in transparently,a process rarely needs to be concerned about locking pages. However,there are two reasons people sometimes are:

  • Speed. A page fault is transparent only insofar as the process is notsensitive to how long it takes to do a simple memory access. Time-criticalprocesses, especially realtime processes, may not be able to wait ormay not be able to tolerate variance in execution speed. A process that needs to lock pages for this reason probably also needspriority among other processes for use of the CPU. SeePriority.

    In some cases, the programmer knows better than the system's demandpaging allocator which pages should remain in real memory to optimizesystem performance. In this case, locking pages can help.

  • Privacy. If you keep secrets in virtual memory and that virtual memorygets paged out, that increases the chance that the secrets will get out. If a password gets written out to disk swap space, for example, it mightstill be there long after virtual and real memory have been wiped clean.

Be aware that when you lock a page, that's one fewer page frame that canbe used to back other virtual memory (by the same or other processes),which can mean more page faults, which means the system runs moreslowly. In fact, if you lock enough memory, some programs may not beable to run at all for lack of real memory.


Next: Page Lock Functions,Previous: Why Lock Pages,Up: Locking Pages

3.4.2 Locked Memory Details

A memory lock is associated with a virtual page, not a real frame. Thepaging rule is: If a frame backs at least one locked page, don't page itout.

Memory locks do not stack. I.e., you can't lock a particular page twiceso that it has to be unlocked twice before it is truly unlocked. It iseither locked or it isn't.

A memory lock persists until the process that owns the memory explicitlyunlocks it. (But process termination and exec cause the virtual memoryto cease to exist, which you might say means it isn't locked any more).

Memory locks are not inherited by child processes. (But note that on amodern Unix system, immediately after a fork, the parent's and thechild's virtual address space are backed by the same real page frames,so the child enjoys the parent's locks). SeeCreating a Process.

Because of its ability to impact other processes, only the superuser canlock a page. Any process can unlock its own page.

The system sets limits on the amount of memory a process can have lockedand the amount of real memory it can have dedicated to it. SeeLimits on Resources.

In Linux, locked pages aren't as locked as you might think. Two virtual pages that are not shared memory can nonetheless be backedby the same real frame. The kernel does this in the name of efficiencywhen it knows both virtual pages contain identical data, and does iteven if one or both of the virtual pages are locked.

But when a process modifies one of those pages, the kernel must get it aseparate frame and fill it with the page's data. This is known as acopy-on-write page fault. It takes a small amount of time and ina pathological case, getting that frame may require I/O. To make sure this doesn't happen to your program, don't just lock thepages. Write to them as well, unless you know you won't write to themever. And to make sure you have pre-allocated frames for your stack,enter a scope that declares a C automatic variable larger than themaximum stack size you will need, set it to something, then return fromits scope.


Previous: Locked Memory Details,Up: Locking Pages

3.4.3 Functions To Lock And Unlock Pages

The symbols in this section are declared in sys/mman.h. Thesefunctions are defined by POSIX.1b, but their availability depends onyour kernel. If your kernel doesn't allow these functions, they existbut always fail. They are available with a Linux kernel.

Portability Note: POSIX.1b requires that when the mlockandmunlock functions are available, the fileunistd.hdefine the macro_POSIX_MEMLOCK_RANGE and the filelimits.h define the macroPAGESIZE to be the size of amemory page in bytes. It requires that when themlockall andmunlockall functions are available, the unistd.h filedefine the macro_POSIX_MEMLOCK. The GNU C Library conforms tothis requirement.

— Function: int mlock (const void *addr, size_t len)

mlock locks a range of the calling process' virtual pages.

The range of memory starts at address addr and is len byteslong. Actually, since you must lock whole pages, it is the range ofpages that include any part of the specified range.

When the function returns successfully, each of those pages is backed by(connected to) a real frame (is resident) and is marked to stay thatway. This means the function may cause page-ins and have to wait forthem.

When the function fails, it does not affect the lock status of anypages.

The return value is zero if the function succeeds. Otherwise, it is-1 anderrno is set accordingly.errno valuesspecific to this function are:

ENOMEM
  • At least some of the specified address range does not exist in thecalling process' virtual address space.
  • The locking would cause the process to exceed its locked page limit.
EPERM
The calling process is not superuser.
EINVAL
len is not positive.
ENOSYS
The kernel does not provide mlock capability.

You can lock all a process' memory with mlockall. Youunlock memory withmunlock ormunlockall.

To avoid all page faults in a C program, you have to usemlockall, because some of the memory a program uses is hiddenfrom the C code, e.g. the stack and automatic variables, and youwouldn't know what address to tellmlock.

— Function: int munlock (const void *addr, size_t len)

munlock unlocks a range of the calling process' virtual pages.

munlock is the inverse of mlock and functions completelyanalogously tomlock, except that there is noEPERMfailure.

— Function: int mlockall (int flags)

mlockall locks all the pages in a process' virtual memory addressspace, and/or any that are added to it in the future. This includes thepages of the code, data and stack segment, as well as shared libraries,user space kernel data, shared memory, and memory mapped files.

flags is a string of single bit flags represented by the followingmacros. They tellmlockall which of its functions you want. Allother bits must be zero.

MCL_CURRENT
Lock all pages which currently exist in the calling process' virtualaddress space.
MCL_FUTURE
Set a mode such that any pages added to the process' virtual addressspace in the future will be locked from birth. This mode does notaffect future address spaces owned by the same process so exec, whichreplaces a process' address space, wipes outMCL_FUTURE. See Executing a File.

When the function returns successfully, and you specifiedMCL_CURRENT, all of the process' pages are backed by (connectedto) real frames (they are resident) and are marked to stay that way. This means the function may cause page-ins and have to wait for them.

When the process is in MCL_FUTURE mode because it successfullyexecuted this function and specifiedMCL_CURRENT, any system callby the process that requires space be added to its virtual address spacefails witherrno =ENOMEM if locking the additional spacewould cause the process to exceed its locked page limit. In the casethat the address space addition that can't be accommodated is stackexpansion, the stack expansion fails and the kernel sends aSIGSEGV signal to the process.

When the function fails, it does not affect the lock status of any pagesor the future locking mode.

The return value is zero if the function succeeds. Otherwise, it is-1 anderrno is set accordingly.errno valuesspecific to this function are:

ENOMEM
  • At least some of the specified address range does not exist in thecalling process' virtual address space.
  • The locking would cause the process to exceed its locked page limit.
EPERM
The calling process is not superuser.
EINVAL
Undefined bits in flags are not zero.
ENOSYS
The kernel does not provide mlockall capability.

You can lock just specific pages with mlock. You unlock pageswithmunlockall andmunlock.

— Function: int munlockall (void)

munlockall unlocks every page in the calling process' virtualaddress space and turn offMCL_FUTURE future locking mode.

The return value is zero if the function succeeds. Otherwise, it is-1 anderrno is set accordingly. The only way thisfunction can fail is for generic reasons that all functions and systemcalls can fail, so there are no specificerrno values.


Next: String and Array Utilities,Previous: Memory,Up: Top

4 Character Handling

Programs that work with characters and strings often need to classify acharacter—is it alphabetic, is it a digit, is it whitespace, and soon—and perform case conversion operations on characters. Thefunctions in the header filectype.h are provided for thispurpose. Since the choice of locale and character set can alter theclassifications of particular character codes, all of these functionsare affected by the current locale. (More precisely, they are affectedby the locale currently selected for character classification—theLC_CTYPE category; seeLocale Categories.)

The ISO C standard specifies two different sets of functions. Theone set works onchar type characters, the other one onwchar_t wide characters (seeExtended Char Intro).


Next: Case Conversion,Up: Character Handling

4.1 Classification of Characters

This section explains the library functions for classifying characters. For example, isalpha is the function to test for an alphabeticcharacter. It takes one argument, the character to test, and returns anonzero integer if the character is alphabetic, and zero otherwise. Youwould use it like this:

     if (isalpha (c))
       printf ("The character `%c' is alphabetic.\n", c);

Each of the functions in this section tests for membership in aparticular class of characters; each has a name starting with ‘is’. Each of them takes one argument, which is a character to test, andreturns anint which is treated as a boolean value. Thecharacter argument is passed as anint, and it may be theconstant valueEOF instead of a real character.

The attributes of any given character can vary between locales. See Locales, for more information on locales.

These functions are declared in the header file ctype.h.

— Function: int islower (int c)

Returns true if c is a lower-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.

— Function: int isupper (int c)

Returns true if c is an upper-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.

— Function: int isalpha (int c)

Returns true if c is an alphabetic character (a letter). Ifislower orisupper is true of a character, thenisalpha is also true.

In some locales, there may be additional characters for whichisalpha is true—letters which are neither upper case nor lowercase. But in the standard"C" locale, there are no suchadditional characters.

— Function: int isdigit (int c)

Returns true if c is a decimal digit (‘0’ through ‘9’).

— Function: int isalnum (int c)

Returns true if c is an alphanumeric character (a letter ornumber); in other words, if eitherisalpha orisdigit istrue of a character, then isalnum is also true.

— Function: int isxdigit (int c)

Returns true if c is a hexadecimal digit. Hexadecimal digits include the normal decimal digits ‘0’ through‘9’ and the letters ‘A’ through ‘F’ and‘a’ through ‘f’.

— Function: int ispunct (int c)

Returns true if c is a punctuation character. This means any printing character that is not alphanumeric or a spacecharacter.

— Function: int isspace (int c)

Returns true if c is a whitespace character. In the standard"C" locale,isspace returns true for only the standardwhitespace characters:

' '
space
'\f'
formfeed
'\n'
newline
'\r'
carriage return
'\t'
horizontal tab
'\v'
vertical tab

— Function: int isblank (int c)

Returns true if c is a blank character; that is, a space or a tab. This function was originally a GNU extension, but was added in ISO C99.

— Function: int isgraph (int c)

Returns true if c is a graphic character; that is, a characterthat has a glyph associated with it. The whitespace characters are notconsidered graphic.

— Function: int isprint (int c)

Returns true if c is a printing character. Printing charactersinclude all the graphic characters, plus the space (‘’) character.

— Function: int iscntrl (int c)

Returns true if c is a control character (that is, a character thatis not a printing character).

— Function: int isascii (int c)

Returns true if c is a 7-bit unsigned char value that fitsinto the US/UK ASCII character set. This function is a BSD extensionand is also an SVID extension.


Next: Classification of Wide Characters,Previous: Classification of Characters,Up: Character Handling

4.2 Case Conversion

This section explains the library functions for performing conversionssuch as case mappings on characters. For example, toupperconverts any character to upper case if possible. If the charactercan't be converted,toupper returns it unchanged.

These functions take one argument of type int, which is thecharacter to convert, and return the converted character as anint. If the conversion is not applicable to the argument given,the argument is returned unchanged.

Compatibility Note: In pre-ISO C dialects, instead ofreturning the argument unchanged, these functions may fail when theargument is not suitable for the conversion. Thus for portability, youmay need to writeislower(c) ? toupper(c) : c rather than justtoupper(c).

These functions are declared in the header file ctype.h.

— Function: int tolower (int c)

If c is an upper-case letter, tolower returns the correspondinglower-case letter. Ifc is not an upper-case letter,c is returned unchanged.

— Function: int toupper (int c)

If c is a lower-case letter, toupper returns the correspondingupper-case letter. Otherwisec is returned unchanged.

— Function: int toascii (int c)

This function converts c to a 7-bit unsigned char valuethat fits into the US/UK ASCII character set, by clearing the high-orderbits. This function is a BSD extension and is also an SVID extension.

— Function: int _tolower (int c)

This is identical to tolower, and is provided for compatibilitywith the SVID. SeeSVID.

— Function: int _toupper (int c)

This is identical to toupper, and is provided for compatibilitywith the SVID.


Next: Using Wide Char Classes,Previous: Case Conversion,Up: Character Handling

4.3 Character class determination for wide characters

Amendment 1 to ISO C90 defines functions to classify widecharacters. Although the original ISO C90 standard already definedthe typewchar_t, no functions operating on them were defined.

The general design of the classification functions for wide charactersis more general. It allows extensions to the set of availableclassifications, beyond those which are always available. The POSIXstandard specifies how extensions can be made, and this is alreadyimplemented in the GNU C Library implementation of the localedefprogram.

The character class functions are normally implemented with bitsets,with a bitset per character. For a given character, the appropriatebitset is read from a table and a test is performed as to whether acertain bit is set. Which bit is tested for is determined by theclass.

For the wide character classification functions this is made visible. There is a type classification type defined, a function to retrieve thisvalue for a given class, and a function to test whether a givencharacter is in this class, using the classification value. On top ofthis the normal character classification functions as used forchar objects can be defined.

— Data type: wctype_t

The wctype_t can hold a value which represents a character class. The only defined way to generate such a value is by using thewctype function.

This type is defined in wctype.h.

— Function: wctype_t wctype (const char *property)

The wctype returns a value representing a class of widecharacters which is identified by the stringproperty. Besidesome standard properties each locale can define its own ones. In caseno property with the given name is known for the current localeselected for theLC_CTYPE category, the function returns zero.

The properties known in every locale are:

"alnum""alpha""cntrl""digit"
"graph""lower""print""punct"
"space""upper""xdigit"

This function is declared in wctype.h.

To test the membership of a character to one of the non-standard classesthe ISO C standard defines a completely new function.

— Function: int iswctype (wint_t wc, wctype_t desc)

This function returns a nonzero value if wc is in the characterclass specified bydesc.desc must previously be returnedby a successful call towctype.

This function is declared in wctype.h.

To make it easier to use the commonly-used classification functions,they are defined in the C library. There is no need to usewctype if the property string is one of the known characterclasses. In some situations it is desirable to construct the propertystrings, and then it is important that wctype can also handle thestandard classes.

— Function: int iswalnum (wint_t wc)

This function returns a nonzero value if wc is an alphanumericcharacter (a letter or number); in other words, if eitheriswalphaoriswdigit is true of a character, then iswalnum is alsotrue.

This function can be implemented using

          iswctype (wc, wctype ("alnum"))

It is declared in wctype.h.

— Function: int iswalpha (wint_t wc)

Returns true if wc is an alphabetic character (a letter). Ifiswlower oriswupper is true of a character, theniswalpha is also true.

In some locales, there may be additional characters for whichiswalpha is true—letters which are neither upper case nor lowercase. But in the standard"C" locale, there are no suchadditional characters.

This function can be implemented using

          iswctype (wc, wctype ("alpha"))

It is declared in wctype.h.

— Function: int iswcntrl (wint_t wc)

Returns true if wc is a control character (that is, a character thatis not a printing character).

This function can be implemented using

          iswctype (wc, wctype ("cntrl"))

It is declared in wctype.h.

— Function: int iswdigit (wint_t wc)

Returns true if wc is a digit (e.g., ‘0’ through ‘9’). Please note that this function does not only return a nonzero value fordecimal digits, but for all kinds of digits. A consequence isthat code like the following will not work unconditionally forwide characters:

          n = 0;
          while (iswdigit (*wc))
            {
              n *= 10;
              n += *wc++ - L'0';
            }

This function can be implemented using

          iswctype (wc, wctype ("digit"))

It is declared in wctype.h.

— Function: int iswgraph (wint_t wc)

Returns true if wc is a graphic character; that is, a characterthat has a glyph associated with it. The whitespace characters are notconsidered graphic.

This function can be implemented using

          iswctype (wc, wctype ("graph"))

It is declared in wctype.h.

— Function: int iswlower (wint_t wc)

Returns true if wc is a lower-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.

This function can be implemented using

          iswctype (wc, wctype ("lower"))

It is declared in wctype.h.

— Function: int iswprint (wint_t wc)

Returns true if wc is a printing character. Printing charactersinclude all the graphic characters, plus the space (‘’) character.

This function can be implemented using

          iswctype (wc, wctype ("print"))

It is declared in wctype.h.

— Function: int iswpunct (wint_t wc)

Returns true if wc is a punctuation character. This means any printing character that is not alphanumeric or a spacecharacter.

This function can be implemented using

          iswctype (wc, wctype ("punct"))

It is declared in wctype.h.

— Function: int iswspace (wint_t wc)

Returns true if wc is a whitespace character. In the standard"C" locale,iswspace returns true for only the standardwhitespace characters:

L' '
space
L'\f'
formfeed
L'\n'
newline
L'\r'
carriage return
L'\t'
horizontal tab
L'\v'
vertical tab

This function can be implemented using

          iswctype (wc, wctype ("space"))

It is declared in wctype.h.

— Function: int iswupper (wint_t wc)

Returns true if wc is an upper-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.

This function can be implemented using

          iswctype (wc, wctype ("upper"))

It is declared in wctype.h.

— Function: int iswxdigit (wint_t wc)

Returns true if wc is a hexadecimal digit. Hexadecimal digits include the normal decimal digits ‘0’ through‘9’ and the letters ‘A’ through ‘F’ and‘a’ through ‘f’.

This function can be implemented using

          iswctype (wc, wctype ("xdigit"))

It is declared in wctype.h.

The GNU C Library also provides a function which is not defined in theISO C standard but which is available as a version for single bytecharacters as well.

— Function: int iswblank (wint_t wc)

Returns true if wc is a blank character; that is, a space or a tab. This function was originally a GNU extension, but was added in ISO C99. It is declared inwchar.h.


Next: Wide Character Case Conversion,Previous: Classification of Wide Characters,Up: Character Handling

4.4 Notes on using the wide character classes

The first note is probably not astonishing but still occasionally acause of problems. TheiswXXX functions can be implementedusing macros and in fact, the GNU C Library does this. They are stillavailable as real functions but when thewctype.h header isincluded the macros will be used. This is the same as thechar type versions of these functions.

The second note covers something new. It can be best illustrated by a(real-world) example. The first piece of code is an excerpt from theoriginal code. It is truncated a bit but the intention should be clear.

     int
     is_in_class (int c, const char *class)
     {
       if (strcmp (class, "alnum") == 0)
         return isalnum (c);
       if (strcmp (class, "alpha") == 0)
         return isalpha (c);
       if (strcmp (class, "cntrl") == 0)
         return iscntrl (c);
       ...
       return 0;
     }

Now, with the wctype and iswctype you can avoid theif cascades, but rewriting the code as follows is wrong:

     int
     is_in_class (int c, const char *class)
     {
       wctype_t desc = wctype (class);
       return desc ? iswctype ((wint_t) c, desc) : 0;
     }

The problem is that it is not guaranteed that the wide characterrepresentation of a single-byte character can be found using casting. In fact, usually this fails miserably. The correct solution to thisproblem is to write the code as follows:

     int
     is_in_class (int c, const char *class)
     {
       wctype_t desc = wctype (class);
       return desc ? iswctype (btowc (c), desc) : 0;
     }

See Converting a Character, for more information on btowc. Note that this change probably does not improve the performanceof the program a lot since thewctype function still has to makethe string comparisons. It gets really interesting if theis_in_class function is called more than once for thesame class name. In this case the variabledesc could be computedonce and reused for all the calls. Therefore the above form of thefunction is probably not the final one.


Previous: Using Wide Char Classes,Up: Character Handling

4.5 Mapping of wide characters.

The classification functions are also generalized by the ISO Cstandard. Instead of just allowing the two standard mappings, alocale can contain others. Again, thelocaledef programalready supports generating such locale data files.

— Data Type: wctrans_t

This data type is defined as a scalar type which can hold a valuerepresenting the locale-dependent character mapping. There is no way toconstruct such a value apart from using the return value of thewctrans function.

This type is defined in wctype.h.

— Function: wctrans_t wctrans (const char *property)

The wctrans function has to be used to find out whether a namedmapping is defined in the current locale selected for theLC_CTYPE category. If the returned value is non-zero, you can useit afterwards in calls totowctrans. If the return value iszero no such mapping is known in the current locale.

Beside locale-specific mappings there are two mappings which areguaranteed to be available in every locale:

"tolower""toupper"

These functions are declared in wctype.h.

— Function: wint_t towctrans (wint_t wc, wctrans_t desc)

towctrans maps the input character wcaccording to the rules of the mapping for whichdesc is adescriptor, and returns the value it finds.desc must beobtained by a successful call towctrans.

This function is declared in wctype.h.

For the generally available mappings, the ISO C standard definesconvenient shortcuts so that it is not necessary to callwctransfor them.

— Function: wint_t towlower (wint_t wc)

If wc is an upper-case letter, towlower returns the correspondinglower-case letter. Ifwc is not an upper-case letter,wc is returned unchanged.

towlower can be implemented using

          towctrans (wc, wctrans ("tolower"))

This function is declared in wctype.h.

— Function: wint_t towupper (wint_t wc)

If wc is a lower-case letter, towupper returns the correspondingupper-case letter. Otherwisewc is returned unchanged.

towupper can be implemented using

          towctrans (wc, wctrans ("toupper"))

This function is declared in wctype.h.

The same warnings given in the last section for the use of the widecharacter classification functions apply here. It is not possible tosimply cast achar type value to awint_t and use it as anargument totowctrans calls.


Next: Character Set Handling,Previous: Character Handling,Up: Top

5 String and Array Utilities

Operations on strings (or arrays of characters) are an important part ofmany programs. The GNU C Library provides an extensive set of stringutility functions, including functions for copying, concatenating,comparing, and searching strings. Many of these functions can alsooperate on arbitrary regions of storage; for example, the memcpyfunction can be used to copy the contents of any kind of array.

It's fairly common for beginning C programmers to “reinvent the wheel”by duplicating this functionality in their own code, but it pays tobecome familiar with the library functions and to make use of them,since this offers benefits in maintenance, efficiency, and portability.

For instance, you could easily compare one string to another in twolines of C code, but if you use the built-instrcmp function,you're less likely to make a mistake. And, since these libraryfunctions are typically highly optimized, your program may run fastertoo.


Next: String/Array Conventions,Up: String and Array Utilities

5.1 Representation of Strings

This section is a quick summary of string concepts for beginning Cprogrammers. It describes how character strings are represented in Cand some common pitfalls. If you are already familiar with thismaterial, you can skip this section.

Astring is an array ofchar objects. But string-valuedvariables are usually declared to be pointers of typechar *. Such variables do not include space for the text of a string; that hasto be stored somewhere else—in an array variable, a string constant,or dynamically allocated memory (seeMemory Allocation). It's up toyou to store the address of the chosen memory space into the pointervariable. Alternatively you can store anull pointer in thepointer variable. The null pointer does not point anywhere, soattempting to reference the string it points to gets an error.

“string” normally refers to multibyte character strings as opposed towide character strings. Wide character strings are arrays of typewchar_t and as for multibyte character strings usually pointersof type wchar_t * are used.

By convention, anull character,'\0', marks the end of amultibyte character string and thenull wide character,L'\0', marks the end of a wide character string. For example, intesting to see whether thechar * variablep points to anull character marking the end of a string, you can write!*p or*p== '\0'.

A null character is quite different conceptually from a null pointer,although both are represented by the integer0.

String literals appear in C program source as strings ofcharacters between double-quote characters (‘"’) where the initialdouble-quote character is immediately preceded by a capital ‘L’(ell) character (as in L"foo"). In ISO C, string literalscan also be formed by string concatenation:"a" "b" is thesame as"ab". For wide character strings one can either useL"a" L"b" orL"a" "b". Modification of string literals isnot allowed by the GNU C compiler, because literals are placed inread-only storage.

Character arrays that are declared const cannot be modifiedeither. It's generally good style to declare non-modifiable stringpointers to be of typeconst char *, since this often allows theC compiler to detect accidental modifications as well as providing someamount of documentation about what your program intends to do with thestring.

The amount of memory allocated for the character array may extend pastthe null character that normally marks the end of the string. In thisdocument, the termallocated size is always used to refer to thetotal amount of memory allocated for the string, while the termlength refers to the number of characters up to (but notincluding) the terminating null character.A notorious source of program bugs is trying to put more characters in astring than fit in its allocated size. When writing code that extendsstrings or moves characters into a pre-allocated array, you should bevery careful to keep track of the length of the text and make explicitchecks for overflowing the array. Many of the library functionsdo not do this for you! Remember also that you need to allocatean extra byte to hold the null character that marks the end of thestring.

Originally strings were sequences of bytes where each byte represents asingle character. This is still true today if the strings are encodedusing a single-byte character encoding. Things are different if thestrings are encoded using a multibyte encoding (for more information onencodings seeExtended Char Intro). There is no difference inthe programming interface for these two kind of strings; the programmerhas to be aware of this and interpret the byte sequences accordingly.

But since there is no separate interface taking care of thesedifferences the byte-based string functions are sometimes hard to use. Since the count parameters of these functions specify bytes a call tostrncpy could cut a multibyte character in the middle and put anincomplete (and therefore unusable) byte sequence in the target buffer.

To avoid these problems later versions of the ISO C standardintroduce a second set of functions which are operating onwidecharacters (seeExtended Char Intro). These functions don't havethe problems the single-byte versions have since every wide character isa legal, interpretable value. This does not mean that cutting widecharacter strings at arbitrary points is without problems. It normallyis for alphabet-based languages (except for non-normalized text) butlanguages based on syllables still have the problem that more than onewide character is necessary to complete a logical unit. This is ahigher level problem which the C library functions are not designedto solve. But it is at least good that no invalid byte sequences can becreated. Also, the higher level functions can also much easier operateon wide character than on multibyte characters so that a general adviseis to use wide characters internally whenever text is more than simplycopied.

The remaining of this chapter will discuss the functions for handlingwide character strings in parallel with the discussion of the multibytecharacter strings since there is almost always an exact equivalentavailable.


Next: String Length,Previous: Representation of Strings,Up: String and Array Utilities

5.2 String and Array Conventions

This chapter describes both functions that work on arbitrary arrays orblocks of memory, and functions that are specific to null-terminatedarrays of characters and wide characters.

Functions that operate on arbitrary blocks of memory have namesbeginning with ‘mem’ and ‘wmem’ (such asmemcpy andwmemcpy) and invariably take an argument which specifies the size(in bytes and wide characters respectively) of the block of memory tooperate on. The array arguments and return values for these functionshave typevoid * orwchar_t. As a matter of style, theelements of the arrays used with the ‘mem’ functions are referredto as “bytes”. You can pass any kind of pointer to these functions,and thesizeof operator is useful in computing the value for thesize argument. Parameters to the ‘wmem’ functions must be of typewchar_t *. These functions are not really usable with anythingbut arrays of this type.

In contrast, functions that operate specifically on strings and widecharacter strings have names beginning with ‘str’ and ‘wcs’respectively (such asstrcpy andwcscpy) and look for anull character to terminate the string instead of requiring an explicitsize argument to be passed. (Some of these functions accept a specifiedmaximum length, but they also check for premature termination with anull character.) The array arguments and return values for thesefunctions have typechar * andwchar_t * respectively, andthe array elements are referred to as “characters” and “widecharacters”.

In many cases, there are both ‘mem’ and ‘str’/‘wcs’versions of a function. The one that is more appropriate to use dependson the exact situation. When your program is manipulating arbitraryarrays or blocks of storage, then you should always use the ‘mem’functions. On the other hand, when you are manipulating null-terminatedstrings it is usually more convenient to use the ‘str’/‘wcs’functions, unless you already know the length of the string in advance. The ‘wmem’ functions should be used for wide character arrays withknown size.

Some of the memory and string functions take single characters asarguments. Since a value of typechar is automatically promotedinto an value of typeint when used as a parameter, the functionsare declared withint as the type of the parameter in question. In case of the wide character function the situation is similarly: theparameter type for a single wide character iswint_t and notwchar_t. This would for many implementations not be necessarysince thewchar_t is large enough to not be automaticallypromoted, but since the ISO C standard does not require such achoice of types thewint_t type is used.


Next: Copying and Concatenation,Previous: String/Array Conventions,Up: String and Array Utilities

5.3 String Length

You can get the length of a string using the strlen function. This function is declared in the header filestring.h.

— Function: size_t strlen (const char *s)

The strlen function returns the length of the null-terminatedstrings in bytes. (In other words, it returns the offset of theterminating null character within the array.)

For example,

          strlen ("hello, world")
              ⇒ 12

When applied to a character array, the strlen function returnsthe length of the string stored there, not its allocated size. You canget the allocated size of the character array that holds a string usingthesizeof operator:

          char string[32] = "hello, world";
          sizeof (string)
              ⇒ 32
          strlen (string)
              ⇒ 12

But beware, this will not work unless string is the characterarray itself, not a pointer to it. For example:

          char string[32] = "hello, world";
          char *ptr = string;
          sizeof (string)
              ⇒ 32
          sizeof (ptr)
              ⇒ 4  /* (on a machine with 4 byte pointers) */

This is an easy mistake to make when you are working with functions thattake string arguments; those arguments are always pointers, not arrays.

It must also be noted that for multibyte encoded strings the returnvalue does not have to correspond to the number of characters in thestring. To get this value the string can be converted to widecharacters andwcslen can be used or something like the followingcode can be used:

          /* The input is in string.
             The length is expected in n.  */
          {
            mbstate_t t;
            char *scopy = string;
            /* In initial state.  */
            memset (&t, '\0', sizeof (t));
            /* Determine number of characters.  */
            n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
          }

This is cumbersome to do so if the number of characters (as opposed tobytes) is needed often it is better to work with wide characters.

The wide character equivalent is declared in wchar.h.

— Function: size_t wcslen (const wchar_t *ws)

The wcslen function is the wide character equivalent tostrlen. The return value is the number of wide characters in thewide character string pointed to byws (this is also the offset ofthe terminating null wide character of ws).

Since there are no multi wide character sequences making up onecharacter the return value is not only the offset in the array, it isalso the number of wide characters.

This function was introduced in Amendment 1 to ISO C90.

— Function: size_t strnlen (const char *s, size_t maxlen)

The strnlen function returns the length of the string s inbytes if this length is smaller thanmaxlen bytes. Otherwise itreturnsmaxlen. Therefore this function is equivalent to(strlen (s) <maxlen? strlen (s) :maxlen)but itis more efficient and works even if the strings is notnull-terminated.

          char string[32] = "hello, world";
          strnlen (string, 32)
              ⇒ 12
          strnlen (string, 5)
              ⇒ 5

This function is a GNU extension and is declared in string.h.

— Function: size_t wcsnlen (const wchar_t *ws, size_t maxlen)

wcsnlen is the wide character equivalent to strnlen. Themaxlen parameter specifies the maximum number of wide characters.

This function is a GNU extension and is declared in wchar.h.


Next: String/Array Comparison,Previous: String Length,Up: String and Array Utilities

5.4 Copying and Concatenation

You can use the functions described in this section to copy the contentsof strings and arrays, or to append the contents of one string toanother. The ‘str’ and ‘mem’ functions are declared in theheader file string.h while the ‘wstr’ and ‘wmem’functions are declared in the filewchar.h.A helpful way to remember the ordering of the arguments to the functionsin this section is that it corresponds to an assignment expression, withthe destination array specified to the left of the source array. Allof these functions return the address of the destination array.

Most of these functions do not work properly if the source anddestination arrays overlap. For example, if the beginning of thedestination array overlaps the end of the source array, the originalcontents of that part of the source array may get overwritten before itis copied. Even worse, in the case of the string functions, the nullcharacter marking the end of the string may be lost, and the copyfunction might get stuck in a loop trashing all the memory allocated toyour program.

All functions that have problems copying between overlapping arrays areexplicitly identified in this manual. In addition to functions in thissection, there are a few others likesprintf (seeFormatted Output Functions) and scanf (see Formatted Input Functions).

— Function: void * memcpy (void *restrict to, const void *restrict from, size_t size)

The memcpy function copies size bytes from the objectbeginning atfrom into the object beginning atto. Thebehavior of this function is undefined if the two arraysto andfrom overlap; usememmove instead if overlapping is possible.

The value returned by memcpy is the value of to.

Here is an example of how you might use memcpy to copy thecontents of an array:

          struct foo *oldarray, *newarray;
          int arraysize;
          ...
          memcpy (new, old, arraysize * sizeof (struct foo));

— Function: wchar_t * wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size)

The wmemcpy function copies size wide characters from the objectbeginning atwfrom into the object beginning atwto. Thebehavior of this function is undefined if the two arrayswto andwfrom overlap; usewmemmove instead if overlapping is possible.

The following is a possible implementation of wmemcpy but thereare more optimizations possible.

          wchar_t *
          wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                   size_t size)
          {
            return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
          }

The value returned by wmemcpy is the value of wto.

This function was introduced in Amendment 1 to ISO C90.

— Function: void * mempcpy (void *restrict to, const void *restrict from, size_t size)

The mempcpy function is nearly identical to the memcpyfunction. It copiessize bytes from the object beginning atfrom into the object pointed to byto. But instead ofreturning the value ofto it returns a pointer to the bytefollowing the last written byte in the object beginning atto. I.e., the value is((void *) ((char *)to+size)).

This function is useful in situations where a number of objects shall becopied to consecutive memory positions.

          void *
          combine (void *o1, size_t s1, void *o2, size_t s2)
          {
            void *result = malloc (s1 + s2);
            if (result != NULL)
              mempcpy (mempcpy (result, o1, s1), o2, s2);
            return result;
          }

This function is a GNU extension.

— Function: wchar_t * wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size)

The wmempcpy function is nearly identical to the wmemcpyfunction. It copiessize wide characters from the objectbeginning atwfrom into the object pointed to bywto. Butinstead of returning the value ofwto it returns a pointer to thewide character following the last written wide character in the objectbeginning atwto. I.e., the value iswto+size.

This function is useful in situations where a number of objects shall becopied to consecutive memory positions.

The following is a possible implementation of wmemcpy but thereare more optimizations possible.

          wchar_t *
          wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                    size_t size)
          {
            return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
          }

This function is a GNU extension.

— Function: void * memmove (void *to, const void *from, size_t size)

memmove copies the size bytes at from into thesize bytes atto, even if those two blocks of spaceoverlap. In the case of overlap,memmove is careful to copy theoriginal values of the bytes in the block atfrom, including thosebytes which also belong to the block atto.

The value returned by memmove is the value of to.

— Function: wchar_t * wmemmove (wchar *wto, const wchar_t *wfrom, size_t size)

wmemmove copies the size wide characters at wfrominto thesize wide characters atwto, even if those twoblocks of space overlap. In the case of overlap,memmove iscareful to copy the original values of the wide characters in the blockatwfrom, including those wide characters which also belong to theblock atwto.

The following is a possible implementation of wmemcpy but thereare more optimizations possible.

          wchar_t *
          wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                    size_t size)
          {
            return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
          }

The value returned by wmemmove is the value of wto.

This function is a GNU extension.

— Function: void * memccpy (void *restrict to, const void *restrict from, int c, size_t size)

This function copies no more than size bytes from from toto, stopping if a byte matchingc is found. The returnvalue is a pointer intoto one byte past wherec was copied,or a null pointer if no byte matchingc appeared in the firstsize bytes offrom.

— Function: void * memset (void *block, int c, size_t size)

This function copies the value of c (converted to anunsigned char) into each of the firstsize bytes of theobject beginning atblock. It returns the value ofblock.

— Function: wchar_t * wmemset (wchar_t *block, wchar_t wc, size_t size)

This function copies the value of wc into each of the firstsize wide characters of the object beginning atblock. Itreturns the value ofblock.

— Function: char * strcpy (char *restrict to, const char *restrict from)

This copies characters from the string from (up to and includingthe terminating null character) into the stringto. Likememcpy, this function has undefined results if the stringsoverlap. The return value is the value ofto.

— Function: wchar_t * wcscpy (wchar_t *restrict wto, const wchar_t *restrict wfrom)

This copies wide characters from the string wfrom (up to andincluding the terminating null wide character) into the stringwto. Likewmemcpy, this function has undefined results ifthe strings overlap. The return value is the value ofwto.

— Function: char * strncpy (char *restrict to, const char *restrict from, size_t size)

This function is similar to strcpy but always copies exactlysize characters intoto.

If the length of from is more than size, then strncpycopies just the firstsize characters. Note that in this casethere is no null terminator written intoto.

If the length of from is less than size, then strncpycopies all offrom, followed by enough null characters to add uptosize characters in all. This behavior is rarely useful, but itis specified by the ISO C standard.

The behavior of strncpy is undefined if the strings overlap.

Using strncpy as opposed to strcpy is a way to avoid bugsrelating to writing past the end of the allocated space forto. However, it can also make your program much slower in one common case:copying a string which is probably small into a potentially large buffer. In this case,size may be large, and when it is,strncpy willwaste a considerable amount of time copying null characters.

— Function: wchar_t * wcsncpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size)

This function is similar to wcscpy but always copies exactlysize wide characters intowto.

If the length of wfrom is more than size, thenwcsncpy copies just the firstsize wide characters. Notethat in this case there is no null terminator written intowto.

If the length of wfrom is less than size, thenwcsncpy copies all ofwfrom, followed by enough null widecharacters to add up tosize wide characters in all. Thisbehavior is rarely useful, but it is specified by the ISO Cstandard.

The behavior of wcsncpy is undefined if the strings overlap.

Using wcsncpy as opposed to wcscpy is a way to avoid bugsrelating to writing past the end of the allocated space forwto. However, it can also make your program much slower in one common case:copying a string which is probably small into a potentially large buffer. In this case,size may be large, and when it is,wcsncpy willwaste a considerable amount of time copying null wide characters.

— Function: char * strdup (const char *s)

This function copies the null-terminated string s into a newlyallocated string. The string is allocated usingmalloc; seeUnconstrained Allocation. Ifmalloc cannot allocate spacefor the new string, strdup returns a null pointer. Otherwise itreturns a pointer to the new string.

— Function: wchar_t * wcsdup (const wchar_t *ws)

This function copies the null-terminated wide character string wsinto a newly allocated string. The string is allocated usingmalloc; seeUnconstrained Allocation. If malloccannot allocate space for the new string,wcsdup returns a nullpointer. Otherwise it returns a pointer to the new wide characterstring.

This function is a GNU extension.

— Function: char * strndup (const char *s, size_t size)

This function is similar to strdup but always copies at mostsize characters into the newly allocated string.

If the length of s is more than size, then strndupcopies just the firstsize characters and adds a closing nullterminator. Otherwise all characters are copied and the string isterminated.

This function is different to strncpy in that it alwaysterminates the destination string.

strndup is a GNU extension.

— Function: char * stpcpy (char *restrict to, const char *restrict from)

This function is like strcpy, except that it returns a pointer tothe end of the stringto (that is, the address of the terminatingnull characterto + strlen (from)) rather than the beginning.

For example, this program uses stpcpy to concatenate ‘foo’and ‘bar’ to produce ‘foobar’, which it then prints.

          #include <string.h>
          #include <stdio.h>
          
          int
          main (void)
          {
            char buffer[10];
            char *to = buffer;
            to = stpcpy (to, "foo");
            to = stpcpy (to, "bar");
            puts (buffer);
            return 0;
          }

This function is not part of the ISO or POSIX standards, and is notcustomary on Unix systems, but we did not invent it either. Perhaps itcomes from MS-DOG.

Its behavior is undefined if the strings overlap. The function isdeclared in string.h.

— Function: wchar_t * wcpcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom)

This function is like wcscpy, except that it returns a pointer tothe end of the stringwto (that is, the address of the terminatingnull characterwto + strlen (wfrom)) rather than the beginning.

This function is not part of ISO or POSIX but was found useful whiledeveloping the GNU C Library itself.

The behavior of wcpcpy is undefined if the strings overlap.

wcpcpy is a GNU extension and is declared in wchar.h.

— Function: char * stpncpy (char *restrict to, const char *restrict from, size_t size)

This function is similar to stpcpy but copies always exactlysize characters intoto.

If the length of from is more then size, then stpncpycopies just the firstsize characters and returns a pointer to thecharacter directly following the one which was copied last. Note that inthis case there is no null terminator written intoto.

If the length of from is less than size, then stpncpycopies all offrom, followed by enough null characters to add uptosize characters in all. This behavior is rarely useful, but itis implemented to be useful in contexts where this behavior of thestrncpy is used.stpncpy returns a pointer to thefirst written null character.

This function is not part of ISO or POSIX but was found useful whiledeveloping the GNU C Library itself.

Its behavior is undefined if the strings overlap. The function isdeclared in string.h.

— Function: wchar_t * wcpncpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size)

This function is similar to wcpcpy but copies always exactlywsize characters intowto.

If the length of wfrom is more then size, thenwcpncpy copies just the firstsize wide characters andreturns a pointer to the wide character directly following the lastnon-null wide character which was copied last. Note that in this casethere is no null terminator written intowto.

If the length of wfrom is less than size, then wcpncpycopies all ofwfrom, followed by enough null characters to add uptosize characters in all. This behavior is rarely useful, but itis implemented to be useful in contexts where this behavior of thewcsncpy is used.wcpncpy returns a pointer to thefirst written null character.

This function is not part of ISO or POSIX but was found useful whiledeveloping the GNU C Library itself.

Its behavior is undefined if the strings overlap.

wcpncpy is a GNU extension and is declared in wchar.h.

— Macro: char * strdupa (const char *s)

This macro is similar to strdup but allocates the new stringusingalloca instead ofmalloc (see Variable Size Automatic). This means of course the returned string has the samelimitations as any block of memory allocated usingalloca.

For obvious reasons strdupa is implemented only as a macro;you cannot get the address of this function. Despite this limitationit is a useful function. The following code shows a situation whereusingmalloc would be a lot more expensive.

          #include <paths.h>
          #include <string.h>
          #include <stdio.h>
          
          const char path[] = _PATH_STDPATH;
          
          int
          main (void)
          {
            char *wr_path = strdupa (path);
            char *cp = strtok (wr_path, ":");
          
            while (cp != NULL)
              {
                puts (cp);
                cp = strtok (NULL, ":");
              }
            return 0;
          }

Please note that calling strtok using path directly isinvalid. It is also not allowed to callstrdupa in the argumentlist ofstrtok since strdupa usesalloca(see Variable Size Automatic) can interfere with the parameterpassing.

This function is only available if GNU CC is used.

— Macro: char * strndupa (const char *s, size_t size)

This function is similar to strndup but like strdupa itallocates the new string usingallocaseeVariable Size Automatic. The same advantages and limitationsof strdupa are valid forstrndupa, too.

This function is implemented only as a macro, just like strdupa. Just asstrdupa this macro also must not be used inside theparameter list in a function call.

strndupa is only available if GNU CC is used.

— Function: char * strcat (char *restrict to, const char *restrict from)

The strcat function is similar to strcpy, except that thecharacters fromfrom are concatenated or appended to the end ofto, instead of overwriting it. That is, the first character fromfrom overwrites the null character marking the end ofto.

An equivalent definition for strcat would be:

          char *
          strcat (char *restrict to, const char *restrict from)
          {
            strcpy (to + strlen (to), from);
            return to;
          }

This function has undefined results if the strings overlap.

— Function: wchar_t * wcscat (wchar_t *restrict wto, const wchar_t *restrict wfrom)

The wcscat function is similar to wcscpy, except that thecharacters fromwfrom are concatenated or appended to the end ofwto, instead of overwriting it. That is, the first character fromwfrom overwrites the null character marking the end ofwto.

An equivalent definition for wcscat would be:

          wchar_t *
          wcscat (wchar_t *wto, const wchar_t *wfrom)
          {
            wcscpy (wto + wcslen (wto), wfrom);
            return wto;
          }

This function has undefined results if the strings overlap.

Programmers using the strcat or wcscat function (or thefollowingstrncat orwcsncar functions for that matter)can easily be recognized as lazy and reckless. In almost all situationsthe lengths of the participating strings are known (it better should besince how can one otherwise ensure the allocated size of the buffer issufficient?) Or at least, one could know them if one keeps track of theresults of the various function calls. But then it is very inefficientto usestrcat/wcscat. A lot of time is wasted finding theend of the destination string so that the actual copying can start. This is a common example:

     /* This function concatenates arbitrarily many strings.  The last
        parameter must be NULL.  */
     char *
     concat (const char *str, ...)
     {
       va_list ap, ap2;
       size_t total = 1;
       const char *s;
       char *result;
     
       va_start (ap, str);
       /* Actually va_copy, but this is the name more gcc versions
          understand.  */
       __va_copy (ap2, ap);
     
       /* Determine how much space we need.  */
       for (s = str; s != NULL; s = va_arg (ap, const char *))
         total += strlen (s);
     
       va_end (ap);
     
       result = (char *) malloc (total);
       if (result != NULL)
         {
           result[0] = '\0';
     
           /* Copy the strings.  */
           for (s = str; s != NULL; s = va_arg (ap2, const char *))
             strcat (result, s);
         }
     
       va_end (ap2);
     
       return result;
     }

This looks quite simple, especially the second loop where the stringsare actually copied. But these innocent lines hide a major performancepenalty. Just imagine that ten strings of 100 bytes each have to beconcatenated. For the second string we search the already stored 100bytes for the end of the string so that we can append the next string. For all strings in total the comparisons necessary to find the end ofthe intermediate results sums up to 5500! If we combine the copyingwith the search for the allocation we can write this function moreefficient:

     char *
     concat (const char *str, ...)
     {
       va_list ap;
       size_t allocated = 100;
       char *result = (char *) malloc (allocated);
     
       if (result != NULL)
         {
           char *newp;
           char *wp;
           const char *s;
     
           va_start (ap, str);
     
           wp = result;
           for (s = str; s != NULL; s = va_arg (ap, const char *))
             {
               size_t len = strlen (s);
     
               /* Resize the allocated memory if necessary.  */
               if (wp + len + 1 > result + allocated)
                 {
                   allocated = (allocated + len) * 2;
                   newp = (char *) realloc (result, allocated);
                   if (newp == NULL)
                     {
                       free (result);
                       return NULL;
                     }
                   wp = newp + (wp - result);
                   result = newp;
                 }
     
               wp = mempcpy (wp, s, len);
             }
     
           /* Terminate the result string.  */
           *wp++ = '\0';
     
           /* Resize memory to the optimal size.  */
           newp = realloc (result, wp - result);
           if (newp != NULL)
             result = newp;
     
           va_end (ap);
         }
     
       return result;
     }

With a bit more knowledge about the input strings one could fine-tunethe memory allocation. The difference we are pointing to here is thatwe don't usestrcat anymore. We always keep track of the lengthof the current intermediate result so we can safe us the search for theend of the string and usemempcpy. Please note that we alsodon't usestpcpy which might seem more natural since we handlewith strings. But this is not necessary since we already know thelength of the string and therefore can use the faster memory copyingfunction. The example would work for wide characters the same way.

Whenever a programmer feels the need to use strcat she or heshould think twice and look through the program whether the code cannotbe rewritten to take advantage of already calculated results. Again: itis almost always unnecessary to usestrcat.

— Function: char * strncat (char *restrict to, const char *restrict from, size_t size)

This function is like strcat except that not more than sizecharacters fromfrom are appended to the end ofto. Asingle null character is also always appended toto, so the totalallocated size ofto must be at least size+ 1 byteslonger than its initial length.

The strncat function could be implemented like this:

          char *
          strncat (char *to, const char *from, size_t size)
          {
            to[strlen (to) + size] = '\0';
            strncpy (to + strlen (to), from, size);
            return to;
          }

The behavior of strncat is undefined if the strings overlap.

— Function: wchar_t * wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size)

This function is like wcscat except that not more than sizecharacters fromfrom are appended to the end ofto. Asingle null character is also always appended toto, so the totalallocated size ofto must be at least size+ 1 byteslonger than its initial length.

The wcsncat function could be implemented like this:

          wchar_t *
          wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
                   size_t size)
          {
            wto[wcslen (to) + size] = L'\0';
            wcsncpy (wto + wcslen (wto), wfrom, size);
            return wto;
          }

The behavior of wcsncat is undefined if the strings overlap.

Here is an example showing the use of strncpy and strncat(the wide character version is equivalent). Notice how, in the call tostrncat, thesize parameter is computed to avoidoverflowing the character arraybuffer.

     #include <string.h>
     #include <stdio.h>
     
     #define SIZE 10
     
     static char buffer[SIZE];
     
     int
     main (void)
     {
       strncpy (buffer, "hello", SIZE);
       puts (buffer);
       strncat (buffer, ", world", SIZE - strlen (buffer) - 1);
       puts (buffer);
     }

The output produced by this program looks like:

     hello
     hello, wo

— Function: void bcopy (const void *from, void *to, size_t size)

This is a partially obsolete alternative for memmove, derived fromBSD. Note that it is not quite equivalent tomemmove, because thearguments are not in the same order and there is no return value.

— Function: void bzero (void *block, size_t size)

This is a partially obsolete alternative for memset, derived fromBSD. Note that it is not as general asmemset, because the onlyvalue it can store is zero.


Next: Collation Functions,Previous: Copying and Concatenation,Up: String and Array Utilities

5.5 String/Array Comparison

You can use the functions in this section to perform comparisons on thecontents of strings and arrays. As well as checking for equality, thesefunctions can also be used as the ordering functions for sortingoperations. SeeSearching and Sorting, for an example of this.

Unlike most comparison operations in C, the string comparison functionsreturn a nonzero value if the strings arenot equivalent ratherthan if they are. The sign of the value indicates the relative orderingof the first characters in the strings that are not equivalent: anegative value indicates that the first string is “less” than thesecond, while a positive value indicates that the first string is“greater”.

The most common use of these functions is to check only for equality. This is canonically done with an expression like ‘! strcmp (s1, s2)’.

All of these functions are declared in the header file string.h.

— Function: int memcmp (const void *a1, const void *a2, size_t size)

The function memcmp compares the size bytes of memorybeginning ata1 against thesize bytes of memory beginningat a2. The value returned has the same sign as the differencebetween the first differing pair of bytes (interpreted asunsignedchar objects, then promoted toint).

If the contents of the two blocks are equal, memcmp returns0.

— Function: int wmemcmp (const wchar_t *a1, const wchar_t *a2, size_t size)

The function wmemcmp compares the size wide charactersbeginning ata1 against thesize wide characters beginningat a2. The value returned is smaller than or larger than zerodepending on whether the first differing wide character isa1 issmaller or larger than the corresponding character ina2.

If the contents of the two blocks are equal, wmemcmp returns0.

On arbitrary arrays, the memcmp function is mostly useful fortesting equality. It usually isn't meaningful to do byte-wise orderingcomparisons on arrays of things other than bytes. For example, abyte-wise comparison on the bytes that make up floating-point numbersisn't likely to tell you anything about the relationship between thevalues of the floating-point numbers.

wmemcmp is really only useful to compare arrays of typewchar_t since the function looks atsizeof (wchar_t) bytesat a time and this number of bytes is system dependent.

You should also be careful about using memcmp to compare objectsthat can contain “holes”, such as the padding inserted into structureobjects to enforce alignment requirements, extra space at the end ofunions, and extra characters at the ends of strings whose length is lessthan their allocated size. The contents of these “holes” areindeterminate and may cause strange behavior when performing byte-wisecomparisons. For more predictable results, perform an explicitcomponent-wise comparison.

For example, given a structure type definition like:

     struct foo
       {
         unsigned char tag;
         union
           {
             double f;
             long i;
             char *p;
           } value;
       };

you are better off writing a specialized comparison function to comparestruct foo objects instead of comparing them withmemcmp.

— Function: int strcmp (const char *s1, const char *s2)

The strcmp function compares the string s1 againsts2, returning a value that has the same sign as the differencebetween the first differing pair of characters (interpreted asunsigned char objects, then promoted to int).

If the two strings are equal, strcmp returns 0.

A consequence of the ordering used by strcmp is that if s1is an initial substring ofs2, thens1 is considered to be“less than” s2.

strcmp does not take sorting conventions of the language thestrings are written in into account. To get that one has to usestrcoll.

— Function: int wcscmp (const wchar_t *ws1, const wchar_t *ws2)

The wcscmp function compares the wide character string ws1againstws2. The value returned is smaller than or larger than zerodepending on whether the first differing wide character isws1 issmaller or larger than the corresponding character in ws2.

If the two strings are equal, wcscmp returns 0.

A consequence of the ordering used by wcscmp is that if ws1is an initial substring ofws2, thenws1 is considered to be“less than” ws2.

wcscmp does not take sorting conventions of the language thestrings are written in into account. To get that one has to usewcscoll.

— Function: int strcasecmp (const char *s1, const char *s2)

This function is like strcmp, except that differences in case areignored. How uppercase and lowercase characters are related isdetermined by the currently selected locale. In the standard"C"locale the characters Ä and ä do not match but in a locale whichregards these characters as parts of the alphabet they do match.

strcasecmp is derived from BSD.

— Function: int wcscasecmp (const wchar_t *ws1, const wchar_T *ws2)

This function is like wcscmp, except that differences in case areignored. How uppercase and lowercase characters are related isdetermined by the currently selected locale. In the standard"C"locale the characters Ä and ä do not match but in a locale whichregards these characters as parts of the alphabet they do match.

wcscasecmp is a GNU extension.

— Function: int strncmp (const char *s1, const char *s2, size_t size)

This function is the similar to strcmp, except that no more thansize characters are compared. In other words, if the twostrings are the same in their firstsize characters, thereturn value is zero.

— Function: int wcsncmp (const wchar_t *ws1, const wchar_t *ws2, size_t size)

This function is the similar to wcscmp, except that no more thansize wide characters are compared. In other words, if the twostrings are the same in their firstsize wide characters, thereturn value is zero.

— Function: int strncasecmp (const char *s1, const char *s2, size_t n)

This function is like strncmp, except that differences in caseare ignored. Likestrcasecmp, it is locale dependent howuppercase and lowercase characters are related.

strncasecmp is a GNU extension.

— Function: int wcsncasecmp (const wchar_t *ws1, const wchar_t *s2, size_t n)

This function is like wcsncmp, except that differences in caseare ignored. Likewcscasecmp, it is locale dependent howuppercase and lowercase characters are related.

wcsncasecmp is a GNU extension.

Here are some examples showing the use of strcmp andstrncmp (equivalent examples can be constructed for the widecharacter functions). These examples assume the use of the ASCIIcharacter set. (If some other character set—say, EBCDIC—is usedinstead, then the glyphs are associated with different numeric codes,and the return values and ordering may differ.)

     strcmp ("hello", "hello")
         ⇒ 0    /* These two strings are the same. */
     strcmp ("hello", "Hello")
         ⇒ 32   /* Comparisons are case-sensitive. */
     strcmp ("hello", "world")
         ⇒ -15  /* The character 'h' comes before 'w'. */
     strcmp ("hello", "hello, world")
         ⇒ -44  /* Comparing a null character against a comma. */
     strncmp ("hello", "hello, world", 5)
         ⇒ 0    /* The initial 5 characters are the same. */
     strncmp ("hello, world", "hello, stupid world!!!", 5)
         ⇒ 0    /* The initial 5 characters are the same. */

— Function: int strverscmp (const char *s1, const char *s2)

The strverscmp function compares the string s1 againsts2, considering them as holding indices/version numbers. Thereturn value follows the same conventions as found in thestrcmp function. In fact, ifs1 and s2 contain nodigits, strverscmp behaves likestrcmp.

Basically, we compare strings normally (character by character), untilwe find a digit in each string - then we enter a special comparisonmode, where each sequence of digits is taken as a whole. If we reach theend of these two parts without noticing a difference, we return to thestandard comparison mode. There are two types of numeric parts:"integral" and "fractional" (those begin with a '0'). The typesof the numeric parts affect the way we sort them:

  • integral/integral: we compare values as you would expect.
  • fractional/integral: the fractional part is less than the integral one. Again, no surprise.
  • fractional/fractional: the things become a bit more complex. If the common prefix contains only leading zeroes, the longest part is lessthan the other one; else the comparison behaves normally.
          strverscmp ("no digit", "no digit")
              ⇒ 0    /* same behavior as strcmp. */
          strverscmp ("item#99", "item#100")
              ⇒ <0   /* same prefix, but 99 < 100. */
          strverscmp ("alpha1", "alpha001")
              ⇒ >0   /* fractional part inferior to integral one. */
          strverscmp ("part1_f012", "part1_f01")
              ⇒ >0   /* two fractional parts. */
          strverscmp ("foo.009", "foo.0")
              ⇒ <0   /* idem, but with leading zeroes only. */

This function is especially useful when dealing with filename sorting,because filenames frequently hold indices/version numbers.

strverscmp is a GNU extension.

— Function: int bcmp (const void *a1, const void *a2, size_t size)

This is an obsolete alias for memcmp, derived from BSD.


Next: Search Functions,Previous: String/Array Comparison,Up: String and Array Utilities

5.6 Collation Functions

In some locales, the conventions for lexicographic ordering differ fromthe strict numeric ordering of character codes. For example, in Spanishmost glyphs with diacritical marks such as accents are not considereddistinct letters for the purposes of collation. On the other hand, thetwo-character sequence ‘ll’ is treated as a single letter that iscollated immediately after ‘l’.

You can use the functions strcoll and strxfrm (declared inthe headers filestring.h) andwcscoll and wcsxfrm(declared in the headers file wchar) to compare strings using acollation ordering appropriate for the current locale. The locale usedby these functions in particular can be specified by setting the localefor theLC_COLLATE category; see Locales. In the standard C locale, the collation sequence forstrcoll isthe same as that forstrcmp. Similarly, wcscoll andwcscmp are the same in this situation.

Effectively, the way these functions work is by applying a mapping totransform the characters in a string to a byte sequence that representsthe string's position in the collating sequence of the current locale. Comparing two such byte sequences in a simple fashion is equivalent tocomparing the strings with the locale's collating sequence.

The functions strcoll and wcscoll perform this translationimplicitly, in order to do one comparison. By contrast,strxfrmandwcsxfrm perform the mapping explicitly. If you are makingmultiple comparisons using the same string or set of strings, it islikely to be more efficient to usestrxfrm orwcsxfrm totransform all the strings just once, and subsequently compare thetransformed strings withstrcmp orwcscmp.

— Function: int strcoll (const char *s1, const char *s2)

The strcoll function is similar to strcmp but uses thecollating sequence of the current locale for collation (theLC_COLLATE locale).

— Function: int wcscoll (const wchar_t *ws1, const wchar_t *ws2)

The wcscoll function is similar to wcscmp but uses thecollating sequence of the current locale for collation (theLC_COLLATE locale).

Here is an example of sorting an array of strings, using strcollto compare them. The actual sort algorithm is not written here; itcomes fromqsort (seeArray Sort Function). The job of thecode shown here is to say how to compare the strings while sorting them. (Later on in this section, we will show a way to do this moreefficiently usingstrxfrm.)

     /* This is the comparison function used with qsort. */
     
     int
     compare_elements (const void *v1, const void *v2)
     {
       char * const *p1 = v1;
       char * const *p1 = v2;
     
       return strcoll (*p1, *p2);
     }
     
     /* This is the entry point---the function to sort
        strings using the locale's collating sequence. */
     
     void
     sort_strings (char **array, int nstrings)
     {
       /* Sort temp_array by comparing the strings. */
       qsort (array, nstrings,
              sizeof (char *), compare_elements);
     }

— Function: size_t strxfrm (char *restrict to, const char *restrict from, size_t size)

The function strxfrm transforms the string from using thecollation transformation determined by the locale currently selected forcollation, and stores the transformed string in the arrayto. Uptosize characters (including a terminating null character) arestored.

The behavior is undefined if the strings to and fromoverlap; seeCopying and Concatenation.

The return value is the length of the entire transformed string. Thisvalue is not affected by the value ofsize, but if it is greateror equal thansize, it means that the transformed string did notentirely fit in the arrayto. In this case, only as much of thestring as actually fits was stored. To get the whole transformedstring, callstrxfrm again with a bigger output array.

The transformed string may be longer than the original string, and itmay also be shorter.

If size is zero, no characters are stored in to. In thiscase,strxfrm simply returns the number of characters that wouldbe the length of the transformed string. This is useful for determiningwhat size the allocated array should be. It does not matter whatto is ifsize is zero; to may even be a null pointer.

— Function: size_t wcsxfrm (wchar_t *restrict wto, const wchar_t *wfrom, size_t size)

The function wcsxfrm transforms wide character string wfromusing the collation transformation determined by the locale currentlyselected for collation, and stores the transformed string in the arraywto. Up tosize wide characters (including a terminating nullcharacter) are stored.

The behavior is undefined if the strings wto and wfromoverlap; seeCopying and Concatenation.

The return value is the length of the entire transformed wide characterstring. This value is not affected by the value ofsize, but ifit is greater or equal thansize, it means that the transformedwide character string did not entirely fit in the arraywto. Inthis case, only as much of the wide character string as actually fitswas stored. To get the whole transformed wide character string, callwcsxfrm again with a bigger output array.

The transformed wide character string may be longer than the originalwide character string, and it may also be shorter.

If size is zero, no characters are stored in to. In thiscase,wcsxfrm simply returns the number of wide characters thatwould be the length of the transformed wide character string. This isuseful for determining what size the allocated array should be (rememberto multiply withsizeof (wchar_t)). It does not matter whatwto is ifsize is zero; wto may even be a null pointer.

Here is an example of how you can use strxfrm whenyou plan to do many comparisons. It does the same thing as the previousexample, but much faster, because it has to transform each string onlyonce, no matter how many times it is compared with other strings. Eventhe time needed to allocate and free storage is much less than the timewe save, when there are many strings.

     struct sorter { char *input; char *transformed; };
     
     /* This is the comparison function used with qsort
        to sort an array of struct sorter. */
     
     int
     compare_elements (const void *v1, const void *v2)
     {
       const struct sorter *p1 = v1;
       const struct sorter *p2 = v2;
     
       return strcmp (p1->transformed, p2->transformed);
     }
     
     /* This is the entry point---the function to sort
        strings using the locale's collating sequence. */
     
     void
     sort_strings_fast (char **array, int nstrings)
     {
       struct sorter temp_array[nstrings];
       int i;
     
       /* Set up temp_array.  Each element contains
          one input string and its transformed string. */
       for (i = 0; i < nstrings; i++)
         {
           size_t length = strlen (array[i]) * 2;
           char *transformed;
           size_t transformed_length;
     
           temp_array[i].input = array[i];
     
           /* First try a buffer perhaps big enough.  */
           transformed = (char *) xmalloc (length);
     
           /* Transform array[i].  */
           transformed_length = strxfrm (transformed, array[i], length);
     
           /* If the buffer was not large enough, resize it
              and try again.  */
           if (transformed_length >= length)
             {
               /* Allocate the needed space. +1 for terminating
                  NUL character.  */
               transformed = (char *) xrealloc (transformed,
                                                transformed_length + 1);
     
               /* The return value is not interesting because we know
                  how long the transformed string is.  */
               (void) strxfrm (transformed, array[i],
                               transformed_length + 1);
             }
     
           temp_array[i].transformed = transformed;
         }
     
       /* Sort temp_array by comparing transformed strings. */
       qsort (temp_array, sizeof (struct sorter),
              nstrings, compare_elements);
     
       /* Put the elements back in the permanent array
          in their sorted order. */
       for (i = 0; i < nstrings; i++)
         array[i] = temp_array[i].input;
     
       /* Free the strings we allocated. */
       for (i = 0; i < nstrings; i++)
         free (temp_array[i].transformed);
     }

The interesting part of this code for the wide character version wouldlook like this:

     void
     sort_strings_fast (wchar_t **array, int nstrings)
     {
       ...
           /* Transform array[i].  */
           transformed_length = wcsxfrm (transformed, array[i], length);
     
           /* If the buffer was not large enough, resize it
              and try again.  */
           if (transformed_length >= length)
             {
               /* Allocate the needed space. +1 for terminating
                  NUL character.  */
               transformed = (wchar_t *) xrealloc (transformed,
                                                   (transformed_length + 1)
                                                   * sizeof (wchar_t));
     
               /* The return value is not interesting because we know
                  how long the transformed string is.  */
               (void) wcsxfrm (transformed, array[i],
                               transformed_length + 1);
             }
       ...

Note the additional multiplication with sizeof (wchar_t) in therealloc call.

Compatibility Note: The string collation functions are a newfeature of ISO C90. Older C dialects have no equivalent feature. The wide character versions were introduced in Amendment 1 to ISO C90.


Next: Finding Tokens in a String,Previous: Collation Functions,Up: String and Array Utilities

5.7 Search Functions

This section describes library functions which perform various kindsof searching operations on strings and arrays. These functions aredeclared in the header filestring.h.

— Function: void * memchr (const void *block, int c, size_t size)

This function finds the first occurrence of the byte c (convertedto anunsigned char) in the initialsize bytes of theobject beginning atblock. The return value is a pointer to thelocated byte, or a null pointer if no match was found.

— Function: wchar_t * wmemchr (const wchar_t *block, wchar_t wc, size_t size)

This function finds the first occurrence of the wide character wcin the initialsize wide characters of the object beginning atblock. The return value is a pointer to the located widecharacter, or a null pointer if no match was found.

— Function: void * rawmemchr (const void *block, int c)

Often the memchr function is used with the knowledge that thebytec is available in the memory block specified by theparameters. But this means that thesize parameter is not reallyneeded and that the tests performed with it at runtime (to check whetherthe end of the block is reached) are not needed.

The rawmemchr function exists for just this situation which issurprisingly frequent. The interface is similar tomemchr exceptthat thesize parameter is missing. The function will look beyondthe end of the block pointed to byblock in case the programmermade an error in assuming that the bytec is present in the block. In this case the result is unspecified. Otherwise the return value is apointer to the located byte.

This function is of special interest when looking for the end of astring. Since all strings are terminated by a null byte a call like

             rawmemchr (str, '\0')

will never go beyond the end of the string.

This function is a GNU extension.

— Function: void * memrchr (const void *block, int c, size_t size)

The function memrchr is like memchr, except that it searchesbackwards from the end of the block defined byblock andsize(instead of forwards from the front).

This function is a GNU extension.

— Function: char * strchr (const char *string, int c)

The strchr function finds the first occurrence of the characterc (converted to achar) in the null-terminated stringbeginning atstring. The return value is a pointer to the locatedcharacter, or a null pointer if no match was found.

For example,

          strchr ("hello, world", 'l')
              ⇒ "llo, world"
          strchr ("hello, world", '?')
              ⇒ NULL

The terminating null character is considered to be part of the string,so you can use this function get a pointer to the end of a string byspecifying a null character as the value of thec argument.

When strchr returns a null pointer, it does not let you knowthe position of the terminating null character it has found. If youneed that information, it is better (but less portable) to usestrchrnul than to search for it a second time.

— Function: wchar_t * wcschr (const wchar_t *wstring, int wc)

The wcschr function finds the first occurrence of the widecharacterwc in the null-terminated wide character stringbeginning atwstring. The return value is a pointer to thelocated wide character, or a null pointer if no match was found.

The terminating null character is considered to be part of the widecharacter string, so you can use this function get a pointer to the endof a wide character string by specifying a null wude character as thevalue of thewc argument. It would be better (but less portable)to use wcschrnul in this case, though.

— Function: char * strchrnul (const char *string, int c)

strchrnul is the same as strchr except that if it doesnot find the character, it returns a pointer to string's terminatingnull character rather than a null pointer.

This function is a GNU extension.

— Function: wchar_t * wcschrnul (const wchar_t *wstring, wchar_t wc)

wcschrnul is the same as wcschr except that if it does notfind the wide character, it returns a pointer to wide character string'sterminating null wide character rather than a null pointer.

This function is a GNU extension.

One useful, but unusual, use of the strchrfunction is when one wants to have a pointer pointing to the NUL byteterminating a string. This is often written in this way:

       s += strlen (s);

This is almost optimal but the addition operation duplicated a bit ofthe work already done in thestrlen function. A better solutionis this:

       s = strchr (s, '\0');

There is no restriction on the second parameter of strchr so itcould very well also be the NUL character. Those readers thinking veryhard about this might now point out that thestrchr function ismore expensive than thestrlen function since we have two abortcriteria. This is right. But in the GNU C Library the implementation ofstrchr is optimized in a special way so thatstrchractually is faster.

— Function: char * strrchr (const char *string, int c)

The function strrchr is like strchr, except that it searchesbackwards from the end of the stringstring (instead of forwardsfrom the front).

For example,

          strrchr ("hello, world", 'l')
              ⇒ "ld"

— Function: wchar_t * wcsrchr (const wchar_t *wstring, wchar_t c)

The function wcsrchr is like wcschr, except that it searchesbackwards from the end of the stringwstring (instead of forwardsfrom the front).

— Function: char * strstr (const char *haystack, const char *needle)

This is like strchr, except that it searches haystack for asubstringneedle rather than just a single character. Itreturns a pointer into the stringhaystack that is the firstcharacter of the substring, or a null pointer if no match was found. Ifneedle is an empty string, the function returnshaystack.

For example,

          strstr ("hello, world", "l")
              ⇒ "llo, world"
          strstr ("hello, world", "wo")
              ⇒ "world"

— Function: wchar_t * wcsstr (const wchar_t *haystack, const wchar_t *needle)

This is like wcschr, except that it searches haystack for asubstringneedle rather than just a single wide character. Itreturns a pointer into the stringhaystack that is the first widecharacter of the substring, or a null pointer if no match was found. Ifneedle is an empty string, the function returnshaystack.

— Function: wchar_t * wcswcs (const wchar_t *haystack, const wchar_t *needle)

wcswcs is an deprecated alias for wcsstr. This is thename originally used in the X/Open Portability Guide before theAmendment 1 to ISO C90 was published.

— Function: char * strcasestr (const char *haystack, const char *needle)

This is like strstr, except that it ignores case in searching forthe substring. Likestrcasecmp, it is locale dependent howuppercase and lowercase characters are related.

For example,

          strcasestr ("hello, world", "L")
              ⇒ "llo, world"
          strcasestr ("hello, World", "wo")
              ⇒ "World"

— Function: void * memmem (const void *haystack, size_t haystack-len,

const void *needle, size_t needle-len)

This is like strstr, but needle and haystack are bytearrays rather than null-terminated strings.needle-len is thelength ofneedle and haystack-len is the length ofhaystack.

This function is a GNU extension.

— Function: size_t strspn (const char *string, const char *skipset)

The strspn (“string span”) function returns the length of theinitial substring ofstring that consists entirely of characters thatare members of the set specified by the stringskipset. The orderof the characters inskipset is not important.

For example,

          strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
              ⇒ 5

Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.

— Function: size_t wcsspn (const wchar_t *wstring, const wchar_t *skipset)

The wcsspn (“wide character string span”) function returns thelength of the initial substring ofwstring that consists entirelyof wide characters that are members of the set specified by the stringskipset. The order of the wide characters inskipset is notimportant.

— Function: size_t strcspn (const char *string, const char *stopset)

The strcspn (“string complement span”) function returns the lengthof the initial substring ofstring that consists entirely of charactersthat arenot members of the set specified by the stringstopset. (In other words, it returns the offset of the first character instringthat is a member of the setstopset.)

For example,

          strcspn ("hello, world", " \t\n,.;!?")
              ⇒ 5

Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.

— Function: size_t wcscspn (const wchar_t *wstring, const wchar_t *stopset)

The wcscspn (“wide character string complement span”) functionreturns the length of the initial substring ofwstring thatconsists entirely of wide characters that arenot members of theset specified by the stringstopset. (In other words, it returnsthe offset of the first character instring that is a member ofthe setstopset.)

— Function: char * strpbrk (const char *string, const char *stopset)

The strpbrk (“string pointer break”) function is related tostrcspn, except that it returns a pointer to the first characterinstring that is a member of the setstopset instead of thelength of the initial substring. It returns a null pointer if no suchcharacter fromstopset is found.

For example,

          strpbrk ("hello, world", " \t\n,.;!?")
              ⇒ ", world"

Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.

— Function: wchar_t * wcspbrk (const wchar_t *wstring, const wchar_t *stopset)

The wcspbrk (“wide character string pointer break”) function isrelated towcscspn, except that it returns a pointer to the firstwide character inwstring that is a member of the setstopset instead of the length of the initial substring. Itreturns a null pointer if no such character fromstopset is found.

5.7.1 Compatibility String Search Functions

— Function: char * index (const char *string, int c)

index is another name for strchr; they are exactly the same. New code should always usestrchr since this name is defined inISO C whileindex is a BSD invention which never was availableon System V derived systems.

— Function: char * rindex (const char *string, int c)

rindex is another name for strrchr; they are exactly the same. New code should always usestrrchr since this name is defined inISO C whilerindex is a BSD invention which never was availableon System V derived systems.


Next: strfry,Previous: Search Functions,Up: String and Array Utilities

5.8 Finding Tokens in a String

It's fairly common for programs to have a need to do some simple kindsof lexical analysis and parsing, such as splitting a command string upinto tokens. You can do this with the strtok function, declaredin the header filestring.h.

— Function: char * strtok (char *restrict newstring, const char *restrict delimiters)

A string can be split into tokens by making a series of calls to thefunction strtok.

The string to be split up is passed as the newstring argument onthe first call only. Thestrtok function uses this to set upsome internal state information. Subsequent calls to get additionaltokens from the same string are indicated by passing a null pointer asthenewstring argument. Calling strtok with anothernon-nullnewstring argument reinitializes the state information. It is guaranteed that no other library function ever callsstrtokbehind your back (which would mess up this internal state information).

The delimiters argument is a string that specifies a set of delimitersthat may surround the token being extracted. All the initial charactersthat are members of this set are discarded. The first character that isnot a member of this set of delimiters marks the beginning of thenext token. The end of the token is found by looking for the nextcharacter that is a member of the delimiter set. This character in theoriginal stringnewstring is overwritten by a null character, and thepointer to the beginning of the token innewstring is returned.

On the next call to strtok, the searching begins at the nextcharacter beyond the one that marked the end of the previous token. Note that the set of delimitersdelimiters do not have to be thesame on every call in a series of calls tostrtok.

If the end of the string newstring is reached, or if the remainder ofstring consists only of delimiter characters,strtok returnsa null pointer.

Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.

— Function: wchar_t * wcstok (wchar_t *newstring, const char *delimiters)

A string can be split into tokens by making a series of calls to thefunction wcstok.

The string to be split up is passed as the newstring argument onthe first call only. Thewcstok function uses this to set upsome internal state information. Subsequent calls to get additionaltokens from the same wide character string are indicated by passing anull pointer as thenewstring argument. Callingwcstokwith another non-null newstring argument reinitializes the stateinformation. It is guaranteed that no other library function ever callswcstok behind your back (which would mess up this internal stateinformation).

The delimiters argument is a wide character string that specifiesa set of delimiters that may surround the token being extracted. Allthe initial wide characters that are members of this set are discarded. The first wide character that isnot a member of this set ofdelimiters marks the beginning of the next token. The end of the tokenis found by looking for the next wide character that is a member of thedelimiter set. This wide character in the original wide characterstringnewstring is overwritten by a null wide character, and thepointer to the beginning of the token innewstring is returned.

On the next call to wcstok, the searching begins at the nextwide character beyond the one that marked the end of the previous token. Note that the set of delimitersdelimiters do not have to be thesame on every call in a series of calls towcstok.

If the end of the wide character string newstring is reached, orif the remainder of string consists only of delimiter wide characters,wcstok returns a null pointer.

Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.

Warning: Since strtok and wcstok alter the stringthey is parsing, you should always copy the string to a temporary bufferbefore parsing it withstrtok/wcstok (seeCopying and Concatenation). If you allow strtok or wcstok to modifya string that came from another part of your program, you are asking fortrouble; that string might be used for other purposes afterstrtok orwcstok has modified it, and it would not havethe expected value.

The string that you are operating on might even be a constant. Thenwhen strtok or wcstok tries to modify it, your programwill get a fatal signal for writing in read-only memory. SeeProgram Error Signals. Even if the operation of strtok or wcstokwould not require a modification of the string (e.g., if there isexactly one token) the string can (and in the GNU C Library case will) bemodified.

This is a special case of a general principle: if a part of a programdoes not have as its purpose the modification of a certain datastructure, then it is error-prone to modify the data structuretemporarily.

The functions strtok and wcstok are not reentrant. SeeNonreentrancy, for a discussion of where and why reentrancy isimportant.

Here is a simple example showing the use of strtok.

     #include <string.h>
     #include <stddef.h>
     
     ...
     
     const char string[] = "words separated by spaces -- and, punctuation!";
     const char delimiters[] = " .,;:!-";
     char *token, *cp;
     
     ...
     
     cp = strdupa (string);                /* Make writable copy.  */
     token = strtok (cp, delimiters);      /* token => "words" */
     token = strtok (NULL, delimiters);    /* token => "separated" */
     token = strtok (NULL, delimiters);    /* token => "by" */
     token = strtok (NULL, delimiters);    /* token => "spaces" */
     token = strtok (NULL, delimiters);    /* token => "and" */
     token = strtok (NULL, delimiters);    /* token => "punctuation" */
     token = strtok (NULL, delimiters);    /* token => NULL */

The GNU C Library contains two more functions for tokenizing a stringwhich overcome the limitation of non-reentrancy. They are onlyavailable for multibyte character strings.

— Function: char * strtok_r (char *newstring, const char *delimiters, char **save_ptr)

Just like strtok, this function splits the string into severaltokens which can be accessed by successive calls tostrtok_r. The difference is that the information about the next token is stored inthe space pointed to by the third argument,save_ptr, which is apointer to a string pointer. Calling strtok_r with a nullpointer fornewstring and leaving save_ptr between the callsunchanged does the job without hindering reentrancy.

This function is defined in POSIX.1 and can be found on many systemswhich support multi-threading.

— Function: char * strsep (char **string_ptr, const char *delimiter)

This function has a similar functionality as strtok_r with thenewstring argument replaced by thesave_ptr argument. Theinitialization of the moving pointer has to be done by the user. Successive calls tostrsep move the pointer along the tokensseparated by delimiter, returning the address of the next tokenand updatingstring_ptr to point to the beginning of the nexttoken.

One difference between strsep and strtok_r is that if theinput string contains more than one character fromdelimiter in arowstrsep returns an empty string for each pair of charactersfromdelimiter. This means that a program normally should testforstrsep returning an empty string before processing it.

This function was introduced in 4.3BSD and therefore is widely available.

Here is how the above example looks like when strsep is used.

     #include <string.h>
     #include <stddef.h>
     
     ...
     
     const char string[] = "words separated by spaces -- and, punctuation!";
     const char delimiters[] = " .,;:!-";
     char *running;
     char *token;
     
     ...
     
     running = strdupa (string);
     token = strsep (&running, delimiters);    /* token => "words" */
     token = strsep (&running, delimiters);    /* token => "separated" */
     token = strsep (&running, delimiters);    /* token => "by" */
     token = strsep (&running, delimiters);    /* token => "spaces" */
     token = strsep (&running, delimiters);    /* token => "" */
     token = strsep (&running, delimiters);    /* token => "" */
     token = strsep (&running, delimiters);    /* token => "" */
     token = strsep (&running, delimiters);    /* token => "and" */
     token = strsep (&running, delimiters);    /* token => "" */
     token = strsep (&running, delimiters);    /* token => "punctuation" */
     token = strsep (&running, delimiters);    /* token => "" */
     token = strsep (&running, delimiters);    /* token => NULL */

— Function: char * basename (const char *filename)

The GNU version of the basename function returns the lastcomponent of the path infilename. This function is the preferredusage, since it does not modify the argument,filename, andrespects trailing slashes. The prototype for basename can befound instring.h. Note, this function is overriden by the XPGversion, iflibgen.h is included.

Example of using GNU basename:

          #include <string.h>
          
          int
          main (int argc, char *argv[])
          {
            char *prog = basename (argv[0]);
          
            if (argc < 2)
              {
                fprintf (stderr, "Usage %s <arg>\n", prog);
                exit (1);
              }
          
            ...
          }

Portability Note: This function may produce different resultson different systems.

— Function: char * basename (char *path)

This is the standard XPG defined basename. It is similar inspirit to the GNU version, but may modify thepath by removingtrailing '/' characters. If thepath is made up entirely of '/'characters, then "/" will be returned. Also, ifpath isNULL or an empty string, then "." is returned. The prototype forthe XPG version can be found inlibgen.h.

Example of using XPG basename:

          #include <libgen.h>
          
          int
          main (int argc, char *argv[])
          {
            char *prog;
            char *path = strdupa (argv[0]);
          
            prog = basename (path);
          
            if (argc < 2)
              {
                fprintf (stderr, "Usage %s <arg>\n", prog);
                exit (1);
              }
          
            ...
          
          }

— Function: char * dirname (char *path)

The dirname function is the compliment to the XPG version ofbasename. It returns the parent directory of the file specifiedbypath. Ifpath is NULL, an empty string, orcontains no '/' characters, then "." is returned. The prototype for thisfunction can be found inlibgen.h.


Next: Trivial Encryption,Previous: Finding Tokens in a String,Up: String and Array Utilities

5.9 strfry

The function below addresses the perennial programming quandary: “How doI take good data in string form and painlessly turn it into garbage?”This is actually a fairly simple task for C programmers who do not usethe GNU C Library string functions, but for programs based on the GNU C Library,the strfry function is the preferred method fordestroying string data.

The prototype for this function is in string.h.

— Function: char * strfry (char *string)

strfry creates a pseudorandom anagram of a string, replacing theinput with the anagram in place. For each position in the string,strfry swaps it with a position in the string selected at random(from a uniform distribution). The two positions may be the same.

The return value of strfry is always string.

Portability Note: This function is unique to the GNU C Library.


Next: Encode Binary Data,Previous: strfry,Up: String and Array Utilities

5.10 Trivial Encryption

The memfrob function converts an array of data to somethingunrecognizable and back again. It is not encryption in its usual sensesince it is easy for someone to convert the encrypted data back to cleartext. The transformation is analogous to Usenet's “Rot13” encryptionmethod for obscuring offensive jokes from sensitive eyes and such. Unlike Rot13,memfrob works on arbitrary binary data, not justtext.For true encryption, See Cryptographic Functions.

This function is declared in string.h.

— Function: void * memfrob (void *mem, size_t length)

memfrob transforms (frobnicates) each byte of the data structureatmem, which islength bytes long, by bitwise exclusiveoring it with binary 00101010. It does the transformation in place andits return value is alwaysmem.

Note that memfrob a second time on the same data structurereturns it to its original state.

This is a good function for hiding information from someone who doesn'twant to see it or doesn't want to see it very much. To really preventpeople from retrieving the information, use stronger encryption such asthat described in SeeCryptographic Functions.

Portability Note: This function is unique to the GNU C Library.


Next: Argz and Envz Vectors,Previous: Trivial Encryption,Up: String and Array Utilities

5.11 Encode Binary Data

To store or transfer binary data in environments which only support textone has to encode the binary data by mapping the input bytes tocharacters in the range allowed for storing or transfering. SVIDsystems (and nowadays XPG compliant systems) provide minimal support forthis task.

— Function: char * l64a (long int n)

This function encodes a 32-bit input value using characters from thebasic character set. It returns a pointer to a 7 character buffer whichcontains an encoded version ofn. To encode a series of bytes theuser must copy the returned string to a destination buffer. It returnsthe empty string ifn is zero, which is somewhat bizarre butmandated by the standard.

Warning: Since a static buffer is used this function should notbe used in multi-threaded programs. There is no thread-safe alternativeto this function in the C library.

Compatibility Note: The XPG standard states that the returnvalue ofl64a is undefined ifn is negative. In the GNUimplementation,l64a treats its argument as unsigned, so it willreturn a sensible encoding for any nonzeron; however, portableprograms should not rely on this.

To encode a large buffer l64a must be called in a loop, once foreach 32-bit word of the buffer. For example, one could do somethinglike this:

          char *
          encode (const void *buf, size_t len)
          {
            /* We know in advance how long the buffer has to be. */
            unsigned char *in = (unsigned char *) buf;
            char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
            char *cp = out, *p;
          
            /* Encode the length. */
            /* Using `htonl' is necessary so that the data can be
               decoded even on machines with different byte order.
               `l64a' can return a string shorter than 6 bytes, so 
               we pad it with encoding of 0 ('.') at the end by 
               hand. */
          
            p = stpcpy (cp, l64a (htonl (len)));
            cp = mempcpy (p, "......", 6 - (p - cp));
          
            while (len > 3)
              {
                unsigned long int n = *in++;
                n = (n << 8) | *in++;
                n = (n << 8) | *in++;
                n = (n << 8) | *in++;
                len -= 4;
                p = stpcpy (cp, l64a (htonl (n)));
                cp = mempcpy (p, "......", 6 - (p - cp));
              }
            if (len > 0)
              {
                unsigned long int n = *in++;
                if (--len > 0)
                  {
                    n = (n << 8) | *in++;
                    if (--len > 0)
                      n = (n << 8) | *in;
                  }
                cp = stpcpy (cp, l64a (htonl (n)));
              }
            *cp = '\0';
            return out;
          }

It is strange that the library does not provide the completefunctionality needed but so be it.

To decode data produced with l64a the following function should beused.