C/C++ Programming Tips

1. char** & const char**

   char* cp;
   const char* ccp;
   ccp = cp; // this is legal

   char** cpp;
   const char** ccpp;
   ccpp = cpp; // compile error!

In the case of char* and const char*, they are the pointers to char and const char. So, they are for the compatible types of char, and just differ in whether they have the const qualifier.

In the case of char** and const char**, however, they are the pointers to char* and const char*, more specifically (char) * and (const char) *. So, the types of these pointers are different since their original types are a pointer of char and a pointer of const char.

On the other hand, the following code is valid. Note that const char** and char* const* are clearly different.

   char** cpp;
   char* const* cpcp;
   cpcp = cpp;

2. structure parameter passing

Parameters are passed in registers for speed where possible. Be aware that an int i may well be passed in a completely different manner to a struct s whose only member is an int. While assuming an int parameter is typically passed in a register, structs may be instead passed on the stack.

3. declaration & definition

Variables must have exactly one definition, and they may have multiple external declarations. A declaration is like a customs declaration. It is not the thing itself, merely a description of some baggage having around somewhere. But, a definition is the special kind of declaration that fixes the storage for a variable.

  • definition: occurs in only one place. specifies the type of a variable. reserve storage for it. e.g. int arr[100];
  • declaration: can occur multiple times. describes the type of a variable. is used to refer to variables defined elsewhere. e.g. `extern int arr[]

The declaration of an external variable tells the compiler the type and name of it, and that memory allocation is done somewhere else. For a multiple-dimensional array, however, the size of all array dimensions except the leftmost one has to be provided.

4. arrays != pointers

In the case of accessing a[i] after char a[9] = "abcdefgh";, the compiler symbol table has a as address 1000 at compile-time, for example, as below. Then, a[i] can get the contents from address (1000 + i) after getting value i and adding it to 1000 at run-time.

ArrayRef

This is why extern char a[] is equal to extern char a[100]. The compiler does not need to know how long the array is in total, as it merely generates address offsets from the start. In contrast, extern char* p tells the compiler that p is a pointer and the variable pointed to is a character. To get the character, the compiler symbol table has p as address 1234 at compile-time, for example, as below. Then, *p can get the contents from address 5678 after getting it from address 1234 at run-time.

PointerRef

Differences between arrays and pointers can be summed up as follows.

arrayspointers
holds dataholds the address of data
data is accessed directly, so a[i] is the contents of the location i units past adata is accessed indirectly, so *p is the contents after getting the contents of p first. If the pointer has a subscript [i], the contents should be the one of the location i units past p.
commonly used for fixed number of elements of the same type of datacommonly used for dynamic data structures
implicitly allocated and deallocatedcommonly used with malloc() and free()

5. interpositioning

Interpositioning or interposing is the practice of replacing a library function with a user-written function of the same name, which is very dangerous. With interpositioning, it replaces the system calls as well as user code.

Interpositioning

6. a few things about the stack

  • A stack frame might not be on the stack. Although it is said that a stack frame is pushed on the stack, an activation record need not be on the stack. It is actually faster and better to keep as much as possible of the activation record in registers.
  • On UNIX, the stack grows automatically as a process needs more space. The programmer can just assume that the stack is indefinitely large. Although the kernel normally handles a reference to an invalid address by sending a segmentation fault to the process, a reference to the red zone region, which is located just below the top of the stack is not considered as a fault. Instead, the operating system increases the stack segment size by a good chunk.
  • The method of specifying stack size varies with the compiler. Compiler vendors have different methods for doing this.

7. array parameters & pointer parameters

char ga[] = "abcdefghijklm";

void passArray(char ca[10])
{
   printf( " address of array parameter = %#x \n", &ca );
   printf( " address (ca[0]) = %#x \n", &(ca[0]) );
   printf( " address (ca[1]) = %#x \n", &(ca[1]) );
   printf( " ++ca = %#x \n\n", ++ca );
}

void passPointer(char* pa)
{
   printf( " address of pointer parameter = %#x \n", &pa );
   printf( " address (pa[0]) = %#x \n", &(pa[0]) );
   printf( " address (pa[1]) = %#x \n", &(pa[1]) );
   printf( " ++pa = %#x \n", ++pa );
}

void main()
{
   printf( " address of global array = %#x \n", &ga );
   printf( " address (ga[0]) = %#x \n", &(ga[0]) );
   printf( " address (ga[1]) = %#x \n\n", &(ga[1]) );
   passArray( ga );
   passPointer( ga );
}

The output of the above code could be as follows.

 address of global array = 0x81590010 
 address (ga[0]) = 0x81590010 
 address (ga[1]) = 0x81590011 

 address of array parameter = 0x9f295078 
 address (ca[0]) = 0x81590010 
 address (ca[1]) = 0x81590011 
 ++ca = 0x81590011 

 address of pointer parameter = 0x9f295078 
 address (pa[0]) = 0x81590010 
 address (pa[1]) = 0x81590011 
 ++pa = 0x81590011 
  • The results of ga, ca, and pa are the same. Only the results of &ca and &pa are different.
  • ga represents the address of the first element of the array ga.
  • &ga represents the address of the array ga.
  • ga and &ga are the same. But ga + 1 points to the second element of the array ga, and &ga + 1 points to the next one by the size of the array ga, which means undefined behavior.
  • The addresses of ga and ga[0] are the same since ga is the original.
  • When calling the functions, the address of ga is copied and passed to them, which means that the addresses of ca and pa are not the same as the address of ga.

PtrArrMem

8. array and pointer parameters changed by the compiler

The “array name is rewritten as a pointer argument” rule is not recursive. An array of array is rewritten as a “pointer to array” not as a “pointer to pointer”.

argumentmatched parameter
array of array such as char c[8][10];pointer to array such as char (*c)[10];
array of pointer such as char *c[15];pointer to pointer such as char** c;
pointer to array such as char (*c)[64];does not change
pointer to pointer char** cdoes not change

Note that char *c[15] is a vector of 15 pointers-to-char and char (*c)[64] is the pointer to array-of-64-chars. The reason char** argv appears is that argv is an array of pointers, which is char *argv[]. This decays into a pointer to the element, namely a pointer to a pointer.

9. sizeof( long ) == 8?

In general, the size of long type is 4-byte in the 32-bit system or 8-byte in the 64-bit system. However, this size varies by platform and not fixed to 4-byte. Fortunately, other types are fixed bytes except for long type.

32-bit Windows/Linux/Mac64-bit Windows64-bit Linux/Mac
pointer size is 4pointer size is 8pointer size is 8
sizeof( char ) is 1sizeof( char ) is 1sizeof( char ) is 1
sizeof( short ) is 2sizeof( short ) is 2sizeof( short ) is 2
sizeof( int ) is 4sizeof( int ) is 4sizeof( int ) is 4
sizeof( long ) is 4sizeof( long ) is 4sizeof( long ) is 8
sizeof( long long ) is 8sizeof( long long ) is 8sizeof( long long ) is 8
sizeof( float ) is 4sizeof( float ) is 4sizeof( float ) is 4
sizeof( double ) is 8sizeof( double ) is 8sizeof( double ) is 8

Although it is speculation, this inconsistency seems because of the DWORD type in Windows, which is declared as typedef unsigned long DWORD. DWORD is a variable used assuming 4 bytes, so if this size would be changed to 8-byte, DWORD size should be also changed, which means a disaster of worldwide code.

10. char vs wchar_t

The basic idea behind Unicode is to assign every character or glyph from every language in common use around the globe to a unique hexadecimal code known as a code point. When storing a string of characters in memory, a particular encoding is selected among the following. Note that UTF-16 and UTF-32 encodings can be little-endian or big-endian.

  • UTF-32: Each Unicode code point is encoded into a 32-bit value, which is the simplest Unicode encoding.
  • UTF-8: Each Unicode code point is encoded into a 8-bit value, but some code points occupy more than one byte. This is known as a variable-length encoding, or a multibyte character set(MBCS) because each character in a string may take one or more bytes of storage. The first 127 Unicode code points correspond numerically to the old ANSI character codes.
  • UTF-16: Each character in a UTF-16 string is represented by either one or two 16-bit values. This is known as a wide character set(WCS).

The char type is intended for use with legacy ANSI strings and with MBCS including UTF-8. The wchar_t type is a wide character type, which is intended to be capable of representing any valid code ponit in a single integer. So, its size is compiler- and system-specific. It could be 16-bit for UTF-16 or 32-bit for UTF-32. Under Windows, however, the wchar_t type is used exclusively for UTF-16 and the char type is used for ANSI strings and legacy Windows code page string encodings. When reading the Windows API documents, the term ‘Unicode’ is always synonymous with WCS and UTF-16 encoding. This is a bit confusing because Unicode strings can in general be encoded in the non-wide multibyte UTF-8 format.

11. new throwing an exception, not returning nullptr

new expression throws an exception to report failure to allocate storage, and does not return nullptr. However, new accepts an argument because it works like a function. new(std::nothrow) can be used for returning nullptr when bad allocation happens.

#include <iostream>
#include <new>

int main()
{
   try {
      while (true) new int[100000000ul];
   }
   catch (const std::bad_alloc& e) {
      std::cout << e.what() << '\n';
   }

    while (true) {
        int* p = new(std::nothrow) int[100000000ul];
        if (p == nullptr) {
            std::cout << "Allocation returned nullptr\n";
            break;
        }
    }
   return 0;
}
# Output 
std::bad_alloc
Allocation returned nullptr

References

[1] Peter van der Linden. 1994. Expert C programming: deep C secrets. Prentice-Hall, Inc., USA.

[2] J. Gregory, Game Engine Architecture, Third Edition, CRC Press

[3] 전상현. 2018. 크로스 플랫폼 핵심 모듈 설계의 기술, 로드북


© 2024. All rights reserved.