C/C++ Programming Tips
- 1.
char**
&const char**
- 2. structure parameter passing
- 3. declaration & definition
- 4. arrays
!=
pointers - 5. interpositioning
- 6. a few things about the stack
- 7. array parameters & pointer parameters
- 8. array and pointer parameters changed by the compiler
- 9.
sizeof( long ) == 8
? - 10.
char
vswchar_t
- 11.
new
throwing an exception, not returningnullptr
- 12.
reinterpret_cast
is portable - References
1. char**
& const char**
char* cp;
const char* ccp;
ccp = cp; // this is legal
char** cpp;
const char** ccpp;
ccpp = cpp; // compile error!
In the case of char*
and const char*
, they are the pointers to char
and const char
. So, they are for the compatible types of char
, and just differ in whether they have the const
qualifier.
In the case of char**
and const char**
, however, they are the pointers to char*
and const char*
, more specifically (char) *
and (const char) *
. So, the types of these pointers are different since their original types are a pointer of char
and a pointer of const char
.
On the other hand, the following code is valid. Note that const char**
and char* const*
are clearly different.
char** cpp;
char* const* cpcp;
cpcp = cpp;
2. structure parameter passing
Parameters are passed in registers for speed where possible. Be aware that an int i
may well be passed in a completely different manner to a struct s
whose only member is an int
. While assuming an int
parameter is typically passed in a register, struct
s may be instead passed on the stack.
3. declaration & definition
Variables must have exactly one definition, and they may have multiple external declarations. A declaration is like a customs declaration. It is not the thing itself, merely a description of some baggage having around somewhere. But, a definition is the special kind of declaration that fixes the storage for a variable.
- definition: occurs in only one place. specifies the type of a variable. reserve storage for it. e.g.
int arr[100];
- declaration: can occur multiple times. describes the type of a variable. is used to refer to variables defined elsewhere. e.g. `extern int arr[]
The declaration of an external variable tells the compiler the type and name of it, and that memory allocation is done somewhere else. For a multiple-dimensional array, however, the size of all array dimensions except the leftmost one has to be provided.
4. arrays !=
pointers
In the case of accessing a[i]
after char a[9] = "abcdefgh";
, the compiler symbol table has a
as address 1000
at compile-time, for example, as below. Then, a[i]
can get the contents from address (1000 + i)
after getting value i
and adding it to 1000
at run-time.
This is why extern char a[]
is equal to extern char a[100]
. The compiler does not need to know how long the array is in total, as it merely generates address offsets from the start. In contrast, extern char* p
tells the compiler that p
is a pointer and the variable pointed to is a character. To get the character, the compiler symbol table has p
as address 1234
at compile-time, for example, as below. Then, *p
can get the contents from address 5678
after getting it from address 1234
at run-time.
Differences between arrays and pointers can be summed up as follows.
arrays | pointers |
---|---|
holds data | holds the address of data |
data is accessed directly, so a[i] is the contents of the location i units past a | data is accessed indirectly, so *p is the contents after getting the contents of p first. If the pointer has a subscript [i] , the contents should be the one of the location i units past p . |
commonly used for fixed number of elements of the same type of data | commonly used for dynamic data structures |
implicitly allocated and deallocated | commonly used with malloc() and free() |
5. interpositioning
Interpositioning or interposing is the practice of replacing a library function with a user-written function of the same name, which is very dangerous. With interpositioning, it replaces the system calls as well as user code.
6. a few things about the stack
- A stack frame might not be on the stack. Although it is said that a stack frame is pushed on the stack, an activation record need not be on the stack. It is actually faster and better to keep as much as possible of the activation record in registers.
- On UNIX, the stack grows automatically as a process needs more space. The programmer can just assume that the stack is indefinitely large. Although the kernel normally handles a reference to an invalid address by sending a segmentation fault to the process, a reference to the red zone region, which is located just below the top of the stack is not considered as a fault. Instead, the operating system increases the stack segment size by a good chunk.
- The method of specifying stack size varies with the compiler. Compiler vendors have different methods for doing this.
7. array parameters & pointer parameters
char ga[] = "abcdefghijklm";
void passArray(char ca[10])
{
printf( " address of array parameter = %#x \n", &ca );
printf( " address (ca[0]) = %#x \n", &(ca[0]) );
printf( " address (ca[1]) = %#x \n", &(ca[1]) );
printf( " ++ca = %#x \n\n", ++ca );
}
void passPointer(char* pa)
{
printf( " address of pointer parameter = %#x \n", &pa );
printf( " address (pa[0]) = %#x \n", &(pa[0]) );
printf( " address (pa[1]) = %#x \n", &(pa[1]) );
printf( " ++pa = %#x \n", ++pa );
}
void main()
{
printf( " address of global array = %#x \n", &ga );
printf( " address (ga[0]) = %#x \n", &(ga[0]) );
printf( " address (ga[1]) = %#x \n\n", &(ga[1]) );
passArray( ga );
passPointer( ga );
}
The output of the above code could be as follows.
address of global array = 0x81590010
address (ga[0]) = 0x81590010
address (ga[1]) = 0x81590011
address of array parameter = 0x9f295078
address (ca[0]) = 0x81590010
address (ca[1]) = 0x81590011
++ca = 0x81590011
address of pointer parameter = 0x9f295078
address (pa[0]) = 0x81590010
address (pa[1]) = 0x81590011
++pa = 0x81590011
- The results of
ga
,ca
, andpa
are the same. Only the results of&ca
and&pa
are different. ga
represents the address of the first element of the arrayga
.&ga
represents the address of the arrayga
.ga
and&ga
are the same. Butga + 1
points to the second element of the arrayga
, and&ga + 1
points to the next one by the size of the arrayga
, which means undefined behavior.- The addresses of
ga
andga[0]
are the same sincega
is the original. - When calling the functions, the address of
ga
is copied and passed to them, which means that the addresses ofca
andpa
are not the same as the address ofga
.
8. array and pointer parameters changed by the compiler
The “array name is rewritten as a pointer argument” rule is not recursive. An array of array is rewritten as a “pointer to array” not as a “pointer to pointer”.
argument | matched parameter |
---|---|
array of array such as char c[8][10]; | pointer to array such as char (*c)[10]; |
array of pointer such as char *c[15]; | pointer to pointer such as char** c ; |
pointer to array such as char (*c)[64]; | does not change |
pointer to pointer char** c | does not change |
Note that char *c[15]
is a vector of 15 pointers-to-char
and char (*c)[64]
is the pointer to array-of-64-char
s. The reason char** argv
appears is that argv
is an array of pointers, which is char *argv[]
. This decays into a pointer to the element, namely a pointer to a pointer.
9. sizeof( long ) == 8
?
In general, the size of long
type is 4-byte in the 32-bit system or 8-byte in the 64-bit system. However, this size varies by platform and not fixed to 4-byte. Fortunately, other types are fixed bytes except for long
type.
32-bit Windows/Linux/Mac | 64-bit Windows | 64-bit Linux/Mac |
---|---|---|
pointer size is 4 | pointer size is 8 | pointer size is 8 |
sizeof( char ) is 1 | sizeof( char ) is 1 | sizeof( char ) is 1 |
sizeof( short ) is 2 | sizeof( short ) is 2 | sizeof( short ) is 2 |
sizeof( int ) is 4 | sizeof( int ) is 4 | sizeof( int ) is 4 |
sizeof( long ) is 4 | sizeof( long ) is 4 | sizeof( long ) is 8 |
sizeof( long long ) is 8 | sizeof( long long ) is 8 | sizeof( long long ) is 8 |
sizeof( float ) is 4 | sizeof( float ) is 4 | sizeof( float ) is 4 |
sizeof( double ) is 8 | sizeof( double ) is 8 | sizeof( double ) is 8 |
Although it is speculation, this inconsistency seems because of the DWORD
type in Windows, which is declared as typedef unsigned long DWORD
. DWORD
is a variable used assuming 4 bytes, so if this size would be changed to 8-byte, DWORD
size should be also changed, which means a disaster of worldwide code.
10. char
vs wchar_t
The basic idea behind Unicode is to assign every character or glyph from every language in common use around the globe to a unique hexadecimal code known as a code point. When storing a string of characters in memory, a particular encoding is selected among the following. Note that UTF-16 and UTF-32 encodings can be little-endian or big-endian.
- UTF-32: Each Unicode code point is encoded into a 32-bit value, which is the simplest Unicode encoding.
- UTF-8: Each Unicode code point is encoded into a 8-bit value, but some code points occupy more than one byte. This is known as a variable-length encoding, or a multibyte character set(MBCS) because each character in a string may take one or more bytes of storage. The first 127 Unicode code points correspond numerically to the old ANSI character codes.
- UTF-16: Each character in a UTF-16 string is represented by either one or two 16-bit values. This is known as a wide character set(WCS).
The char
type is intended for use with legacy ANSI strings and with MBCS including UTF-8. The wchar_t
type is a wide character type, which is intended to be capable of representing any valid code ponit in a single integer. So, its size is compiler- and system-specific. It could be 16-bit for UTF-16 or 32-bit for UTF-32. Under Windows, however, the wchar_t
type is used exclusively for UTF-16 and the char
type is used for ANSI strings and legacy Windows code page string encodings. When reading the Windows API documents, the term ‘Unicode’ is always synonymous with WCS and UTF-16 encoding. This is a bit confusing because Unicode strings can in general be encoded in the non-wide multibyte UTF-8 format.
11. new
throwing an exception, not returning nullptr
new
expression throws an exception to report failure to allocate storage, and does not return nullptr
. However, new
accepts an argument because it works like a function. new(std::nothrow)
can be used for returning nullptr
when bad allocation happens.
#include <iostream>
#include <new>
int main()
{
try {
while (true) new int[100000000ul];
}
catch (const std::bad_alloc& e) {
std::cout << e.what() << '\n';
}
while (true) {
int* p = new(std::nothrow) int[100000000ul];
if (p == nullptr) {
std::cout << "Allocation returned nullptr\n";
break;
}
}
return 0;
}
# Output
std::bad_alloc
Allocation returned nullptr
12. reinterpret_cast
is portable
This casting might seem as if it causes undefined behavior or not fully specified in the C++ Standard and, therefore, not guaranteed to be portable across all platforms. In fact, as of C++17, casting such related pointers between on another is explicitly guaranteed by the C++ Standard to work as intended on all platforms. According to cpp17, section 6.9.2, paragraph 4, p.82,
Two objects
a
andb
are pointer-interconvertible if (…) one is a standard-layout class object and the other is the first non-static data member of that object (…) If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via areinterpret_cast
(…)
References
[1] Peter van der Linden. 1994. Expert C programming: deep C secrets. Prentice-Hall, Inc., USA.
[2] J. Gregory, Game Engine Architecture, Third Edition, CRC Press
[3] 전상현. 2018. 크로스 플랫폼 핵심 모듈 설계의 기술, 로드북
[4] J. Lakos, Large-Scale C++ Volume I: Process and Architecture, Addison-Wesley Professional