c - Is it possible to have a 32-bit pointer on x86-64 without undefined behavior? - Stack Overflow

admin2025-04-18  4

Generally pointers on x86-64 are defined to be 8 bytes. However, if you are certain that you have data that will only ever be in the first 4GB of the address space, then a 32-bit value is technically sufficient. Sometimes people will create a <4GB buffer, and store 32-bit offsets into that buffer, but I am asking about directly storing a memory address in 32-bits, NOT storing an offset into some other buffer.

You could do something like:

struct small_pointer {
    uint32_t p;
};

char load_small_pointer(struct small_pointer p, size_t index)
{
    return *((char*)p + index);
}

However this involves converting from a uint32_t to a (char*) which is probably undefined behavior. The standard allows round trips through uintptr_t, but I don't know of any special allowance for other types, even though when our conditons are met the conversion from uintptr_t to uint32_t should be lossless. Is there any standards compliant way to do this? If not, do GCC/Clang provide any implementation specific guarantees?

In case you're wondering why you would want this: the 'pointers' take up half as much data cache, and it's not unusual for operating systems to store special data in either the high or low end of the address range. Needing a different pointer type for some objects is already natural in these situations. Expressing it as an offset into some buffer allocated at runtime would involve an extra load (loading the address from the pointer to the buffer, then the actual load from the offsetted location).

Generally pointers on x86-64 are defined to be 8 bytes. However, if you are certain that you have data that will only ever be in the first 4GB of the address space, then a 32-bit value is technically sufficient. Sometimes people will create a <4GB buffer, and store 32-bit offsets into that buffer, but I am asking about directly storing a memory address in 32-bits, NOT storing an offset into some other buffer.

You could do something like:

struct small_pointer {
    uint32_t p;
};

char load_small_pointer(struct small_pointer p, size_t index)
{
    return *((char*)p + index);
}

However this involves converting from a uint32_t to a (char*) which is probably undefined behavior. The standard allows round trips through uintptr_t, but I don't know of any special allowance for other types, even though when our conditons are met the conversion from uintptr_t to uint32_t should be lossless. Is there any standards compliant way to do this? If not, do GCC/Clang provide any implementation specific guarantees?

In case you're wondering why you would want this: the 'pointers' take up half as much data cache, and it's not unusual for operating systems to store special data in either the high or low end of the address range. Needing a different pointer type for some objects is already natural in these situations. Expressing it as an offset into some buffer allocated at runtime would involve an extra load (loading the address from the pointer to the buffer, then the actual load from the offsetted location).

Share Improve this question edited Jan 30 at 4:22 Joseph Garvin asked Jan 29 at 23:05 Joseph GarvinJoseph Garvin 22k19 gold badges105 silver badges180 bronze badges 30
  • 4 Re “I don't see any reason to expect the answer is different across the two languages”: (a) The two languages have different rules, so explaining why behavior is or is not defined in one or the other requires citing different passages in different documents. (b) The two languages have different facilities for doing conversions and other operations, so there may be relevant code that is available in one language and not the other. – Eric Postpischil Commented Jan 30 at 11:36
  • 2 (c) Stack Overflow does not exist primarily for giving you an answer to your question. It is intended to be a durable repository of questions and answers for future users. To that end, it is desired not to dilute search results; somebody searching for results for C should get results primarily focused on C and somebody searching for results for C++ should get results primarily focused on C++. Conflating C and C++ makes it harder for future users to find the results they want. – Eric Postpischil Commented Jan 30 at 11:38
  • 2 (d) Both the C and C++ tags request they not be conflated. That is long-standing policy, and it has proven useful. Experience has shown that answers for the different languages often do diverge in spite of your lack of expectation. If you disagree with policy, you can argue it on meta.stackexchange.com. You are benefitting from a free service, and you can support that by respecting the rules. – Eric Postpischil Commented Jan 30 at 11:41
  • 2 @JosephGarvin The site never "survived without" this kind of rule. Users just ignored it more often in the past. You can always ask your question on Reddit if you want an anything-goes environment. – TylerH Commented Jan 30 at 20:33
  • 2 @JosephGarvin Or you have a problem of perspective. There are ~100 people actively curating the site (in general, not every day) and yet there are several thousand questions asked every single day. The people working to keep the site clean and make sure questions are answerable don't have enough time in the day to check every single one. It has always been a rule on SO that a question asking about some specific language feature needs to be about a specific language. My advice: don't waste energy being upset about such an inherently logical rule and just abide by it. – TylerH Commented Jan 30 at 21:29
 |  Show 25 more comments

3 Answers 3

Reset to default 3

This is well defined on gcc, given the constraint on the pointer value.

The rules for pointer/integer conversions are spelled out in section 6.3.2.3, p5 and p6, of the C standard:

5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type

Additionally, the GCC documentation states the following about the above implementation defined behavior:

The result of converting a pointer to an integer or vice versa (C90 6.3.4, C99 and C11 6.3.2.3).

A cast from pointer to integer discards most-significant bits if the pointer representation is larger than the integer type, sign-extends if the pointer representation is smaller than the integer type, otherwise the bits are unchanged.

A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

So given the above, converting from a char * to a uint32_t is well defined provided that the original pointer value only has the lower 32 bits set. Additionally, it provides that the round trip conversion is valid and points to the original object.

An intermediate cast involving uintptr_t and/or void * is not required given the implementation defined behavior in GCC.

This is well defined and it's the reason for the x32 target architecture which gives you 32 bit pointers targeting the x64 processor.

On Windows you can get the correct binary behavior by turning off the LARGEADDRESSAWARE flag in the PE executable but I'm not aware of any available tools you can use to build such binaries. (This was a problem early in porting 64 bit code where people used casts between DWORD ans pointers and expected their code to work with a pure recompile; clearing the flag made the code run but didn't magically change the compile time pointers to 4 bytes but only ensured the high bits would always be zero.)

In theory a pointer modifier type could exist to say "this pointer points to the low 2GB RAM"; however you would still be looking at a toolchain change to get it to compile.

If only some pointers matter you can allocate the memory arena yourself using VirtualAlloc (Windows) or mmap (Unix); and passing the arguments to select what virtual address your RAM is; thus making the casts well defined (because you know the high bits are zero). In theory you should be able to say bottom 4GB rather than bottom 2GB but the history says that's harder to do because of how the compiler works, and in fact this blew up on Windows before we even went 64 bit because some compilers treated pointers as signed types.

This is the case restriction: you need an API guaranteed to give you a (readable and writable) pointer with all bits 31..63 clear so the pointer points to the bottom of the 2GB address space. If you have that API, the cast is well defined. If you do not, the cast isn't well defined.

Is there any standards compliant way to do this?

No, the C standard makes conversions from pointers to integers implementation-defined, per C 2024 6.3.3.3, so there is no way to guarantee based on the standard alone that pointers into the low 4 GiB of address space can be stored using 32-bit integers.1, 2

If not, do GCC/Clang provide any implementation specific guarantees?

For GCC, there is barely a guarantee, when it is completed with additional platform documentation. For Clang, documentation seems lacking.

The GCC 14.2 Manual documents in clause 4.7 that “A cast from pointer to integer discards most-significant bits if the pointer representation is larger than the integer type, sign-extends if the pointer representation is smaller than the integer type, otherwise the bits are unchanged” and “A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.”

Thus, if a pointer representation contains only zero bits above the low 32 bits, you can convert it directly to a uint32_t in GCC without losing any information, and you can convert it back to the pointer type to restore the original pointer value.

We still need to know how pointers are represented. They are not necessarily plain hardware addresses. Pointer representation is nominally covered in the GCC manual by 4.15, which says the bytes encoding an object, other than as specified by the C standard, are “Determined by ABI.”

Thus, if you are using a platform that uses plain hardware addresses (per its ABI) and are using GCC 14.2 (or any other version documented as described above), then pointers to locations in the low 4 GiB of the address space can be converted to unsigned 32-bit integers, stored as such, and converted back to the original pointer type to restore the original value.

Note that you should use an unsigned type. The conversion of an address in the second 2 GiB would set the sign bit of a signed 32-bit integer, and then the conversion back to the pointer type would sign-extend that, producing an address different from the original.

I do not see that the Clang documentation defines conversions between pointers and integers, although it could be buried in the documentation somewhere.

Footnotes

1 Actually, the question is improperly phrased. Any source code that is accepted by at least one C implementation is conforming to the C standard, even if its behavior varies between C implementations or is not defined at all by the C standard. The desired question is whether there is strictly conforming code that is guaranteed to store the desired pointers using 32-bit integers, that is, code that works in all C implementations, not just one.

2 Some conversions to pointers are undefined rather than implementation-defined, when the result of the conversion would not fit in the destination type. However, this would be easily avoided by converting to uintptr_t and then to uint32_t, provided these optional types are available.

转载请注明原文地址:http://www.anycun.com/QandA/1744945712a89853.html