I was looking at
ReadDirectoryChangesW and came across something I've not seen in the MS documentation.
Quote[out] lpBuffer
A pointer to the DWORD-aligned formatted buffer in which the read results are to be returned.
I thought Windows was pretty relaxed about buffer alignment, but later
QuoteReadDirectoryChangesW fails with ERROR_NOACCESS when the buffer is not aligned on a DWORD boundary.
Wonder why align 4 is so important?
The alignment is important because the FILE_NOTIFY_INFORMATION structure is variable in length, because of it's last member FileName.
The other structure members are DWORDs, so you must align at 4 bytes (at least?).
8 byte and 16 byte alignment should also work ?! But this is just a guess by myself.
Quote from: Greenhorn on December 19, 2024, 07:12:59 AMThe alignment is important because the FILE_NOTIFY_INFORMATION structure is variable in length, because of it's last member FileName.
The other structure members are DWORDs, so you must align at 4 bytes (at least?).
8 byte and 16 byte alignment should also work ?! But this is just a guess by myself.
There was a discussion a couple of years ago about this. Marinus suggested me to align 16 and 32 for performance in a certain code.
I just checked, out of curiosity, and I aligned 32 in the most important loop, and 16 in some other ones. Apart from that, I used 4
Well, nobody has yet answered sinsi's question:
QuoteWonder why align 4 is so important?
My guess®™ is that somewhere in the code for
ReadDirectoryChangesW() they explicitly check the buffer alignment, I'm guessing in the interest of speed.
Otherwise, how could a function possibly fail if data wasn't DWORD aligned?
Are there any x86/x64 instructions that would fail under that circumstance?
Quote from: NoCforMe on March 30, 2025, 11:21:41 AMWell, nobody has yet answered sinsi's question:
QuoteWonder why align 4 is so important?
Maybe nobody here knows for certain.
QuoteMy guess®™ is that somewhere in the code for ReadDirectoryChangesW() they explicitly check the buffer alignment, I'm guessing in the interest of speed.
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-readdirectorychangesw (https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-readdirectorychangesw)
ERROR_NOACCESS when buffer is not aligned on a dword boundary.
But of course, it does not explain why. :badgrin:
The ultimate answer: Because Microsoft says so. :tongue:
Raymond Chen might know, he seems pretty knowledgeable about MS inner workings.
Misaligning the buffer by 1 and following the code in a debugger I get to
NtNotifyChangeDirectoryFileEx.
That fails with
HRESULT 80000002 which, funnily enough, is
Quote0x80000002
STATUS_DATATYPE_MISALIGNMENT
{EXCEPTION} Alignment Fault
A data type misalignment was detected in a load or store instruction.
Still not sure why it causes a fault, but I need a break from debugging.
Quote from: sinsi on March 30, 2025, 03:04:09 PMMisaligning the buffer by 1 and following the code in a debugger I get to NtNotifyChangeDirectoryFileEx.
That fails with HRESULT 80000002 which, funnily enough, is
Quote0x80000002
STATUS_DATATYPE_MISALIGNMENT
{EXCEPTION} Alignment Fault
A data type misalignment was detected in a load or store instruction.
Still not sure why it causes a fault, but I need a break from debugging.
Hmm; so is that a bona fide hardware fault, or something generated by the Microsoft's code in the function?
What load or store instructions will fault because of misalignment?
Not just generic
MOVs, I don't think.
Quote from: NoCforMe on March 30, 2025, 05:51:33 PMHmm; so is that a bona fide hardware fault, or something generated by the Microsoft's code in the function?
What load or store instructions will fault because of misalignment?
Not just generic MOVs, I don't think.
It's exception 0x11; requires bit 18 to be set in register CR0; then setting flag AC ( also bit 18 ) in register EFL should activate the check. All MOVs, PUSHs, POPs, BTxs are affected, but just in ring 3.
Quote from: NoCforMe on March 30, 2025, 11:21:41 AMAre there any x86/x64 instructions that would fail under that circumstance?
Not that I am aware of, but I could be wrong. There are plenty of SIMD instructions that require align 16, but align 4? Never seen.
Quote from: _japheth on March 30, 2025, 06:04:56 PMQuote from: NoCforMe on March 30, 2025, 05:51:33 PMHmm; so is that a bona fide hardware fault, or something generated by the Microsoft's code in the function?
What load or store instructions will fault because of misalignment?
Not just generic MOVs, I don't think.
It's exception 0x11; requires bit 18 to be set in register CR0; then setting flag AC ( also bit 18 ) in register EFL should activate the check. All MOVs, PUSHs, POPs, BTxs are affected, but just in ring 3.
So, are you saying that this
exception must be enabled by setting those bits? otherwise it won't occur?
Why would anyone do such a thing?
Seems to me (unless I'm quite wrong here) that Micro$oft is being a bit anal by penalizing the caller for having a misaligned buffer. Why not just let it be aligned whereever, and if the caller takes a performance penalty for a misaligned buffer, then it's on them?
And I forget just which rings things execute in: does the kernel run in ring 3?
Quote from: NoCforMe on March 31, 2025, 09:55:30 AMSo, are you saying that this exception must be enabled by setting those bits? otherwise it won't occur?
Yes.
QuoteWhy would anyone do such a thing?
To detect misaligned memory accesses, perhaps? Those cause performance penalties.
QuoteSeems to me (unless I'm quite wrong here) that Micro$oft is being a bit anal by penalizing the caller for having a misaligned buffer.
For Windows Apps alignment check is off in MS Windows - I guess, too many apps would crash if it's on.
QuoteAnd I forget just which rings things execute in: does the kernel run in ring 3?
No, your app runs in ring 3, the kernel in ring 0.
Quote from: _japheth on March 31, 2025, 01:26:01 PMQuote from: NoCforMe on March 31, 2025, 09:55:30 AMSeems to me (unless I'm quite wrong here) that Micro$oft is being a bit anal by penalizing the caller for having a misaligned buffer.
For Windows Apps alignment check is off in MS Windows - I guess, too many apps would crash if it's on.
But but but ... now we're back to square one:
If that's true, then why did sinsi get a fault back there in this post (https://masm32.com/board/index.php?msg=137697) (#5)?
Again: was that an actual hardware fault caused by a CPU instruction, or something somehow generated by the Win32 function? (Or a hardware fault that was caught by C++ code?)
Very confused here ...
QuoteAgain: was that an actual hardware fault caused by a CPU instruction, or something somehow generated by the Win32 function? (Or a hardware fault that was caught by C++ code?)
if((ULONG_PTR)buffer) & 3)
{
ExRaiseDataytpeMisalignmentException();
}
That's what happens in the kernel for pretty much every native function that takes a struct parameter.
You don't see it often fail like that in 32-bit apps on 64- because if you send in misaligned buffer, you're only sending them to WoW, and WoW will call the kernel without dodgy buffers. With native bitness apps, it's pretty easy to do. Like the struct passed to VirtualQuery for instance, that needs 8 alignment on 64-bit because the first member of the struct is a pointer rather than a dword like the struct for ReadDirectoryChangesW.
As for why, this:
QuoteTo detect misaligned memory accesses, perhaps? Those cause performance penalties
NT has been ported to loads of different architectures, it doesn't just run on PCs. Some of those don't do misaligned memory reads (Itanium for one) . Every platform does aligned reads, so you write one check to make sure the buffer is aligned and then you don't have to care about what the platform does if it isn't.
Thanks for the excellent explanation.
a that buffer store linked lists, so it isn't BYTE nor WCHAR buffer.
Quote from: TimoVJL on March 31, 2025, 08:01:34 PMa that buffer store linked lists, so it isn't BYTE nor WCHAR buffer.
Care to state that in English?
that API function parameter is variable size struct FILE_NOTIFY_INFORMATION or several of them.
struct native align is DWORD (4)
simple, isn't it ?
This function, is just a call to ReadDirectoryChangesExW (kernelbase.dll) whose last parameter is settled as 1 (always), where 1 = ReadDirectoryNotifyInformation from _READ_DIRECTORY_NOTIFY_INFORMATION_CLASS . here (https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ne-minwinbase-read_directory_notify_information_class)
In windows 10, the function when used for 32 bits app is created like this:
BOOL __stdcall ReadDirectoryChangesW(
HANDLE hDirectory,
LPVOID lpBuffer,
DWORD nBufferLength,
BOOL bWatchSubtree,
DWORD dwNotifyFilter,
LPDWORD lpBytesReturned,
LPOVERLAPPED lpOverlapped,
LPOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine)
{
return ReadDirectoryChangesExW(
hDirectory,
(int)lpBuffer,
nBufferLength,
bWatchSubtree,
dwNotifyFilter,
(int)lpBytesReturned,
(int)lpOverlapped,
(int)lpCompletionRoutine,
1);
}
Since the function calls to ReadDirectoryChangesExW that uses (as an alternative) the structure FILE_NOTIFY_EXTENDED_INFORMATION (which is bigger than FILE_NOTIFY_INFORMATION), the better would be fill the buffer with an array of the same size as FILE_NOTIFY_EXTENDED_INFORMATION. On this way it will hold enough data to store the array of FILE_NOTIFY_INFORMATION structure and probably won´t need to align.
So, the buffer may consists of a pointer to FILE_NOTIFY_EXTENDED_INFORMATION
typedef struct _FILE_NOTIFY_EXTENDED_INFORMATION {
DWORD NextEntryOffset;
DWORD Action;
LARGE_INTEGER CreationTime;
LARGE_INTEGER LastModificationTime;
LARGE_INTEGER LastChangeTime;
LARGE_INTEGER LastAccessTime;
LARGE_INTEGER AllocatedLength;
LARGE_INTEGER FileSize;
DWORD FileAttributes;
union {
DWORD ReparsePointTag;
DWORD EaSize;
} DUMMYUNIONNAME;
LARGE_INTEGER FileId;
LARGE_INTEGER ParentFileId;
DWORD FileNameLength;
WCHAR FileName[1];
} FILE_NOTIFY_EXTENDED_INFORMATION, *PFILE_NOTIFY_EXTENDED_INFORMATION;
typedef struct _FILE_NOTIFY_INFORMATION {
DWORD NextEntryOffset;
DWORD Action;
DWORD FileNameLength;
WCHAR FileName[1];
} FILE_NOTIFY_INFORMATION, *PFILE_NOTIFY_INFORMATION;
On curiosity, the api also uses NtNotifyChangeDirectoryFileEx ZwNotifyChangeDirectoryFileEx (https://learn.microsoft.com/en-us/previous-versions/mt812581%28v%3dvs.85%29). It doesn´t seems to be a huge function, so perhaps it can be rewritten completelly.
If anyone is interested in doing this, here are the C code (untested) that can be used to rebuild it later
BOOL __stdcall ReadDirectoryChangesW(
HANDLE hDirectory,
LPVOID lpBuffer,
DWORD nBufferLength,
BOOL bWatchSubtree,
DWORD dwNotifyFilter,
LPDWORD lpBytesReturned,
LPOVERLAPPED lpOverlapped,
LPOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine)
{
return ReadDirectoryChangesExW(
hDirectory,
(int)lpBuffer,
nBufferLength,
bWatchSubtree,
dwNotifyFilter,
lpBytesReturned,
lpOverlapped,
lpCompletionRoutine,
1);
}
int __stdcall ReadDirectoryChangesExW(
HANDLE hDirectory,
int lpBuffer,
int nBufferLength,
int bWatchSubtree,
int dwNotifyFilter,
_DWORD *lpBytesReturned,
_DWORD *lpOverlapped,
_DWORD *lpCompletionRoutine,
int ReadDirectoryNotifyInformationClass)
{
int v9; // ebx
PVOID v10; // edi
int v12; // eax
_DWORD *v13; // ecx
_DWORD *v14; // esi
NTSTATUS v15; // eax
int (__stdcall *v16)(int, int, int); // edx
NTSTATUS v17; // esi
int v18; // eax
int v19; // [esp-4h] [ebp-24h]
_DWORD v20[2]; // [esp+Ch] [ebp-14h] BYREF
int v21; // [esp+14h] [ebp-Ch]
int v22; // [esp+18h] [ebp-8h]
PVOID P; // [esp+1Ch] [ebp-4h] BYREF
v9 = 1;
v10 = 0;
P = 0;
if ( ReadDirectoryNotifyInformationClass == 1 )
{
v12 = 1;
v22 = 1;
}
else
{
if ( ReadDirectoryNotifyInformationClass != 2 )
{
RtlSetLastWin32Error(0x57u);
return 0;
}
v12 = 2;
v22 = 2;
}
v13 = lpOverlapped;
if ( !lpOverlapped )
{
v18 = NtNotifyChangeDirectoryFileEx(
hDirectory,
0,
0,
0,
v20,
lpBuffer,
nBufferLength,
dwNotifyFilter,
bWatchSubtree,
v12);
if ( v18 == 259 )
{
v18 = NtWaitForSingleObject(hDirectory, 0, 0);
if ( v18 < 0 )
goto LABEL_24;
v18 = v20[0];
}
if ( v18 >= 0 )
{
*lpBytesReturned = v20[1];
return v9;
}
LABEL_24:
BaseSetLastNTError(v18);
return 0;
}
v14 = lpCompletionRoutine;
if ( lpCompletionRoutine )
{
v21 = 0;
v15 = BasepAllocateActivationContextActivationBlock(lpOverlapped, &P);
if ( v15 < 0 )
{
BaseSetLastNTError(v15);
return 0;
}
v10 = P;
v13 = lpOverlapped;
if ( P )
{
v16 = (int (__stdcall *)(int, int, int))BasepIoCompletion;
v14 = P;
}
else
{
v16 = BasepIoCompletionSimple;
}
}
else
{
v16 = 0;
v21 = lpOverlapped[4];
v14 = (v21 & 1) == 0 ? lpOverlapped : 0;
}
v19 = v22;
*v13 = 259;
v17 = NtNotifyChangeDirectoryFileEx(
hDirectory,
v21,
v16,
v14,
v13,
lpBuffer,
nBufferLength,
dwNotifyFilter,
bWatchSubtree,
v19);
if ( (v17 & 0xC0000000) == 0xC0000000 )
{
if ( v10 )
BasepFreeActivationContextActivationBlock(v10);
BaseSetLastNTError(v17);
return 0;
}
return v9;
}
ULONG __thiscall BaseSetLastNTError(NTSTATUS Status)
{
ULONG v1; // esi
v1 = RtlNtStatusToDosError(Status);
RtlSetLastWin32Error(v1);
return v1;
}
int __fastcall BasepAllocateActivationContextActivationBlock(int a1, int a2, int a3, _DWORD *a4)
{
char v4; // bl
int v5; // esi
NTSTATUS v6; // eax
_DWORD *Heap; // eax
HANDLE pvBuffer; // [esp+Ch] [ebp-Ch] BYREF
int v10; // [esp+10h] [ebp-8h]
int v11; // [esp+14h] [ebp-4h]
v4 = a1;
v11 = a2;
pvBuffer = 0;
v10 = 0;
if ( a4 )
*a4 = 0;
if ( (a1 & 0xFFFFFFFC) != 0 )
return 0xC00000EF;
if ( !a4 )
return 0xC00000F2;
v6 = RtlQueryInformationActivationContext(1u, 0, 0, 1u, &pvBuffer, 8u, 0);
v5 = v6;
if ( v6 < 0 )
{
_DbgPrintEx(
0x33u,
0,
"SXS: %s - Failure getting active activation context; ntstatus %08lx\n",
"BasepAllocateActivationContextActivationBlock",
v6);
goto LABEL_19;
}
if ( (v10 & 1) != 0 )
{
RtlReleaseActivationContext(pvBuffer);
pvBuffer = 0;
}
if ( (v4 & 2) == 0 || pvBuffer )
{
Heap = RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, KernelBaseGlobalData, 0x10u);
*a4 = Heap;
if ( !Heap )
{
v5 = -1073741801;
goto LABEL_19;
}
*Heap = 0;
Heap[3] = pvBuffer;
pvBuffer = 0;
if ( (v4 & 1) != 0 )
*Heap |= 1u;
Heap[1] = v11;
Heap[2] = a3;
}
v5 = 0;
LABEL_19:
if ( pvBuffer )
RtlReleaseActivationContext(pvBuffer);
return v5;
}
void __stdcall BasepIoCompletion(PVOID ApcContext, PIO_STATUS_BLOCK IoStatusBlock, ULONG Reserved)
{
ULONG v3; // edi
ULONG_PTR Information; // ebx
struct _RTL_CALLER_ALLOCATED_ACTIVATION_CONTEXT_STACK_FRAME_EXTENDED Frame; // [esp+10h] [ebp-44h] BYREF
void (__thiscall *v6)(_DWORD, ULONG, ULONG_PTR, PIO_STATUS_BLOCK); // [esp+34h] [ebp-20h]
PVOID Context; // [esp+38h] [ebp-1Ch]
CPPEH_RECORD ms_exc; // [esp+3Ch] [ebp-18h]
Frame.Size = 36;
Frame.Format = 1;
memset(&Frame.Frame, 0, 0x1Cu);
if ( (IoStatusBlock->Status & 0xC0000000) == 0xC0000000 )
{
v3 = RtlNtStatusToDosError(IoStatusBlock->Status);
Information = 0;
}
else
{
v3 = 0;
Information = IoStatusBlock->Information;
}
Context = (PVOID)*((_DWORD *)ApcContext + 3);
v6 = (void (__thiscall *)(_DWORD, ULONG, ULONG_PTR, PIO_STATUS_BLOCK))*((_DWORD *)ApcContext + 1);
if ( (*(_BYTE *)ApcContext & 1) == 0 )
BasepFreeActivationContextActivationBlock((HANDLE *)ApcContext);
RtlActivateActivationContextUnsafeFast(&Frame, Context);
ms_exc.registration.TryLevel = 0;
v6(v6, v3, Information, IoStatusBlock);
ms_exc.registration.TryLevel = -2;
RtlDeactivateActivationContextUnsafeFast(&Frame);
}
BOOLEAN __thiscall BasepFreeActivationContextActivationBlock(HANDLE *P)
{
BOOLEAN result; // al
if ( P )
{
if ( P[3] )
{
RtlReleaseActivationContext(P[3]);
P[3] = 0;
}
return RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, P);
}
return result;
}
int __stdcall BasepIoCompletionSimple(int (__thiscall *a1)(_DWORD, ULONG, NTSTATUS, NTSTATUS *), NTSTATUS *a2, int a3)
{
ULONG v3; // eax
NTSTATUS v4; // ecx
if ( (*a2 & 0xC0000000) == 0xC0000000 )
{
v3 = RtlNtStatusToDosError(*a2);
v4 = 0;
}
else
{
v4 = a2[1];
v3 = 0;
}
return a1(a1, v3, v4, a2);
}
BOOLEAN __thiscall BasepFreeActivationContextActivationBlock(HANDLE *P)
{
BOOLEAN result; // al
if ( P )
{
if ( P[3] )
{
RtlReleaseActivationContext(P[3]);
P[3] = 0;
}
return RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, P);
}
return result;
}
The noticeable speed difference is when I change movsb /stosb to movsd / stosd
when running 16 bit dos
I have treasure box pop up code using xmm regs using 16 char string = 8+8 chars random chooses 8 char material and 8 char object you find in treasure chest together with random amount of coin,capped by which level you are on
Like this "mythril"+" sword"
"daydreamer": what the hell does this have to do with buffer alignment?
And what's with that blond girl picture? Not you, we know ...