GlobalAlloc uses the default process heap (probably... or maybe its own heap), which usually has the size of 1 MB allocated at the time of process start (/HEAP switch of the linker). So, when you're allocating small pieces, it "just grabs" first suitable block from those 1 MB, and returns it without a need in "physical" heap (re)allocation.
Results:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
879 cycles per kByte for HeapAlloc 00001000h bytes (4 kB)
4478 cycles per kByte for VirtualAlloc 00001000h bytes (4 kB)
565 cycles per kByte for GlobalAlloc 00001000h bytes (4 kB)
494 cycles per kByte for HeapAlloc 00004000h bytes (16 kB)
2954 cycles per kByte for VirtualAlloc 00004000h bytes (16 kB)
145 cycles per kByte for GlobalAlloc 00004000h bytes (16 kB)
5771 cycles per kByte for HeapAlloc 00010000h bytes (64 kB)
2951 cycles per kByte for VirtualAlloc 00010000h bytes (64 kB)
726 cycles per kByte for GlobalAlloc 00010000h bytes (64 kB)
1102 cycles per kByte for HeapAlloc 00040000h bytes (256 kB)
3142 cycles per kByte for VirtualAlloc 00040000h bytes (256 kB)
18 cycles per kByte for GlobalAlloc 00040000h bytes (256 kB)
3066 cycles per kByte for HeapAlloc 00100000h bytes (1 MB)
3199 cycles per kByte for VirtualAlloc 00100000h bytes (1 MB)
3110 cycles per kByte for GlobalAlloc 00100000h bytes (1 MB)
3067 cycles per kByte for HeapAlloc 00400000h bytes (4 MB)
3045 cycles per kByte for VirtualAlloc 00400000h bytes (4 MB)
3036 cycles per kByte for GlobalAlloc 00400000h bytes (4 MB)
Memory was touched
Edit: the stated above is truely right only for heap functions at all. VirtualAlloc is slow because it needs to reserve/commit the memory, while heap functions just getting suitable piece from entire block - reallocation occured only when the block size is not enough for the piece requested.
As for difference between Global/HeapAlloc, below is the results when code changed to use GMEM_ZEROINIT when calling GlobalAlloc:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
636 cycles per kByte for HeapAlloc 00001000h bytes (4 kB)
4639 cycles per kByte for VirtualAlloc 00001000h bytes (4 kB)
762 cycles per kByte for GlobalAlloc 00001000h bytes (4 kB)
443 cycles per kByte for HeapAlloc 00004000h bytes (16 kB)
2599 cycles per kByte for VirtualAlloc 00004000h bytes (16 kB)
475 cycles per kByte for GlobalAlloc 00004000h bytes (16 kB)
5648 cycles per kByte for HeapAlloc 00010000h bytes (64 kB)
2818 cycles per kByte for VirtualAlloc 00010000h bytes (64 kB)
1748 cycles per kByte for GlobalAlloc 00010000h bytes (64 kB)
1007 cycles per kByte for HeapAlloc 00040000h bytes (256 kB)
3049 cycles per kByte for VirtualAlloc 00040000h bytes (256 kB)
965 cycles per kByte for GlobalAlloc 00040000h bytes (256 kB)
3088 cycles per kByte for HeapAlloc 00100000h bytes (1 MB)
3090 cycles per kByte for VirtualAlloc 00100000h bytes (1 MB)
3025 cycles per kByte for GlobalAlloc 00100000h bytes (1 MB)
3193 cycles per kByte for HeapAlloc 00400000h bytes (4 MB)
3056 cycles per kByte for VirtualAlloc 00400000h bytes (4 MB)
2950 cycles per kByte for GlobalAlloc 00400000h bytes (4 MB)
Memory was touched