Technology Answer: Large buffers vs Large static buffers, is there an advantage?

Hello, Consider the following code.

Is DoSomething1() faster then DoSomething2() in a 1000 consecutive executions? I would assume that if I where to call DoSomething1() it 1000 times it would be faster then calling DoSomething2() it 1000 times.

Is there any disadvantage to making all my large buffers static?

#define MAX_BUFFER_LENGTH 1024*5 
void DoSomething1()
{
    static char buf[MAX_BUFFER_LENGTH] ; 
    memset( buf, 0, MAX_BUFFER_LENGTH );
}

void DoSomething2()
{
    char buf[MAX_BUFFER_LENGTH] ; 
    memset( buf, 0, MAX_BUFFER_LENGTH );
}

Thank you for your time.

From stackoverflow

Disadvantage of static buffers:
- If you need to be thread safe then using static buffers probably isn't a good idea.
- Memory won't be freed until the end of your program hence making your memory consumption higher.
Advantages of static buffers:
- There are less allocations with static buffers. You don't need to allocate on the stack each time.
- With a static buffer, there is less chance of a stack overflow from too high of an allocation.
David Thornley : Stack allocation is quick, so I wouldn't worry about that. Stack overflow is something I would seriously worry about, though.
There will hardly be any speed difference at all between the two. Allocating a buffer on the stack is very fast -- all it is is decrementing the stack pointer by a value. If you allocate a very large buffer on the stack, though, there is a chance you could overflow your stack and cause a segfault/access violation. Conversely, if you have a lot of static buffers, you'll increase your program's working set size considerably, although this will be mitigated somewhat if you have good locality of reference.

Another major difference is that stack buffers are thread-safe and reentrant, whereas static buffers are neither thread-safe nor reentrant.

The stack allocation is a little more expensive if you have /GS enabled in the VC++ compiler, which enables security checks for buffer overruns (/GS is on by default). Really, you should profile the two options and see which is faster. It's possible that something like cache locality in the static memory vs on the stack could make the difference.

Here's the non-static version, with the VC++ compiler with /O2.

_main   PROC      ; COMDAT
; Line 5
    mov eax, 5124    ; 00001404H
    call __chkstk
    mov eax, DWORD PTR ___security_cookie
    xor eax, esp
    mov DWORD PTR __$ArrayPad$[esp+5124], eax
; Line 7
    push 5120     ; 00001400H
    lea eax, DWORD PTR _buf$[esp+5128]
    push 0
    push eax
    call _memset
; Line 9
    mov ecx, DWORD PTR __$ArrayPad$[esp+5136]
    movsx eax, BYTE PTR _buf$[esp+5139]
    add esp, 12     ; 0000000cH
    xor ecx, esp
    call @__security_check_cookie@4
    add esp, 5124    ; 00001404H
    ret 0
_main   ENDP
_TEXT   ENDS

And here's the static version

;   COMDAT _main
_TEXT   SEGMENT
_main   PROC      ; COMDAT
; Line 7
    push 5120     ; 00001400H
    push 0
    push OFFSET ?buf@?1??main@@9@4PADA
    call _memset
; Line 8
    movsx eax, BYTE PTR ?buf@?1??main@@9@4PADA+3
    add esp, 12     ; 0000000cH
; Line 9
    ret 0
_main   ENDP
_TEXT   ENDS
END

Daniel Earwicker : +1 for cache locality and profiling.

You could also consider to put your code into a class. E.g. something like

const MAX_BUFFER_LENGTH = 1024*5; 
class DoSomethingEngine {
  private:
    char *buffer;
  public:
    DoSomethingEngine() {
      buffer = new buffer[MAX_BUFFER_LENGTH];
    }
    virtual ~DoSomethingEngine() {
       free(buffer);
    }
    void DoItNow() {
       memset(buffer, 0, MAX_BUFFER_LENGTH);
       ...
    }
}

This is tread safe if each tread just allocates its own engine. It avoids allocation of large quantity of memory on the stack. The allocation on the heap is a small overhead, but this is negligible if you reuse instances of the class many times.

Am I the only one here working on multi-threaded software? Static buffers are an absolute no-no in that situation, unless you want to commit yourself to lots of performance cripling locking and unlocking.
As other have said the stack allocation is very fast, the speedup from not having to realloc each time is probably greater for more complex objects such as an ArrayList or HashTable (now List<> and Dictionary<,> in the generic world) where there is construction code to run each time, also if the capacities are not set correctly then you have unwanted reallocations each time the container reaches capacity and has to allocate new memory and copy the contents from old memory to new. This I often have working List<> objects that I allow to grow to whatever size is required and I reuse them by calling Clear() - which leaves the allocated memory/capacity intact. However you should be wary of memory leaks if you have a rouge call that happens to allocate a lot of memory which doesn't occur very often or only once.

Technology Answer

Friday, May 6, 2011

Large buffers vs Large static buffers, is there an advantage?

0 comments:

Post a Comment

Blog Archive