Friday, May 6, 2011

Large buffers vs Large static buffers, is there an advantage?

Hello, Consider the following code.

Is DoSomething1() faster then DoSomething2() in a 1000 consecutive executions? I would assume that if I where to call DoSomething1() it 1000 times it would be faster then calling DoSomething2() it 1000 times.

Is there any disadvantage to making all my large buffers static?

#define MAX_BUFFER_LENGTH 1024*5 
void DoSomething1()
{
    static char buf[MAX_BUFFER_LENGTH] ; 
    memset( buf, 0, MAX_BUFFER_LENGTH );
}

void DoSomething2()
{
    char buf[MAX_BUFFER_LENGTH] ; 
    memset( buf, 0, MAX_BUFFER_LENGTH );
}

Thank you for your time.

From stackoverflow
  • Disadvantage of static buffers:

    • If you need to be thread safe then using static buffers probably isn't a good idea.
    • Memory won't be freed until the end of your program hence making your memory consumption higher.

    Advantages of static buffers:

    • There are less allocations with static buffers. You don't need to allocate on the stack each time.
    • With a static buffer, there is less chance of a stack overflow from too high of an allocation.
    David Thornley : Stack allocation is quick, so I wouldn't worry about that. Stack overflow is something I would seriously worry about, though.
  • There will hardly be any speed difference at all between the two. Allocating a buffer on the stack is very fast -- all it is is decrementing the stack pointer by a value. If you allocate a very large buffer on the stack, though, there is a chance you could overflow your stack and cause a segfault/access violation. Conversely, if you have a lot of static buffers, you'll increase your program's working set size considerably, although this will be mitigated somewhat if you have good locality of reference.

    Another major difference is that stack buffers are thread-safe and reentrant, whereas static buffers are neither thread-safe nor reentrant.

  • The stack allocation is a little more expensive if you have /GS enabled in the VC++ compiler, which enables security checks for buffer overruns (/GS is on by default). Really, you should profile the two options and see which is faster. It's possible that something like cache locality in the static memory vs on the stack could make the difference.

    Here's the non-static version, with the VC++ compiler with /O2.

    _main   PROC      ; COMDAT
    ; Line 5
        mov eax, 5124    ; 00001404H
        call __chkstk
        mov eax, DWORD PTR ___security_cookie
        xor eax, esp
        mov DWORD PTR __$ArrayPad$[esp+5124], eax
    ; Line 7
        push 5120     ; 00001400H
        lea eax, DWORD PTR _buf$[esp+5128]
        push 0
        push eax
        call _memset
    ; Line 9
        mov ecx, DWORD PTR __$ArrayPad$[esp+5136]
        movsx eax, BYTE PTR _buf$[esp+5139]
        add esp, 12     ; 0000000cH
        xor ecx, esp
        call @__security_check_cookie@4
        add esp, 5124    ; 00001404H
        ret 0
    _main   ENDP
    _TEXT   ENDS
    

    And here's the static version

    ;   COMDAT _main
    _TEXT   SEGMENT
    _main   PROC      ; COMDAT
    ; Line 7
        push 5120     ; 00001400H
        push 0
        push OFFSET ?buf@?1??main@@9@4PADA
        call _memset
    ; Line 8
        movsx eax, BYTE PTR ?buf@?1??main@@9@4PADA+3
        add esp, 12     ; 0000000cH
    ; Line 9
        ret 0
    _main   ENDP
    _TEXT   ENDS
    END
    
    Daniel Earwicker : +1 for cache locality and profiling.
  • You could also consider to put your code into a class. E.g. something like

    const MAX_BUFFER_LENGTH = 1024*5; 
    class DoSomethingEngine {
      private:
        char *buffer;
      public:
        DoSomethingEngine() {
          buffer = new buffer[MAX_BUFFER_LENGTH];
        }
        virtual ~DoSomethingEngine() {
           free(buffer);
        }
        void DoItNow() {
           memset(buffer, 0, MAX_BUFFER_LENGTH);
           ...
        }
    }
    

    This is tread safe if each tread just allocates its own engine. It avoids allocation of large quantity of memory on the stack. The allocation on the heap is a small overhead, but this is negligible if you reuse instances of the class many times.

  • Am I the only one here working on multi-threaded software? Static buffers are an absolute no-no in that situation, unless you want to commit yourself to lots of performance cripling locking and unlocking.

  • As other have said the stack allocation is very fast, the speedup from not having to realloc each time is probably greater for more complex objects such as an ArrayList or HashTable (now List<> and Dictionary<,> in the generic world) where there is construction code to run each time, also if the capacities are not set correctly then you have unwanted reallocations each time the container reaches capacity and has to allocate new memory and copy the contents from old memory to new. This I often have working List<> objects that I allow to grow to whatever size is required and I reuse them by calling Clear() - which leaves the allocated memory/capacity intact. However you should be wary of memory leaks if you have a rouge call that happens to allocate a lot of memory which doesn't occur very often or only once.

0 comments:

Post a Comment