Thursday, September 22, 2016

Lazy Initialization Algorithm

This post presents my own understanding and analysis of lazy initialization algorithm explained in the book Elements of Programming Interviews, Chapter 6.

Say you are writing a program to manage memory and its initialization. When a user asks for a very large chunk of memory, say 1GB of size, and initialize all this big chuck of memory to some value, you are worried that it will take too long time to write out to the big block of memory address by address.

Is there any way to avoid brute-force initialization of the memory? Here is one way to avoid it and trick the user into believing as if the memory has been initialized without actually doing it.

1. If the user asks to read the value at specific address, you can first check whether that address has ever been written to since allocation of the memory (how? read on). If so, read the value at that address and return as requested. If not, don't even bother reading the value at that address---it will be just some garbage anyway as it is not initialized---and return what it should have been initialized to.

2. If the user asks to write some value at specific address, again check whether the address has been ever written to (how? read on). If so, just write out the value as requested to that address. If not, mark that location for "has been written" (how? read on) and write out the value as requested.

OK, you would agree that the scenario above makes sense, but there is one problem. How would you implement a method to mark which memory addresses have been written to?

The simplest method I can think of is to maintain a Boolean table for each address and mark, say 0 for not-written and 1 for written. Well, one problem with this method is that it requires initialization on its own, as you would have to start off with all-zero table, so it kind of defeats the whole purpose.

The second method is to store all the addresses where data have been written to in a hash table. The thing is that we do not know a priori the specific patterns of write addresses, so we may need to rehash often to spread out.

Finally, here is the alternative method to those mentioned above. I do not claim that this method is the best, as each implementation has its own pros and cons. I thought that this implementation is very clever that I should share.

Say the user asks for memory allocation for n elements with initialization. We then allocate (but not initialize) two more chunks of memory for n unsigned integers for indexes. The first will be called P, which will keep track of the number of indices thus far written to, which is maintained by a counter, say t. Another array will be called S, which will hold the index of write operation.

For each first-time write to index i, we write the counter value t into P[i] and write the index value i into S[P[i]], and increment the counter. To determine whether memory at index j has been written, first examine P[j] to see if it is between 0 to t-1. If so, it is likely that it has been written already. To verify with certainty, we double check to see if S[P[j]] is indeed j. If this holds, then it must be the case that this specific address has been already written. Otherwise, the address has not been written yet.

For better understanding, here is the actual code:


template <typename T>
class LazyArray {
private:
  unsigned size;
  T* A;
  unsigned *P;
  unsigned *S;
  unsigned t;
  T initVal;
  bool isWritten(unsigned idx) {
    if (P[idx] >= 0 && P[idx] < t)
      if (S[P[idx]] == idx)
        return true;
    return false;
  }

public:
  LazyArray(unsigned n, T initVal) {
    size = n;
    A = new T[n];
    P = new unsigned[2*n];
    S = &P[n];
    this->initVal = initVal;
    t = 0;
  }
  LazyArray(LazyArray &that) {
  }
  LazyArray& operator= (LazyArray& that) {
  }
  ~LazyArray() {
    delete[] A;
    delete[] P;
  }

  bool read(unsigned idx, T *pval) {
    if (idx<0 || idx>=size)
      return false;
    if (isWritten(idx))
      *pval = A[idx];
    else
      *pval = initVal;
    return true;
  }

  bool write(unsigned idx, T val) {
    if (idx<0 || idx>=size)
      return false;
    A[idx] = val;
    if (!isWritten(idx)) {
      P[idx] = t++;
      S[P[idx]] = idx;
    }
    return true;
  }

};


No comments:

Post a Comment