So, we all know how to generate arbitrary length K combinations from an arbitrary amount of different characters belonging to a set S. Here is a particularly simple algorithm. Something that is perfectly fine to use in a prototype.

const char data[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$"; const int MAX_LEVEL = 5; const int MAX_INDEX = 66; void go_deep(char *string, int level) { if (level == MAX_LEVEL) { // print or store return; } for (int i = 0; i < MAX_INDEX; i++) { string[level] = data[i]; go_deep(string, level+1); } }

Now that we have a way to generate all the combinations, what do we do with this? Store the combinations in memory? Store them in a file? If you do the math, you'll realize that even using small lengths for our strings results in several GBs of data. Clearly we need to serialize the production of each combination. And it would also be useful if we could stop and resume generating combinations whenever we want. That pretty much rules out any recursion based algorithm like the one above. Aha ! But we've all learnt that using a stack is the key to eliminating recursion. Yes, that is true, however using a simple stack and transforming call points into stack operations gains us nothing. The whole point is to avoid storing massive amounts of data. Looking further into our algorithm, we can re-imagine the sample space as a search space.

root ____(*)____ / | \ / | \ / | \ a b c ... / | \ / | \ / | \ a b c... a b c... a b c.. .. ..

So, trivially, what we're looking to do is implement DFS where each leaf node represents a unique combination. Another observation that we can make here is that the number of children and their order are fixed.

Using the algorithm above and applying it to this search space - lets note some key observations:

1. We need to maintain a 'bread crumb' history of the index 'i' whenever we travel down a level.

2. We don't need to store any characters at all. Given an index and the depth level, it is trivial to reconstruct its parent nodes right
up to root given a history of indexes.

That's all we need ! Here is a simplified algorithm that does accomplishes all our goals.

char data[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$"; const int MAX_LEVEL = 5; const int MAX_CHARS = 66; struct State { int *stk; // history of indexes from 0 to MAX_LEVEL for current state int lvl; // current level int idx; // current index State(int *_stk, int _lvl, int _idx) : stk(_stk), lvl(_lvl), idx(_idx) { } }; bool is_leaf(State *s) { return (s->lvl == MAX_LEVEL); } bool dfs_step_from(State *s) { if(s->idx < MAX_CHARS) { s->stk[s->lvl] = s->idx; // Save current, index before going 'down' s->idx=0; // Start a-fresh at child 0 for the level below us s->lvl++; // We're onto the next Level.. return true; // return true - we are still within the data range } else { return false; // return false. We're through all the current 'children', so go back up } } bool dfs_step_back(State *s) { s->lvl--; // Previous level s->idx = s->stk[s->lvl]; // get index from our stack s->idx++; // We've returned from exhausting the level below child N, move to next child N+1 if (s->idx >= MAX_CHARS && s->lvl == 0) return true; // We're at root **AND** no more children to process else return false; } void print_state(State *s) { for (int i = 0; i < MAX_LEVEL; i++) { printf("%c", data[s->stk[i]]); } printf("\n"); } void one_combination(State *s) { while(1) { if (!is_leaf(s)) { if(!dfs_step_from(s)) { if (dfs_step_back(s)) { //Set 'all done' flag break; } } } else { print_state(s); // do something with generated combination if (dfs_step_back(s)) { //Set 'all done' flag break; } break; } } } void go() { int indexes[MAX_LEVEL] = {0}; State *s = new State(indexes, 0, 0); one_combination(s); }

Let us look at what we have accomplished. The memory footprint of this simple design is tiny. We have eliminated recursion. We have used a fairly small data structure to store our execution state. We've architected the algorithm so that given any state we can begin execution from that point onwards. We pass around the *same* state object which means execution state can be stored at any time and reused later. So what can we do with this? As a simple design - We call our combination generator (using saved state) any time we need a few combinations pumped into a queue. At the other end a dispatcher sends each combination to a separate thread to process. A sophisticated approach can involve culling the search space into multiple non-overlapping areas and distribute the work over several compute resources.

Performance wise, there are some low hanging fruits. Most of the functions can be inlined and tweaked to reduce branching. The stack DS while small can be reduced further by replacing the integers with smaller data types since indexes wont be getting too large.