Updated

We want to write a function that takes a non-empty array of distinct integers and an integer representing a target sum. If any two numbers in the input array sum up to the target sum, the function should return them in an array, in any order. If no two numbers sum up to the target sum, the process should return an empty array.

Intended Audience

This article teaches you how to approach and solve the two summation challenge. Therefore, you should be familiar with some basic understanding of programming constructs. And also, we’d be covering the time and space complexity of this algorithm.


First Approach

My first approach would be to go through the array of integers in a brute force manner. Suppose we have an array of numbers [ 1, 2, 3 ]. We need to figure out all the two-element combinations it can have. If we think about it, we would probably end up doing something like this:

Array combination yields the set of pairs (1,2), (1,3), (2,3)
Figure 1: Array combination yields { (1,2),(1,3),(2,3) }\{\ (1,2), (1,3), (2,3) \ \}

Conceptually in this approach, we try to achieve a reducing set of combinations for two numbers and do some calculation with it. If we can align this approach as a solution to our challenge statement, we can write a brute force algorithm like the following:

  • An outer loop which goes through each of element until n1n-1
  • An inner loop which goes through n+1n+1
  • A condition to check the summation

Program Input — Say we have an array [ -1, 5, -4, 8, 7, 1, 3, 11 ] and a target sum 14. Now let’s transform above steps into a pseudocode.

function solve(array, targetSum) {
  for (let x = 0; x < array.length - 1; x++) {
    for (let y = x + 1; y < array.length; y++) {
      if (array[x] + array[y] === targetSum) {
        return [array[x], array[y]];
      }
    }
  }

  return [];
}

solve([-1, 5, -4, 8, 7, 1, 3, 11], 14) // [3, 11]

If we were to execute this algorithm, what are the different combinations we go through? Let’s write down the iterations and their respective combinations manually.

IterationChecked Combinations
1(-1,5), (-1,-4), (-1,8), (-1,7), (-1,1), (-1,3), (-1,11)
2(5,-4), (5,8), (5,7), (5,1), (5,3), (5,11)
3(-4,8), (-4,7), (-4,1), (-4,3), (-4,11)
4(8,7), (8,1), (8,3), (8,11)
5(7,1), (7,3), (7,11)
6(1,3), (1,11)
7(3,11)

Did you see that? In worst-case scenario we had to evaluate 2828 pairs and in the 77th iteration we found our matching number pair (3,11)(3, 11). However, it’s not the same always. It will change based on the indices and therefore combinations will be lesser if we break out of the loop after a successful match.

Combinations

In mathematics, combinations are the number of possible arrangements in a collection of items where the order of the selection does not matter.

What we just did is very much similar to enumerative combinatorics. You can see that our approach is awfully inefficient in doing that. There are techniques that one could use to optimize combination problems. But simply for our first approach, it is not needed at all. Instead, I’m leaving this here because this is just one way we could think of the problem.

If you are interested in learning combinatorics, you can refer to Sal’s explanation on Khan Academy. If you’re super serious about it, I’d recommend you read Concrete Mathematics and Enumerative Combinatorics books to solidify your knowledge in it.

Now let’s do a quick analysis of our first solution.

Problem Analysis

While this works pretty well for smaller arrays, it won’t scale nicely for larger arrays. The algorithm we just wrote is pretty slow.

Time Complexity — So, to reiterate, we have an outer loop that goes through the length(array)1\text{length}(\text{array}) - 1 and an inner loop that goes until length(array)\text{length}(\text{array}) to check the combination of two numbers. This means that we would need to iterate over n×nn \times n.

(n2)=x=0n1 (y=x+1nf(x, y))\large{{n \choose 2} = \sum_{x = 0}^{n - 1} \ \Biggl( \sum_{y = x+1}^{n} f(x,\ y) \Biggr)}

In worst-case scenario, we do (n2){n \choose 2}1 iterations just to check whether we have a matching pair.

This algorithm has a polynomial time complexity where it grows proportionally to the square of the input size nn. Unfortunately our algorithm takes O(n2)O(n^2) time to complete.

Space Complexity — We only use two for loops which involves constant variables as indexes xx and yy. And also, we don’t use any extra space. So, our algorithm’s space complexity will always be O(1)O(1).

We know that first approach is bad but can we improve the algorithm and make it a bit faster? What happens if we first sort the array?


Second Approach

There’s a second way of solving this problem. And it’s slightly better than the first one. Initially, in the challenge statement, I didn’t mention whether the array is sorted or not. So, what if we sort the array first in ascending order and then figure out a way to solve this?

Program Input — Say we are given a new array [ -4, 13, 1, 3, 5, 6, -1, 11 ] and a target sum of 10. Let’s use these inputs for our second approach.

First Operation

First, we have to sort the array in ascending order2. In order for the algorithm to work this must be done and then only we can continue.

Sorting the array in ascending order
Figure 2: Two summation approach 2 step 1 — sorting the array in ascending order

Second Operation

Then we can allocate two pointers from left and right to walk through the elements n1n - 1 times and operate on these two numbers.

Allocating pointers
Figure 3: Set xx to index 00 and yy to the last index n1n - 1

This way we can solve the problem more optimally instead of using two for loops. With a reasonable sorting algorithm like mergesort or quicksort we could sort the array in n log(n)n \ \log(n) time. But remember we still have to walk through our n1n - 1 times which is equivalent to O(n)O(n).

Third Operation (doing the summation)

So far, now we know the array must be sorted first, and we need two pointers to compare. The core logic of this approach is to drive algorithm’s state using three predicates. We need to check whether the sum of A+BA + B:

  1. Is it equal to target sum?
  2. Is it less than target sum?
  3. Is it greater than target sum?

Let’s try to write down the algorithm. Remember that, up to this point, we assume that we have already sorted the array and allocated the two pointers. Now it is time to evaluate the above conditions against each pair in every nnth iteration.

Our loop starts from x=0x = 0 and y=7y = 7. At this point our xx‘s element is 4-4 and yy‘s element is 1313 (see figure 4). If we add up those two numbers together, we get a total of 99 which is less than our target sum 1010. In this case we move the xx pointer to the right side. Basically, incrementing xx‘s index by 11. That way we can guarantee that in next iteration we would always get a sum >9> 9.

First iteration
Figure 4: Two summation approach 2 step 3 (loop) — first iteration

Alright, in the last iteration we moved xx by 11 and now we are at x=1x = 1 and y=7y = 7 (see figure 5). Once again, if we sum up 1-1 and 1313 we get a total of 1212. Now, this is larger than our expected target sum. In this case we move the yy pointer to left side. Saying that we want to decrement our yy pointer by 11.

Second iteration
Figure 5: Two summation approach 2 step 3 (loop) — second iteration

Got the point? We do this iteratively until matching the target sum or until xx and yy meets together at the same index.

Third iteration
Figure 6: Two summation approach 2 step 3 (loop) — third iteration

Well, would you look at that? We reduced the number of iterations we have to go through! We have accomplished significant progress in making this algorithm a bit faster. Now that we found our number pair, we can finally return the result and halt the algorithm. Let’s write the JavaScript code for this algorithm now.

function solve(array, targetSum) {
  const sorted = array.sort((a, b) => a - b);
  let x = 0;
  let y = sorted.length - 1;

  while (x < y) {
    if (sorted[x] + sorted[y] === targetSum) {
      return [sorted[x], sorted[y]];
    } else if (sorted[x] + sorted[y] < targetSum) {
      x++;
    } else {
      y--;
    }
  }

  return [];
}

While this approach is slightly better than the first, we are back to square one. Why? Well, it’s the same reason as before; it does not scale well enough for larger arrays. Let me show you the problem.

Problem Analysis

First, we need to understand the cost of sorting things. Browsers have implemented stable and well optimized algorithms for Array.sort(). But assume that the sort function chose mergesort as our sorting algorithm.

It is guaranteed that mergesort runs in O(n log(n))O(n \ \log(n)) time using O(n)O(n) space. And since we only used one while loop to drive the state of xx and yy pointers, we get O(n log(n))+O(n)O(n \ \log(n)) + O(n) which is equivalent to O(n log(n))O(n \ \log(n)) in total time3 and O(n)O(n) for space.

Big O time complexity graph
Figure 7: This graph is borrowed from the famous big o cheatsheet website by Eric Rowell

The algorithm we wrote runs in linearithmic time which tells us its complexity grows proportionally to the array input size with a logarithmic factor. What can we do about it? Can we solve it in linear time?


Dynamic Approach

Up until now, all the approaches we have taken is not very optimal from a time standpoint. Fortunately, there’s one other way of solving this problem in a much cooler way. You might have thought about this already from the previous approach. But first, let’s list down the things we already know:

  • We know what our target sum is (let it be zz).
  • We already know one of our addends4 (let it be xx).

So, basically we have two variables at hand before even doing any operations. So, we could write an equation like x+y=zx + y = z to represent it (where yy is unknown). Where we can isolate the unknown variable. Say for example, (x+y)=z(x + y) = z     \iff y=(zx)y = (z - x)

Inverse Operations

To isolate a variable, we cancel out (or undo) operations on the same side of the equation as the variable of interest while maintaining the equality of the equation.

This can be done by performing inverse operations on the terms that need to be removed so that the variable of interest is isolated. Subtraction cancels out addition, and vice versa, and multiplication cancels out division, and vice versa.

Now we can find the unknown variable yy without any combinations or two pointers. The only caveat is that we need a way to memorize this calculated value as we go through the array.

Solution

Using some extra space is okay as long as its complexity grows in a reasonable size. Now what do you think? For our solution should we use a hashmap? What about a set?

You’d see a lot of examples of two summation problem’s dynamic approach in the internet uses a hashmap auxiliary space implementation. But we really do not need key-value pairs for our solution. Instead, simply we can use a set of numbers to track the inversion results.

Formal Proof

R={iASAi}R = \{ i \in A \mid S - A_i \} iA,  P(i): (AiR AiR)\therefore \exists i \in A, \ \ P(i): \ (A_i \in R \ \lor A_i \notin R)

There exists ii such that may or may not be a member of RR.

Elaboration

We need a loop i=0n1\sum_{i=0}^{n - 1} that goes through each element of the array starting from index 00. We need to calculate the inverse set within the loop5 so, we create an empty set R=R = \emptyset and then for each element we calculate y=zAxy = z - A_x6 and now we can place a predicate that checks AxA_x existence in inverse set RR like P(x):(AxR)P(x): (A_x \in R) then return {Ax,y}\{ A_x, y \} if P(x)P(x) is true. Otherwise, ¬P(x)\neg P(x) we union our inverse set with the calculated yy value where RR {y}R \gets R \ \cup \{ y \} and we keep on looping until n1n - 1.

Switching to New Inputs — For this approach let’s use the array [ -7, -5, -3, -1, 0, 1, 3, 5, 7 ] and the target sum -5.

First iteration
Figure 8: First iteration of two summation dynamic approach

As illustrated, in the first iteration we start off with an empty set named RR. Initially our loop starts from index 00 where ii is the index variable. In the first iteration we don’t have any elements. So, therefore we immediately add the calculated value 2=5(7)2 = -5 - (-7) to the set RR and move on to the next element.

Second iteration
Figure 9: Second iteration of two summation dynamic approach

In the second iteration, first we check whether element at index 11 is an element of RR. We can see that 5R-5 \notin R so, we add our inverse calculation 0=5(5)0 = -5 - (-5) to set RR and continue.

Third iteration
Figure 10: Third iteration of two summation dynamic approach

In the third iteration, again we check whether element at index 22 is an element of RR. We can see that 3R-3 \notin R so, we do our inverse calculation 2=5(3)-2 = -5 - (-3) and add it to set RR and continue.

Fourth iteration
Figure 11: Fourth iteration of two summation dynamic approach

Fourth iteration already! Again we check whether element at index 33 is an element of RR. We can see that 1R-1 \notin R so, we do our inverse calculation 4=5(1)-4 = -5 - (-1) and add it to set RR and continue.

Fifth iteration
Figure 12: Fifth iteration of two summation dynamic approach

We are in the fifth iteration! And would you look at that! We just found 00 in our set RR. This means our inverse got a match! Now we can return these two elements like {Ax,5Ax}\{ A_x, -5 - A_x \} where AxA_x is 00.

Woohoo now we have an idea on how it works, let’s write the pseudocode.

function solve(array, targetSum) {
  const records = new Set();

  for (let x = 0; x < array.length; x++) {
    let y = targetSum - array[x];
    if (records.has(array[x])) {
      return [array[x], y];
    }
    records.add(y);
  }

  return [];
}

Time & Space Complexity

In this approach we solely rely on dynamic programming techniques. And we were able to solve it in O(n)O(n) time and O(n)O(n) space complexity. This is the optimal way of solving this problem.


Summary

Overall, I think even though two summation is a very easy challenge, we can learn a lot from it. How simple algebraic equations can help to solve complex problems more elegantly.

Until next time. Thanks for reading!

Footnotes

  1. Binomial derived from the combinations formula n!r!(nr)!\frac{n!}{r! \left ( n-r \right )!} — we can say that nn is the number of elements we have to check, and we have to choose rr of them without any order to check the summation.

  2. You can do it either way. We can sort in descending or in ascending order. But we have to make sure we swap the comparing indexes along with the conditions. But for now let’s stick to ascending order to make things simpler.

  3. Remember, in time and space complexity analysis we always pick the dominant term.

  4. What I’m referring to is the current element while walking through the array.

  5. We could also do a pre-computation of the inverse set before the loop. Since it does not affect the running time of the algorithm, it’s up to you to decide!

  6. xAx \in A

Well, now what?

You can navigate to more writings from here. Connect with me on LinkedIn for a chat.