Big O of String Concatenation

While studying Cracking the Code Interview book I came across a mathematical formula for Big O Notation when it describing the Big O Notation of String Concatenation in Java. Many of the principles in programming have mathematical backgrounds such as Big O Notation, methods, binary and the underlying science behind how programming languages work. This article is a guide for understanding the Big O Notation of string concatenation in Java. It contains excerpts from Cracking the Code Interview Book (Chapter 1).

Code for String Concatenation

The above code in Java describes a function that takes in array of string words (String [] words) as an argument to the function. The code block within the brackets {}indicates what we will do with those words. The first part indicates that we will create a variable named sentence that contains strings. The “” indicates that the variable starts off as an empty string. The for loop indicates that we will loop through array of words (each word is called a w (String w) and it is pushed into the sentence variable by adding it to the empty string sentence using the + w (sentence = sentence + w). Once the loop is finished going through the array it will return the sentence (return sentence part of code) with all of the words. In the Cracking the Code interview book Gayle describes the process for how each string is concatenated.

For those not familiar with Java, strings are considered immutable data types. A good article on what this means can be found here. When we concatenate a string a copy of the actual string is created and each character of the string is copied over. When looking at the Big O Notation which described the worst case scenario for the runtime of this code the Big O is described as O(xn²). If you are unfamiliar with Big O notation a great article can be found here. This is the part that puzzled me as I did not understand how x + 2x +… +nx leads to a Big O Notation of O(xn²). As someone who likes to deep dive into the mathematical principles of programming I decided to deep dive into mathematics behind this Big O. Researching this big O notation led me to a math website that described the Triangular Number Sequence. The images are below are taken from this website. The triangular number sequence describes the number of dots in a triangle.

We start off with one dot and keep adding rows to create the triangle. The first triangle with two rows has 3 dots in total because we added two more dots to the second row. When we move on to the next triangle we add a third row with 3 dots and the total number of dots equals to six. If you look carefully we can find a pattern related to counting the number of dots in a triangle. If you work backwards you see a pattern emerging for quickly finding the number of dots. If you have 4 rows then you can add the total number of dots by counting backwards. 4 + 3 +2 +1 = 10 dots. This rule for finding the dots applies to any number provided. For example, if you want to find the total number of dots a triangle with a base of 5 you will simply follow the same rule (5 + 4 + 3 + 2 +1) = 15 dots. Lets deep dive even further and figure out the mathematical formula calculating the number of dots for any number given such as 5 or 4 or 1,000,000.

N represents the number of dots we start off with (green dots). We double the dots with new dots (yellow dots) to form a rectangle. n becomes the length of the rectangle and the width will always be n +1. For example, if n is 3(length) then the width will be n +1 or (4). In order to find the area of the rectangular we multiple n * n +1. If we continue with the example we will get 4 * 3 or 12 as the area. However, we are concerned with the green dots only as the yellow dots were only used to find the area of the rectangle. In order to find only the total number of green dots we will divide n * n +1 by 1/2. This leads us to finding the total number of green dots as 6 for n = 3. This also lines up with the sum rule of 3 + 2 +1 or 6. We can use this rule now for calculating the total number of dots for any number using this formula. For example, if n was 60 you can use the formula to calculate that there will be 1830 dots.

This mathematical formula is related to string concatenation because on each concatenation a copy of the string is created and the strings are copied over. We can think of each green dot representing a string. If we add each concatenation then it follows as x +2x + 3x which follows the triangular number sequence formula. In Big O Notation we are concerned with input size and how the function performs with bigger input sizes. For concatenating a string no matter how big the array size the Big O will always follow this formula. Big O looks at the total number of operations or steps based on the input size. If the array was one million strings or just ten the Big O will follow this formula. In Big O notation we drop the lower order values so n(n +1)/2 is equal to (n² +n )/2 or just n². Which leads to string concatenation having a Big O of O(n²). In Java in order to avoid this situation with string concatenation you can use something called StringBuilder. An article on this can be found here.

I hope this article helped you to understand this mathematical concept when it comes to understanding Big O of string concatenation. I highly recommend bookmarking the websites listed in this article and buying the go to book for understanding coding interview questions “Cracking the Coding Interview Book”. (No I am not a paid sponsor for this book, I just love learning from it!!)

Aspiring Software Engineer and Gamer.