Image transformation with Python

Updated: Nov 20, 2020

Digital transformation can mean many things to many people. In this post, I avoid the bigger question and demonstrate how we can transform an image to provide more of a bird's eye view. All we need is Python, OpenCV and a little trial and error. Sometimes it's good to see things from a different perspective.

The challenge

My aim is to provide a way to create orthogonal images or basically to 'square up' images that have been taken from a different perspective. I took the first image (above) of a dummy credit card as a test case. I wanted a simple image to start with that has clearly defined boundaries. The text on the card provides a way to assess how well the overall transformation works.

The use case is largely for whiteboard images so a 2D image should not constrain us too much. There are commercial applications that will do this for you - but where's the fun in that! This prototype uses OpenCV for processing images.

OpenCV is an open source computer vision and machine learning software library. It is freely downloadable and has many features that will help with this challenge. The final image (pictured 6 above) is a result only of applying the Python code - with no further post-processing.

The entire application is a little less than 80 lines of code - with many of these lines added purely to show what's happening. Without the narrative, it should be possible to reduce this considerably. As an initial attempt, the outcomes is reasonably accurate with the smaller text seeming at least as readable in the final image.

The processing workflow

Figure 1 shows the basic steps to transform the image of a credit card without technical details. The full code is available on GitHub and suggestions are welcome as always.

Step 1: Loading your image of choice

The load-image function allows us to read an image into memory and resize it as necessary.

def load_image(image_path, width, height):
    Loads an image from the specified file and changes its size.
    :param image_path: The relative path & name of the image file.
    :param width: The required width of the image.
    :param height: The required height of the image.
    :return: A re-sized object representing the image.
    img = cv2.imread(image_path)
    return cv2.resize(img, (width, height))

# Load an image and resize it.
test_image = load_image('card3.jpg', 800, 600)

Step 2: Creating an image mask

The OpenCV threshold function allows us to create a mask for our image. The COLOR_BGR2GRAY constant is one of many colour converters that OpenCV supplies.

def create_binary_image(image):
    Creates a binary threshold of the image by
    classifying each pixel as either black or white.
    :param image: The image to be processed.
    :return: A classified image.
    new_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, new_image = cv2.threshold(new_image, 225, 255, cv2.THRESH_BINARY_INV)
    return new_image

threshold_image = create_binary_image(image_name)

The cv2.THRESH_BINARY_INV constant determines how pixels are processed. For the binary constants, if the pixel value is smaller than the threshold, it is set to 0, otherwise it is set to a maximum value. The thresholding process you apply will depend on the nature of your image.

Step 3: Adding contours

A contour is essentially a curve joining all the continuous points (along a boundary), that have the same color or intensity. Contouring provides a useful tool for shape analysis and object detection. We are interested only in the object's boundary and our binary image (created in step 2) should work ok.

# Get an array or contours from the binary image we previously created.
contours, _ = cv2.findContours(threshold_image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours_restricted = sorted(contours, key=cv2.contourArea, reverse=True)[0]

the code above creates contours for our binary image, the cv2.CHAIN_APPROX_SIMPLE parameter removes redundant points thereby requiring less memory.

Step 4: Finding the corners

This step uses something called the Ramer–Douglas–Peucker algorithm to find the corners of the object in question. It approximates a contour shape to another shape with less vertices depending upon the precision (epsilon) we specify. If we choose a good value for epsilon, it should work reasonably well.

def get_start_image_corners(source_image, image_contours):
    Returns the start coordinates for the corners of the image.
    :param source_image:
    :param image_contours:
    epsilon = 0.05 * cv2.arcLength(image_contours, True)
    approx_corners = cv2.approxPolyDP(image_contours, epsilon, True)
    approx_corners = sorted(np.concatenate(approx_corners).tolist())

There is no need to fully understand this algorithm, a value of epsilon equating to 5% of the object perimeter works well enough.

Step 5: Transform the image

In this step, the corner coordinates for the transformed image are calculated before passing them to OpenCV to do all the heavy lifting. The code uses pythagoras' theorem to calculate the width and height of the transformed image. It then finds the coordinates of each corner by using (0, 0) as the top left hand corner and offsetting the other corners appropriately. In reality, I added an arbirtary offset to effectively move the final image from the edge of its canvas.

def transform_image(source_image, start_pos, end_pos):
    This is where the magic happens. OpenCV provides the 2 functions required to
    transform the image once the start and end corner coordinates are provided.
    :param source_image:
    :param start_pos:
    :param end_pos:
    :return: The transformed image.
    h, w = source_image.shape[:2]
    homography, _ = cv2.findHomography(start_pos, end_pos, method=cv2.RANSAC, ransacReprojThreshold=3.0)
    transform = cv2.warpPerspective(source_image, homography, (w, h), flags=cv2.INTER_LINEAR)
    return transform

There are some simple utility functions that help to calculate coordinates and dimensions.

A couple other tests

I snapped another couple of images from my office to see how well the code handled them in terms of comparing the before and after versions. First was the picture on my wall.

A much harder test was my whiteboard - it's actually much more of a textured rock board as you can see below. This is actually two boards side by side which leads to more of a skew in the original picture. And the lighting and reflections all played their part.

I was still reasonably pleased with the outcome given the constraints and the limited time spent with the code.

Closing thoughts

The code shown here will hopefully point you in the right direction if you are interested in similar challenges. It is not perfect and there are many changes you can make to improve the way it works. I am thinking of extending my Sudoku solution to read images from a puzzle book and the code here will certainly help. It won't handle the OCR to convert parts of the image to digital numbers but that's a challenge for another day.

I'm sure it would not be too hard to fool the code presented here. Still, this is meant as a guide to what is possible rather than a production strength solution. Also, the code peforms much better with a flat background and I admit to blanking out the space behind my whiteboard before letting the algorithm loose on it. Pre-processing was not a goal of this exercise.

If you are looking at a larger digital transformation within your business then I'd be delighted to help. Please feel free to get in touch. My company, Objectivity, has been helping our clients for almost 30 years to derive business value from technology.

Do you have a problem that you need a little help with? Please click below to find out more.