Hacking images for your CNN

Convolutional neural networks (CNNs) are the state-of-the-art models for image classification, and when you get started with them, you usually create models using some famous data set like MNIST or CIFAR.  If you want to create a model doing something novel, you’ll actually have to collect your own data (if you’re lucky you can scrape), and often these images won’t be nicely in the same compression format or be the same size.  This post talks about one process I used to deal with the dilemma for a project a was doing in with Python and TensorFlow:

  1. Collect data
  2. Organize data
  3. Import data near the model
  4. Pad or scale
  5. Crop
  6. Grey*
  7. Get tensor
  8. Add color channels*
  9. Batch generator

A lot of the techniques I used were inspired by Alex Krizhevsky’s implementation of AlexNet.

AlexNet homage

Originally, I attempted to use TensorFlow’s built in functions to go through the process.  I didn’t have any luck, and my Python kernel kept crashing which is why I went with PIL.  This method is probably slower than what TensorFlow offers, so keep that in mind when implementing your own models.

Collect images

Siraj Raval showed a neat Chrome extension called Futkun Batch Download Images that allows you do a Google image search and collect a decent number of return images.  It turns out that all the images were jpeg so this took care of the formatting issue.  Also, it downloads all the images in a folder which will be useful in the next step.  Don’t worry if the number of images is not the same across classes (cropping will allow us to amplify our dataset).

Organize data

It’ll be good to keep all images of a specific class in one directory; you can name each sub-directory the name of the class it is representing.  You also might want to split the data into a test and train set; I did this by hand.  So you should have two directories, one for train and one for test, each possessing sub-directories for each class.

The train directory holds 13 directories (one folder for each class), where each class is a brand own by Expedia, Inc., and inside each brand directory are the jpeg images of that brand’s logo

Import data near the model

I was using Python for my project, so the next step will be to figure out a way to access the images through Python.  This was where I diverged from the TensorFlow method, and I’m curious to know how much less efficient this method is in comparison (possible future post). We’ll use glob to meander through the directories and PIL to import images (we’ll also use NumPy and collections so just import them now).

from PIL import Image
import glob
import numpy as np
import collections

We’ll need to create a few helper functions  and variables before we get to the meat of this section.  To start, create a helper function that will get the class of the image given the image’s directory’s path.

def getClass(path):
    return path.split('/')[-1]

Create a dictionary that maps the class to the class directory.

lab2path = {}
for pat in glob.glob('Data/Brands/train/*'):
    lab2path[getClass(pat)] = pat

Get a list of the classes from that dictionary’s keys.

classes = lab2path.keys()

Create a mapping from integers to class.

int2class ={}
for i,c in enumerate(classes):

Now that we have some tools to work with, let’s create a function that returns PIL images from one class.

#given a class let it randomly select num_samples images
def getIMsamples(lab,num_samples = 1):
    class_folder = lab2path[lab]

    result = []

    image_files = []
    for f in glob.glob(class_folder+'/*'):

    samples = np.random.choice(image_files,size=num_samples)
    for s in samples:
        im = Image.open(s)
    return result

Pad or scale

Not all images are the same size, but they better be if you’re going to feed them into TensorFlow. What makes matters more complicated is that certain images might be tall-or wide-rectangles, and the long part of the rectangle might be on the same order of size as the rest of the images’ heights and widths.  For example, let’s say that most of the images in our dataset are around 250×250 pixels and  we have an image that is 150×250 pixels (a wide rectangle).  Let’s also say that we require all images going into our model to be 200×200 pixels; we can ensure this with cropping as long as the images are at least that big.

If we stretch the image out so it’s 200×250 this may cause some undesired distortion.

Screen Shot 2017-02-17 at 12.48.28 AM.png
The left cat has been stretched and is distorted

Instead, we could simply pad a border (of zero valued pixels) on the top and bottom of the image until the newly padded image meets the minimum size requirements

Padding of two pixels on the top, bottom, left, and right of an image

This won’t distort the image and will only add a 25 pixel tall band to the top and bottom.

#maybe pad
def zeroPad(im,size):
    o_size = im.size

    #check width
    if size[0]>o_size[0]:
        new_im = Image.new("RGB", (size[0],o_size[1]))
        new_im.paste(im, ((size[0]-o_size[0])/2,
        new_im = im

    o_size = new_im.size

    #check height
    if size[1]>o_size[1]:
        new_im2 = Image.new("RGB",(o_size[0],size[1]))
        new_im2 = new_im

    return new_im2

It’s unclear whether the stretching or padding should be implemented.  Maybe a combination of both is preferable.  Maybe neither should be implemented, and all the images should be around the same size beforehand.  This definitely requires more research.  For my project, I just stuck with padding.


Screen Shot 2017-02-17 at 12.40.06 AM.png

The easiest way to ensure that all the images are the same size is with cropping.  We’ll also be able to generate lots of data by doing random cropping.  For example, if we have a 250×250 pixel image and we want a 100×100 pixel crop, we can get (250-100)*(250-100) = 150^2 unique crops from that single image!

#given an image let it randomly crop to a fixed size
#size = [width,height]
def cropIM(im,size=[93,37]):
    w_max,h_max = w-size[0],h-size[1]
    return im.crop(box=(l,d,l+size[0],d+size[1]))



If you don’t want to use color images in your model, it’s best to turn the image into a grey-scale image now.  If you choose this option, you won’t need to add color channels.

#given an image let it greyscale
def toGrey(im):
    return im.convert('L')

Get tensor

Screen Shot 2017-02-17 at 12.19.09 AM.png

The next step is turn the image from a PIL object to a tensor.  This can simply be done using NumPy.

#given an image let it numpy
 def toArray(im):
     return np.array(im)

Add color channels*


(If you chose to grey-scale your images skip this step.)

If you examine the shapes of all your images, you’ll probably notice that not all of them have three dimensions.

I’m not sure why this happens (it probably has something to do with jpeg compression), but you’ll have to ensure that your tensor has three dimensions and that the last dimension has three channels if you want to take advantage of your color images.  To do this, create a 3D tensor and copy the 2D tensor into the 3 channels; this distributes pixel intensities evenly across all three color channels.

#If the image is not in color, fix that by evenly distributing across channels
def addChannels(a):
    if len(a.shape) != 3:
        td = a/3.0
        result = np.ndarray(shape=(td.shape[0],td.shape[1],3))
        for i in range(3):
            result[:,:,i] = td
        return result
        return a

Batch Generator

Now that we have all the tools to get images into the format we wanted them in, let’s create a function that generates batches of images and their labels.  First, let’s create a function that gets some number of transformed images from a specific class.

def getSamples(lab,k=10,c_size=[100,100],grey=False):
    if grey == False:
        result = np.ndarray(shape=[k,c_size[1],c_size[0],3])
        result = np.ndarray(shape=[k,c_size[1],c_size[0]])
    imgs = getIMsamples(lab,num_samples=k)
    for i,img in enumerate(imgs):
        if grey:
            result[i,:,:] = toArray(toGrey(cropIM(img)))
            img = toArray(cropIM(zeroPad(img,c_size),c_size))
            img = addChannels(img)
            result[i,:,:,:] = img
    return result

Next, let’s create a the function that creates batches of the images.

def get_batch(batch_size,c_size=[100,100]):
    counts = collections.Counter(distr)
    for k in counts.keys():
        lab = int2class[k]
        ry = np.ones(counts[k])*k

        if not 'X' in vars():
            X = rx
            y = ry
            X = np.concatenate((X,rx),axis=0)
            y = np.concatenate((y,ry),axis=0)
    return X,y

The last thing to do is to create a function the takes the labels from get_batch and one-hot encodes them.

def onehot(y):
    result = np.ndarray(shape=[len(y),len(classes)])
    for i,x in enumerate(y):
        result[i]=(np.arange(len(classes)) == x).astype(np.float32)
    return result

BAM.  You’re ready to send your images into a TensorFlow model.  This was only one process to aid in image processing before being fed into a model.  It is probably not the best method, but it will allow you to get started and do experimentation of your own.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s