Using Custom Datasets with Tensorflow's Object Detection API

I've been working on a project for work recently involving tensorflow and up to this point I've been using the pet detector tutorial and code to create a setup that I can use to train any pretrained model I want to detect things, but now has come the time to train a custom made dataset of the things work has asked me to detect and I ran into issues with the posts I made before about making custom datasets, because apparently create_pet_tfrecord.py requires that you have trimaps for your dataset, even if you don't use them, and even with a little if/else bypassing, it wasn't making much of a difference because the problem is much more "ingrained" and I had imagined, so I had to look elsewhere, and thus here I am!

Setup
Finding Images
Annotating the Images
Converting to TFRecords for Training

Setup

If you haven't been following along with my series, then you may not know the directory structure of the project I'm working on, but if you have it should be very similar, but it might have changed, it should look something like this

[my workspace] + images + raw + train + test + annotations + xmls + data + models + model

I've stored all of the images that are in the dataset in the images/raw directory, each '+' is a folder, and indented folders are subfolders of the parent that they are indented for. Whenever I mention running a script and don't mention where to run it from, I'm talking about the [my workspace] directory. All folders are subfolders of the [my workspace] folder.

If you're new to tensorflow and don't really know how to setup a model, but know that you need to use a custom dataset, then that's fine, but start here, don't worry it'll send you back here when the time is right.

Finding Images

Finding data can be pretty tedious, especially if you have to do it yourself, unfortunately, I did, but some of you probably be able to take advantage of this tool I found for automatically downloading images from google.

You can find it . you can pass it key-terms, and such, and it will automatically search through google and download roughly 100 images, you can go higher than that, by installing additional software drivers, for more information, view the tutorial here. As the blog states, we need roughly 260 images to create a usable dataset.

Annotating the Images

A simple trimap example.

So if you're coming from the pet detector then you'll notice that when you extract their images and annotations, that aside from the very basic fact that the annotations and the images are in two different files, that the rest of it is just kind of wierd, and if you'll look at the python scripts written to train and convert it, then you'll notice that this "weirdness" is the reason that everything used in the tutorial is labeled 'pet' at some point. Now, don't get me wrong, trimaps are awesome, but for most of us, we don't need them, we just want bounding boxes, and when it comes to making custom datasets, we don't want to take the painstaking time to make trimaps if all we need is bounding boxes. So what is a trimap? Well, if we take a look at the pet dataset, it's fairly objvious, in one case, you're training a model to see bounding boxes of a general region, with no real focus, and in the other, you're training the model to see the actual shape of an object via pixel shading (see above).

Now, for Annotation, I used Labelimg, it's a great tool for drawing bounding boxes on images, the bounding boxes get automatically converted into xml documents which you can then use for tensorflow.

Using Labelimg

Using LabelImg can be a bit of a mystery at first, but as long as you know these tips, you should get along just fine:

The buttons do what they say they do
You need to press W before you can draw a box on an image
You can press A and D to change between pictures in a directory that you've opened
You can turn on auto-save in the settings to allow you not to have to press Ctrl+S after every image
You can use Ctrl and the mousewheel to zoom in and out instead of + and
You can use the arrow keys to fine-tune adjust the placement of your boxes.

If you want to use the code that I've written for these tutorials, then you'll want to save the xml files you make to your ./annotations/xmls/ directory

Creating TFRecords Compatible with Training

Now for the difficult part (not really, it's just long-ish). We need to make TFRecords so that we can use them during training. There are a couple of ways to do this, some people prefer to convert their xml annotations into single csv files, and use separate scripts to turn the csv files into TFRecords, but I prefer to do it all in one go, since the when you convert the xml into a csv, and then open it in tensorflow, it just turns it back into the same data-structure it was before you saved it means that you're just writing data to the disk, to write data to the disk. Anyway, so I'm just going to show you step-by-step what I did, when creating my dataset, if you want to skip this part and just get the entire file, you can find it here. You'll also want this one which needs to be run before you run the create_tf_records.py because it splits up the images into a train and test categories, which create_tf_records.py relies on to create training and testing xml files (If you already have them split into these two folders, then you can skip this).

Splitting the "raw" Images into Train and Test Categories

Firs thing I did was focus on how I could split a directory of images into two categories without requiring a text document like the pet dataset does, and this is what I came up with, it searches for images in the images/raw directory, accumulates a list of images, then splits it randomly into two categories and moves the corresponding files to their new folders.

from random import shuffle import os, fnmatch, pathlib, shutil def read_train_val(): """Reads the names of all image files from the ./images/raw/ directory and then splits it up into multiple parts to be used by training and testing int the typical 90%/10% ratio.""" # Maps the directory for image files and reads in all of the lines and then shuffles them files = list(map(lambda x: "./images/raw/" + x, fnmatch.filter(os.listdir("./images/raw/"), "*.jpg"))) shuffle(files) for name in files: print(name) # Splits the files into two arrays of train values, and test values split_loc = int(len(files) * 0.9) train_values = files[:split_loc] test_values = files[split_loc:] return (train_values, test_values) def copy_files(file_list, dst): """Copies a list of files to another directory""" for file in file_list: print("Copying " + file + " to " + dst) shutil.copy(file, dst) if __name__ == '__main__': """Separates the image files in the ./images/raw/ directory into two categories then moves them into two directories, ./images/train/, and ./images/test/.""" # Reads in the trainval.txt file and shuffles its contents # Then splits it into training and testing values (train, test) = read_train_val() # Creates the train folder if it doesn't exist yet. pathlib.Path("./images/train/").mkdir(parents=True, exist_ok=True) # Creates the test folder if it doesn't exist yet. pathlib.Path("./images/test/").mkdir(parents=True, exist_ok=True) # Copies the files into their respective directories copy_files(train, "./images/train/") copy_files(test, "./images/test/")

Basically, it looks for any file in the images/raw directory that ends in .jpg and then compiles that list, shuffles it, then splits it using the preferred ratio of 90% training samples to 10% validation samples which is normal for a small training set like this one (source). The script then copies the files to the train and test folders.

Converting the XML Annotations to Pandas Dataframes

This is the script that I'm going to be talking about here, I'm not going to touch all of it, because it's really long, and kind of boring, but I'll try to hit the major parts.

So there's this script out there that just about every single tutorial uses, that claims to convert xml files to csv, but it doesn't actually do that, the script returns a pandas dataframe, which they then save as a csv, here I'll show you:

def xml_to_csv(xml_file_list): """Converts given xml files into a csv panda dataframe""" # Parses the xml files xml_list = [] for xml_file in xml_file_list: tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): value = (root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text) ) xml_list.append(value) # Sets up the pandas dataframe column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df

You'll notice that aside from the function name, the term csv does not appear in this function, but if you take the return value and call:

xml_df.to_csv([output file], index=None)

THEN you get a csv file (see, I pulled a sneaky one on you *wink*). Anyway, we don't need that part anyway, because in the tfrecord code, it calls:

examples = pd.read_csv(FLAGS.csv_input)

WHICH EXACTLY DOES THE UNDOING OF THE LAST CODE SNIPPET, so we're skipping it, and we'll just pass the pandas dataframe directly to the tfrecord generator

Here's what we do instead, first we need to find the xml files that we sorted above (with the image sorter), to do that, I created this function:

def find_xml_files(): """Locates all of the image files in the two ./images/train/ and ./images/test/ directories and finds the corresponding xml files in the ./annotations/xmls/ directory""" # Finds the xml files train_files = convert_jpg_to_xml("./images/train/") test_files = convert_jpg_to_xml("./images/test/") # Checks to make sure that the xml files exist, and if they don't then skips them # Train files remove_non_existant_files(train_files) # Test files remove_non_existant_files(test_files) print("Training files:") for file in train_files: print(file) print("Testing files:") for file in test_files: print(file) # Returns the located xml files return (train_files, test_files)

This function calls two children functions, first it must find all of the jpg filenames, and then swaps the .jpg extension with an .xml extension.

def convert_jpg_to_xml(path: str): """Converts a list of file names in the given path from .jpg to .xml""" # Finds all jpg files files = fnmatch.filter(os.listdir(path), "*.jpg") # Converts the .jpg to .xml and adds the ./annotations/xmls prefix xmls = map(lambda x: "./annotations/xmls/" + x[:-4] + ".xml", files) return list(xmls)

Then it checks that all of the xml files exist, and removes them from the list if they don't

def remove_non_existant_files(file_list): """Iterates through the list and removes and files that don't exist""" # Checks to make sure that the xml files exist, and if they don't then skips them for i in range(len(file_list) - 1, -1, -1): if not os.path.isfile(file_list[i]): print("File " + file_list[i] + " does not exist, removing...") del(file_list[i])

Now that the image files are read in, and the xml files are found, we can convert them to the pandas dataframe that I mentioned above.

Then all that's left to do is to generate the TFRecords, but this is where things change a little bit, I think the sharded files that tensorflow's pet detector tutorial's tf generator created are important, so I re-wrote the scripts from here and here to include it. After consulting another source and another source it appears that sharding isn't actually all that beneficial, but it does make it easier when it comes to having the dataset across multiple machines, which could be nice.

def gen_tfrecord(panda_df, output_path, num_shards = 10): """Creates a TFRecord of the current dataframe into the output file""" with contextlib2.ExitStack() as tf_record_close_stack: writer = tf_record_creation_util.open_sharded_output_tfrecords( tf_record_close_stack, output_path, num_shards) grouped = split(panda_df, 'filename') for idx, group in enumerate(grouped): if idx % 100 == 0: print("On image " + str(idx) + " of " + str(len(grouped))) tf_example = create_tf_example(group, "./images/raw") shard_idx = idx % num_shards writer[shard_idx].write(tf_example.SerializeToString()) print("Successfully creates the TFRecords: {}".format(output_path))

You'll notice that this script also has to create the tf_example, which is a kind of datastructure used by tensorflow to write TFRecords, you can find that and the split functions in the script mentioned above, and right here.

Note: Something to remember here, the .config file is what specifies the naming template for the .record files, if you change the name, or you aren't sharding, then you need to go and change the config file (which I put in the ./models/model/ directory).

Thanks for reading, if you haven't read my tutorial on reusing frozen inference graphs then check it out!

Search This Blog

Aaron's Blog

Total Pageviews