DETR Failure

I'm working on a project to bring advanced document intelligence to the construction industry.

After building an initial prototype I uncovered several sub-problems to dig into. One is how to cross-link references in construction drawings (think hyperlinking across pages). No doubt a capability that exists in the industry standard Procore, but I'm not using Procore.

The Goal

Train a custom object detection model using the DETR architecture to detect reference symbols in construction drawings.

Example reference symbol in a construction drawing

Motivation

I read that DETR is elegant because it largely uses a transformer to solve object detection. That's great, transformers are being used for seemingly everything else and maybe this will help me understand them better.

There's a blog post by Facebook AI Research that explains the simplified architecture and I was excited to try it out.

I found this Hugging Face tutorial on how to fine-tune a DETR model which felt very approachable. Just bring your COCO dataset and go to town!

The Start

First I needed to create a COCO dataset. I have a few IFC (Issued for construction) drawings that I could use, but I needed to annotate them.

There are plenty of existing tools to annotate a dataset, but I had already started building a workbench tool to help with document processing and content extraction. The functionality to annotate images and export the annotations in COCO format would be a fun addition.

After a few days of work I had a tool that could load an image, draw bounding boxes, and export the annotations in COCO format. I was ready to start annotating. Just a couple of hours of manual labor and I had created my first dataset.

The Failure

I was ready to train the model. I followed the Hugging Face tutorial, modifying it to work with my local data. Finally, the big moment: I started the training process.

Since my dataset was miniscule the training went quick. After 3 minutes, out popped a model that I could use to make predictions. I loaded up a previously unseen drawing from my test set and ran it through the model.

The model predicted nothing. Not a single bounding box.

I tried a few more drawings. Nothing.

Action Items

I need to investigate further why this experiment failed.

The images I used were high resolution and needed to be downscaled to fit into the transformer context. This could significantly reduce information about the comparatively small reference symbols.
- Action: Investigate Deformable DETR or other architectures that are designed to handle small objects.
It's possible I just need more training data, or need to train for longer. This is where my lack of experience in training object detection models is showing.
- Action: Collect more data and try again.
Investigate more traditional object detection architectures like YOLO or Faster R-CNN.
- Action: Try out Detectron2

Now that I have a COCO dataset and a tool to annotate more images, I can try out different architectures and see if I can get a model to work.

Stay tuned!

X · GitHub · LinkedIn