DETR Failure
I'm working on a project to bring advanced document intelligence to the construction industry.
After building an initial prototype I uncovered several sub-problems to dig into. One is how to cross-link references in construction drawings (think hyperlinking across pages). No doubt a capability that exists in the industry standard Procore, but I'm not using Procore.
The Goal
Train a custom object detection model using the DETR architecture to detect reference symbols in construction drawings.
Motivation
I read that DETR is elegant because it largely uses a transformer to solve object detection. That's great, transformers are being used for seemingly everything else and maybe this will help me understand them better.
There's a blog post by Facebook AI Research that explains the simplified architecture and I was excited to try it out.
I found this Hugging Face tutorial on how to fine-tune a DETR model which felt very approachable. Just bring your COCO dataset and go to town!
The Start
First I needed to create a COCO dataset. I have a few IFC (Issued for construction) drawings that I could use, but I needed to annotate them.
There are plenty of existing tools to annotate a dataset, but I had already started building a workbench tool to help with document processing and content extraction. The functionality to annotate images and export the annotations in COCO format would be a fun addition.
After a few days of work I had a tool that could load an image, draw bounding boxes, and export the annotations in COCO format. I was ready to start annotating. Just a couple of hours of manual labor and I had created my first dataset.
The Failure
I was ready to train the model. I followed the Hugging Face tutorial, modifying it to work with my local data. Finally, the big moment: I started the training process.
Since my dataset was miniscule the training went quick. After 3 minutes, out popped a model that I could use to make predictions. I loaded up a previously unseen drawing from my test set and ran it through the model.
The model predicted nothing. Not a single bounding box.
I tried a few more drawings. Nothing.
Action Items
I need to investigate further why this experiment failed.
- The images I used were high resolution and needed to be downscaled to fit into the transformer context. This could significantly reduce information about the comparatively small reference symbols.
- Action: Investigate Deformable DETR or other architectures that are designed to handle small objects.
- It's possible I just need more training data, or need to train for longer. This is where my lack of experience in training object detection models is showing.
- Action: Collect more data and try again.
- Investigate more traditional object detection architectures like YOLO or Faster R-CNN.
- Action: Try out Detectron2
Now that I have a COCO dataset and a tool to annotate more images, I can try out different architectures and see if I can get a model to work.
Stay tuned!
© Mike Surowiec