How do I get the best Airbnb Clone Script?

This post explains how you can get an Airbnb clone script from reputable companies that help you to succeed in the online rental business.

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Semantic Segmentation with SegFormer

Model Prediction on Drone Dataset Image

Image Segmentation is the process of classifying each pixel in an image. It is a computer vision task tasked mainly to detect regions in an image with an object.

The Segformer architecture doesn’t depend on positional encodings like Vision Transformers and hence improves inference on images of different resolutions. The other thing that ranks Segformer above the rest of its counterparts is the way its Decoder has been designed. Unlike all the Segmentation architectures that use Upsampling or Deconvolutions, Segformer uses MLP Decoder which is faster and more efficient. We will be discussing the architecture in a bit more detail in the following sections.

Transformers were originally built to solve the Sequence2Sequence problems like text generation and translation. Transformer is a novel architecture for transforming one sequence into another using an Encoder and Decoder along with the self-attention mechanism.

Figure 1: From ‘Attention Is All You Need’ by Vaswani et al.

The architecture is based on a Transformer architecture with Encoder-Decoder heads where the encoder makes use of Self Attention.

The architecture unlike other complex decoders applies a simple MLP decoder that aggregates information from different layers and thus combines both local attention and global attention for rendering powerful representations.

Figure 2: SegFormer architecture

The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird’s eye) view acquired at an altitude of 5 to 30 meters above the ground. A high-resolution camera was used to acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly available images.

Training the SegFormer model is as easy as training any other model in PyTorch. The steps we will be following will be:

We will be using HuggingFace’s feature extractor which takes care of handling the segmentation labels directly from the loaded mask.

We have the labels CSV which can be used for plotting the colors on the segmented image.

We will be using HuggingFace’s pre-trained SegFormer model and fine-tune it with our own dataset. The transformers library by HuggingFace makes it really easy to use a pre-trained model for fine-tuning it with a custom dataset. Loading the pre-trained model is barely a couple of lines.

We will be using the default hyperparameters used in training the SegFormer for training the model with the Drone Dataset. The transformers library by HuggingFace provides the optimizer AdamW with slight changes to handle training the pre-trained HuggingFace model.

And here we are ready to train the model. We are training it only for 10 epochs for the sake of the blog as it takes a long time on Colab to train and gives decent results with 10 epochs.

The above model trained on Drone Dataset is pushed to the HuggingFace Model hub.

Let’s look at the inference code to test our model.

Getting the colors for the color palette that we will be using to draw the classified segmentation on the image.

As we saw earlier, loading the pre-trained model is very easy with HuggingFace.

And that’s it. We did it! Pat your back, you trained a transformer model all on your own!

We implemented a Semantic Segmentation model based on Transformer which has given state-of-the-art results in multiple tasks. We also saw how the architecture was designed taking into consideration each aspect of the problem. The hierarchical design helps propagate features at multiple scales. The MLP decoder helps speed up the forward pass which significantly improves the FPS of the model. We also tried HuggingFace which makes it so very easy to train and try Transformer based models.

I hope you take something away from the blog. Please do try out the colab notebook and share your experiences in the comments below.

Add a comment

Related posts:

Is it right to charge for private courses to prospect students struggling to get admission to grad school?

In the world of social media where everyone you follow or watch videos make you want to believe that they are giving you information for your benefit. Until , you reach the end of the video to know…

How to learn things fast without going crazy

As people who work in the tech industry, we often need to learn new technologies for our work. With time I found myself struggling with the same points over and over again: 1.The Tutorial Hell — with…

You Said Applied Research?

Here are 10 ways to stay up-to-date. “You Said Applied Research?” is published by Bilal Zaiter.