Skip to main content

7 posts tagged with "deep learning"

View All Tags

· 18 min read



AutoEncoders are a type of artificial neural network used primarily for unsupervised learning tasks, particularly in the field of dimensionality reduction and feature learning. Their general purpose is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction or noise reduction.

The main components of an autoencoder are:

  1. Encoder: This part of the network compresses the input into a latent space representation. It encodes the input data as an encoded representation in a reduced dimension. The encoder layer is typically composed of a series of layers that gradually decrease in size.

  2. Bottleneck: This is the layer that contains the encoded representation of the input data. It is the heart of the network, which holds the compressed knowledge of the input data. The bottleneck is where the dimensionality reduction takes place.

  3. Decoder: The decoder network performs the reverse operation of the encoder. It takes the encoded data from the bottleneck and reconstructs the input data as closely as possible. This part of the network is typically symmetrical to the encoder, with layers increasing in size.

The objective of an autoencoder is to minimize the difference between the original input and its reconstruction, typically measured by a loss function like mean squared error. By learning to reconstruct the input data, the network learns valuable properties about the data and its structure. AutoEncoders are used in various applications like anomaly detection, image denoising, and as a pre-training step for deep learning models.


The objective of an autoencoder is to minimize the difference between the original input and its reconstruction, typically measured by a loss function like mean squared error. By learning to reconstruct the input data, the network learns valuable properties about the data and its structure.

Get full code before continue

Code available at gist :


This is a single file approach consists of model architecture, training code, testing code, and inference code. We will talk about each part of the code separately later. If you're in hurry, run this single file python script and open localhost:20202 to see the result.


Please make a new directory for putting this file, once you run this file, it automatically download MNIST dataset to the relative path ./data. You might not want the code download data into unwanted place.

· 14 min read


The main purpose of this article is to use basic self-attention blocks to build a simple large language model for learning purposes. Due to limitations in model scale and embedding methods, the model built in this article will not be very effective, but this does not affect the ability to learn various basic concepts of language models similar to Transformer through the code provided in this article.

What happens on this page:

  • get full code of a basic Large(?) Language Model (data preparation, model architecture, model training and predicting)
  • understand the general architecture of transformer with a few illustrations
  • understand how self regressive training works
  • understand how to load very large text dataset into limited memory (OpenWebTextCorpus)
  • train and observe the training procedure
  • load the trained model into a simple ask-and-answer interactive script

Get full code

Code available at


Download the code via git clone before continue.

· 10 min read


I'm working on some semantic segmentation related code, where I need to enhance segmentation accuracy on boundaries. Therefore, I tried to use boundary loss to assist model training. This article is my attempt and codes.


Our purpose is clear. In Cityscapes, we have indexed image that represents pixels of each class named gtFine_labelIds. What we want is to generate class level boundary from gtFine_labelIds so that we can use it to optimize boundary regions for specific class.

· 22 min read

I look back at the assisted driving systems project I created during my undergraduate studies. As we continue to improve it, it is gradually becoming more bloated and difficult to scale. As I continued to learn, I found that an end-to-end approach would be the right solution. In this page, I will record my review of this project and my gradual understanding of the end-to-end approach as a beginner.


In this page, I first review the assisted driving project I once created and explain how it was structured. Then I will introduce World Model as a novice and introduce the advantages of LLM for autonomous driving. Next, I will share a paper on how to integrate LLM into an autonomous driving system. Finally, I will go deeper into Augmented Language Models and share how I begin.

· 42 min read


1. Attention是什么


In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and focus on the information most relevant to behavior.

上面这段文字摘自Alana de Santana Correia, and Esther Luna Colombini的论文 ATTENTION, PLEASE ! A SURVEY OF NEURAL ATTENTION MODELS IN DEEP LEARNING。你应该注意到了,在你的视野中,只有一部分区域是很清晰的。对于视野周围的场景,你往往需要转转眼珠,把视野朝向它,才能完全看清。或者,你还发现,比起历史老师开始强调重点,你似乎对下课铃声的响起更加敏感——这就是注意力。你所处的环境包含着远超你的处理能力的信息,而注意力机制让你的大脑集中精力处理你视野中心的场景,或是你“更应该”关心的事物。

Attention机制听上去是一个很高大上的词汇,实际上,Attention在不经意间就会被使用。例如,循环神经网络中每一步计算都依赖于上一步计算结果的过程就可以被视为一种Attention:在 Attention 机制引入之前,有一个问题大家一直很苦恼:长距离的信息会被弱化,就好像记忆能力弱的人,记不住过去的事情是一样的。





· 5 min read

论文作者:Yongming Rao, Wenliang Zhao, Zheng Zhu Jiwen Lu, Jie Zhou。论文原文点此URL

本文提出概念简洁、计算性能优异的 Global Filter Network (GFNet),该模型主要结构框架基于 Vision Transformer ,在频域中学习空间长距离(long-term spatial dependencies)关系,其具有较小的对数线性复杂性。


图:Global Filter Network 的整体结构

其主要创新是使用 Global Filter Layer 替换了 Vision Transformer 中的 self-attention 层中的每一个子层。该模型取得了可喜的高精度,同时仅具有 CNN 的复杂度。

· 5 min read

ResNet 算得上是超经典的backbone了,其网络提出了残差结构,可有效缓解随网络层数的加深而导致的梯度消失和梯度爆炸现象。结构和设计在这里有讨论。这里主要尝试一下复现。ResNet的常见形式有:

NameTop-1 errorTop-5 error