ArcFace: Facial Recognition Model

ArcFace is an open source state-of-the-art model for facial recognition. Author Jiang Kang et al. published a paper in 2018 titled ArcFace: Additive Angular Margin Loss for Deep Face Recognition”. It has shown outstanding performance of 99.82% accuracy in LFW dataset . This article provides a brief analysis and explanation of the paper.


DCNNs have been used for Facial Recognition for quite some time. There are numerous research papers proposing various versions of models to perfect the technology of identifying and verifying faces. Some are multi-class classifiers using Softmax classifier and others produce embeddings(FaceNet) of faces. However, a major issue with many such methods is the loss functions that lack a strong discriminitive power to differentiate faces.

Previous work like Sphereface proposed the idea that the weights of the last fully connected layer of DCNN bear similarities to the different classes of face. This was leveraged to develop a loss function that enabled ‘intra-class compactness and inter-class discrepancy’. However, in order for this to work, sphereface had to make a number of assumptions leading to unstable training of network. CosFace takes a step further to make the loss function more efficient but it also suffers from inconsistency.

On the contrary Arcface administers an additive penalty to the dot product of the DCNN feature and the weights of the last fully connected layer. This has minimised the problems addressed in the previous work and has proven to be more Engaging, Effective, Easy to implement and Efficient according to the authors of the paper.


The most widely used classification loss function, softmax loss, is presented as follows:

Softmax Loss Func

where ‘ xi’ denotes the deep feature, ‘b’ is bias, ’N’ is batch size, ’n’ is class number, ’w’ is the weights of the last layer and the embedding feature dimension size is 512. This is not optimised for distinguising between high similarity embeddings of different classes which results in performance gap. Thats where Arcface comes in. It offers the following changes in the loss function.

ArcFace loss function

The Arcface loss function essentially takes the dot product of the weight ‘w’ and the ‘x’ feature where θ is the angle between ‘w’ and ‘x’ and then adds a penalty ‘m’ to it. ‘w’ is normalised using l2 norm and ‘x’ has been normalised with l2 norm and scaled by a factor ‘s’. This makes the predictions rely only on the angle θ or the cosine distance between the wieghts and the feature. The entire process is visualised below.

Image from original Paper


The loss function proposed by the paper shows clear inter-class distinction and reduces intra-class gap leading it to outperform other methodologies proposed before. The paper also boasts an exceptionally extensive evaluation of recent Facial Recognition methods in comparison with ArcFace over a variety of benchmark datasets.

Image from original paper

The above graphs visualises decision margins of different loss functions under binary classification case. The dashed line represents the decision boundary, and the grey areas are the decision margins. Furthermore, the following table details performance of ArcFace as opposed to few other methods on 3 different benchmark datasets.

Image from original paper


The article condenses the key concepts of ArcFace and explains its significance. The original paper is linked here and the code can be found here. An excellant pytorch implementation of it can be reproduced from this repo.

Deep Learning Engineer exploring AI and Computational Art.