Skip to content

Computer Vision Models (Vision API)

Native Lux Models

Boltz.Vision.AlexNet Type
julia
AlexNet(; kwargs...)

Create an AlexNet model (Krizhevsky et al., 2012).

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.VGG Type
julia
VGG(imsize; config, inchannels, batchnorm = false, nclasses, fcsize, dropout)

Create a VGG model (Simonyan, 2014).

Arguments

  • imsize: input image width and height as a tuple

  • config: the configuration for the convolution layers

  • inchannels: number of input channels

  • batchnorm: set to true to use batch normalization after each convolution

  • nclasses: number of output classes

  • fcsize: intermediate fully connected layer size

  • dropout: dropout level between fully connected layers

source

julia
VGG(depth::Int; batchnorm::Bool=false, pretrained::Bool=false)

Create a VGG model (Simonyan, 2014) with ImageNet Configuration.

Arguments

  • depth::Int: the depth of the VGG model. Choices: {11, 13, 16, 19}.

Keyword Arguments

  • batchnorm = false: set to true to use batch normalization after each convolution.

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.VisionTransformer Type
julia
VisionTransformer(name::Symbol; pretrained=false)

Creates a Vision Transformer model with the specified configuration.

Arguments

  • name::Symbol: name of the Vision Transformer model to create. The following models are available – :tiny, :small, :base, :large, :huge, :giant, :gigantic.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Imported from Metalhead.jl

Load Metalhead

You need to load Metalhead before using these models.

Boltz.Vision.ConvMixer Function
julia
ConvMixer(name::Symbol; pretrained::Bool=false)

Create a ConvMixer model (Trockman and Kolter, 2022).

Arguments

  • name::Symbol: The name of the ConvMixer model. Must be one of :base, :small, or :large.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.DenseNet Function
julia
DenseNet(depth::Int; pretrained::Bool=false)

Create a DenseNet model (Huang et al., 2017).

Arguments

  • depth::Int: The depth of the DenseNet model. Must be one of 121, 161, 169, or 201.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.GoogLeNet Function
julia
GoogLeNet(; pretrained::Bool=false)

Create a GoogLeNet model (Szegedy et al., 2015).

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.MobileNet Function
julia
MobileNet(name::Symbol; pretrained::Bool=false)

Create a MobileNet model (Howard, 2017; Sandler et al., 2018; Howard et al., 2019).

Arguments

  • name::Symbol: The name of the MobileNet model. Must be one of :v1, :v2, :v3_small, or :v3_large.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.ResNet Function
julia
ResNet(depth::Int; pretrained::Bool=false)

Create a ResNet model (He et al., 2016).

Arguments

  • depth::Int: The depth of the ResNet model. Must be one of 18, 34, 50, 101, or 152.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.ResNeXt Function
julia
ResNeXt(depth::Int; cardinality=32, base_width=nothing, pretrained::Bool=false)

Create a ResNeXt model (Xie et al., 2017).

Arguments

  • depth::Int: The depth of the ResNeXt model. Must be one of 50, 101, or 152.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

  • cardinality: The cardinality of the ResNeXt model. Defaults to 32.

  • base_width: The base width of the ResNeXt model. Defaults to 8 for depth 101 and 4 otherwise.

source

Boltz.Vision.SqueezeNet Function
julia
SqueezeNet(; pretrained::Bool=false)

Create a SqueezeNet model (Iandola et al., 2016).

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Boltz.Vision.WideResNet Function
julia
WideResNet(depth::Int; pretrained::Bool=false)

Create a WideResNet model (Zagoruyko and Komodakis, 2017).

Arguments

  • depth::Int: The depth of the WideResNet model. Must be one of 18, 34, 50, 101, or 152.

Keyword Arguments

  • pretrained::Bool=false: If true, loads pretrained weights when LuxCore.setup is called.

source

Pretrained Models

Load JLD2

You need to load JLD2 before being able to load pretrained weights.

Load Pretrained Weights

Pass pretrained=true to the model constructor to load the pretrained weights.

MODELTOP 1 ACCURACY (%)TOP 5 ACCURACY (%)
AlexNet()54.4877.72
VGG(11)67.3587.91
VGG(13)68.4088.48
VGG(16)70.2489.80
VGG(19)71.0990.27
VGG(11; batchnorm=true)69.0988.94
VGG(13; batchnorm=true)69.6689.49
VGG(16; batchnorm=true)72.1191.02
VGG(19; batchnorm=true)72.9591.32
ResNet(18)--
ResNet(34)--
ResNet(50)--
ResNet(101)--
ResNet(152)--
ResNeXt(50; cardinality=32, base_width=4)--
ResNeXt(101; cardinality=32, base_width=8)--
ResNeXt(101; cardinality=64, base_width=4)--
SqueezeNet()--
WideResNet(50)--
WideResNet(101)--

Pretrained Models from Metalhead

For Models imported from Metalhead, the pretrained weights can be loaded if they are available in Metalhead. Refer to the Metalhead.jl docs for a list of available pretrained models.

Preprocessing

All the pretrained models require that the images be normalized with the parameters mean = [0.485f0, 0.456f0, 0.406f0] and std = [0.229f0, 0.224f0, 0.225f0].


Bibliography

  • He, K.; Zhang, X.; Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 770–778.

  • Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. and others (2019). Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision; pp. 1314–1324.

  • Howard, A. G. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv preprint arXiv:1704.04861.

  • Huang, G.; Liu, Z.; Van Der Maaten, L. and Weinberger, K. Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 4700–4708.

  • Iandola, F. N.; Han, S.; Moskewicz, M. W.; Ashraf, K.; Dally, W. J. and Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, arXiv:1602.07360 [cs.CV].

  • Krizhevsky, A.; Sutskever, I. and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25.

  • Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A. and Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 4510–4520.

  • Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.

  • Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V. and Rabinovich, A. (2015). Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 1–9.

  • Trockman, A. and Kolter, J. Z. (2022). Patches are all you need? arXiv preprint arXiv:2201.09792.

  • Xie, S.; Girshick, R.; Dollár, P.; Tu, Z. and He, K. (2017). Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 1492–1500.

  • Zagoruyko, S. and Komodakis, N. (2017). Wide Residual Networks, arXiv:1605.07146 [cs.CV].