from torchvision import models
vgg = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)
print(vgg)
| 模型 | 说明 |
| AlexNet | |
| ConvNeXt | |
| DenseNet | |
| EfficientNet | |
| EfficientNetV2 | |
| GoogLeNet | |
| Inception V3 | |
| MaxVit | |
| MNASNet | |
| MobileNet V2 | |
| MobileNet V3 | |
| RegNet | |
| ResNet | |
| ResNeXt | |
| ShuffleNet V2 | |
| SqueezeNet | |
| SwinTransformer | |
| VGG | |
| VisionTransformer | |
| Wide ResNet |
| 模型 | 说明 |
| DeepLabV3 | |
| FCN | |
| LRASPP |
| 模型 | 说明 |
| Faster R-CNN | |
| FCOS | |
| RetinaNet | |
| SSD | |
| SSDlite |
| 模型 | 说明 |
| Mask R-CNN |
| 模型 | 说明 |
| Keypoint R-CNN |
| 模型 | 说明 |
| Video MViT | |
| Video ResNet | |
| Video S3D | |
| Video SwinTransformer |
| 模型 | 说明 |
| RAFT |
| 模型 | 说明 |
| Conformer | Conformer architecture introduced in //Conformer: Convolution-augmented Transformer for Speech Recognition// [Gulati //et al.//, 2020]. |
| ConvTasNet | Conv-TasNet architecture introduced in //Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation// [Luo and Mesgarani, 2019]. |
| DeepSpeech | DeepSpeech architecture introduced in //Deep Speech: Scaling up end-to-end speech recognition// [Hannun //et al.//, 2014]. |
| Emformer | Emformer architecture introduced in //Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition// [Shi //et al.//, 2021]. |
| HDemucs | Hybrid Demucs model from //Hybrid Spectrogram and Waveform Source Separation// [Défossez, 2021]. |
| HuBERTPretrainModel | HuBERT model used for pretraining in //HuBERT// [Hsu //et al.//, 2021]. |
| RNNT | Recurrent neural network transducer (RNN-T) model. |
| RNNTBeamSearch | Beam search decoder for RNN-T model. |
| SquimObjective | Speech Quality and Intelligibility Measures (SQUIM) model that predicts objective metric scores for speech enhancement (e.g., STOI, PESQ, and SI-SDR). |
| SquimSubjective | Speech Quality and Intelligibility Measures (SQUIM) model that predicts subjective metric scores for speech enhancement (e.g., Mean Opinion Score (MOS)). |
| Tacotron2 | Tacotron2 model from //Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions// [Shen //et al.//, 2018] based on the implementation from Nvidia Deep Learning Examples. |
| Wav2Letter | Wav2Letter model architecture from //Wav2Letter: an End-to-End ConvNet-based Speech Recognition System// [Collobert //et al.//, 2016]. |
| Wav2Vec2Model | Acoustic model used in //wav2vec 2.0// [Baevski //et al.//, 2020]. |
| WaveRNN | WaveRNN model from //Efficient Neural Audio Synthesis// [Kalchbrenner //et al.//, 2018] based on the implementation from fatchord/WaveRNN. |