Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions configs/rec/crnn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,19 +39,21 @@ According to our experiments, the training (following the steps in [Model Traini

<div align="center">

| **Model** | **Context** | **Backbone** | **Train Dataset** | **Model Params** | **Batch size per card** | **Graph train 8P (s/epoch)** | **Graph train 8P (ms/step)** | **Graph train 8P (FPS)** | **Avg Eval Accuracy** | **Recipe** | **Download** |
| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) |
| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) |
| **Model** | **Context** | **Backbone** | **Train Dataset** | **Num Classes** | **Model Params** | **Batch size per card** | **Graph train (s/epoch)** | **Graph train (ms/step)** | **Graph train (FPS)** | **Avg Eval Accuracy** | **Recipe** | **Download** |
| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 37 | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) |
| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 37 | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) |
| CRNN | D910x4-MS2.0-G | ResNet34_vd | MJ+ST | 96 | 24.51 M | 64 | 4292.18 | 76.08 | 3364.72 | 83.50% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_server.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c-55748731.mindir) |
</div>

- Detailed accuracy results for each benchmark dataset (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE):
<div align="center">

| **Model** | **Backbone** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
| :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
| CRNN | VGG7 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% |
| CRNN | ResNet34_vd | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% |
| **Model** | **Backbone** | **Num Classes** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
| :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
| CRNN | VGG7 | 37 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% |
| CRNN | ResNet34_vd | 37 | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% |
| CRNN | ResNet34_vd | 96 | 94.65% | 94.70% | 94.28% | 93.20% | 72.5% | 63.94% | 87.63% | 86.09% | 74.42% | 73.61% | 83.50% |
</div>

### Inference Perf.
Expand All @@ -70,7 +72,7 @@ The inference performance is tested on Mindspore Lite, please take a look at [Mi
**Notes:**
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x8-MS1.8-G is for training on 8 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.8.
- To reproduce the result on other contexts, please ensure the global batch size is the same.
- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
- The number of classes of the model is determined by the dictionary used for training. The default dictionary contains lowercase English characters from a to z and digits from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
- The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100).

Expand Down
20 changes: 11 additions & 9 deletions configs/rec/crnn/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,20 +39,22 @@ Table Format:

<div align="center">

| **模型** | **环境配置** | **骨干网络** | **训练集** | **参数量** | **单卡批量** | **图模式8卡训练 (s/epoch)** | **图模式8卡训练 (ms/step)** | **图模式8卡训练 (FPS)** | **平均评估精度** | **配置文件** | **模型权重下载** |
| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) |
| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) |
| **模型** | **环境配置** | **骨干网络** | **训练集** | **类别数** | **参数量** | **单卡批量** | **图模式训练 (s/epoch)** | **图模式训练 (ms/step)** | **图模式训练 (FPS)** | **平均评估精度** | **配置文件** | **模型权重下载** |
| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :------: |
| CRNN | D910x8-MS1.8-G | VGG7 | MJ+ST | 37 |8.72 M | 16 | 2488.82 | 22.06 | 5802.71 | 82.03% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_vgg7.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c-573dbd61.mindir) |
| CRNN | D910x8-MS1.8-G | ResNet34_vd | MJ+ST | 37 | 24.48 M | 64 | 2157.18 | 76.48 | 6694.84 | 84.45% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07-eb10a0c9.mindir) |
| CRNN | D910x4-MS2.0-G | ResNet34_vd | MJ+ST | 96 | 24.51 M | 64 | 4292.18 | 76.08 | 3364.72 | 83.50% | [yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_server.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_server-e0d66c0c-55748731.mindir) |
</div>

- 在各个基准数据集(IC03,IC13,IC15,IIIT,SVT,SVTP,CUTE)上的准确率:

<div align="center">

| **模型** | **骨干网络** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** |
| :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
| CRNN | VGG7 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% |
| CRNN | ResNet34_vd | 94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% |
| **模型** | **骨干网络** | **类别数** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **平均准确率** |
| :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
| CRNN | VGG7 | 37 | 94.53% | 94.00% | 92.18% | 90.74% | 71.95% | 66.06% | 84.10% | 83.93% | 73.33% | 69.44% | 82.03% |
| CRNN | ResNet34_vd | 37 |94.42% | 94.23% | 93.35% | 92.02% | 75.92% | 70.15% | 87.73% | 86.40% | 76.28% | 73.96% | 84.45% |
| CRNN | ResNet34_vd | 96 | 94.65% | 94.70% | 94.28% | 93.20% | 72.5% | 63.94% | 87.63% | 86.09% | 74.42% | 73.61% | 83.50% |
</div>


Expand All @@ -72,7 +74,7 @@ Table Format:
**注意:**
- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 Mindspore 模式可以是 G-graph 模式或 F-pynative 模式。例如,D910x8-MS1.8-G 用于使用图形模式在8张昇腾910 NPU上依赖Mindspore1.8版本进行训练。
- 如需在其他环境配置重现训练结果,请确保全局批量大小与原配置文件保持一致。
- 模型所能识别的字符都是默认的设置,即所有英文小写字母a至z及数字0至9,详细请看[4. 字符词典](#4-字符词典)
- 模型的类别数由用于训练的字典决定。默认字典包含小写英文字符从a到z和数字从0到9,详细请看[4. 字符词典](#4-字符词典)
- 模型都是从头开始训练的,无需任何预训练。关于训练和测试数据集的详细介绍,请参考[数据集下载及使用](#312-数据集下载)章节。
- CRNN_VGG7和CRNN_ResNet34_vd的MindIR导出时的输入Shape均为(1, 3, 32, 100)。

Expand Down
8 changes: 2 additions & 6 deletions configs/rec/crnn/crnn_icdar15.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,16 +96,12 @@ train:
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: True
- RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize.
- RecResizeNormImg:
image_shape: [32, 100] # H, W
infer_mode: *infer_mode
character_dict_path: *character_dict_path
padding: False # aspect ratio will be preserved if true.
- NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec.
bgr_to_rgb: True
is_hwc: True
mean : [127.0, 127.0, 127.0]
std : [127.0, 127.0, 127.0]
norm_before_pad: False
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize
output_columns: ['image', 'text_seq'] #, 'length'] #'img_path']
Expand Down
10 changes: 3 additions & 7 deletions configs/rec/crnn/crnn_resnet34.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,23 +80,19 @@ train:
shuffle: True
transform_pipeline:
- DecodeImage:
img_mode: BGR
img_mode: RGB
to_float32: False
- RecCTCLabelEncode:
max_text_len: *max_text_len
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: True
- RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize.
- RecResizeNormImg:
image_shape: [32, 100] # H, W
infer_mode: *infer_mode
character_dict_path: *character_dict_path
padding: False # aspect ratio will be preserved if true.
- NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec.
bgr_to_rgb: True
is_hwc: True
mean : [127.0, 127.0, 127.0]
std : [127.0, 127.0, 127.0]
norm_before_pad: False
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize
output_columns: ['image', 'text_seq'] #, 'length'] #'img_path']
Expand Down
14 changes: 5 additions & 9 deletions configs/rec/crnn/crnn_resnet34_ch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ train:
max_text_len: *max_text_len
transform_pipeline:
- DecodeImage:
img_mode: BGR
img_mode: RGB
to_float32: False
- RecCTCLabelEncode:
max_text_len: *max_text_len
Expand All @@ -94,16 +94,12 @@ train:
- Rotate90IfVertical:
threshold: 2.0
direction: counterclockwise
- RecResizeImg:
image_shape: [32, 320]
- RecResizeNormImg:
image_shape: [32, 320] # H, W
infer_mode: *infer_mode
character_dict_path: *character_dict_path
padding: True
- NormalizeImage:
bgr_to_rgb: True
is_hwc: True
mean: [127.0, 127.0, 127.0]
std: [127.0, 127.0, 127.0]
padding: True # aspect ratio will be preserved if true.
norm_before_pad: False
- ToCHWImage:
output_columns: ["image", "text_seq"]
net_input_column_index: [0]
Expand Down
150 changes: 150 additions & 0 deletions configs/rec/crnn/crnn_resnet34_server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: True
amp_level: 'O3'
seed: 42
log_interval: 100
val_while_train: True
drop_overflow_update: False

common:
character_dict_path: &character_dict_path mindocr/utils/dict/en_dict.txt
num_classes: &num_classes 96 # num_chars_in_dict+1, TODO: retreive it from dict or check correctness
max_text_len: &max_text_len 24
infer_mode: &infer_mode False
use_space_char: &use_space_char True
lower: &lower False
batch_size: &batch_size 64

model:
type: rec
transform: null
backbone:
name: rec_resnet34
pretrained: False
neck:
name: RNNEncoder
hidden_size: 256
head:
name: CTCHead
weight_init: crnn_customised
bias_init: crnn_customised
out_channels: *num_classes

postprocess:
name: RecCTCLabelDecode
character_dict_path: *character_dict_path
use_space_char: *use_space_char

metric:
name: RecMetric
main_indicator: acc
character_dict_path: *character_dict_path
ignore_space: True
print_flag: False

loss:
name: CTCLoss
pred_seq_len: 25 # TODO: retrieve from the network output shape.
max_label_len: *max_text_len # this value should be smaller than pre_seq_len
batch_size: *batch_size

scheduler:
scheduler: warmup_cosine_decay
min_lr: 0.000001
lr: 0.001
num_epochs: 30
warmup_epochs: 2
decay_epochs: 28

optimizer:
opt: adamw
filter_bias_and_bn: True
momentum: 0.95
weight_decay: 0.0001
nesterov: False

loss_scaler:
type: dynamic
loss_scale: 512
scale_factor: 2.0
scale_window: 1000

train:
ckpt_save_dir: './crnn_resnet34_server'
pred_cast_fp32: False # let CTCLoss cast internally
ema: True # added
dataset_sink_mode: False
dataset:
type: LMDBDataset
dataset_root: /path/to/data_lmdb_release/
data_dir: training/
# label_file: # not required when using LMDBDataset
sample_ratio: 1.0
shuffle: True
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- RecCTCLabelEncode:
max_text_len: *max_text_len
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: *lower
- RecResizeNormImg:
image_shape: [32, 100] # H, W
infer_mode: *infer_mode
character_dict_path: *character_dict_path
padding: True # aspect ratio will be preserved if true.
norm_before_pad: True
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize
output_columns: ['image', 'text_seq'] #, 'length'] #'img_path']
net_input_column_index: [0] # input indices for network forward func in output_columns
label_column_index: [1] # input indices marked as label
#keys_for_loss: 4 # num labels for loss func

loader:
shuffle: True
batch_size: *batch_size
drop_remainder: True
max_rowsize: 12
num_workers: 8

eval:
ckpt_load_path: ./crnn_resnet34_server/best.ckpt
dataset_sink_mode: False
dataset:
type: LMDBDataset
dataset_root: /path/to/data_lmdb_release/
data_dir: validation/
# label_file: # not required when using LMDBDataset
sample_ratio: 1.0
shuffle: False
transform_pipeline:
- DecodeImage:
img_mode: RGB
to_float32: False
- RecCTCLabelEncode:
max_text_len: *max_text_len
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: *lower
- RecResizeNormImg:
image_shape: [32, 100] # H, W
infer_mode: *infer_mode
character_dict_path: *character_dict_path
padding: True # aspect ratio will be preserved if true.
norm_before_pad: True
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize
output_columns: ['image', 'text_padded', 'text_length'] # TODO return text string padding w/ fixed length, and a scaler to indicate the length
net_input_column_index: [0] # input indices for network forward func in output_columns
label_column_index: [1, 2] # input indices marked as label

loader:
shuffle: False # TODO: tbc
batch_size: 64
drop_remainder: False
max_rowsize: 12
num_workers: 8
10 changes: 3 additions & 7 deletions configs/rec/crnn/crnn_vgg7.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,23 +81,19 @@ train:
shuffle: True
transform_pipeline:
- DecodeImage:
img_mode: BGR
img_mode: RGB
to_float32: False
- RecCTCLabelEncode:
max_text_len: *max_text_len
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: True
- RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize.
- RecResizeNormImg:
image_shape: [32, 100] # H, W
infer_mode: *infer_mode
character_dict_path: *character_dict_path
padding: False # aspect ratio will be preserved if true.
- NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec.
bgr_to_rgb: True
is_hwc: True
mean : [127.0, 127.0, 127.0]
std : [127.0, 127.0, 127.0]
norm_before_pad: False
- ToCHWImage:
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize
output_columns: ['image', 'text_seq'] #, 'length'] #'img_path']
Expand Down
Loading