アプリケーション作成時にDPU対応、非対応の最終層sigmoidとsoftmaxをUnetで使う方法の備忘録。
目次
1.Unetで最終層(活性化関数=activation)がsigmoidのとき
2.Unetで最終層(活性化関数=activation)がsoftmaxのとき
3.追記:SeparableConv2Dでもsoftmaxを使える
1.Unetで最終層(活性化関数=activation)がsigmoidのとき
unetの最終層
c10 = Conv2D(filters=nClasses, kernel_size=1, data_format=IMAGE_ORDERING, activation="sigmoid")(c10) model = Model(inputs=img_input, outputs=c10)
最終層がsigmoidの時のunetのcompile時のメッセージ
$ ./compile.sh >>>> ************************************************** * VITIS_AI Compilation - Xilinx Inc. ************************************************** [VAI_C][Warning] layer [conv2d_23_Sigmoid] (type: Sigmoid) is not supported in DPU, deploy it in CPU instead. Kernel topology "unet2_kernel_graph.jpg" for network "unet2" kernel list info for network "unet2" Kernel ID : Name 0 : unet2_0 1 : unet2_1 Kernel Name : unet2_0 -------------------------------------------------------------------------------- Kernel Type : DPUKernel Code Size : 1.16MB Param Size : 29.63MB Workload MACs : 83685.54MOPS IO Memory Space : 17.04MB Mean Value : 0, 0, 0, Total Tensor Count : 40 Boundary Input Tensor(s) (H*W*C) input_1:0(0) : 224*224*3 Boundary Output Tensor(s) (H*W*C) conv2d_23_convolution:0(0) : 224*224*11 Total Node Count : 35 Input Node(s) (H*W*C) conv2d_1_convolution(0) : 224*224*3 Output Node(s) (H*W*C) conv2d_23_convolution(0) : 224*224*11 Kernel Name : unet2_1 -------------------------------------------------------------------------------- Kernel Type : CPUKernel Boundary Input Tensor(s) (H*W*C) conv2d_23_Sigmoid:0(0) : 224*224*11 Boundary Output Tensor(s) (H*W*C) conv2d_23_Sigmoid:0(0) : 224*224*11 Input Node(s) (H*W*C) conv2d_23_Sigmoid : 224*224*11 Output Node(s) (H*W*C) conv2d_23_Sigmoid : 224*224*11
Ultra96v2上で動かすときの注意点
sigmoidはDPUで対応してないため、DPUKernelとCPUKernelに分割される。
なのでアプリケーション作成時に使う
「src/fpc_main.cc」のコードで#difine
で、呼び出すDPUkernelは
#define KERNEL_CONV "unet2_0" // #define KERNEL_CONV "unet2" から変更する
に置き換えないとultra96v2上で動かない。
2.Unetで最終層(活性化関数=activation)がsoftmaxのとき
Conv2dでsoftmaxを使ったとき
c10 = Conv2D(filters=nClasses, kernel_size=1, data_format=IMAGE_ORDERING, activation="softmax")(c10) model = Model(inputs=img_input, outputs=c10)
普通にcompileすると次のエラーが出る。
$ ./compile.sh >>>> ************************************************** * VITIS_AI Compilation - Xilinx Inc. ************************************************** 〜 [VAI_C][Warning] Operator [ name: conv2d_transpose_4/strided_slice_2/stack_2, op: Const] will be deleted by dnnc because it's not parent or child of any other operator. [VAI_C][Error] 'Const' op should be fused with current op [Max] by DECENT.
tensorflowのラッパーのtf.keras.layers.Conv2DTranspose
を使っても同じエラーが出てコンパイルできない。
Unetの活性化関数にsoftmaxを使う方法
FCN8のコードを真似て、
Conv2DTranspose()
をConv2D()
の代わりに使ったらうまくいった最終層の改造
FINE_N_CLASSES = 5 def create_fintune_model(model, nClasses=FINE_N_CLASSES): IMAGE_ORDERING = "channels_last" inputs_ = model.inputs dense = model.get_layer(index=-2).output o1 = Conv2DTranspose(nClasses , kernel_size=(1,1) , strides=(1,1) , use_bias=False, data_format=IMAGE_ORDERING )(dense) o = (Activation("softmax"))(o1) models = Model(inputs=inputs_, outputs=o) return models
FCN8とUnetのin/output node の結果
# Unet TF input node name: [<tf.Tensor 'input_1:0' shape=(?, 304, 480, 3) dtype=float32>] TF output node name: [<tf.Tensor 'activation_19/truediv:0' shape=(?, ?, ?, 5) dtype=float32>] # fcn8 TF input node name: [<tf.Tensor 'input_1:0' shape=(?, 256, 256, 3) dtype=float32>] TF output node name: [<tf.Tensor 'activation_1/truediv:0' shape=(?, ?, ?, 5) dtype=float32>]
freezeとquantizeファイルのNODE名
# freeze_tf_graphs.shのNODE名 OUTPUT_NODE='activation_19/truediv' # quantize.shのNODE名 INPUT_NODE="input_1" Q_OUTPUT_NODE="conv2d_transpose_1/conv2d_transpose" # output node of quantized CNN
Unetのmodel.summary()の出力のlast
model.summary() >>>> 〜〜〜〜〜 batch_normalization_18 (BatchNo (None, 304, 480, 64) 256 conv2d_26[0][0] __________________________________________________________________________________________________ activation_18 (Activation) (None, 304, 480, 64) 0 batch_normalization_18[0][0] __________________________________________________________________________________________________ conv2d_transpose_1 (Conv2DTrans (None, 304, 480, 5) 320 activation_18[0][0] __________________________________________________________________________________________________ activation_19 (Activation) (None, 304, 480, 5) 0 conv2d_transpose_1[0][0] ================================================================================================== Total params: 32,449,152 Trainable params: 32,437,376 Non-trainable params: 11,776
Compile時の出力結果。DPUのhwhファイル(dcfファイル)はsoftmax対応にした。
全部DPU Kernelにおさまってる。
$./compile.sh 〜〜〜 ************************************************** * VITIS_AI Compilation - Xilinx Inc. ************************************************** custom.json [VAI_C][Warning] Operator [ name: conv2d_transpose_1/strided_slice_1/stack, op: Const] will be deleted by dnnc because it's not parent or child of any other operator. [VAI_C][Warning] Operator [ name: conv2d_transpose_1/strided_slice_1/stack_1, op: Const] will be deleted by dnnc because it's not parent or child of any other operator. 〜〜〜〜〜〜〜 Kernel topology "unet_kernel_graph.jpg" for network "unet" kernel list info for network "unet" Kernel ID : Name 0 : unet Kernel Name : unet -------------------------------------------------------------------------------- Kernel Type : DPUKernel Code Size : 2.89MB Param Size : 30.92MB Workload MACs : 248040.66MOPS IO Memory Space : 45.01MB Mean Value : 0, 0, 0, Total Tensor Count : 36 Boundary Input Tensor(s) (H*W*C) input_1:0(0) : 304*480*3 Boundary Output Tensor(s) (H*W*C) conv2d_transpose_1_conv2d_transpose:0(0) : 304*480*5 Total Node Count : 35 Input Node(s) (H*W*C) conv2d_1_convolution(0) : 304*480*3 Output Node(s) (H*W*C) conv2d_transpose_1_conv2d_transpose(0) : 304*480*5
$make
でアプリケーションもうまく作成できた。
3.追記:SeparableConv2Dでもsoftmaxを使える
SeparableConv2D
でもsoftmaxを使えた。ただ精度はkernel_size=(1,1), strides=(1,1)
でshapeを変えずに素通りさせてるから、やってることはあんま変わらないかも。
Convtranspose2D
の方がNN構造によっては若干良かった。
FINE_N_CLASSES = 5 def create_fintune_model(model, nClasses=FINE_N_CLASSES): IMAGE_ORDERING = "channels_last" inputs_ = model.inputs dense = model.get_layer(index=-2).output o1 = SeparableConv2D(nClasses, kernel_size=(1,1), strides=(1,1), use_bias=False, data_format=IMAGE_ORDERING)(dense) o = (Activation("softmax"))(o1) models = Model(inputs=inputs_, outputs=o) return models
INPUT_NODE="input_1" OUTPUT_NODE='activation_19/truediv' Q_OUTPUT_NODE="separable_conv2d_1/separable_conv2d" # output node of quantized CNN
$./compile.sh 〜〜〜 Code Size : 2.93MB Param Size : 30.92MB Workload MACs : 248059.33MOPS IO Memory Space : 45.01MB Mean Value : 0, 0, 0, Total Tensor Count : 36 Boundary Input Tensor(s) (H*W*C) input_1:0(0) : 304*480*3 Boundary Output Tensor(s) (H*W*C) separable_conv2d_1_separable_conv2d:0(0) : 304*480*5 Total Node Count : 35 Input Node(s) (H*W*C) conv2d_1_convolution(0) : 304*480*3 Output Node(s) (H*W*C) separable_conv2d_1_separable_conv2d(0) : 304*480*5
参考サイト
・run_fcn8.sh
・Const' op should be fused with current op Conv2DBackpropInput by DECENT