利用Caffe推理CenterNet(下篇)_杨振互联网服务中心

“OLDPAN博客”，侃侃而谈人工智能
深度酝酿优质原创文！

阅读本文需要 7 分钟

此文章承接利用Caffe推理CenterNet(上篇)，将CenterNet推理利用Caffe实现。Caffe虽然不是c++版本运行CenterNet的最优方式，但在某些场景下也是需要的。

在上文中，虽然通过外挂libpostprocess.so动态链接库的方式，实现了CenterNet的后处理部分，但显然不是很优雅，频繁地对显存进行申请和释放可能会影响推理过程中的稳定性和吞吐量，因此我们有必要将后处理部分以Caffe层的形式执行。

将后处理移至Caffe层中

如果移到caffe层中，相当于自己添加一个新的层，那么需要使用protobuf定义新的层，首先我们需要修改caffe.proto。

修改caffe.proto

这里我定义了一个CenternetOutput层，作为CenterNet的后处理部分，需要在caffe.proto中的合适位置添加以下内容：

optional CenternetOutputParameter centernet_output_param = 209;
message CenternetOutputParameterParameter { // Number of classes that are actually predicted. Required! optional uint32 num_classes = 1; optional uint32 kernel_size = 2 [default = 3]; optional float vis_threshold = 3 [default = 0.3]; optional bool apply_nms = 4 [default = false]; optional uint32 feature_map_h = 5 [default = 0]; optional uint32 feature_map_w = 6 [default = 0]; }

并且在之前的res50.prototxt中最后添加以下部分，三个bottom分别为CenterNet最后三个输出：hm、hw、reg：

layer { name: "centernet_output" type: "CenternetOutput" bottom: "conv_blob55" bottom: "conv_blob57" bottom: "conv_blob59" top: "result_out" centernet_output_param { num_classes: 2 kernel_size: 3 vis_threshold: 0.3 } }

修改完prototxt后模型最后几层的结果是这样的，CenternetOutpu即我们定义的后处理层：

修改后记得造出新的caffe.pb.cc和caffe.pb.h，否则会Error parsing text-format caffe.NetParameter: 2715:26: Message type "caffe.LayerParameter" has no field named "centernet_output_param"，最好make clean一下再重新编译。

对于这些后处理层，我们不需要只需要前向过程，不需要反向的过程，所以直接将其设置为：

virtual void Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) { NOT_IMPLEMENTED; } virtual void Backward_gpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) { NOT_IMPLEMENTED; }

而我们的centernet_output_layer.hpp这样写：

接下来写两个这个layer头文件对应两个版本(GPU和CPU)的.cpp。

CPU版本的Layer

GPU版本的如下，后处理部分借鉴了 https://github.com/CaoWGG/TensorRT-CenterNet 这个github。

将cuda后处理挪至caffe层后，推理代码修改如下：

std::vector<vector<float> > CenterNet_Detector::Detect(const cv::Mat& img) {

int count = 1;
std::vector<vector<float> > rlt;

Blob<float>* input_layer = net_->input_blobs()[0];

typedef std::chrono::duration<double, std::ratio<1, 1000>> ms;
auto total_t0 = std::chrono::high_resolution_clock::now();

  auto t0 = std::chrono::high_resolution_clock::now();
  // 这里按照实际输入图像的长宽设定模型的输入大小
  input_layer->Reshape(1, num_channels_, img.rows, img.cols);
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
  net_->Reshape();

  auto t1 = std::chrono::high_resolution_clock::now();
  double reshape_time = std::chrono::duration_cast<ms>(t1 – t0).count();
  std::cout << “Caffe Reshape time: ” << std::fixed << std::setprecision(2)
      << reshape_time << ” ms” << std::endl;

  t0 = std::chrono::high_resolution_clock::now();
  std::vector<cv::Mat> input_channels;
  WrapInputLayer(&input_channels);
  cv::Mat tm = Preprocess(img, &input_channels);
  t1 = std::chrono::high_resolution_clock::now();
  double preprocess_time = std::chrono::duration_cast<ms>(t1 – t0).count();
  std::cout << “Preprocess time: ” << std::fixed << std::setprecision(2)
            << preprocess_time << ” ms” << std::endl;

while(count–)
{

  t0 = std::chrono::high_resolution_clock::now();
  net_->Forward();
  t1 = std::chrono::high_resolution_clock::now();
  double net_time = std::chrono::duration_cast<ms>(t1 – t0).count();
  std::cout << “Net processing time: ” << std::fixed << std::setprecision(2)
            << net_time << ” ms” << std::endl;

  Blob<float>* result_blob = net_->output_blobs()[0];
  const float* result = result_blob->cpu_data();
  std::cout<<“result shpae: ” << result_blob->shape_string()<<std::endl;

const int num_det = result_blob->num();
std::cout<<“num_det: ” << num_det <<std::endl;

  vector<vector<float> > predictions;
  for(int i = 0; i < num_det; ++i)
  {
    vector<float> prediction;
    prediction.push_back(result[0]);
    prediction.push_back(result[1]);
    printf(“score: %f  “, result[1]);
    prediction.push_back(result[2]);
    prediction.push_back(result[3]);
    prediction.push_back(result[4]);
    prediction.push_back(result[5]);
    // printf(“after x1 y1 x2 y2: %f %f %f %f\n”, result[2],result[3],result[4],result[5]);
    predictions.push_back(prediction);
    result += 6;
  }

Mat temp = img;
std::cout << “boxes nums ” << predictions.size() << std::endl;

  double total_time = std::chrono::duration_cast<ms>(t1 – total_t0).count();
  std::cout << “Total time: ” << std::fixed << std::setprecision(2)
            << total_time << ” ms” << std::endl;

  for(int i = 0; i < predictions.size(); i ++)
  {
    vector<float> prediction;
    prediction = predictions[i];
    cv::rectangle(temp,cv::Point((int)(prediction[2]),(int)(prediction[3])),cv::Point((int)(prediction[4]),(int)(prediction[5])),cv::Scalar(0,0,255),1,1,0);
  }

cv::imwrite(“image_result.jpg”, temp);

}
return rlt;
}

这样就完成了~

如果编译.cu遇到identifier "nullptr" is undefined，那是因为在.cu代码中使用了nullptr，需要在nvcc编译命令后面flag加一个-std=c++11，在Caffe的Makefile中为NVCCFLAGS添加一个-std=c++11即可。

NVCCFLAGS += -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS) -std=c++11

如果感觉有收获，不妨分享一下吧~

阅读原文

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/202670.html

利用Caffe推理CenterNet(下篇)

将后处理移至Caffe层中

修改caffe.proto

CPU版本的Layer

作者专栏

oldpan博客