轻量级参数解析库-tiny_cmdline

2024-08-13 约 2885 字预计阅读 6 分钟

摘要: 作者设计了一个轻量级、易于阅读和定制的C++参数解析库tiny_cmdline，基于getopt_long封装，专注于Linux平台，支持C++11。其接口允许用户添加各种类型参数（有值、无值、自定义解析函数），通过模板和类型转换实现灵活扩展，参数存储在容器中，解析过程调用getopt_long，支持自动生成帮助信息。整体目标是简洁、实用，避免复杂边界情况，强调用户自定义能力。 (评价: A)

当编写一些命令行版本软件的时候，往往需要涉及到命令行参数的处理。网上能搜索到一些参数处理库，但是对很多需求来说，它们太过庞大，一般也只适合作为黑盒使用。

因此，我编写了一个轻量级的参数解析库，tiny_cmdline，目的就是轻量，让用户容易阅读和定制。

起初计划要实现在100行以内，但是加上一些注释后，超过100行比较多，目前整体不到200行，我认为这个量级也算方便阅读。

项目地址：https://github.com/caibingcheng/tiny_cmdline

想法

既然要轻，那就要减少corner-case的需求，减少重复轮子，为此，我设想的原则是：

仅考虑linux平台，windows上命令行软件似乎比较少
使用getopt_long作为底层参数解析库，因此不再需要自己处理参数解析，所以tiny_cmdline可以当作getopt_long的C++封装
接口需要现代化，否则的话直接使用getopt_long就好了
仅适配C++11，一方面是我认识到生产环境中大部分是完全支持C++11的，更高标准则不一定；另一方面是C++向下兼容，所以不用担心
不需要考虑性能，一个参数解析模块需要什么性能呢？
不需要考虑安全性，会有什么攻击行为吗？
不需要参数检查，参数解析只做解析，检查（比如范围检查）是用户自己的行为
不需要保存参数结果，用户应该提供“容器”来保存参数结果

设想的接口是（设想，并非最终结果）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


// 构造函数没有显式的动作
TinyCmdline cmdline;

// 用户的容器
int a = 0;
std::string b;
double c = 0.0;

// 添加参数，长选项，短选项，用户容器，描述
cmdline.add_argument("arg_a", 'a', a, "int argument");
cmdline.add_argument("arg_b", 'b', b, "string argument");
cmdline.add_argument("arg_c", 'c', c, "double argument");

// 解析参数
cmdline.parse(argc, argv);

上面的接口已经能应对大多数需求了，目前看起来描述也正常，比如针对"arg_a"，描述是：把"arg_a"或者"a"参数后的值放到变量a中，这个参数的意思是"int argument"；可以发现从用户的视角来看，没有涉及到类型的描述。

但是又考虑到一些开关性质的参数，这些参数没有值，只有存在与否，那么需要这样的接口：

1
2
3
4
5
6


// 用户的容器
bool d = false;
// 添加开关参数，长选项，短选项，用户容器，参数存在时则给用户容器赋值为true，描述
cmdline.add_argument("arg_d", 'd', d, true, "switch argument");
// PS：实际考虑接口重载问题，这个接口设计为了需要指定参数不存在时的默认值和存在时的赋值
cmdline.add_argument("arg_d", 'd', d, false, true, "switch argument");

比如针对第二个接口，描述是：先把d赋值为false，如果"arg_d"或者"d"参数存在，则把d赋值为true，这个参数的意思是"switch argument"。

以上包含了有值参数和无值参数的解析，但是我实际遇到过更复杂一点的需求，emmm，用户需求总是无穷无尽的，不如让他们自定义吧！所以设计了一个自定义解析函数的接口：

1
2
3
4
5
6
7
8


// 用户的容器
int e = 0, f = 0;
// 添加自定义解析函数，长选项，短选项，用户容器，解析函数，描述
cmdline.add_argument("arg_e", 'e', [&e](const char* optarg) {
    if (sscanf(optarg, "%d,%d", &e, &f) != 2) {
        throw std::runtime_error("custom argument parse error");
    }
}, "custom argument");

类似的，只期望在某个选项出现时执行某个函数，而不需要保存参数，那么可以这样：

1
2
3
4


// 添加自定义函数，长选项，短选项，解析函数，描述
cmdline.add_argument("arg_f", 'f', []() {
    std::cout << "custom function" << std::endl;
}, "custom function");

实现

在上述设想的接口中，不难发现“自定义解析函数的接口”是最基础的接口，因为其他接口都可以通过自定义解析函数来实现。因此，先定义这个接口。

1
2


  template <typename T>
  void add_argument(const std::string &long_name, char short_name, T &&f, Argument type, const std::string &help = "");

其中T代表带参数值的自定义函数或者不带参数值的自定义函数，Argument是一个枚举类型，依赖于getopt_long模块，目前提供三个值：

1
2
3
4
5


  enum class Argument {
    none = no_argument,
    required = required_argument,
    optional = optional_argument,
  };

其他两个接口可以转换为上面的接口调用，因此可以直接实现这两个接口：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


  // 有值参数
  template <typename T>
  void add_argument(const std::string &long_name, char short_name, T &value, const std::string &help = "") {
    auto operator_f = [&value](const char *optarg) { value = convert<T>::to(optarg); };
    add_argument(long_name, short_name, operator_f, Argument::required, help);
  }

  // 无值参数
  template <typename T, typename U>
  void add_argument(const std::string &long_name, char short_name, T &value, const U &default_val, const U &placed_val,
                    const std::string &help = "") {
    value = static_cast<T>(default_val);
    auto operator_f = [&value, placed_val]([[maybe_unused]] const char *) { value = static_cast<T>(placed_val); };
    add_argument(long_name, short_name, operator_f, Argument::none, help);
  }

无值参数的接口中，直接赋值即可，通过static_cast也顺便做了类型检查。有值参数接口则需要考虑类型转换，将参数值const char*转换为用户容器的类型，所以定义了额外的工具类，用于转换：

1
2
3


  template <typename T> struct convert {
    static T to(const char *optarg) { return static_cast<T>(std::stoll(optarg)); }
  };

默认情况下，先将参数值转换为long long类型，然后再转换为用户容器的类型。当然也有一些其他的情况，比如需要转换为double类型、std::string类型等等。这时候触发“减少corner-case的需求”的原则（其实是偷懒），非默认的转换就交给用户自己实现了：

1
2
3
4
5
6
7
8


  // 比如在main.cpp中
  template <> struct convert<double> {
    static double to(const char *optarg) { return std::stod(optarg); }
  };

  template <> struct convert<std::string> {
    static std::string to(const char *optarg) { return std::string(optarg); }
  };

这样做也合理，因为我无法确定哪些是常用的转换，有人说std::string是常用的，有人说double是常用的，那不如让用户自己实现，我给出一个我认为常用的转换即可。

现在还有第一个add_argument接口没有实现，考虑到add_argument仅记录用户需要的参数，解析发生在parse，所以需要一个容器存储用户的参数，很容易想到std::unordered_map：

1
2
3
4
5
6
7
8


  struct operator_option {
    char short_name;
    std::string long_name;
    operator_t op;  // operator function, takes the argument value as a parameter
    std::string help;
    Argument type;
  };
  std::unordered_map<int32_t, operator_option> operators_;

其key是根据getopt_long的规则来设计的，value是用户的参数信息。这样，add_argument接口就可以实现了：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


  template <typename T>
  void add_argument(const std::string &long_name, char short_name, T &&f, Argument type, const std::string &help = "") {
    using decay_f = typename std::decay<T>::type;
    constexpr bool is_operator_f = std::is_convertible<decay_f, operator_t>::value;
    constexpr bool is_void_operator_f = std::is_convertible<decay_f, void_operator_t>::value;
    static_assert(is_operator_f || is_void_operator_f, "The operator function must be operator_t or void_operator_t.");

    // 应对只有长选项或只有短选项的情况
    const auto opt_val = static_cast<int32_t>((short_name == '\0') ? opt_val_++ : short_name);
    auto operator_f = convert_operator_f(std::forward<T>(f));
    if (!operators_.emplace(opt_val, operator_option{short_name, long_name, operator_f, help, type}).second) {
      fprintf(stderr, "duplicate option -%c, --%s\n", short_name, long_name.c_str());
    }
  }

parse函数的实现就是调用getopt_long，然后根据getopt_long的返回值来调用用户的参数解析函数，这里就不展开了。

另外，tiny_cmdline还可以根据用户提供的描述生成帮助信息，也支持用户自定义帮助信息，所以"-h"或"–help"参数就被内定了。如果用户需要自定义参数信息，除了使用额外的参数，也可以覆盖"-h"或"–help"，一种写法是：

1
2
3
4
5


  // 添加帮助信息
  cmdline.add_argument("help", 'h', []() {
    user_defined_help();
    exit(0);
  }, Argument::none);

使用

直接摘抄自README.md：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


#include "tiny_cmdline.h"

struct ParsedArgs {
  std::string filename;
  std::string ip;
  int32_t port;
};

// convert char* to std::string should be specialized
template <> struct tiny_cmdline::TinyCmdline::convert<std::string> {
  static std::string to(const char *optarg) { return std::string(optarg); }
};

int main(int argc, char* argv[]) {
    using namespace tiny_cmdline;

    ParsedArgs args;
    TinyCmdline cmd;
    cmd.add_argument("file", 'f', args.filename, "The file to be loaded.");
    cmd.add_argument("ip", 'i', args.ip, "The IP address to connect to.");
    cmd.add_argument("port", 'p', args.port, "The port to connect to.");
    cmd.parse(argc, argv);
}

总结

整体实现不算复杂，主要精力在接口设计上。如果能够支持到C++14或者C++17，还可以更加简洁。

现在想来还有些地方没有考虑清楚，比如convert类的设计是否合理？用户好像不能直观的知道可以通过特化convert类来实现自定义转换，需要查看源码或者文档。不过这种设计我也觉得有意思，参考来源是Pimpl惯用法。