也谈如何写一个Webserver（－）

时间:2021-05-03 grassroot72 人气:0

关于如何写一个Webserver，很多大咖都发表过类似的文章．趁着这个五一假期，我也来凑个份子．

我写Webserver的原因，还得从如何将http协议传送的消息解析说起．当时，我只是想了解一下http的消息解析过程，好能够提高基于http协议的消息处理效率，所以就在网上搜了一下，发现很多人都在用nodejs的http-parser，也许是智商上限封顶^_^!，我居然没太看懂大神的代码逻辑．后来也考察过h2o这个项目的parser，无奈还是没有能领悟大神的精神^_^!．

怎么办．．．，挣扎了半天，最终决定硬着头皮自己写一个http消息的parser吧．就酱，就有了后来我写Maestro Webserver的故事．

既然谈到了http message的解析，那今天这第一篇随笔就谈这个东西吧．http协议的内容说起来历史太久远了，我不是历史老师，网上很多讲解都很棒，我就不多说了．此外，RFC2616, RFC7231等文档也明确的讲解了协议的含义．不过还是应该吐槽一下RFC文档的晦涩难懂哈．．．

还是让我引用一段相对清晰的关于http message的RFC讲解吧

   HTTP messages consist of requests from client to server and responses
   from server to client.

       HTTP-message   = Request | Response     ; HTTP/1.1 messages

   Request (section 5) and Response (section 6) messages use the generic
   message format of RFC 822 [9] for transferring entities (the payload
   of the message). Both types of message consist of a start-line, zero
   or more header fields (also known as "headers"), an empty line (i.e.,
   a line with nothing preceding the CRLF) indicating the end of the
   header fields, and possibly a message-body.

        generic-message = start-line
                          *(message-header CRLF)
                          CRLF
                          [ message-body ]
        start-line      = Request-Line | Status-Line

从这段文字中，我们可以知道不论是request还是response，http message分三段，即start-line，message headers和message body.

那么，在设计我的messge结构体时(对了，我是用C语言开发的)，我会包含这三段内容．我并没有把parser写成独立的单一函数，而是将他们分解成了一组能重复被调用的更小的函数．而从封装的角度来说，我也没有遵守尽量封装数据结构体的原则．我的目的很简单，那就是，简单易懂，容易调用(这会不会被老师调打一顿:-)．

还是看看定义的数据结构体吧．

typedef struct {
  int method;　　 /* GET/POST... */
  char *path;
  int ver_major;
  int ver_minor;
  int code;      /* status code */
  char *status;  /* status text */

  sllist_t *headers;

  int len_startline;
  int len_headers;

  unsigned char *body;    
  unsigned char *body_zipped;
  unsigned char *body_s;  /* point to the range start of the body */
  size_t len_body;
} httpmsg_t;

先不用看和body相关的部分，因为我会在后续如何写Webserver中介绍相关的内容（涉及到body的压缩，断点续传等等）.

下面是相关的函数，

int msg_parse(sllist_t *headers,
              unsigned char **startline,
              unsigned char **body,
              size_t *len_body,
              const unsigned char *buf);

这个是对底层message进行解析的函数，再此之上，我用两个函数封装了它，分别用于解析http request和http response.

httpmsg_t *http_parse_req(const unsigned char *buf);
httpmsg_t *http_parse_rep(const unsigned char *buf);

我写这些底层函数的原则是，尽量利用上一步的结果，不做重复的计算，比如，同一字符串的长度不要多次通过strlen计算，希望这样应该能提高(微不足道^_^!)的性能吧．

在上面的httpmsg_t结构体中，我用了单链表来管理http headers，因为headers的数量不是很多，单链表轮询反而速度更快．

至于上述函数如何实现，感兴趣朋友可以请访问我的github项目，链接https://github.com/grassroot72/Maestro2．

欢迎和我探讨．．

我会在第二篇内容里介绍socket和epoll在Webserver中的应用．．．

加载全部内容