go variant底层原理深入解析

时间:2022-11-24 用户杨杰人气:0

varint

今天本来在研究 OpenTelemetry 的基准性能测试 github.com/zdyj3170101…，测试不同网络协议:grpc, grpc-stream, http, websocket 在发送不同大小数据包时消耗 cpu，吞吐的区别，由 tigrannajaryan 这位大神所写。

好奇翻了翻该大神的 github 仓库，发现了一个同样神奇的库。

这个库也是基准的性能测试，用来测试 go 中不同方法实现的多类型变量在消耗 cpu 以及内存上的区别。

旨在实现一个能存储多类型变量并且具有最小 cpu 消耗以及内存消耗的数据结构。

github.com/tigrannajar…

benchmarks

Interface: 接口
struct：struct，多个 field 存放不同类型的结构体
variant：该库的时间

struct

struct 是一个结构体，typ 表示当前结构体的类型，不同的 field 分别存储不同的类型。

type Variant struct {
   typ   variant.Type
   bytes []byte
   str   string
   i     int
   f     float64
}

struct 还有两种不同的类型。

根据是否返回指针分别为 plainstruct 和 ptrstruct。

而 ptrstruct 相比 plainstruct 就多出一次内存分配，以及增加 cpu 耗时（栈上内存分配几个移位指令就能完成）。

func StringVariant(v string) Variant {
  return Variant{typ: variant.TypeString, str: v}
} 

func StringVariant(v string) *Variant {
  return &Variant{typ: variant.TypeString, str: v}
}

进行 benchmark 后发现 plainstruct 已经 0 byte 分配了，我也想不出还有其他的优化思路。

yangjie05-mac:plainstruct jie.yang05$ go test -bench=. -benchmem plainstruct_test.go  plainstruct.go
Variant size=64 bytes
goos: darwin
goarch: amd64
cpu: VirtualApple @ 2.50GHz
BenchmarkVariantIntGet-10                       1000000000               0.3111 ns/op          0 B/op          0 allocs/op
BenchmarkVariantFloat64Get-10                   1000000000               0.3117 ns/op          0 B/op          0 allocs/op
BenchmarkVariantIntTypeAndGet-10                1000000000               0.3189 ns/op          0 B/op          0 allocs/op
BenchmarkVariantStringTypeAndGet-10             141588165                8.435 ns/op           0 B/op          0 allocs/op
BenchmarkVariantBytesTypeAndGet-10              140932470                8.465 ns/op           0 B/op          0 allocs/op
BenchmarkVariantIntSliceGetAll-10                7293846               165.7 ns/op           640 B/op          1 allocs/op
BenchmarkVariantIntSliceTypeAndGetAll-10         7491408               170.6 ns/op           640 B/op          1 allocs/op
BenchmarkVariantStringSliceTypeAndGetAll-10      7061575               170.1 ns/op           640 B/op          1 allocs/op

variant

一个 variant 由指向真实数据的指针 ptr，一个紧凑的 lenandtype 同时表示长度和类型，这个数据结构还根据不同位的系统做了优化，以及 capOrVal(在slice类型数据时，就是 cap，非slice类型数据时就是val )。

32位系统下，type 占3位，len 用29位表示
64 位系统下，type占3位，len用63位表示。

Variant 设计主要是为了同时满足存储 float64 和 string 的需求。因为 float64 的存在，必须要有一个 int64 类型的字段存储 float64 的值。而 string 的 len 是int类型的字段，就不需要用int64。

type Variant struct {
  // Pointer to the slice start for slice-based types.
  ptr unsafe.Pointer

  // Len and Type fields.
  // Type uses `typeFieldBitCount` least significant bits, Len uses the rest.
  // Len is used only for the slice-based types.
  lenAndType int

  // Capacity for slice-based types, or the value for other types. For Float64Val type
  // contains the 64 bits of the floating point value.
  capOrVal int64
}

比如创建一个string的时候，ptr 中存放指向数据的指针，而lenAndType 中存储slice的长度以及 type。 ``

// NewString creates a Variant of TypeString type.
func NewString(v string) Variant {
  hdr := (*reflect.StringHeader)(unsafe.Pointer(&v))
  if hdr.Len > maxSliceLen {
    panic("maximum len exceeded")
  }

  return Variant{
    ptr:        unsafe.Pointer(hdr.Data),
    lenAndType: (hdr.Len << typeFieldBitCount) | int(TypeString),
  }
}

为什么 variant 要比 plainstruct 快

分别测试 variant 和 plainstruct 创建 string 的性能：

func createVariantString() Variant { // 防止编译优化掉？
   for i := 0; i < 1; i++ {
      return StringVariant(testutil.StrMagicVal)
   }
   return StringVariant("def")
}
func BenchmarkVariantStringTypeAndGet(b *testing.B) {
   for i := 0; i < b.N; i++ {
      v := createVariantString()
      if v.Type() == variant.TypeString {
         if v.String() == "" {
            panic("empty string")
         }
      } else {
         panic("invalid type")
      }
   }
}

使用 go tool 做性能测试，并查看plainstruct的profile文件：

go test -o=bin -bench=. -v -test.cpuprofile=cpuprofile plainstruct_test.go plainstruct.go
go tool pprof -http=:  bin cpuprofile

同理 variant：

go test -o=bin -bench=. -v -test.cpuprofile=cpuprofile variant_test.go variant.go variant_64.go
 go tool pprof -http=:  bin cpuprofile

variant 的汇编:

plainstruct的汇编：

主要区别还是plainstrutc的指令数太多，因为struct的字段更多。

variant 可能的优化？

variant 其实这里还有一个优化的方向，就是在 32 位机器存储 float64 的时候。将 float64 拆成两个 int32，分别用 ptr 和 capOrVal 来存储。这样在 32位系统下，capOrVal 可以由 int64 变成 int，节省了 4 个字节。

type Variant struct {
  // Pointer to the slice start for slice-based types.
  ptr unsafe.Pointer
  // Len and Type fields.
  // Type uses `typeFieldBitCount` least significant bits, Len uses the rest.
  // Len is used only for the slice-based types.
  lenAndType int
  capOrVal int
}

加载全部内容