Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions agent/metrics_agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
_ "flashcat.cloud/categraf/inputs/dcgm"
_ "flashcat.cloud/categraf/inputs/disk"
_ "flashcat.cloud/categraf/inputs/diskio"
_ "flashcat.cloud/categraf/inputs/dmesg"
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank-importing the dmesg input will break non-Linux builds unless the input package is guarded by Linux build tags and provides a //go:build !linux stub file. Please ensure the dmesg input follows the same cross-platform pattern as other Linux-only inputs (e.g., inputs/conntrack).

Suggested change
_ "flashcat.cloud/categraf/inputs/dmesg"

Copilot uses AI. Check for mistakes.
_ "flashcat.cloud/categraf/inputs/dns_query"
_ "flashcat.cloud/categraf/inputs/docker"
_ "flashcat.cloud/categraf/inputs/elasticsearch"
Expand Down
241 changes: 241 additions & 0 deletions inputs/dmesg/dmesg.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
package dmesg

import (
"bytes"
"errors"
"log"
"os"
"strconv"
"strings"
"syscall"

"flashcat.cloud/categraf/config"
"flashcat.cloud/categraf/inputs"
"flashcat.cloud/categraf/types"
)
Comment on lines +1 to +15
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This input reads /dev/kmsg and uses Linux-only syscall flags (e.g., O_NONBLOCK). Without Linux build tags plus a !linux stub file, the repo will fail to compile on non-Linux platforms (and the blank import in agent/metrics_agent.go will also break builds). Please add //go:build linux to this file and add a corresponding dmesg_* !linux stub (see inputs/conntrack/conntrack.go and inputs/conntrack/conntrack_nolinux.go for the established pattern).

Copilot uses AI. Check for mistakes.

const inputName = "dmesg"

const (
defaultBufSize = uint32(1 << 14) // 16KB by default
levelMask = uint64(1<<3 - 1)
)

const (
OomError = "Out of memory"
NfConntrackTableFull = "nf_conntrack: table full"
DropPacket = "dropping packet"
WillResetAdapter = "will reset adapter"
MemoryError = "memory error"
ResetSuccessfulForScsi = "Reset successful for scsi"
CallTrace = "Call Trace"
Segfault = "segfault"
NicLinkDown = "NIC Link is Down"
Ext4FsError = "EXT4-fs error"
MediumError = "Medium Error"
PackageTemperatureAboveThreshold = "Package temperature above threshold"
)

type Msg struct {
Level uint64 // SYSLOG lvel
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: comment says "SYSLOG lvel"; should be "SYSLOG level".

Suggested change
Level uint64 // SYSLOG lvel
Level uint64 // SYSLOG level

Copilot uses AI. Check for mistakes.
Facility uint64 // SYSLOG facility
Seq uint64 // Message sequence number
TsUsec int64 // Timestamp in microsecond
Caller string // Message caller
IsFragment bool // This message is a fragment of an early message which is not a fragment
Text string // Log text
DeviceInfo map[string]string // Device info
}

type Instance struct {
config.InstanceConfig

ExternalKeywords []string `toml:"external_keywords"`

conn syscall.RawConn
file *os.File

errorList map[string]int
}

func (ins *Instance) Init() error {

var err error

f, err := os.OpenFile("/dev/kmsg", syscall.O_RDONLY|syscall.O_NONBLOCK, 0)
if err != nil {
log.Println("Error opening /dev/kmsg:", err)
return err
}

ins.conn, err = f.SyscallConn()
if err != nil {
f.Close()
log.Println("Error getting raw connection:", err)
return err
}

ins.errorList = map[string]int{
OomError: 0,
NfConntrackTableFull: 0,
DropPacket: 0,
WillResetAdapter: 0,
MemoryError: 0,
ResetSuccessfulForScsi: 0,
CallTrace: 0,
Segfault: 0,
NicLinkDown: 0,
Ext4FsError: 0,
MediumError: 0,
PackageTemperatureAboveThreshold: 0,
}

for _, keyword := range ins.ExternalKeywords {
ins.errorList[keyword] = 0
}

ins.file = f

return nil
}

type Dmesg struct {
config.PluginConfig
Instances []*Instance `toml:"instances"`
}

func init() {
inputs.Add(inputName, func() inputs.Input {
return &Dmesg{}
})
}

func (d *Dmesg) Clone() inputs.Input {
return &Dmesg{}
}

func (d *Dmesg) Name() string {
return inputName
}

func (d *Dmesg) GetInstances() []inputs.Instance {
ret := make([]inputs.Instance, len(d.Instances))
for i := 0; i < len(d.Instances); i++ {
ret[i] = d.Instances[i]
}
return ret
}

func (ins *Instance) Gather(slist *types.SampleList) {

msgs := make([]Msg, 0)

var syscallError error = nil
err := ins.conn.Read(func(fd uintptr) bool {
for {
buf := make([]byte, defaultBufSize)
_, err := syscall.Read(int(fd), buf)

Check failure on line 137 in inputs/dmesg/dmesg.go

View workflow job for this annotation

GitHub Actions / Go Build Check

cannot use int(fd) (value of type int) as syscall.Handle value in argument to syscall.Read
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if !errors.Is(err, syscall.EINVAL) {
return true
}
}

msg := parseData(buf)
Comment on lines +137 to +146
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read loop ignores the byte count returned by syscall.Read and always passes the full fixed-size buffer to parseData. That means parseData is parsing trailing zero bytes and can mis-handle end-of-line logic. Capture n and call parseData(buf[:n]) (and consider skipping parse on non-nil errors like EINVAL if the message is known-truncated).

Suggested change
_, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if !errors.Is(err, syscall.EINVAL) {
return true
}
}
msg := parseData(buf)
n, err := syscall.Read(int(fd), buf)
if err != nil {
syscallError = err
// EINVAL means buf is not enough, data would be truncated, but still can continue.
if errors.Is(err, syscall.EINVAL) {
// Skip parsing known-truncated data.
continue
}
return true
}
if n <= 0 {
continue
}
msg := parseData(buf[:n])

Copilot uses AI. Check for mistakes.
if msg == nil {
continue
}
msgs = append(msgs, *msg)
}
})

// EAGAIN means no more data, should be treated as normal.
if syscallError != nil && !errors.Is(syscallError, syscall.EAGAIN) {
err = syscallError
}

if err != nil {
log.Println("Error reading from /dev/kmsg:", err)
slist.PushFront(types.NewSample(inputName, "up", 0, nil))
return
}

slist.PushFront(types.NewSample(inputName, "up", 1, nil))
for _, d := range msgs {
for keyword := range ins.errorList {
if strings.Contains(d.Text, keyword) {
ins.errorList[keyword]++
}
}
}
Comment on lines +131 to +172
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gather loop allocates a new 16KB buffer and appends Msg structs for every kmsg line, then does a second pass just to count keyword hits. This creates avoidable allocations and memory growth under heavy kernel logging. Consider reusing a single buffer (or a pool) and incrementing counters as each message is parsed, without storing all messages in msgs.

Copilot uses AI. Check for mistakes.
for keyword, count := range ins.errorList {
slist.PushFront(types.NewSample(inputName, "hit_keyword", count, map[string]string{
"keyword": keyword,
}))
}

}

func (ins *Instance) Cleanup() {
if ins.file != nil {
ins.file.Close()
}
}

func parseData(data []byte) *Msg {
msg := Msg{}

dataLen := len(data)
prefixEnd := bytes.IndexByte(data, ';')
if prefixEnd == -1 {
return nil
}

for index, prefix := range bytes.Split(data[:prefixEnd], []byte(",")) {
switch index {
case 0:
val, _ := strconv.ParseUint(string(prefix), 10, 64)
msg.Level = val & levelMask
msg.Facility = val & (^levelMask)
case 1:
val, _ := strconv.ParseUint(string(prefix), 10, 64)
msg.Seq = val
case 2:
val, _ := strconv.ParseInt(string(prefix), 10, 64)
msg.TsUsec = val
case 3:
msg.IsFragment = prefix[0] != '-'
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential panic in parseData: msg.IsFragment = prefix[0] != '-' assumes the 4th prefix field is non-empty. If the field is empty/malformed, this will index out of range. Add a length check before accessing prefix[0] and treat empty as non-fragment (or return nil).

Suggested change
msg.IsFragment = prefix[0] != '-'
if len(prefix) > 0 {
msg.IsFragment = prefix[0] != '-'
} else {
// Treat empty or malformed fragment field as non-fragment.
msg.IsFragment = false
}

Copilot uses AI. Check for mistakes.
case 4:
msg.Caller = string(prefix)
}
}

textEnd := bytes.IndexByte(data, '\n')
if textEnd == -1 || textEnd <= prefixEnd {
return nil
}

msg.Text = string(data[prefixEnd+1 : textEnd])
if textEnd == dataLen-1 {
return nil
}

msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if info[0] != ' ' {
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential panic in device-info parsing: if info[0] != ' ' will panic if info is empty (e.g., consecutive newlines or an empty trailer slice). Guard len(info) > 0 before indexing.

Suggested change
if info[0] != ' ' {
if len(info) == 0 || info[0] != ' ' {

Copilot uses AI. Check for mistakes.
continue
}

kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}

msg.DeviceInfo[string(kv[0])] = string(kv[1])
Comment on lines +221 to +237
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseData returns nil when the log line ends right after the message text (textEnd == dataLen-1). That drops the common /dev/kmsg case where there is no device-info trailer, so keyword detection will miss most messages. This function should return the parsed Msg even when there is no device info section.

Suggested change
if textEnd == dataLen-1 {
return nil
}
msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if info[0] != ' ' {
continue
}
kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}
msg.DeviceInfo[string(kv[0])] = string(kv[1])
// Only parse device info if there is data after the message text.
if textEnd < dataLen-1 {
msg.DeviceInfo = make(map[string]string, 2)
deviceInfo := bytes.Split(data[textEnd+1:dataLen-1], []byte("\n"))
for _, info := range deviceInfo {
if len(info) == 0 || info[0] != ' ' {
continue
}
kv := bytes.Split(info, []byte("="))
if len(kv) != 2 {
continue
}
msg.DeviceInfo[string(kv[0])] = string(kv[1])
}

Copilot uses AI. Check for mistakes.
}

return &msg
}
Comment on lines +187 to +241
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new parser/keyword matching logic isn’t covered by tests. Since it’s easy to regress and interacts with tricky /dev/kmsg formatting (prefix parsing, no-device-info lines, empty segments), please add unit tests for parseData (at least: normal line without device info, line with device info, and malformed/empty segments).

Copilot uses AI. Check for mistakes.
Loading