On 3/3/23 9:12 AM, Dave Horsfall wrote: > I can't help but provide an extract from my antispam log summariser > (AWK): > > # Yes, I have a warped sense of humour here. > /^[JFMAMJJASOND][aeapauuuecoc][nbrrynlgptvc] [ 0123][0-9] / \ > { > date = sprintf("%4d/%.2d/%.2d", > year, months[substr($0, 1, 3)], substr($0, 5, 2)) Thank you for sharing that Dave. > Etc. The idea is not to validate so much as to grab a line of interest > to me and extract the bits that I want. Fair enough. Using bracket expressions for the three letters is definitely another idea that I hadn't considered. But I believe I like what I think is -- what I'm going to describe as -- the more precise alternation listing out each month. (Jan|Feb|Mar... Such an alternation is not going to match Jer like the three bracket expressions will. I also believe that the alternation will be easier to maintain in the future. Especially by someone other than me that has less experience with REs. > In this case I trust the source (the Sendmail log), but of course > that is not always the case... I trust that syslog will produce consistent line beginnings more than I trust the data that is provided to syslog. But I'd still like to be able to detect "Jer" or "Dot" if syslog ever tosses it's cookies. > When doing things like this, you need to ask yourself at least the > following questions: > > 1) What exactly am I trying to do? This is fairly important :-) Filter out known to be okay log entries. > 2) Can I trust the data? Bobby Tables, Reflections on Trusting > Trust... Given that I'm effectively negating things and filtering out log entries that I want to not see (because they are okay) I'm comfortable with trusting the data from syslog. Brown M&Ms come to mind. > 3) Etc. > > And let's not get started on the difference betwixt "trusted" and > "trustworthy" (that distinction keeps security bods awake at night). ACK -- Grant. . . . unix || die