mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH v10] Build process uses script to add CFI directives to x86 asm
@ 2015-07-10 13:03 Alex Dowad
  2015-07-25  4:10 ` Rich Felker
  2015-08-26 20:29 ` Rich Felker
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Dowad @ 2015-07-10 13:03 UTC (permalink / raw)
  To: musl

Some functions implemented in asm need to use EBP for purposes other than acting
as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
CFI information from a .debug_frame or .eh_frame ELF section.

Rather than littering our asm with ugly .cfi directives, use an awk script to
insert them in the right places during the build process, so GDB can keep track of
where the current stack frame is relative to the stack pointer. This means GDB can
produce beautiful stack traces at any given point when single-stepping through asm
functions.

Additionally, when registers are saved on the stack and later overwritten, emit
.cfi directives so GDB will know where they were saved relative to the stack
pointer. This way, when you look back up the stack from within an asm function,
you can still reliably print the values of local variables in the caller.

If this awk script were to understand every possible wild and crazy contortion that
an asm programmer can do with the stack and registers, and always emit the exact
.cfi directives needed for GDB to know what the register values were in the
preceding stack frame, it would necessarily be as complex as a full x86 emulator.
That way lies madness.

Hence, we assume that the stack pointer will _only_ ever be adjusted using push/pop
or else add/sub with a constant. We do not attempt to detect every possible way that
a register value could be saved for later use, just the simple and common ways.

Thanks to Szabolcs Nagy for suggesting numerous improvements to this code.
---

Dear musl devs,

The Makefile now uses an AS_CMD variable, which is set conditionally.

The script no longer mistakenly treats __cp_begin and __cp_end as "functions".

Thanks,
Alex Dowad

 Makefile               |  12 ++-
 configure              |  17 ++++
 tools/add-cfi.i386.awk | 227 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 254 insertions(+), 2 deletions(-)
 create mode 100644 tools/add-cfi.i386.awk

diff --git a/Makefile b/Makefile
index 5f74005..620c1fc 100644
--- a/Makefile
+++ b/Makefile
@@ -119,11 +119,19 @@ $(dir $(patsubst %/,%,$(dir $(1))))$(notdir $(1:.s=.o)): $(1)
 endef
 $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 
+# Choose invocation of assembler to be used
+# $(1) is input file, $(2) is output file, $(3) is assembler flags
+ifeq ($(ADD_CFI),yes)
+	AS_CMD = LC_ALL=C awk -f tools/add-cfi.$(ARCH).awk $< | $(CC) -x assembler -c -o $@ -
+else
+	AS_CMD = $(CC) -c -o $@ $<
+endif
+
 %.o: $(ARCH)$(ASMSUBARCH)/%.sub
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.o: $(ARCH)/%.s
-	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
+	$(AS_CMD) $(CFLAGS_ALL_STATIC)
 
 %.o: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
@@ -132,7 +140,7 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.lo: $(ARCH)/%.s
-	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $<
+	$(AS_CMD) $(CFLAGS_ALL_SHARED)
 
 %.lo: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $<
diff --git a/configure b/configure
index beed406..70b77fb 100755
--- a/configure
+++ b/configure
@@ -351,6 +351,22 @@ tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
 test "$debug" = yes && CFLAGS_AUTO=-g
 
 #
+# Preprocess asm files to add extra debugging information if debug is
+# enabled, our assembler supports the needed directives, and the
+# preprocessing script has been written for our architecture.
+#
+printf "checking whether we should preprocess assembly to add debugging information... "
+if fnmatch '-g*|*\ -g*' "$CFLAGS_AUTO $CFLAGS" &&
+   test -f "tools/add-cfi.$ARCH.awk" &&
+   printf ".file 1 \"srcfile.s\"\n.line 1\n.cfi_startproc\n.cfi_endproc" | $CC -g -x assembler -c -o /dev/null 2>/dev/null -
+then
+  ADD_CFI=yes
+else
+  ADD_CFI=no
+fi
+printf "%s\n" "$ADD_CFI"
+
+#
 # Possibly add a -O option to CFLAGS and select modules to optimize with
 # -O3 based on the status of --enable-optimize and provided CFLAGS.
 #
@@ -606,6 +622,7 @@ LIBCC = $LIBCC
 OPTIMIZE_GLOBS = $OPTIMIZE_GLOBS
 ALL_TOOLS = $tools
 TOOL_LIBS = $tool_libs
+ADD_CFI = $ADD_CFI
 EOF
 test "x$static" = xno && echo "STATIC_LIBS ="
 test "x$shared" = xno && echo "SHARED_LIBS ="
diff --git a/tools/add-cfi.i386.awk b/tools/add-cfi.i386.awk
new file mode 100644
index 0000000..4a4a3b6
--- /dev/null
+++ b/tools/add-cfi.i386.awk
@@ -0,0 +1,227 @@
+# Insert GAS CFI directives ("control frame information") into x86-32 asm input
+#
+# CFI directives tell the assembler how to generate "stack frame" debug info
+# This information can tell a debugger (like gdb) how to find the current stack
+#   frame at any point in the program code, and how to find the values which
+#   various registers had at higher points in the call stack
+# With this information, the debugger can show a backtrace, and you can move up
+#   and down the call stack and examine the values of local variables
+
+BEGIN {
+  # don't put CFI data in the .eh_frame ELF section (which we don't keep)
+  print ".cfi_sections .debug_frame"
+
+  # only emit CFI directives inside a function
+  in_function = 0
+
+  # emit .loc directives with line numbers from original source
+  printf ".file 1 \"%s\"\n", ARGV[1]
+  line_number = 0
+
+  # used to detect "call label; label:" trick
+  called = ""
+}
+
+function hex2int(str,   i) {
+  str = tolower(str)
+
+  for (i = 1; i <= 16; i++) {
+    char = substr("0123456789abcdef", i, 1)
+    lookup[char] = i-1
+  }
+
+  result = 0
+  for (i = 1; i <= length(str); i++) {
+    result = result * 16
+    char   = substr(str, i, 1)
+    result = result + lookup[char]
+  }
+  return result
+}
+
+function parse_const(str) {
+  sign = sub(/^-/, "", str)
+  hex  = sub(/^0x/, "", str)
+  if (hex)
+    n = hex2int(str)
+  else
+    n = str+0
+  return sign ? -n : n
+}
+
+function get_const1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is constant)
+  match($0, /-?(0x[0-9a-fA-F]+|[0-9]+),/)
+  return parse_const(substr($0, RSTART, RLENGTH-1))
+}
+function get_reg() {
+  # only use if you already know there is 1 and only 1 register
+  match($0, /%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is register)
+  match($0, /%e(ax|bx|cx|dx|si|di|bp),/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg2() {
+  # for instructions with 2 operands, get 2nd operand (assuming it is register)
+  match($0, /,%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+RLENGTH-3, 3)
+}
+
+function adjust_sp_offset(delta) {
+  if (in_function)
+    printf ".cfi_adjust_cfa_offset %d\n", delta
+}
+
+{
+  line_number = line_number + 1
+
+  # clean the input up before doing anything else
+  # delete comments
+  gsub(/(#|\/\/).*/, "")
+
+  # canonicalize whitespace
+  gsub(/[ \t]+/, " ") # mawk doesn't understand \s
+  gsub(/ *, */, ",")
+  gsub(/ *: */, ": ")
+  gsub(/ $/, "")
+  gsub(/^ /, "")
+}
+
+# check for assembler directives which we care about
+/^\.(section|data|text)/ {
+  # a .cfi_startproc/.cfi_endproc pair should be within the same section
+  # otherwise, clang will choke when generating ELF output
+  if (in_function) {
+    print ".cfi_endproc"
+    in_function = 0
+  }
+}
+/^\.type [a-zA-Z0-9_]+,\@function/ {
+  functions[substr($2, 1, length($2)-10)] = 1
+}
+# not interested in assembler directives beyond this, just pass them through
+/^\./ {
+  print
+  next
+}
+
+/^[a-zA-Z0-9_]+:/ {
+  label = substr($1, 1, length($1)-1) # drop trailing :
+
+  if (called == label) {
+    # note adjustment of stack pointer from "call label; label:"
+    adjust_sp_offset(4)
+  }
+
+  if (functions[label]) {
+    if (in_function)
+      print ".cfi_endproc"
+
+    in_function = 1
+    print ".cfi_startproc"
+
+    for (register in saved)
+      delete saved[register]
+    for (register in dirty)
+      delete dirty[register]
+  }
+
+  # an instruction may follow on the same line, so continue processing
+}
+
+/^$/ { next }
+
+{
+  called = ""
+  printf ".loc 1 %d\n", line_number
+  print
+}
+
+# KEEPING UP WITH THE STACK POINTER
+# We do NOT attempt to understand foolish and ridiculous tricks like stashing
+#   the stack pointer and then using %esp as a scratch register, or bitshifting
+#   it or taking its square root or anything stupid like that.
+# %esp should only be adjusted by pushing/popping or adding/subtracting constants
+#
+/pushl?/ {
+  if (match($0, / %(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(2)
+  else
+    adjust_sp_offset(4)
+}
+/popl?/ {
+  if (match($0, / %(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(-2)
+  else
+    adjust_sp_offset(-4)
+}
+/addl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%esp/ { adjust_sp_offset(-get_const1()) }
+/subl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%esp/ { adjust_sp_offset(get_const1()) }
+
+/call/ {
+  if (match($0, /call [0-9]+f/)) # "forward" label
+    called = substr($0, RSTART+5, RLENGTH-6)
+  else if (match($0, /call [0-9a-zA-Z_]+/))
+    called = substr($0, RSTART+5, RLENGTH-5)
+}
+
+# TRACKING REGISTER VALUES FROM THE PREVIOUS STACK FRAME
+#
+/pushl? %e(ax|bx|cx|dx|si|di|bp)/ { # don't match "push (%reg)"
+  # if a register is being pushed, and its value has not changed since the
+  #   beginning of this function, the pushed value can be used when printing
+  #   local variables at the next level up the stack
+  # emit '.cfi_rel_offset' for that
+
+  if (in_function) {
+    register = get_reg()
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,0\n", register
+      saved[register] = 1
+    }
+  }
+}
+
+/movl? %e(ax|bx|cx|dx|si|di|bp),-?(0x[0-9a-fA-F]+|[0-9]+)?\(%esp\)/ {
+  if (in_function) {
+    register = get_reg()
+    if (match($0, /-?(0x[0-9a-fA-F]+|[0-9]+)\(%esp\)/)) {
+      offset = parse_const(substr($0, RSTART, RLENGTH-6))
+    } else {
+      offset = 0
+    }
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,%d\n", register, offset
+      saved[register] = 1
+    }
+  }
+}
+
+# IF REGISTER VALUES ARE UNCEREMONIOUSLY TRASHED
+# ...then we want to know about it.
+#
+function trashed(register) {
+  if (in_function && !saved[register] && !dirty[register]) {
+    printf ".cfi_undefined %s\n", register
+  }
+  dirty[register] = 1
+}
+# this does NOT exhaustively check for all possible instructions which could
+# overwrite a register value inherited from the caller (just the common ones)
+/mov.*,%e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg2()) }
+/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr) %e(ax|bx|cx|dx|si|di|bp),/ {
+  trashed(get_reg1())
+}
+/i?mul [^,]*$/                    { trashed("eax"); trashed("edx") }
+/i?mul %e(ax|bx|cx|dx|si|di|bp),/ { trashed(get_reg1()) }
+/i?div/                           { trashed("eax"); trashed("edx") }
+/(dec|inc|not|neg|pop) %e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg()) }
+/cpuid/ { trashed("eax"); trashed("ebx"); trashed("ecx"); trashed("edx") }
+
+END {
+  if (in_function)
+    print ".cfi_endproc"
+}
-- 
2.0.0.GIT



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v10] Build process uses script to add CFI directives to x86 asm
  2015-07-10 13:03 [PATCH v10] Build process uses script to add CFI directives to x86 asm Alex Dowad
@ 2015-07-25  4:10 ` Rich Felker
  2015-08-26 20:29 ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2015-07-25  4:10 UTC (permalink / raw)
  To: musl

On Fri, Jul 10, 2015 at 03:03:24PM +0200, Alex Dowad wrote:
> The Makefile now uses an AS_CMD variable, which is set conditionally.
> 
> The script no longer mistakenly treats __cp_begin and __cp_end as "functions".

I tested this version and it seems to be working! Thanks again for
your patience -- I've been really busy the past couple weeks. I'll
commit soon if nothing else comes up.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v10] Build process uses script to add CFI directives to x86 asm
  2015-07-10 13:03 [PATCH v10] Build process uses script to add CFI directives to x86 asm Alex Dowad
  2015-07-25  4:10 ` Rich Felker
@ 2015-08-26 20:29 ` Rich Felker
  2015-08-27  3:38   ` Alex
  1 sibling, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-08-26 20:29 UTC (permalink / raw)
  To: musl

On Fri, Jul 10, 2015 at 03:03:24PM +0200, Alex Dowad wrote:
> Some functions implemented in asm need to use EBP for purposes other than acting
> as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
> 6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
> CFI information from a .debug_frame or .eh_frame ELF section.
> [...]

Committed, finally!

Sorry for the long delay. I was using this locally without problem and
somehow forgot that I had not actually committed it despite saying I
would do so if no problems were found.

The only change I made was re-folding the commit message to fit well
in 80 columns.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v10] Build process uses script to add CFI directives to x86 asm
  2015-08-26 20:29 ` Rich Felker
@ 2015-08-27  3:38   ` Alex
  2015-08-27  3:47     ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Alex @ 2015-08-27  3:38 UTC (permalink / raw)
  To: musl

On Wed, Aug 26, 2015 at 10:29 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Jul 10, 2015 at 03:03:24PM +0200, Alex Dowad wrote:
>> Some functions implemented in asm need to use EBP for purposes other than acting
>> as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
>> 6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
>> CFI information from a .debug_frame or .eh_frame ELF section.
>> [...]
>
> Committed, finally!
>
> Sorry for the long delay. I was using this locally without problem and
> somehow forgot that I had not actually committed it despite saying I
> would do so if no problems were found.
>
> The only change I made was re-folding the commit message to fit well
> in 80 columns.

So I guess a script for x86-64 is needed now?

AD


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v10] Build process uses script to add CFI directives to x86 asm
  2015-08-27  3:38   ` Alex
@ 2015-08-27  3:47     ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2015-08-27  3:47 UTC (permalink / raw)
  To: musl

On Thu, Aug 27, 2015 at 05:38:12AM +0200, Alex wrote:
> On Wed, Aug 26, 2015 at 10:29 PM, Rich Felker <dalias@libc.org> wrote:
> > On Fri, Jul 10, 2015 at 03:03:24PM +0200, Alex Dowad wrote:
> >> Some functions implemented in asm need to use EBP for purposes other than acting
> >> as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
> >> 6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
> >> CFI information from a .debug_frame or .eh_frame ELF section.
> >> [...]
> >
> > Committed, finally!
> >
> > Sorry for the long delay. I was using this locally without problem and
> > somehow forgot that I had not actually committed it despite saying I
> > would do so if no problems were found.
> >
> > The only change I made was re-folding the commit message to fit well
> > in 80 columns.
> 
> So I guess a script for x86-64 is needed now?

Yes, that would be great. I'm trying to package up 1.1.11 now, so
maybe we can plan to get the x86_64 CFI committed somewhere early in
the next release cycle?

Thanks again for your work on this!

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-27  3:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 13:03 [PATCH v10] Build process uses script to add CFI directives to x86 asm Alex Dowad
2015-07-25  4:10 ` Rich Felker
2015-08-26 20:29 ` Rich Felker
2015-08-27  3:38   ` Alex
2015-08-27  3:47     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).