mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH] Build process uses script to add CFI directives to x86 asm
@ 2015-05-19  9:31 Alex Dowad
  2015-05-19 11:16 ` u-wsnj
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Dowad @ 2015-05-19  9:31 UTC (permalink / raw)
  To: musl

Some functions implemented in asm need to use EBP for purposes other than acting
as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
CFI information from a .debug_frame or .eh_frame ELF section.

Rather than littering our asm with ugly .cfi directives, use an awk script to
insert them in the right places during the build process, so GDB can keep track of
where the current stack frame is relative to the stack pointer. This means GDB can
produce beautiful stack traces at any given point when single-stepping through asm
functions.

Additionally, when registers are saved on the stack and later overwritten, emit
.cfi directives so GDB will know where they were saved relative to the stack
pointer. This way, when you look back up the stack from within an asm function,
you can still reliably print the values of local variables in the caller.

If this awk script were to understand every possible wild and crazy contortion that
an asm programmer can do with the stack and registers, and always emit the exact
.cfi directives needed for GDB to know what the register values were in the
preceding stack frame, it would necessarily be as complex as a full x86 emulator.
That way lies madness.

Hence, we assume that the stack pointer will _only_ ever be adjusted using push/pop
or else add/sub with a constant. We do not attempt to detect every possible way that
a register value could be saved for later use, just the simple and common ways.

Thanks to Szabolcs Nagy for suggesting numerous improvements to this code.
---

Dear muslmen,

Tested on a 32-bit machine with a BusyBox-based distro. BusyBox awk gobbled up the
awk code with not a word of complaint. The only trouble was with the use of "echo"
in the configure script -- BusyBox ash's "echo" is different from bash's.

Tried debugging the resulting binaries with GDB, the results were good.

I also tried building with clang. I take it you don't normally use clang?

clang doesn't like the ".file" directive *if* the assembler was invoked with -g.
You can run it without -g, or with -g and with no ".file" directives. In either
case, it produces binaries which do contain some DWARF data, but GDB can't
display backtraces which debugging those binaries.

Sorry, but I still haven't found a better way of writing the Makefile rules.
Attempts to apply your suggestions re: the aswrap script have not resulted in
anything much better.

It was also suggested that the Makefile rules for "subarch" asm files should
be updated, but we don't have an "add-cfi" script for any arch which actually
has subarchs.

Thanks,
Alex Dowad

 Makefile               |   8 ++
 configure              |  16 ++++
 tools/add-cfi.i386.awk | 207 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 231 insertions(+)
 create mode 100644 tools/add-cfi.i386.awk

diff --git a/Makefile b/Makefile
index 6559295..1bb2a0a 100644
--- a/Makefile
+++ b/Makefile
@@ -118,7 +118,11 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.o: $(ARCH)/%.s
+ifeq ($(ADD_CFI),yes)
+	LC_ALL=C awk -f tools/add-cfi.$(ARCH).awk $< | $(CC) $(CFLAGS_ALL_STATIC) -x assembler -c -o $@ -
+else
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
+endif
 
 %.o: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
@@ -127,7 +131,11 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.lo: $(ARCH)/%.s
+ifeq ($(ADD_CFI),yes)
+	LC_ALL=C awk -f tools/add-cfi.$(ARCH).awk $< | $(CC) $(CFLAGS_ALL_SHARED) -x assembler -c -o $@ -
+else
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $<
+endif
 
 %.lo: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $<
diff --git a/configure b/configure
index 143dc92..84f8f47 100755
--- a/configure
+++ b/configure
@@ -317,6 +317,21 @@ tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
 test "$debug" = yes && CFLAGS_AUTO=-g
 
 #
+# Preprocess asm files to add extra debugging information if debug is
+# enabled, our assembler supports the needed directives, and the
+# preprocessing script has been written for our architecture.
+#
+if expr "$CFLAGS_AUTO" : "-g" >/dev/null &&
+   test -f "tools/add-cfi.$ARCH.awk" &&
+   echo ".cfi_startproc
+.cfi_endproc" | $CC -x assembler -c -o /dev/null -
+then
+  ADD_CFI=yes
+else
+  ADD_CFI=no
+fi
+
+#
 # Possibly add a -O option to CFLAGS and select modules to optimize with
 # -O3 based on the status of --enable-optimize and provided CFLAGS.
 #
@@ -570,6 +585,7 @@ LDFLAGS = $LDFLAGS_AUTO $LDFLAGS
 CROSS_COMPILE = $CROSS_COMPILE
 LIBCC = $LIBCC
 OPTIMIZE_GLOBS = $OPTIMIZE_GLOBS
+ADD_CFI = $ADD_CFI
 EOF
 test "x$static" = xno && echo "STATIC_LIBS ="
 test "x$shared" = xno && echo "SHARED_LIBS ="
diff --git a/tools/add-cfi.i386.awk b/tools/add-cfi.i386.awk
new file mode 100644
index 0000000..53ae798
--- /dev/null
+++ b/tools/add-cfi.i386.awk
@@ -0,0 +1,207 @@
+# Insert GAS CFI directives ("control frame information") into x86-32 asm input
+#
+# CFI directives tell the assembler how to generate "stack frame" debug info
+# This information can tell a debugger (like gdb) how to find the current stack
+#   frame at any point in the program code, and how to find the values which
+#   various registers had at higher points in the call stack
+# With this information, the debugger can show a backtrace, and you can move up
+#   and down the call stack and examine the values of local variables
+
+BEGIN {
+  # don't put CFI data in the .eh_frame ELF section (which we don't keep)
+  print ".cfi_sections .debug_frame"
+
+  # only emit CFI directives inside a function
+  in_function = 0
+
+  # emit .loc directives with line numbers from original source
+  printf ".file 1 \"%s\"\n", ARGV[1]
+  line_number = 0
+}
+
+function hex2int(str,   i) {
+  str = tolower(str)
+
+  for (i = 1; i <= 16; i++) {
+    char = substr("0123456789abcdef", i, 1)
+    lookup[char] = i-1
+  }
+
+  result = 0
+  for (i = 1; i <= length(str); i++) {
+    result = result * 16
+    char   = substr(str, i, 1)
+    result = result + lookup[char]
+  }
+  return result
+}
+
+function parse_const(str) {
+  sign = sub(/^-/, "", str)
+  hex  = sub(/^0x/, "", str)
+  if (hex)
+    n = hex2int(str)
+  else
+    n = str+0
+  return sign ? -n : n
+}
+
+function get_const1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is constant)
+  match($0, /-?(0x[0-9a-fA-F]+|[0-9]+),/)
+  return parse_const(substr($0, RSTART, RLENGTH-1))
+}
+function get_reg() {
+  # only use if you already know there is 1 and only 1 register
+  match($0, /%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is register)
+  match($0, /%e(ax|bx|cx|dx|si|di|bp),/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg2() {
+  # for instructions with 2 operands, get 2nd operand (assuming it is register)
+  match($0, /,%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+RLENGTH-3, 3)
+}
+
+function adjust_sp_offset(delta) {
+  if (in_function)
+    printf ".cfi_adjust_cfa_offset %d\n", delta
+}
+
+{
+  line_number = line_number + 1
+
+  # clean the input up before doing anything else
+  # delete comments
+  gsub("/(#|//).*/", "")
+
+  # canonicalize whitespace
+  gsub(/\s+/, " ")
+  gsub(/ *, */, ",")
+  gsub(/ *: */, ": ")
+  gsub(/ $/, "")
+  gsub(/^ /, "")
+}
+
+# check for assembler directives which we care about
+/^\.section/ {
+  if (in_function) {
+    print ".cfi_endproc"
+    in_function = 0
+  }
+}
+/^\.globa?l\s+\w+/ {
+  globals[$2] = 1
+}
+# not interested in assembler directives beyond this, just pass them through
+/^\./ {
+  print
+  next
+}
+
+/^\w+:/ {
+  label = substr($1, 1, length($1)-1) # drop trailing :
+
+  if (globals[label]) {
+    if (in_function)
+      print ".cfi_endproc"
+
+    in_function = 1
+    print ".cfi_startproc"
+
+    for (register in saved)
+      delete saved[register]
+    for (register in dirty)
+      delete dirty[register]
+  }
+}
+
+/^$/ { next }
+
+{
+  printf ".loc 1 %d\n", line_number
+  print
+}
+
+# KEEPING UP WITH THE STACK POINTER
+# We do NOT attempt to understand foolish and ridiculous tricks like stashing
+#   the stack pointer and then using %esp as a scratch register, or bitshifting
+#   it or taking its square root or anything stupid like that.
+# %esp should only be adjusted by pushing/popping or adding/subtracting constants
+#
+/pushl?/ {
+  if (match($0, /\s+%(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(2)
+  else
+    adjust_sp_offset(4)
+}
+/popl?/ {
+  if (match($0, /\s+%(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(-2)
+  else
+    adjust_sp_offset(-4)
+}
+/addl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%esp/ { adjust_sp_offset(-get_const1()) }
+/subl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%esp/ { adjust_sp_offset(get_const1()) }
+
+# TRACKING REGISTER VALUES FROM THE PREVIOUS STACK FRAME
+#
+/pushl? %e(ax|bx|cx|dx|si|di|bp)/ { # don't match "push (%reg)"
+  # if a register is being pushed, and its value has not changed since the
+  #   beginning of this function, the pushed value can be used when printing
+  #   local variables at the next level up the stack
+  # emit '.cfi_rel_offset' for that
+
+  if (in_function) {
+    register = get_reg()
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,0\n", register
+      saved[register] = 1
+    }
+  }
+}
+
+/movl? %e(ax|bx|cx|dx|si|di|bp),-?(0x[0-9a-fA-F]+|[0-9]+)?\(%esp\)/ {
+  if (in_function) {
+    register = get_reg()
+    if (match($0, /-?(0x[0-9a-fA-F]+|[0-9]+)\(%esp\)/)) {
+      offset = parse_const(substr($0, RSTART, RLENGTH-6))
+    } else {
+      offset = 0
+    }
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,%d\n", register, offset
+      saved[register] = 1
+    }
+  }
+}
+
+# IF REGISTER VALUES ARE UNCEREMONIOUSLY TRASHED
+# ...then we want to know about it.
+#
+function trashed(register) {
+  if (in_function && !saved[register] && !dirty[register]) {
+    printf ".cfi_undefined %s\n", register
+  }
+  dirty[register] = 1
+}
+# this does NOT exhaustively check for all possible instructions which could
+# overwrite a register value inherited from the caller (just the common ones)
+/mov.*,%e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg2()) }
+/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr) %e(ax|bx|cx|dx|si|di|bp),/ {
+  trashed(get_reg1())
+}
+/i?mul [^,]*$/                    { trashed("eax"); trashed("edx") }
+/i?mul %e(ax|bx|cx|dx|si|di|bp),/ { trashed(get_reg1()) }
+/i?div/                           { trashed("eax"); trashed("edx") }
+/(dec|inc|not|neg|pop) %e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg()) }
+/cpuid/ { trashed("eax"); trashed("ebx"); trashed("ecx"); trashed("edx") }
+
+END {
+  if (in_function)
+    print ".cfi_endproc"
+}
-- 
2.0.0.GIT



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Build process uses script to add CFI directives to x86 asm
  2015-05-19  9:31 [PATCH] Build process uses script to add CFI directives to x86 asm Alex Dowad
@ 2015-05-19 11:16 ` u-wsnj
  2015-05-19 11:43   ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: u-wsnj @ 2015-05-19 11:16 UTC (permalink / raw)
  To: musl

On Tue, May 19, 2015 at 11:31:14AM +0200, Alex Dowad wrote:
> awk code with not a word of complaint. The only trouble was with the use of "echo"
> in the configure script -- BusyBox ash's "echo" is different from bash's.

Indeed, portable sh scripts should be using printf instead of echo,
every time.

echo(){ printf '%s\n' "$@"; }

yields a well-behaving "plain" echo variant.

If one needs some special tricks, printf covers more cases than echo anyway.

I may be wrong but I am not aware of a "nowadays practically useful"
platform where printf is not available (even if we do not consider a
possibility to make it available).

Rune



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Build process uses script to add CFI directives to x86 asm
  2015-05-19 11:16 ` u-wsnj
@ 2015-05-19 11:43   ` Szabolcs Nagy
  0 siblings, 0 replies; 8+ messages in thread
From: Szabolcs Nagy @ 2015-05-19 11:43 UTC (permalink / raw)
  To: musl

* u-wsnj@aetey.se <u-wsnj@aetey.se> [2015-05-19 13:16:22 +0200]:
> On Tue, May 19, 2015 at 11:31:14AM +0200, Alex Dowad wrote:
> > awk code with not a word of complaint. The only trouble was with the use of "echo"
> > in the configure script -- BusyBox ash's "echo" is different from bash's.
> 
> Indeed, portable sh scripts should be using printf instead of echo,
> every time.
> 
> echo(){ printf '%s\n' "$@"; }
> 
> yields a well-behaving "plain" echo variant.
> 

configure already uses its own echo definition

so ash's echo should not be used

how did the config script end up calling ash's echo?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Build process uses script to add CFI directives to x86 asm
@ 2015-05-20 17:08 Alex Dowad
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Dowad @ 2015-05-20 17:08 UTC (permalink / raw)
  To: musl

> Writing portable shell script is a little hard, but I've found a good
> approach is to make something ancient like pdksh a symlink for /bin/sh and
> even /bin/bash - typically in a small VM. This makes it much more likely
> that your code will work without big changes on AIX, BSDs, etc. The other
> way is to use dash, not BusyBox ash, as BusyBox ash includes a few bashisms
> depending on the configure options at build time.

Just tried swapping all references to /bin/sh to /bin/pdksh instead -- the
config script ran just fine and musl was built correctly (with the added CFI
data). Same with /bin/dash.

Thanks for the idea -- I will keep it in mind next time I am writing shell.

Maintainers: is there still anything blocking this patch from getting merged
in? Please let me know so I can fix any remaining problems.

Thanks,
Alex Dowad


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Build process uses script to add CFI directives to x86 asm
@ 2015-05-19 16:22 Alex Dowad
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Dowad @ 2015-05-19 16:22 UTC (permalink / raw)
  To: musl

> configure already uses its own echo definition

> so ash's echo should not be used

> how did the config script end up calling ash's echo?

OK, thanks for pointing this out. I shouldn't have said that the problem was
with ash's echo.

The original configure script worked when I ran it on Linux Mint + bash, and didn't
work on Alpine Linux + BusyBox ash. I didn't notice echo() in the configure script,
did some experimentation with bash's echo and ash's, found a difference which seemed
to explain the observed problem, and worked around it.

Now it looks like the issue was something else. I can't seem to duplicate the problem
right this moment; however, the configure script as it is right now works on both
platforms.

Mysteries, mysteries...

AD


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Build process uses script to add CFI directives to x86 asm
  2015-05-16 13:54 Alex Dowad
  2015-05-18 13:34 ` Szabolcs Nagy
@ 2015-05-18 15:52 ` Rich Felker
  1 sibling, 0 replies; 8+ messages in thread
From: Rich Felker @ 2015-05-18 15:52 UTC (permalink / raw)
  To: musl

On Sat, May 16, 2015 at 03:54:31PM +0200, Alex Dowad wrote:
> diff --git a/configure b/configure
> index 143dc92..72349a2 100755
> --- a/configure
> +++ b/configure
> @@ -317,6 +317,18 @@ tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
>  test "$debug" = yes && CFLAGS_AUTO=-g
>  
>  #
> +# Preprocess asm files to add extra debugging information if debug is
> +# enabled, our assembler supports the needed directives, and the
> +# preprocessing script has been written for our architecture.
> +#
> +if (test "$debug" = yes) &&
> +   (test -f "tools/add-cfi.awk.$ARCH") &&
> +   (echo ".cfi_startproc\n.cfi_endproc" | $CC -x assembler -c -o /dev/null -)
> +then
> +  ADD_CFI=yes
> +fi
> +

I think CFLAGS should be checked for -g rather than using $debug here.
I'm not really a fan of --enable-debug despite having been the one who
added it, and I usually just put -g in CFLAGS.

Also, minor nit: it's best to avoid () in shell scripts when not
needed, since it usually leads to forking. {} can be used for grouping
but it's not needed here either as far as I can tell.

> +END {
> +  if (in_function)
> +    print ".cfi_endproc"
> +}
> \ No newline at end of file

Note that you still have a missing newline at the end of the awk
script.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Build process uses script to add CFI directives to x86 asm
  2015-05-16 13:54 Alex Dowad
@ 2015-05-18 13:34 ` Szabolcs Nagy
  2015-05-18 15:52 ` Rich Felker
  1 sibling, 0 replies; 8+ messages in thread
From: Szabolcs Nagy @ 2015-05-18 13:34 UTC (permalink / raw)
  To: musl

* Alex Dowad <alexinbeijing@gmail.com> [2015-05-16 15:54:31 +0200]:
> 
> configure script now checks whether debugging is enabled, whether the assembler
> understands CFI directives, whether it understands '-x assembler', and whether
> there the AWK script is present for the current arch.
> 
> aswrap.sh is gone -- the problem is that the arguments it wants (and the order it
>   wants them in) is different from $(CC). So it's hard to use an AS_CMD variable
>   as suggested by RF to run whichever command is appropriate.
> 
> Suggestions from shell/Makefile scripting wizards on how to do this better
>   are appreciated!
> 

it is possible to parse the options but not trivial

may be have the arguments in an order that's good for the wrapper

> The processed asm now contains .file and .loc directives, so when you debug it in
> GDB, it will show you the original source file, and highlight the correct lines.
> 

nice

> I haven't tested with busybox awk yet, just with GNU awk. If there are no further
>   issues with the code, I will do that next.
> 

i see one issue: \s, \w may not be supported (eg mawk does not have them
nor [[:space:]] char classes)

> 
>  Makefile               |  10 ++-
>  configure              |  13 ++++
>  tools/add-cfi.awk.i386 | 207 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 229 insertions(+), 1 deletion(-)
>  create mode 100644 tools/add-cfi.awk.i386
> 
> diff --git a/Makefile b/Makefile
> index 6559295..6f17cc7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -118,7 +118,11 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
>  	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $(dir $<)$(shell cat $<)
>  
>  %.o: $(ARCH)/%.s
> +ifeq ($(ADD_CFI),yes)
> +	LC_ALL=C awk -f tools/add-cfi.awk.$(ARCH) $< | $(CC) $(CFLAGS_ALL_STATIC) -x assembler -c -o $@ -
> +else
>  	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
> +endif
>  

this may work, with a reasonably simple aswrap script:

%.o: $(ARCH)/%.s
	$(ASWRAP) $(ARCH) $< $(CC) $(CFLAGS_ALL_STATIC) -c -o $@

i don't know which approach is less ugly.

the %.sub rules should be handled too..

> diff --git a/configure b/configure
> index 143dc92..72349a2 100755
> --- a/configure
> +++ b/configure
> @@ -317,6 +317,18 @@ tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
>  test "$debug" = yes && CFLAGS_AUTO=-g
>  
>  #
> +# Preprocess asm files to add extra debugging information if debug is
> +# enabled, our assembler supports the needed directives, and the
> +# preprocessing script has been written for our architecture.
> +#
> +if (test "$debug" = yes) &&
> +   (test -f "tools/add-cfi.awk.$ARCH") &&
> +   (echo ".cfi_startproc\n.cfi_endproc" | $CC -x assembler -c -o /dev/null -)
> +then
> +  ADD_CFI=yes
> +fi
> +

else ADD_CFI=no

(otherwise an ADD_CFI env var may influence configure behaviour)

> +#
>  # Possibly add a -O option to CFLAGS and select modules to optimize with
>  # -O3 based on the status of --enable-optimize and provided CFLAGS.
>  #
> @@ -570,6 +582,7 @@ LDFLAGS = $LDFLAGS_AUTO $LDFLAGS
>  CROSS_COMPILE = $CROSS_COMPILE
>  LIBCC = $LIBCC
>  OPTIMIZE_GLOBS = $OPTIMIZE_GLOBS
> +ADD_CFI = $ADD_CFI
>  EOF
>  test "x$static" = xno && echo "STATIC_LIBS ="
>  test "x$shared" = xno && echo "SHARED_LIBS ="


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Build process uses script to add CFI directives to x86 asm
@ 2015-05-16 13:54 Alex Dowad
  2015-05-18 13:34 ` Szabolcs Nagy
  2015-05-18 15:52 ` Rich Felker
  0 siblings, 2 replies; 8+ messages in thread
From: Alex Dowad @ 2015-05-16 13:54 UTC (permalink / raw)
  To: musl

Some functions implemented in asm need to use EBP for purposes other than acting
as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
CFI information from a .debug_frame or .eh_frame ELF section.

Rather than littering our asm with ugly .cfi directives, use an awk script to
insert them in the right places during the build process, so GDB can keep track of
where the current stack frame is relative to the stack pointer. This means GDB can
produce beautiful stack traces at any given point when single-stepping through asm
functions.

Additionally, when registers are saved on the stack and later overwritten, emit
.cfi directives so GDB will know where they were saved relative to the stack
pointer. This way, when you look back up the stack from within an asm function,
you can still reliably print the values of local variables in the caller.

If this awk script were to understand every possible wild and crazy contortion that
an asm programmer can do with the stack and registers, and always emit the exact
.cfi directives needed for GDB to know what the register values were in the
preceding stack frame, it would necessarily be as complex as a full x86 emulator.
That way lies madness.

Hence, we assume that the stack pointer will _only_ ever be adjusted using push/pop
or else add/sub with a constant. We do not attempt to detect every possible way that
a register value could be saved for later use, just the simple and common ways.

Thanks to Szabolcs Nagy for suggesting numerous improvements to this code.
---

Wash, rinse, and repeat...

configure script now checks whether debugging is enabled, whether the assembler
understands CFI directives, whether it understands '-x assembler', and whether
there the AWK script is present for the current arch.

aswrap.sh is gone -- the problem is that the arguments it wants (and the order it
  wants them in) is different from $(CC). So it's hard to use an AS_CMD variable
  as suggested by RF to run whichever command is appropriate.

Suggestions from shell/Makefile scripting wizards on how to do this better
  are appreciated!

The processed asm now contains .file and .loc directives, so when you debug it in
GDB, it will show you the original source file, and highlight the correct lines.

I haven't tested with busybox awk yet, just with GNU awk. If there are no further
  issues with the code, I will do that next.

Thanks,
Alex Dowad

 Makefile               |  10 ++-
 configure              |  13 ++++
 tools/add-cfi.awk.i386 | 207 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 tools/add-cfi.awk.i386

diff --git a/Makefile b/Makefile
index 6559295..6f17cc7 100644
--- a/Makefile
+++ b/Makefile
@@ -118,7 +118,11 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.o: $(ARCH)/%.s
+ifeq ($(ADD_CFI),yes)
+	LC_ALL=C awk -f tools/add-cfi.awk.$(ARCH) $< | $(CC) $(CFLAGS_ALL_STATIC) -x assembler -c -o $@ -
+else
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
+endif
 
 %.o: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
@@ -127,7 +131,11 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.lo: $(ARCH)/%.s
+ifeq ($(ADD_CFI),yes)
+	LC_ALL=C awk -f tools/add-cfi.awk.$(ARCH) $< | $(CC) $(CFLAGS_ALL_SHARED) -x assembler -c -o $@ -
+else
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $<
+endif
 
 %.lo: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_SHARED) -c -o $@ $<
diff --git a/configure b/configure
index 143dc92..72349a2 100755
--- a/configure
+++ b/configure
@@ -317,6 +317,18 @@ tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
 test "$debug" = yes && CFLAGS_AUTO=-g
 
 #
+# Preprocess asm files to add extra debugging information if debug is
+# enabled, our assembler supports the needed directives, and the
+# preprocessing script has been written for our architecture.
+#
+if (test "$debug" = yes) &&
+   (test -f "tools/add-cfi.awk.$ARCH") &&
+   (echo ".cfi_startproc\n.cfi_endproc" | $CC -x assembler -c -o /dev/null -)
+then
+  ADD_CFI=yes
+fi
+
+#
 # Possibly add a -O option to CFLAGS and select modules to optimize with
 # -O3 based on the status of --enable-optimize and provided CFLAGS.
 #
@@ -570,6 +582,7 @@ LDFLAGS = $LDFLAGS_AUTO $LDFLAGS
 CROSS_COMPILE = $CROSS_COMPILE
 LIBCC = $LIBCC
 OPTIMIZE_GLOBS = $OPTIMIZE_GLOBS
+ADD_CFI = $ADD_CFI
 EOF
 test "x$static" = xno && echo "STATIC_LIBS ="
 test "x$shared" = xno && echo "SHARED_LIBS ="
diff --git a/tools/add-cfi.awk.i386 b/tools/add-cfi.awk.i386
new file mode 100644
index 0000000..647b6a1
--- /dev/null
+++ b/tools/add-cfi.awk.i386
@@ -0,0 +1,207 @@
+# Insert GAS CFI directives ("control frame information") into x86-32 asm input
+#
+# CFI directives tell the assembler how to generate "stack frame" debug info
+# This information can tell a debugger (like gdb) how to find the current stack
+#   frame at any point in the program code, and how to find the values which
+#   various registers had at higher points in the call stack
+# With this information, the debugger can show a backtrace, and you can move up
+#   and down the call stack and examine the values of local variables
+
+BEGIN {
+  # don't put CFI data in the .eh_frame ELF section (which we don't keep)
+  print ".cfi_sections .debug_frame"
+
+  # only emit CFI directives inside a function
+  in_function = 0
+
+  # emit .loc directives with line numbers from original source
+  printf ".file 1 \"%s\"\n", ARGV[1]
+  line_number = 0
+}
+
+function hex2int(str,   i) {
+  str = tolower(str)
+
+  for (i = 1; i <= 16; i++) {
+    char = substr("0123456789abcdef", i, 1)
+    lookup[char] = i-1
+  }
+
+  result = 0
+  for (i = 1; i <= length(str); i++) {
+    result = result * 16
+    char   = substr(str, i, 1)
+    result = result + lookup[char]
+  }
+  return result
+}
+
+function parse_const(str) {
+  sign = sub(/^-/, "", str)
+  hex  = sub(/^0x/, "", str)
+  if (hex)
+    n = hex2int(str)
+  else
+    n = str+0
+  return sign ? -n : n
+}
+
+function get_const1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is constant)
+  match($0, /-?(0x[0-9a-fA-F]+|[0-9]+),/)
+  return parse_const(substr($0, RSTART, RLENGTH-1))
+}
+function get_reg() {
+  # only use if you already know there is 1 and only 1 register
+  match($0, /%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is register)
+  match($0, /%e(ax|bx|cx|dx|si|di|bp),/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg2() {
+  # for instructions with 2 operands, get 2nd operand (assuming it is register)
+  match($0, /,%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+RLENGTH-3, 3)
+}
+
+function adjust_sp_offset(delta) {
+  if (in_function)
+    printf ".cfi_adjust_cfa_offset %d\n", delta
+}
+
+{
+  line_number = line_number + 1
+
+  # clean the input up before doing anything else
+  # delete comments
+  gsub("/(#|//).*/", "")
+
+  # canonicalize whitespace
+  gsub(/\s+/, " ")
+  gsub(/ *, */, ",")
+  gsub(/ *: */, ": ")
+  gsub(/ $/, "")
+  gsub(/^ /, "")
+}
+
+# check for assembler directives which we care about
+/^\.section/ {
+  if (in_function) {
+    print ".cfi_endproc"
+    in_function = 0
+  }
+}
+/^\.globa?l\s+\w+/ {
+  globals[$2] = 1
+}
+# not interested in assembler directives beyond this, just pass them through
+/^\./ {
+  print
+  next
+}
+
+/^\w+:/ {
+  label = substr($1, 1, length($1)-1) # drop trailing :
+
+  if (globals[label]) {
+    if (in_function)
+      print ".cfi_endproc"
+
+    in_function = 1
+    print ".cfi_startproc"
+
+    for (register in saved)
+      delete saved[register]
+    for (register in dirty)
+      delete dirty[register]
+  }
+}
+
+/^$/ { next }
+
+{
+  printf ".loc 1 %d\n", line_number
+  print
+}
+
+# KEEPING UP WITH THE STACK POINTER
+# We do NOT attempt to understand foolish and ridiculous tricks like stashing
+#   the stack pointer and then using %esp as a scratch register, or bitshifting
+#   it or taking its square root or anything stupid like that.
+# %esp should only be adjusted by pushing/popping or adding/subtracting constants
+#
+/pushl?/ {
+  if (match($0, /\s+%(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(2)
+  else
+    adjust_sp_offset(4)
+}
+/popl?/ {
+  if (match($0, /\s+%(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(-2)
+  else
+    adjust_sp_offset(-4)
+}
+/addl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%esp/ { adjust_sp_offset(-get_const1()) }
+/subl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%esp/ { adjust_sp_offset(get_const1()) }
+
+# TRACKING REGISTER VALUES FROM THE PREVIOUS STACK FRAME
+#
+/pushl? %e(ax|bx|cx|dx|si|di|bp)/ { # don't match "push (%reg)"
+  # if a register is being pushed, and its value has not changed since the
+  #   beginning of this function, the pushed value can be used when printing
+  #   local variables at the next level up the stack
+  # emit '.cfi_rel_offset' for that
+
+  if (in_function) {
+    register = get_reg()
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,0\n", register
+      saved[register] = 1
+    }
+  }
+}
+
+/movl? %e(ax|bx|cx|dx|si|di|bp),-?(0x[0-9a-fA-F]+|[0-9]+)?\(%esp\)/ {
+  if (in_function) {
+    register = get_reg()
+    if (match($0, /-?(0x[0-9a-fA-F]+|[0-9]+)\(%esp\)/)) {
+      offset = parse_const(substr($0, RSTART, RLENGTH-6))
+    } else {
+      offset = 0
+    }
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,%d\n", register, offset
+      saved[register] = 1
+    }
+  }
+}
+
+# IF REGISTER VALUES ARE UNCEREMONIOUSLY TRASHED
+# ...then we want to know about it.
+#
+function trashed(register) {
+  if (in_function && !saved[register] && !dirty[register]) {
+    printf ".cfi_undefined %s\n", register
+  }
+  dirty[register] = 1
+}
+# this does NOT exhaustively check for all possible instructions which could
+# overwrite a register value inherited from the caller (just the common ones)
+/mov.*,%e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg2()) }
+/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr) %e(ax|bx|cx|dx|si|di|bp),/ {
+  trashed(get_reg1())
+}
+/i?mul [^,]*$/                    { trashed("eax"); trashed("edx") }
+/i?mul %e(ax|bx|cx|dx|si|di|bp),/ { trashed(get_reg1()) }
+/i?div/                           { trashed("eax"); trashed("edx") }
+/(dec|inc|not|neg|pop) %e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg()) }
+/cpuid/ { trashed("eax"); trashed("ebx"); trashed("ecx"); trashed("edx") }
+
+END {
+  if (in_function)
+    print ".cfi_endproc"
+}
\ No newline at end of file
-- 
2.0.0.GIT



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-20 17:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-19  9:31 [PATCH] Build process uses script to add CFI directives to x86 asm Alex Dowad
2015-05-19 11:16 ` u-wsnj
2015-05-19 11:43   ` Szabolcs Nagy
  -- strict thread matches above, loose matches on Subject: below --
2015-05-20 17:08 Alex Dowad
2015-05-19 16:22 Alex Dowad
2015-05-16 13:54 Alex Dowad
2015-05-18 13:34 ` Szabolcs Nagy
2015-05-18 15:52 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).