* [PATCH 4/5] In i386 CFI script, binary ops like ADD, AND, etc. modify the 2nd operand, not 1st
2015-10-02 11:32 [PATCH 1/5] Pull a couple common AWK functions for CFI scripts into separate file Alex Dowad
2015-10-02 11:32 ` [PATCH 2/5] When generating CFI for i386 asm, don't mistake an FDIV instruction for DIV Alex Dowad
2015-10-02 11:32 ` [PATCH 3/5] When generating CFI for i386 asm, don't mistake an FMUL instruction for MUL Alex Dowad
@ 2015-10-02 11:32 ` Alex Dowad
2015-10-02 11:32 ` [PATCH 5/5] Add script to add CFI directives to asm files in debug builds of x86_64 Alex Dowad
2015-10-08 21:37 ` [PATCH 1/5] Pull a couple common AWK functions for CFI scripts into separate file Rich Felker
4 siblings, 0 replies; 6+ messages in thread
From: Alex Dowad @ 2015-10-02 11:32 UTC (permalink / raw)
To: musl
Thanks to Rich Felker for noticing my mistake here!
---
tools/add-cfi.i386.awk | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/add-cfi.i386.awk b/tools/add-cfi.i386.awk
index 5dc8794..fc0d8cf 100644
--- a/tools/add-cfi.i386.awk
+++ b/tools/add-cfi.i386.awk
@@ -185,12 +185,12 @@ function trashed(register) {
# this does NOT exhaustively check for all possible instructions which could
# overwrite a register value inherited from the caller (just the common ones)
/mov.*,%e(ax|bx|cx|dx|si|di|bp)/ { trashed(get_reg2()) }
-/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr) %e(ax|bx|cx|dx|si|di|bp),/ {
- trashed(get_reg1())
+/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr).*,%e(ax|bx|cx|dx|si|di|bp)/ {
+ trashed(get_reg2())
}
-/^i?mul [^,]*$/ { trashed("eax"); trashed("edx") }
-/^i?mul %e(ax|bx|cx|dx|si|di|bp),/ { trashed(get_reg1()) }
-/^i?div/ { trashed("eax"); trashed("edx") }
+/^i?mul [^,]*$/ { trashed("eax"); trashed("edx") }
+/^i?mul.*,%e(ax|bx|cx|dx|si|di|bp)/ { trashed(get_reg2()) }
+/^i?div/ { trashed("eax"); trashed("edx") }
/(dec|inc|not|neg|pop) %e(ax|bx|cx|dx|si|di|bp)/ { trashed(get_reg()) }
/cpuid/ { trashed("eax"); trashed("ebx"); trashed("ecx"); trashed("edx") }
--
2.0.0.GIT
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 5/5] Add script to add CFI directives to asm files in debug builds of x86_64
2015-10-02 11:32 [PATCH 1/5] Pull a couple common AWK functions for CFI scripts into separate file Alex Dowad
` (2 preceding siblings ...)
2015-10-02 11:32 ` [PATCH 4/5] In i386 CFI script, binary ops like ADD, AND, etc. modify the 2nd operand, not 1st Alex Dowad
@ 2015-10-02 11:32 ` Alex Dowad
2015-10-08 21:37 ` [PATCH 1/5] Pull a couple common AWK functions for CFI scripts into separate file Rich Felker
4 siblings, 0 replies; 6+ messages in thread
From: Alex Dowad @ 2015-10-02 11:32 UTC (permalink / raw)
To: musl
This makes debugging more pleasant for the odd times when you have to dig into
files implemented in x86_64 assembly.
It turns out that the parts we care about are almost exactly like i386, except
for the register names!
---
tools/add-cfi.x86_64.awk | 185 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 185 insertions(+)
create mode 100644 tools/add-cfi.x86_64.awk
diff --git a/tools/add-cfi.x86_64.awk b/tools/add-cfi.x86_64.awk
new file mode 100644
index 0000000..8c82ea3
--- /dev/null
+++ b/tools/add-cfi.x86_64.awk
@@ -0,0 +1,185 @@
+# Insert GAS CFI directives ("control frame information") into x86-64 asm input
+
+BEGIN {
+ # don't put CFI data in the .eh_frame ELF section (which we don't keep)
+ print ".cfi_sections .debug_frame"
+
+ # only emit CFI directives inside a function
+ in_function = 0
+
+ # emit .loc directives with line numbers from original source
+ printf ".file 1 \"%s\"\n", ARGV[1]
+ line_number = 0
+
+ # used to detect "call label; label:" trick
+ called = ""
+}
+
+function get_const1() {
+ # for instructions with 2 operands, get 1st operand (assuming it is constant)
+ match($0, /-?(0x[0-9a-fA-F]+|[0-9]+),/)
+ return parse_const(substr($0, RSTART, RLENGTH-1))
+}
+function get_reg() {
+ # only use if you already know there is 1 and only 1 register
+ match($0, /%r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/)
+ return substr($0, RSTART+1, RLENGTH-1)
+}
+function get_reg1() {
+ # for instructions with 2 operands, get 1st operand (assuming it is register)
+ match($0, /%r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15),/)
+ return substr($0, RSTART+1, RLENGTH-2)
+}
+function get_reg2() {
+ # for instructions with 2 operands, get 2nd operand (assuming it is register)
+ match($0, /,%r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/)
+ return substr($0, RSTART+2, RLENGTH-2)
+}
+
+function adjust_sp_offset(delta) {
+ if (in_function)
+ printf ".cfi_adjust_cfa_offset %d\n", delta
+}
+
+{
+ line_number = line_number + 1
+
+ # clean the input up before doing anything else
+ # delete comments
+ gsub(/(#|\/\/).*/, "")
+
+ # canonicalize whitespace
+ gsub(/[ \t]+/, " ") # mawk doesn't understand \s
+ gsub(/ *, */, ",")
+ gsub(/ *: */, ": ")
+ gsub(/ $/, "")
+ gsub(/^ /, "")
+}
+
+# check for assembler directives which we care about
+/^\.(section|data|text)/ {
+ # a .cfi_startproc/.cfi_endproc pair should be within the same section
+ # otherwise, clang will choke when generating ELF output
+ if (in_function) {
+ print ".cfi_endproc"
+ in_function = 0
+ }
+}
+/^\.type [a-zA-Z0-9_]+,\@function/ {
+ functions[substr($2, 1, length($2)-10)] = 1
+}
+# not interested in assembler directives beyond this, just pass them through
+/^\./ {
+ print
+ next
+}
+
+/^[a-zA-Z0-9_]+:/ {
+ label = substr($1, 1, length($1)-1) # drop trailing :
+
+ if (called == label) {
+ # note adjustment of stack pointer from "call label; label:"
+ adjust_sp_offset(8)
+ }
+
+ if (functions[label]) {
+ if (in_function)
+ print ".cfi_endproc"
+
+ in_function = 1
+ print ".cfi_startproc"
+
+ for (register in saved)
+ delete saved[register]
+ for (register in dirty)
+ delete dirty[register]
+ }
+
+ # an instruction may follow on the same line, so continue processing
+}
+
+/^$/ { next }
+
+{
+ called = ""
+ printf ".loc 1 %d\n", line_number
+ print
+}
+
+# KEEPING UP WITH THE STACK POINTER
+# %rsp should only be adjusted by pushing/popping or adding/subtracting constants
+#
+/pushl?/ {
+ adjust_sp_offset(8)
+}
+/popl?/ {
+ adjust_sp_offset(-8)
+}
+/addl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%rsp/ { adjust_sp_offset(-get_const1()) }
+/subl? \$-?(0x[0-9a-fA-F]+|[0-9]+),%rsp/ { adjust_sp_offset(get_const1()) }
+
+/call/ {
+ if (match($0, /call [0-9]+f/)) # "forward" label
+ called = substr($0, RSTART+5, RLENGTH-6)
+ else if (match($0, /call [0-9a-zA-Z_]+/))
+ called = substr($0, RSTART+5, RLENGTH-5)
+}
+
+# TRACKING REGISTER VALUES FROM THE PREVIOUS STACK FRAME
+#
+/pushl? %r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/ { # don't match "push (%reg)"
+ # if a register is being pushed, and its value has not changed since the
+ # beginning of this function, the pushed value can be used when printing
+ # local variables at the next level up the stack
+ # emit '.cfi_rel_offset' for that
+
+ if (in_function) {
+ register = get_reg()
+ if (!saved[register] && !dirty[register]) {
+ printf ".cfi_rel_offset %s,0\n", register
+ saved[register] = 1
+ }
+ }
+}
+
+/movl? %r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15),-?(0x[0-9a-fA-F]+|[0-9]+)?\(%rsp\)/ {
+ if (in_function) {
+ register = get_reg()
+ if (match($0, /-?(0x[0-9a-fA-F]+|[0-9]+)\(%rsp\)/)) {
+ offset = parse_const(substr($0, RSTART, RLENGTH-6))
+ } else {
+ offset = 0
+ }
+ if (!saved[register] && !dirty[register]) {
+ printf ".cfi_rel_offset %s,%d\n", register, offset
+ saved[register] = 1
+ }
+ }
+}
+
+# IF REGISTER VALUES ARE UNCEREMONIOUSLY TRASHED
+# ...then we want to know about it.
+#
+function trashed(register) {
+ if (in_function && !saved[register] && !dirty[register]) {
+ printf ".cfi_undefined %s\n", register
+ }
+ dirty[register] = 1
+}
+# this does NOT exhaustively check for all possible instructions which could
+# overwrite a register value inherited from the caller (just the common ones)
+/mov.*,%r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/ { trashed(get_reg2()) }
+/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr).*,%r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/ {
+ trashed(get_reg2())
+}
+/^i?mul [^,]*$/ { trashed("rax"); trashed("rdx") }
+/^i?mul.*,%r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/ { trashed(get_reg2()) }
+/^i?div/ { trashed("rax"); trashed("rdx") }
+
+/(dec|inc|not|neg|pop) %r(ax|bx|cx|dx|si|di|bp|8|9|10|11|12|13|14|15)/ { trashed(get_reg()) }
+/cpuid/ { trashed("rax"); trashed("rbx"); trashed("rcx"); trashed("rdx") }
+
+END {
+ if (in_function)
+ print ".cfi_endproc"
+}
--
2.0.0.GIT
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/5] Pull a couple common AWK functions for CFI scripts into separate file
2015-10-02 11:32 [PATCH 1/5] Pull a couple common AWK functions for CFI scripts into separate file Alex Dowad
` (3 preceding siblings ...)
2015-10-02 11:32 ` [PATCH 5/5] Add script to add CFI directives to asm files in debug builds of x86_64 Alex Dowad
@ 2015-10-08 21:37 ` Rich Felker
4 siblings, 0 replies; 6+ messages in thread
From: Rich Felker @ 2015-10-08 21:37 UTC (permalink / raw)
To: musl
On Fri, Oct 02, 2015 at 01:32:32PM +0200, Alex Dowad wrote:
> There is a lot which could be common between i386 and x86_64, but none of it
> will be useful for any other arch. These should be useful for all archs,
> however.
> ---
Thanks! I'm commenting on these as a set rather than breaking this
down into separate replies.
I think you still have errors in your instruction parsing which will
lead to incorrect clobbers, just not as bad as before. For example,
movl $0,(%esp,%esi) gets marked as clobbering %esi. Using .* in the
patterns is probably a bad idea. I think the x86_64 patch is also
affected, but I tested it and otherwise it basically works.
In any case I'm going ahead and applying patches 1-3, with 2 and 3
merged into one commit. In general, try to merge changes that are
conceptually related and where putting them together in one patch
doesn't obscure what was changed; for example, merging 4 with 2-3
would obscure changes IMO since there would be pattern changes and
operand order changes in the same lines. I've also edited commit
messages to match capitalization used in shortlog and to make git log
fit in 80 columns (generally messages should be wrapped at 70-72
chars, like email).
Could you follow up with fixes for patches 4 and 5? I'd like to try to
get these into the next release (soon).
Rich
^ permalink raw reply [flat|nested] 6+ messages in thread