Discussion:
Alignment of small memory allocations returned by malloc(3)
Nick Hudson
2014-01-30 12:10:54 UTC
Permalink
Hi,

Standards and modern tools, e.g. gcc[1] 4.8, expect malloc to return
memory with alignment that is different to our current jemalloc.Attached
is a suggested diff to fix the problem for most platforms.

I've probably missed other alignment requirements.

Comments?

Nick

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59958
Joerg Sonnenberger
2014-01-31 20:48:15 UTC
Permalink
Post by Nick Hudson
Standards and modern tools, e.g. gcc[1] 4.8, expect malloc to return
memory with alignment that is different to our current
jemalloc.Attached is a suggested diff to fix the problem for most
platforms.
I still don't see the point of the GCC behavior and I don't agree with
the standard interpretation. The referenced DR covers a quite different
problem (pointer casts). It doesn't make sense to justify the larger
alignment if accessing the storage with any such type is UB because it
is an access beyond the end of the allocation. As such, I do strongly
consider this an overeager optimisation.

Joerg
Martin Husemann
2014-01-31 20:55:13 UTC
Permalink
Post by Joerg Sonnenberger
I still don't see the point of the GCC behavior and I don't agree with
the standard interpretation.
For alpha, at least, the change makes a lot sense and improves code
significantly.

The standard language is poor, but the answer to the DR clearly states
"any object". The real point here would be to tell them "this is an error
in the standard", and once they agree and change it, we can reconsider.

Martin
Joerg Sonnenberger
2014-02-01 19:01:09 UTC
Permalink
Post by Martin Husemann
Post by Joerg Sonnenberger
I still don't see the point of the GCC behavior and I don't agree with
the standard interpretation.
For alpha, at least, the change makes a lot sense and improves code
significantly.
The standard language is poor, but the answer to the DR clearly states
"any object". The real point here would be to tell them "this is an error
in the standard", and once they agree and change it, we can reconsider.
Yes, the standard language is poor. But that is no excuse for making
bogus assumptions on the part of GCC, especially if it is knowingly
breaking existing software. IMO that part of tree-ssa-ccp.c should be
backed out until a conclusion from the C WG comes in.

Joerg
David Sainty
2014-02-01 00:06:24 UTC
Permalink
Post by Joerg Sonnenberger
Post by Nick Hudson
Standards and modern tools, e.g. gcc[1] 4.8, expect malloc to return
memory with alignment that is different to our current
jemalloc.Attached is a suggested diff to fix the problem for most
platforms.
I still don't see the point of the GCC behavior and I don't agree with
the standard interpretation. The referenced DR covers a quite different
problem (pointer casts). It doesn't make sense to justify the larger
alignment if accessing the storage with any such type is UB because it
is an access beyond the end of the allocation. As such, I do strongly
consider this an overeager optimisation.
Surely it makes (the most) sense on any platform that can't write a unit
that small without writing to the surrounding bytes anyway. Isn't that
the case with the (original) Alpha?

I.e. If a two byte write has to be implemented in terms of an aligned 4
byte read-modify-write, then you wouldn't ever want to allocate a pair
of two byte allocations contiguous in the same four bytes anyway - at
least not if the code might be multi-threaded. So you may as well both
align and assume everything aligned on 4 bytes.
David Laight
2014-02-01 13:16:54 UTC
Permalink
Post by David Sainty
Post by Joerg Sonnenberger
Post by Nick Hudson
Standards and modern tools, e.g. gcc[1] 4.8, expect malloc to return
memory with alignment that is different to our current
jemalloc.Attached is a suggested diff to fix the problem for most
platforms.
I still don't see the point of the GCC behavior and I don't agree with
the standard interpretation. The referenced DR covers a quite different
problem (pointer casts). It doesn't make sense to justify the larger
alignment if accessing the storage with any such type is UB because it
is an access beyond the end of the allocation. As such, I do strongly
consider this an overeager optimisation.
Surely it makes (the most) sense on any platform that can't write a unit
that small without writing to the surrounding bytes anyway. Isn't that
the case with the (original) Alpha?
I.e. If a two byte write has to be implemented in terms of an aligned 4
byte read-modify-write, then you wouldn't ever want to allocate a pair
of two byte allocations contiguous in the same four bytes anyway - at
least not if the code might be multi-threaded. So you may as well both
align and assume everything aligned on 4 bytes.
That is certainly a justification for forcing 4 byte alignment on alpha.

Indeed x86 also needs 4 byte alignment because the BTC/BTR/BTS instruction
are likely to do a RMW cycle on the 32bit word.

I'm not sure about arm, pre-v4 there were no 16bit accesses. gcc will
do 32bit accesses for some 16bit values on later cpus, but only because
of the limited addressing modes - and they can't affect a 2 byte allocation.

But gcc is assuming the maximal alignment for 'normal' items - which
ends up being 16 bytes for long double on some systems.

David
--
David Laight: ***@l8s.co.uk
Martin Husemann
2014-02-01 13:20:16 UTC
Permalink
Post by David Laight
But gcc is assuming the maximal alignment for 'normal' items - which
ends up being 16 bytes for long double on some systems.
This is not about gcc, but the C standard and our malloc - Nick's patch
makes most ports use 4 byte alignement for small malloc sizes.

The gcc macro for this came in "late" and many target configurations
have not properly overwritten it, the default is there to err on the
safe side.

We will, of course, fix the in-tree gcc to assume the same alignemnt
our malloc provides.

Martin
Joerg Sonnenberger
2014-02-01 19:05:16 UTC
Permalink
Post by David Laight
But gcc is assuming the maximal alignment for 'normal' items - which
ends up being 16 bytes for long double on some systems.
My problem with the approach is that it special cases a very limited set
of functions and this magical alignment property is lost very, very
soon, so it shouldn't change code generation that much anyway. E.g.
whenever a pointer value is not provably direct from malloc/strdup it
has lost this extra alignment already. The x86 rmw cycles for some ops
are irrelevant -- you get this for access to short fields in structures
all the time. Whether or not malloc should use the compact allocation is
another completely separate discussion.

Joerg

Loading...