Correctness is a Constraint, Performance is a Goal
The recent brouhaha over the change in glibc memcpy behaviour
(which broke some notable bits of non-open-source code, like Adobe
Flash) suggests that while it might have made some loose sense "back
in the day" (note that "the day" was before memmove was introduced at
all; my recollection is that the first memcpy that wasn't
overlap-safe was an early SunOS for SPARC version - which cause lots
of pain when porting 68k-SunOS code in the early 1990's) the thing
that should have been done (and we should do now) is to make that
"undefined" behaviour simply be abort().
In the days of SPECINT92, the raw speed of very-short memcpy's might
have mattered enough to strip off the tests - but we're talking two
compares (which should have very good branch prediction properties,
and which compilers that are already inlining should be able to
resolve in many cases anyhow.)  I do appreciate having memcpy
be efficient per byte-moved, after having it not scramble data.
For those who believe the speed matters - well, we could leave behind
a career_limiting_memcpy that doesn't have the checks, and see
who's willing to justify using it...
Exercise for the reader: take Linus' sample mymemcpy from bug 638477
and add the abort-on-overlap test to it - then run chrome, firefox, or
other large-project of your choice.  Note that every time it aborts is
probably a reportable bug -- but it's a lot cheaper than running it
under fullscale valgrind.
Footnotes: