the bits required aren't there in the hardware page tables, but you can either emulate them or do copy on reference. if you use a paging algorithm with local policy instead of the global one used by berkeley, you can also prevent some pathetic effects under load, and remove complex data structures and code (or not require them in the first place). this was well studied during the late 1960s and early 1970s. the BSD approach is poor. unfortunately, people mimic it. i can only assume that one or more textbooks wrote it up.