Hello.
I've got rid of unnecessary cycle and to made difference between v2 and v3 more visible: v2 is basically just v3 with last bits of Kc zeroed. Also - small readability improvements.
Also I've added test suite with test vectors from original python implementation.