PPC has a vector popcount, this lets the vectorizer use the correct cost for it. Tweak X86 test to use an intrinsic that's actually scalarized (we have a somewhat efficient lowering for vector popcount using SSE, the cost model finds that now). llvm-svn: 265005