Commit Graph

527198 Commits

Author SHA1 Message Date
Christian Sigg
ec056f5458 [llvm][bazel] Adjust to HAVE_SYS_AUXV_H > HAVE_GETAUXVAL in 89d636ba91 2025-02-13 08:08:52 +01:00
AZero13
ffd2633061 [InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407)
Alive2 Proofs:
https://alive2.llvm.org/ce/z/aJnxyp
https://alive2.llvm.org/ce/z/dyeGEv
2025-02-13 14:25:09 +08:00
sakria9
7050e7d2a3 [clang] [ASTDump] Add support for structural value template arguments in TextNodeDumper (#126341)
It was missed in 5518a9d which introduced this new template argument kind.
2025-02-13 14:06:45 +08:00
Vitaly Buka
e76739eeb9 [libclang] Always Dup in createRef(StringRef) (#125020)
We can't guaranty that underlying string is
0-terminated and [String.size()] is even in the
same allocation.


https://lab.llvm.org/buildbot/#/builders/94/builds/4152/steps/17/logs/stdio
```
==c-index-test==1846256==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0  in clang::cxstring::createRef(llvm::StringRef) llvm-project/clang/tools/libclang/CXString.cpp:96:36
    #1  in DumpCXCommentInternal llvm-project/clang/tools/c-index-test/c-index-test.c:521:39
    #2  in DumpCXCommentInternal llvm-project/clang/tools/c-index-test/c-index-test.c:674:7
    #3  in DumpCXCommentInternal llvm-project/clang/tools/c-index-test/c-index-test.c:674:7
    #4  in DumpCXComment llvm-project/clang/tools/c-index-test/c-index-test.c:685:3
    #5  in PrintCursorComments llvm-project/clang/tools/c-index-test/c-index-test.c:768:7

  Memory was marked as uninitialized
    #0  in __msan_allocated_memory llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1023:5
    #1  in Allocate llvm-project/llvm/include/llvm/Support/Allocator.h:172:7
    #2  in Allocate llvm-project/llvm/include/llvm/Support/Allocator.h:216:12
    #3  in Allocate llvm-project/llvm/include/llvm/Support/AllocatorBase.h:53:43
    #4  in Allocate<char> llvm-project/llvm/include/llvm/Support/AllocatorBase.h:76:29
    #5  in convertCodePointToUTF8 llvm-project/clang/lib/AST/CommentLexer.cpp:42:30
    #6  in clang::comments::Lexer::resolveHTMLDecimalCharacterReference(llvm::StringRef) const llvm-project/clang/lib/AST/CommentLexer.cpp:76:10
    #7  in clang::comments::Lexer::lexHTMLCharacterReference(clang::comments::Token&) llvm-project/clang/lib/AST/CommentLexer.cpp:615:16
    #8  in consumeToken llvm-project/clang/include/clang/AST/CommentParser.h:62:9
    #9  in clang::comments::Parser::parseParagraphOrBlockCommand() llvm-project/clang/lib/AST/CommentParser.cpp
    #10 in clang::comments::Parser::parseFullComment() llvm-project/clang/lib/AST/CommentParser.cpp:925:22
    #11 in clang::RawComment::parse(clang::ASTContext const&, clang::Preprocessor const*, clang::Decl const*) const llvm-project/clang/lib/AST/RawCommentList.cpp:221:12
    #12 in clang::ASTContext::getCommentForDecl(clang::Decl const*, clang::Preprocessor const*) const llvm-project/clang/lib/AST/ASTContext.cpp:714:35
    #13 in clang_Cursor_getParsedComment llvm-project/clang/tools/libclang/CXComment.cpp:36:35
    #14 in PrintCursorComments llvm-project/clang/tools/c-index-test/c-index-test.c:756:25
 ```
2025-02-12 22:05:19 -08:00
Vitaly Buka
1032df6f60 [LTO][Pipelines][Coro] Handle coroutines in LTO pipeline (#126168)
ThinLTO delays handling of coroutines to ThinLTO backend.
However it's usually possible to use ThinLTO prelink objects for FullLTO.

In this case we have left-over coroutines which crash in codegen.

Issue #104525.
2025-02-12 21:39:32 -08:00
Razvan Lupusoru
7b473dfe84 [flang][acc] Implement type categorization for FIR types (#126964)
The OpenACC type interfaces have been updated to require that a type
self-identify which type category it belongs to. Ensure that FIR types
are able to provide this self identification.

In addition to implementing the new API, the PointerLikeType interface
attachment was moved to FIROpenACCSupport library like MappableType to
ensure all type interfaces and their implementation are now in the same
spot.
2025-02-12 21:09:59 -08:00
Lang Hames
9456e7fcdd [ORC] Silence unused variable warnings. 2025-02-13 15:24:43 +11:00
NAKAMURA Takumi
9bd836adbb [bazel] Introduce HAVE_SYS_AUXV_H for #126863 2025-02-13 13:07:19 +09:00
NAKAMURA Takumi
cdf45447ef Orc: Suppress a warning in #126691 2025-02-13 13:07:19 +09:00
Brad Smith
89d636ba91 [Support] Fix building on FreeBSD and OpenBSD (#127005)
Fix building after a6f7cb54d3.

Check for the function getauxval() instead of just the sys/auxv.h
header.
2025-02-12 22:55:22 -05:00
Koakuma
30a9941624 [SPARC][IAS] Add IAS flag handling for ISA levels
Add IAS flag handling for ISA levels we support in LLVM.

Reviewers: MaskRay, rorth, brad0, s-barannikov

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/125151
2025-02-13 10:22:31 +07:00
Jonas Devlieghere
73ab0c0762 [lldb-dap] Upgrade @types/node to fix TS2386 in node/module.d.ts (#126994)
Upgrade @types/node to work around an issue in TypeScript [1] that
caused our "publish to VSCode Marketplace" github action [2] to fail:

```
node_modules/@types/node/module.d.ts:290:13 - error TS2386: Overload signatures must all be optional or required.

290             resolve?(specified: string, parent?: string | URL): Promise<string>;
```

[1] https://github.com/microsoft/TypeScript/pull/59259#issuecomment-2228833941
[2] https://github.com/llvm/vscode-lldb/actions/runs/13298213337/job/37134713009
2025-02-12 19:09:09 -08:00
Thurston Dang
df07121d54 [hwasan][NFCI] Rename ClRandomSkipRate to ClRandomKeepRate (#126990)
The meaning of ClRandomSkipRate was inverted in
https://github.com/llvm/llvm-project/pull/88070 but the variable name
was not changed. This patch fixes it to avoid confusion.

Additionally, it elaborates the flag description to mention the
interaction between the random keep rate and hotness cutoff.
2025-02-12 18:43:00 -08:00
Florian Mayer
8ed36373a2 [NFC] [sanitizer] allow getauxval in symbolizer 2025-02-12 17:20:28 -08:00
Longsheng Mou
3e223e3a20 [mlir][vector] Fix out-of-bounds access (#126734)
This PR fixes an out-of-bounds bug that occurs when there are no overlap
dimensions between the `sizes` and source of
`vector.extract_strided_slice`, causing access to `sizes` to go out of
bounds. Fixes #126196.
2025-02-13 09:17:43 +08:00
Uday Bondhugula
8421ad7f45 [MLIR][Affine] Fix sibling fusion - missing check (#126626)
Fix sibling fusion for slice maximality check. Producer-consumer fusion
had this check but not sibling fusion. Sibling fusion shouldn't be
performed if the slice isn't "maximal" (i.e., if it isn't the whole of
the source).


Fixes: https://github.com/llvm/llvm-project/issues/48703
2025-02-13 06:21:03 +05:30
Tristan Ross
a6f7cb54d3 [Support] Prefer AUX vector for page size (#126863)
Prefers the page size to come from the AUX vector, `getpagesize` is
removed from POSIX.1-2001. Also throws in a couple asserts to ensure the
page size is a valid value.
2025-02-13 11:39:49 +11:00
Thurston Dang
51d8255203 [msan] Handle Arm NEON saturating extract and narrow (#125742)
This handles NEON saturating extract and narrow (Intrinsic::aarch64_neon_{sqxtn, sqxtun, uqxtn}) by (ab)using handleShadowOr() to perform the shadow cast. Previously, these were unknown intrinsics handled suboptimally by visitInstruction.

Updates the tests from https://github.com/llvm/llvm-project/pull/125288 and https://github.com/llvm/llvm-project/pull/125140
2025-02-12 16:22:49 -08:00
Jonas Devlieghere
1b582ef3c0 [lldb-dap] Bump the version number for publishing in the Marketplace 2025-02-12 16:14:00 -08:00
Da-Viper
4238238684 [lldb-dap] Fix: Could not find DAP in path (#126903)
Fixes #120839
2025-02-12 16:11:43 -08:00
Uday Bondhugula
4078b11daa [MLIR][Affine] Fix fusion crash for non-int/fp memref elt types (#126829)
Fix assumption on memref elt types being int or float during private
memref creation in affine fusion.

Fixes: https://github.com/llvm/llvm-project/issues/121020
2025-02-13 05:27:48 +05:30
Florian Mayer
6936fadfc3 [compiler-rt] [sanitizer] avoid UB in allocator (#126977) 2025-02-12 15:49:55 -08:00
Matthew Bastien
105b3a92a7 [lldb-dap] add debugAdapterExecutable property to launch configuration (#126803)
The Swift extension for VS Code requires that the `lldb-dap` executable
come from the Swift toolchain which may or may not be configured in
`PATH`. At the moment, this can be configured via LLDB DAP's extension
settings, but experience has shown that modifying other extensions'
settings on behalf of the user (especially those subject to change
whenever a new toolchain is selected) causes issues. Instead, it would
be easier to have this configurable in the launch configuration and let
the Swift extension (or any other extension that wanted to, really)
configure the path to `lldb-dap` that way. This allows the Swift
extension to have its own launch configuration type that delegates to
the LLDB DAP extension in order to provide a more seamless debugging
experience for Swift executables.

This PR adds a new property to the launch configuration object called
`debugAdapterExecutable` which allows overriding the `lldb-dap`
executable path for a specific debug session.
2025-02-12 15:49:38 -08:00
Robert Imschweiler
bcba3117c0 [AMDGPU] SelDAG: fix lowering of undefined workitem intrinsics (#126058)
GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics,
SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in
`AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load
missing argument"`. This commit changes the behavior of SelDAG to
instead use a zero constant.

This LLVM defect was identified via the AMD Fuzzing project.
2025-02-12 18:41:41 -05:00
Andrzej Warzyński
5586541d22 [mlir][tensor] Make useful Tensor utilities public (#126802)
1. Extract the main logic from `foldTensorCastPrecondition` into a
   dedicated helper hook: `hasFoldableTensorCastOperand`. This allows
   for reusing the corresponding checks.

2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for
   better clarity and documentation of its functionality.

3. These updated hooks will be reused in:
   * https://github.com/llvm/llvm-project/pull/123902. This PR makes
     them public.

**Note:** Moving these hooks to `Tensor/Utils` is not feasible because
`MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If
these hooks were moved to `Utils`, it would create a dependency of
`MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular
dependency.
2025-02-12 23:12:14 +00:00
vporpo
1c207f1b6e [SandboxVec][DAG] Fix DAG when old interval is mem free (#126983)
This patch fixes a bug in `DependencyGraph::extend()` when the old
interval contains no memory instructions. When this is the case we
should do a full dependency scan of the new interval.
2025-02-12 15:06:30 -08:00
Amir Bishara
51c847d8f3 [mlir][tosa]-Edit the verifier of tosa constShapeOp (#126962)
Add verification for rank 1 for the elements' attribute of the tosa
const_shape operation.
2025-02-12 15:06:00 -08:00
Louis Dionne
5953e5a3c6 [libc++] Simplify the apple-system-hardened CI configuration (#126911)
It was basically a copy-paste of the non-hardened version of the same
job, and it's easy to remove the duplication.
2025-02-12 23:58:14 +01:00
Louis Dionne
dbfb29fd45 [libc++] Add a link to __builtin_verbose_trap from the hardening docs (#126930) 2025-02-12 23:57:37 +01:00
vporpo
31cb807537 [SanbdoxVec][BottomUpVec] Fix diamond shuffle with multiple vector inputs (#126965)
When the operand comes from multiple inputs then we need additional
packing code. When the operands are scalar then we can use a single
InsertElementInst. But when the operands are vectors then we need a
chain of ExtractElementInst and InsertElementInst instructions to insert
the vector value into the destination vector. This is what this patch
implements.
2025-02-12 14:33:05 -08:00
Nick Desaulniers
3e02069afe [libc][pthread] fix -Wmissing-field-initializers (#126314)
Fixes:


llvm-project/libc/test/integration/src/pthread/pthread_rwlock_test.cpp:59:29:
    warning: missing field '__preference' initializer
    [-Wmissing-field-initializers]
       59 |   pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER;
          |                             ^

Also, add a test that demonstrates the same issue for
PTHREAD_MUTEX_INITIALIZER, and fix that, too.

PTHREAD_ONCE_INIT does not have this issue and does have test coverage.
2025-02-12 14:28:29 -08:00
LLVM GN Syncbot
37952ef75f [gn build] Port 92f916faba 2025-02-12 22:22:01 +00:00
Thurston Dang
c6a39697a9 [hwasan][NFCI] Add more test cases to llvm/test/Instrumentation/HWAddressSanitizer/pgo-opt-out.ll (#126980)
Add more combinations of parameters to test that the skip conditions are
OR'ed together
2025-02-12 14:20:23 -08:00
Nikhil Kalra
f3a29906aa [mlir] BytecodeWriter: invoke reserveExtraSpace (#126953)
Update `BytecodeWriter` to invoke `reserveExtraSpace` on the stream
before writing to it. This will give clients implementing custom output
streams the opportunity to allocate an appropriately sized buffer for
the write.
2025-02-12 14:17:30 -08:00
Razvan Lupusoru
ceb00c0702 [mlir][acc] Clean up TypedValue builders (#126968)
When MappableType was introduced alongside PointerLikeType, the data
clause operation builders were duplicated to accept a `TypedValue` of
one of the two type options. However, the underlying builder takes a
`Value` and this difference is not relevant for it. The only difference
is that `varType` is set differently depending on the type.

Having two duplicated builders can lead to clunky building since a
`Value` must always be cast to one of the two options. Thus, simply
clean this up - the verifier already checks that it is a type that
implements one of the two interfaces.
2025-02-12 14:13:45 -08:00
Jeffrey Byrnes
c5a4512d85 [AMDGPU] iglp.opt does not clobber memory operands (#126976)
I think it was an accident that this wasn't included.
2025-02-12 14:11:02 -08:00
Shubham Sandeep Rastogi
92f916faba Add a pass to collect dropped var statistics for MIR (#126686)
This patch attempts to reland
https://github.com/llvm/llvm-project/pull/120780 while addressing the
issues that caused the patch to be reverted.

Namely:

1. The patch had included code from the llvm/Passes directory in the
llvm/CodeGen directory.

2. The patch increased the backend compile time by 2% due to adding a
very expensive include in MachineFunctionPass.h

The patch has been re-structured so that there is no dependency between
the llvm/Passes and llvm/CodeGen directory, by moving the base class,
`class DroppedVariableStats` to the llvm/IR directory.

The expensive include in MachineFunctionPass.h has been changed to
contain forward declarations instead of other header includes which was
pulling a ton of code into MachineFunctionPass.h and should resolve any
issues when it comes to compile time increase.
2025-02-12 14:08:18 -08:00
Nikhil Kalra
65ed4fa57e [mlir] Python: Parse ModuleOp from file path (#126572)
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.

Re-lands
[4e14b8a](4e14b8afb4).
2025-02-12 14:02:41 -08:00
Jason Molenda
fa71238da8 [lldb] inserted a typeo when checking in a suggested fix 2025-02-12 14:00:41 -08:00
Jason Molenda
cbb4e99f36 [lldb] Update ThreadPlanStepOut to handle new breakpoint behavior (#126838)
I will be changing breakpoint hitting behavior soon, where currently
lldb reports a breakpoint as being hit when a thread is *at* a
BreakpointSite, but possibly has not executed the breakpoint instruction
and trapped yet, to having lldb only report a breakpoint hit when the
breakpoint instruction has actually been executed.

One corner case bug with this change is that when you are stopped at a
breakpoint (that has been hit) on the last instruction of a function,
and you do `finish`, a ThreadPlanStepOut is pushed to the thread's plan
stack to put a breakpoint on the return address and resume execution.
And when the thread is asked to resume, it sees that it is at a
BreakpointSite that has been hit, and pushes a
ThreadPlanStepOverBreakpoint on the thread.   The StepOverBreakpoint
plan sees that the thread's state is eStateRunning (not eStateStepping),
so it marks itself as "auto continue" -- so once the breakpoint has
been stepped over, we will execution on the thread.

With current lldb stepping behavior ("a thread *at* a BreakpointSite is
said to have stopped with a breakpoint-hit stop reason, even if the
breakpoint hasn't been executed yet"),
`ThreadPlanStepOverBreakpoint::DoPlanExplainsStop` has a special bit of
code which detects when the thread stops with a eStopReasonBreakpoint.
It first checks if the pc is the same as when we started -- did our
"step instruction" not actually step? -- says the stop reason is
explained. Otherwise it sets auto-continue to false (because we've hit
an *unexpected* breakpoint, and we have advanced past our original pc,
and returns false - the stop reason is not explained.

So we do the "finish", lldb instruction steps, we stop *at* the
return-address breakpoint and lldb sets the thread's stop reason to
breakpoint-hit. ThreadPlanStepOverBreakpoint sees an
eStopReasonBreakpoint, sets its auto-continue to false, and says we
stopped for osme reason other than this plan. (and it will also report
`IsPlanStale()==true` so it will remove itself) Meanwhile the
ThreadPlanStepOut sees that it has stopped in the StackID it wanted to
run to, and return success.

This all changes when stopping at a breakpoint site doesn't report
breakpoint-hit until we actually execute the instruction. Now the
ThraedPlanStepOverBreakpoint looks at the thread's stop reason, it's
eStopReasonTrace (we've instruction stepped), and so it leaves its
auto-continue to `true`. ThreadPlanStepOut sees that it has reached its
goal StackID, removes its breakpoint, and says it is done.
Thread::ShouldStop thinks the auto-continue == yes vote from
ThreadPlanStepOverBreakpoint wins, and we lose control of the process.

This patch changes ThreadPlanStepOut to require that *both* (1) we are
at the StackID of the caller function, where we wanted to end up, and
(2) we have actually hit the breakpoint that we inserted.

This in effect means that now lldb instruction-steps over the breakpoint
in the callee function, stops at the return address of the caller
function. StepOverBreakpoint has completed. StepOut is still running,
and we continue the thread again. We immediatley hit the breakpoint
(that we're sitting at), and now ThreadPlanStepOut marks itself as
completed, and we return control to the user.

Jim suggests that ThreadPlanStepOverBreakpoint is a bit unusual because
it's not something pushed on the stack by a higher-order thread plan
that "owns" it, it is inserted by the Thread as it is about to resume,
if we're at a BreakpointSite. It has no connection to the thread plans
above it, but tries to set the auto-continue mode based on the state of
the thread when it is inserted (and tries to detect an unexpected
breakpoint and unset that auto-continue it previously decided on,
because it now realizes it should not influence execution control any
more). Instead maybe the
ThreadPlanStepOverBreakpoint should be inserted as a child plan of
whatever the lowest plan is on the stack at the point it is added.

I added an API test that will catch this bug in the new thread
breakpoint algorithm.
2025-02-12 13:48:01 -08:00
S. Bharadwaj Yadavalli
f2650c54c9 [DirectX] Set Shader Flag DisableOptimizations (#126813)
- Set the shader flag `DisableOptimizations` based on `optnone`
attribute of shader entry functions.

- Add DXIL Metadata Analysis pass as pre-requisite for Shader Flags pass
to obtain entry function information collected therein.

- Named module metadata `dx.disable_optimizations` is intended to
indicate disabling optimizations (`-O0`) via commandline flag. However,
its intent is fulfilled by `optnone` attribute of shader entry functions as 
implemented in a recent change, and thus not needed. Delete
generation of named metadata and corresponding test file
`disable_opt.ll`.

- Add tests to verify correctness of setting shader flag.

Closes #112263
2025-02-12 16:45:01 -05:00
Thurston Dang
0d95631a3a [msan] Handle llvm.[us]cmp (starship operator) (#125804)
Apply handleShadowOr to llvm.[us]cmp. Previously, llvm.[su]cmp was correctly handled heuristically when each parameter type is the same as the return type (e.g., `call i8 @llvm.ucmp.i8.i8(i8 %x, i8 %y)`) but handled incorrectly by visitInstruction when the return type is different e.g., (`call i8 @llvm.ucmp.i8.i62(i62 %x, i62 %y)`, `call <4 x i8> @llvm.ucmp.v4i8.v4i32(<4 x i32> %x, <4 x i32> %y)`).

Updates the tests from https://github.com/llvm/llvm-project/pull/125790
2025-02-12 13:38:45 -08:00
Thurston Dang
e9e6ba6a5e [msan] Handle single-parameter Arm NEON vector convert intrinsics (#126136)
This handles the following llvm.aarch64.neon intrinsics, which were suboptimally handled by visitInstruction:
- fcvtas, fcvtau
- fcvtms, fcvtmu
- fcvtns, fcvtnu
- fcvtps, fcvtpu
- fcvtzs, fcvtzu

The old instrumentation checked that the shadow of every element of the input vector was fully initialized, and aborted otherwise. The new instrumentation propagates the shadow: for each element of the output, the shadow is initialized iff the corresponding element of the input is *fully* initialized (since these are floating-point to integer conversions).

Updates the tests from https://github.com/llvm/llvm-project/pull/126095
2025-02-12 13:20:22 -08:00
Florian Hahn
82605285b8 [LAA] Also clear CheckingGroups in RuntimePointerChecking::reset.
This fixes a crash when trying to print access-info in the newly added
test cases.
2025-02-12 21:49:22 +01:00
Vasileios Porpodas
e75e61728e [SandboxVec] Fix warnings introduced by 7a7f9190d0 2025-02-12 12:43:24 -08:00
Philip Reames
859c871184 [RISCV] Default to MicroOpBufferSize = 1 for scheduling purposes (#126608)
This change introduces a default schedule model for the RISCV target
which leaves everything unchanged except the MicroOpBufferSize. The
default value of this flag in NoSched is 0. Both configurations
represent in order cores (i.e. no reorder window), the difference
between them comes down to whether heuristics other than latency are
allowed to apply. (Implementation details below)

I left the processor models which explicitly set MicroOpBufferSize=0
unchanged in this patch, but strongly suspect we should change those
too. Honestly, I think the LLVM wide default for this flag should be
changed, but don't have the energy to manage the updates for all
targets.

Implementation wise, the effect of this change is that schedule units
which are ready to run *except that* one of their predecessors may not
have completed yet are added to the Available list, not the Pending one.
The result of this is that it becomes possible to chose to schedule a
node before it's ready cycle if the heuristics prefer. This is
essentially chosing to insert a resource stall instead of e.g.
increasing register pressure.

Note that I was initially concerned there might be a correctness aspect
(as in some kind of exposed pipeline design), but the generic scheduler
doesn't seem to know how to insert noop instructions. Without that, a
program wouldn't be guaranteed to schedule on an exposed pipeline
depending on the program and schedule model in question.

The effect of this is that we sometimes prefer register pressure in
codegen results. This is mostly churn (or small wins) on scalar because
we have many more registers, but is of major importance on vector -
particularly high LMUL - because we effectively have many fewer
registers and the relative cost of spilling is much higher. This is a
significant improvement on high LMUL code quality for default rva23u
configurations - or any non -mcpu vector configuration for that matter.

Fixes #107532
2025-02-12 12:31:39 -08:00
vporpo
7a7f9190d0 [SandboxVec][Legality] Fix mask on diamond reuse with shuffle (#126963)
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
2025-02-12 12:29:09 -08:00
Philip Reames
9478822f4f [RISCV] Decompose single source shuffles (without exact VLEN) (#126951)
(This is a re-apply for what was 8374d42. The bug there was fairly 
major - despite the comments and review description, the code was 
using each register in the source register group, not only the first 
register. This was completely wrong.)

This is a continuation of the work started in
https://github.com/llvm/llvm-project/pull/125735 to lower selected VLA
shuffles in linear m1 components instead of generating O(LMUL^2) or
O(LMUL*Log2(LMUL) high LMUL shuffles.

This pattern focuses on shuffles where all the elements being used
across the entire destination register group come from a single register
in the source register group. Such cases come up fairly frequently via
e.g. spread(N), and repeat(N) idioms.

One subtlety to this patch is the handling of the index vector for
vrgatherei16.vv. Because the index and source registers can have
different EEW, the index vector for the Nth chunk of the destination is
not guaranteed to be register aligned. In fact, it is common for e.g. an
EEW=64 shuffle to have EEW=16 indices which are four chunks per source
register. Given this, we have to pay a cost for extracting these chunks
into the low position before performing each shuffle.

I'd initially expressed this as a naive extract sub-vector for each data
parallel piece. However, at high LMUL, this quickly caused register
pressure problems since we could at worst need 4x the temporary
registers for the index. Instead, this patch uses a repeating slidedown
chained from previous iterations. This increases critical path by at
worst 3 slides (SEW=64 is the worst case), but reduces register pressure
to at worst 2x - and only if the original index vector is reused
elsewhere. I view this as arguably a bit of a workaround (since our
scheduling should have done better with the plain extract variant), but
a probably necessary one.
2025-02-12 12:10:35 -08:00
Peter Rong
53c618c071 [clang] run clang-format on some CGObjC files (#126644)
These files are relatively old and don't confront our formatting rules.
It's hard to change them without massive clang-format changes.

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
2025-02-12 11:52:49 -08:00
vporpo
6d7a84d72b [SandboxVec][Scheduler] Fix top of schedule (#126820)
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
2025-02-12 11:52:01 -08:00