clang-p2996

Author	SHA1	Message	Date
Christian Sigg	ec056f5458	[llvm][bazel] Adjust to HAVE_SYS_AUXV_H > HAVE_GETAUXVAL in `89d636ba91`	2025-02-13 08:08:52 +01:00
AZero13	ffd2633061	[InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407 ) Alive2 Proofs: https://alive2.llvm.org/ce/z/aJnxyp https://alive2.llvm.org/ce/z/dyeGEv	2025-02-13 14:25:09 +08:00
sakria9	7050e7d2a3	[clang] [ASTDump] Add support for structural value template arguments in TextNodeDumper (#126341 ) It was missed in `5518a9d` which introduced this new template argument kind.	2025-02-13 14:06:45 +08:00
Vitaly Buka	e76739eeb9	[libclang] Always Dup in createRef(StringRef) (#125020 ) We can't guaranty that underlying string is 0-terminated and [String.size()] is even in the same allocation. https://lab.llvm.org/buildbot/#/builders/94/builds/4152/steps/17/logs/stdio ``` ==c-index-test==1846256==WARNING: MemorySanitizer: use-of-uninitialized-value #0 in clang::cxstring::createRef(llvm::StringRef) llvm-project/clang/tools/libclang/CXString.cpp:96:36 #1 in DumpCXCommentInternal llvm-project/clang/tools/c-index-test/c-index-test.c:521:39 #2 in DumpCXCommentInternal llvm-project/clang/tools/c-index-test/c-index-test.c:674:7 #3 in DumpCXCommentInternal llvm-project/clang/tools/c-index-test/c-index-test.c:674:7 #4 in DumpCXComment llvm-project/clang/tools/c-index-test/c-index-test.c:685:3 #5 in PrintCursorComments llvm-project/clang/tools/c-index-test/c-index-test.c:768:7 Memory was marked as uninitialized #0 in __msan_allocated_memory llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1023:5 #1 in Allocate llvm-project/llvm/include/llvm/Support/Allocator.h:172:7 #2 in Allocate llvm-project/llvm/include/llvm/Support/Allocator.h:216:12 #3 in Allocate llvm-project/llvm/include/llvm/Support/AllocatorBase.h:53:43 #4 in Allocate<char> llvm-project/llvm/include/llvm/Support/AllocatorBase.h:76:29 #5 in convertCodePointToUTF8 llvm-project/clang/lib/AST/CommentLexer.cpp:42:30 #6 in clang::comments::Lexer::resolveHTMLDecimalCharacterReference(llvm::StringRef) const llvm-project/clang/lib/AST/CommentLexer.cpp:76:10 #7 in clang::comments::Lexer::lexHTMLCharacterReference(clang::comments::Token&) llvm-project/clang/lib/AST/CommentLexer.cpp:615:16 #8 in consumeToken llvm-project/clang/include/clang/AST/CommentParser.h:62:9 #9 in clang::comments::Parser::parseParagraphOrBlockCommand() llvm-project/clang/lib/AST/CommentParser.cpp #10 in clang::comments::Parser::parseFullComment() llvm-project/clang/lib/AST/CommentParser.cpp:925:22 #11 in clang::RawComment::parse(clang::ASTContext const&, clang::Preprocessor const, clang::Decl const) const llvm-project/clang/lib/AST/RawCommentList.cpp:221:12 #12 in clang::ASTContext::getCommentForDecl(clang::Decl const, clang::Preprocessor const) const llvm-project/clang/lib/AST/ASTContext.cpp:714:35 #13 in clang_Cursor_getParsedComment llvm-project/clang/tools/libclang/CXComment.cpp:36:35 #14 in PrintCursorComments llvm-project/clang/tools/c-index-test/c-index-test.c:756:25 ```	2025-02-12 22:05:19 -08:00
Vitaly Buka	1032df6f60	[LTO][Pipelines][Coro] Handle coroutines in LTO pipeline (#126168 ) ThinLTO delays handling of coroutines to ThinLTO backend. However it's usually possible to use ThinLTO prelink objects for FullLTO. In this case we have left-over coroutines which crash in codegen. Issue #104525.	2025-02-12 21:39:32 -08:00
Razvan Lupusoru	7b473dfe84	[flang][acc] Implement type categorization for FIR types (#126964 ) The OpenACC type interfaces have been updated to require that a type self-identify which type category it belongs to. Ensure that FIR types are able to provide this self identification. In addition to implementing the new API, the PointerLikeType interface attachment was moved to FIROpenACCSupport library like MappableType to ensure all type interfaces and their implementation are now in the same spot.	2025-02-12 21:09:59 -08:00
Lang Hames	9456e7fcdd	[ORC] Silence unused variable warnings.	2025-02-13 15:24:43 +11:00
NAKAMURA Takumi	9bd836adbb	[bazel] Introduce HAVE_SYS_AUXV_H for #126863	2025-02-13 13:07:19 +09:00
NAKAMURA Takumi	cdf45447ef	Orc: Suppress a warning in #126691	2025-02-13 13:07:19 +09:00
Brad Smith	89d636ba91	[Support] Fix building on FreeBSD and OpenBSD (#127005 ) Fix building after `a6f7cb54d3`. Check for the function getauxval() instead of just the sys/auxv.h header.	2025-02-12 22:55:22 -05:00
Koakuma	30a9941624	[SPARC][IAS] Add IAS flag handling for ISA levels Add IAS flag handling for ISA levels we support in LLVM. Reviewers: MaskRay, rorth, brad0, s-barannikov Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/125151	2025-02-13 10:22:31 +07:00
Jonas Devlieghere	73ab0c0762	[lldb-dap] Upgrade @types/node to fix TS2386 in node/module.d.ts (#126994 ) Upgrade @types/node to work around an issue in TypeScript [1] that caused our "publish to VSCode Marketplace" github action [2] to fail: ``` node_modules/@types/node/module.d.ts:290:13 - error TS2386: Overload signatures must all be optional or required. 290 resolve?(specified: string, parent?: string \| URL): Promise<string>; ``` [1] https://github.com/microsoft/TypeScript/pull/59259#issuecomment-2228833941 [2] https://github.com/llvm/vscode-lldb/actions/runs/13298213337/job/37134713009	2025-02-12 19:09:09 -08:00
Thurston Dang	df07121d54	[hwasan][NFCI] Rename ClRandomSkipRate to ClRandomKeepRate (#126990 ) The meaning of ClRandomSkipRate was inverted in https://github.com/llvm/llvm-project/pull/88070 but the variable name was not changed. This patch fixes it to avoid confusion. Additionally, it elaborates the flag description to mention the interaction between the random keep rate and hotness cutoff.	2025-02-12 18:43:00 -08:00
Florian Mayer	8ed36373a2	[NFC] [sanitizer] allow getauxval in symbolizer	2025-02-12 17:20:28 -08:00
Longsheng Mou	3e223e3a20	[mlir][vector] Fix out-of-bounds access (#126734 ) This PR fixes an out-of-bounds bug that occurs when there are no overlap dimensions between the `sizes` and source of `vector.extract_strided_slice`, causing access to `sizes` to go out of bounds. Fixes #126196.	2025-02-13 09:17:43 +08:00
Uday Bondhugula	8421ad7f45	[MLIR][Affine] Fix sibling fusion - missing check (#126626 ) Fix sibling fusion for slice maximality check. Producer-consumer fusion had this check but not sibling fusion. Sibling fusion shouldn't be performed if the slice isn't "maximal" (i.e., if it isn't the whole of the source). Fixes: https://github.com/llvm/llvm-project/issues/48703	2025-02-13 06:21:03 +05:30
Tristan Ross	a6f7cb54d3	[Support] Prefer AUX vector for page size (#126863 ) Prefers the page size to come from the AUX vector, `getpagesize` is removed from POSIX.1-2001. Also throws in a couple asserts to ensure the page size is a valid value.	2025-02-13 11:39:49 +11:00
Thurston Dang	51d8255203	[msan] Handle Arm NEON saturating extract and narrow (#125742 ) This handles NEON saturating extract and narrow (Intrinsic::aarch64_neon_{sqxtn, sqxtun, uqxtn}) by (ab)using handleShadowOr() to perform the shadow cast. Previously, these were unknown intrinsics handled suboptimally by visitInstruction. Updates the tests from https://github.com/llvm/llvm-project/pull/125288 and https://github.com/llvm/llvm-project/pull/125140	2025-02-12 16:22:49 -08:00
Jonas Devlieghere	1b582ef3c0	[lldb-dap] Bump the version number for publishing in the Marketplace	2025-02-12 16:14:00 -08:00
Da-Viper	4238238684	[lldb-dap] Fix: Could not find DAP in path (#126903 ) Fixes #120839	2025-02-12 16:11:43 -08:00
Uday Bondhugula	4078b11daa	[MLIR][Affine] Fix fusion crash for non-int/fp memref elt types (#126829 ) Fix assumption on memref elt types being int or float during private memref creation in affine fusion. Fixes: https://github.com/llvm/llvm-project/issues/121020	2025-02-13 05:27:48 +05:30
Florian Mayer	6936fadfc3	[compiler-rt] [sanitizer] avoid UB in allocator (#126977 )	2025-02-12 15:49:55 -08:00
Matthew Bastien	105b3a92a7	[lldb-dap] add `debugAdapterExecutable` property to launch configuration (#126803 ) The Swift extension for VS Code requires that the `lldb-dap` executable come from the Swift toolchain which may or may not be configured in `PATH`. At the moment, this can be configured via LLDB DAP's extension settings, but experience has shown that modifying other extensions' settings on behalf of the user (especially those subject to change whenever a new toolchain is selected) causes issues. Instead, it would be easier to have this configurable in the launch configuration and let the Swift extension (or any other extension that wanted to, really) configure the path to `lldb-dap` that way. This allows the Swift extension to have its own launch configuration type that delegates to the LLDB DAP extension in order to provide a more seamless debugging experience for Swift executables. This PR adds a new property to the launch configuration object called `debugAdapterExecutable` which allows overriding the `lldb-dap` executable path for a specific debug session.	2025-02-12 15:49:38 -08:00
Robert Imschweiler	bcba3117c0	[AMDGPU] SelDAG: fix lowering of undefined workitem intrinsics (#126058 ) GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project.	2025-02-12 18:41:41 -05:00
Andrzej Warzyński	5586541d22	[mlir][tensor] Make useful Tensor utilities public (#126802 ) 1. Extract the main logic from `foldTensorCastPrecondition` into a dedicated helper hook: `hasFoldableTensorCastOperand`. This allows for reusing the corresponding checks. 2. Rename `getNewOperands` to `getUpdatedOperandsAfterCastOpFolding` for better clarity and documentation of its functionality. 3. These updated hooks will be reused in: * https://github.com/llvm/llvm-project/pull/123902. This PR makes them public. Note: Moving these hooks to `Tensor/Utils` is not feasible because `MLIRTensorUtils` depends on `MLIRTensorDialect` (CMake targets). If these hooks were moved to `Utils`, it would create a dependency of `MLIRTensorDialect` on `MLIRTensorUtils`, leading to a circular dependency.	2025-02-12 23:12:14 +00:00
vporpo	1c207f1b6e	[SandboxVec][DAG] Fix DAG when old interval is mem free (#126983 ) This patch fixes a bug in `DependencyGraph::extend()` when the old interval contains no memory instructions. When this is the case we should do a full dependency scan of the new interval.	2025-02-12 15:06:30 -08:00
Amir Bishara	51c847d8f3	[mlir][tosa]-Edit the verifier of tosa constShapeOp (#126962 ) Add verification for rank 1 for the elements' attribute of the tosa const_shape operation.	2025-02-12 15:06:00 -08:00
Louis Dionne	5953e5a3c6	[libc++] Simplify the apple-system-hardened CI configuration (#126911 ) It was basically a copy-paste of the non-hardened version of the same job, and it's easy to remove the duplication.	2025-02-12 23:58:14 +01:00
Louis Dionne	dbfb29fd45	[libc++] Add a link to __builtin_verbose_trap from the hardening docs (#126930 )	2025-02-12 23:57:37 +01:00
vporpo	31cb807537	[SanbdoxVec][BottomUpVec] Fix diamond shuffle with multiple vector inputs (#126965 ) When the operand comes from multiple inputs then we need additional packing code. When the operands are scalar then we can use a single InsertElementInst. But when the operands are vectors then we need a chain of ExtractElementInst and InsertElementInst instructions to insert the vector value into the destination vector. This is what this patch implements.	2025-02-12 14:33:05 -08:00
Nick Desaulniers	3e02069afe	[libc][pthread] fix -Wmissing-field-initializers (#126314 ) Fixes: llvm-project/libc/test/integration/src/pthread/pthread_rwlock_test.cpp:59:29: warning: missing field '__preference' initializer [-Wmissing-field-initializers] 59 \| pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER; \| ^ Also, add a test that demonstrates the same issue for PTHREAD_MUTEX_INITIALIZER, and fix that, too. PTHREAD_ONCE_INIT does not have this issue and does have test coverage.	2025-02-12 14:28:29 -08:00
LLVM GN Syncbot	37952ef75f	[gn build] Port `92f916faba`	2025-02-12 22:22:01 +00:00
Thurston Dang	c6a39697a9	[hwasan][NFCI] Add more test cases to llvm/test/Instrumentation/HWAddressSanitizer/pgo-opt-out.ll (#126980 ) Add more combinations of parameters to test that the skip conditions are OR'ed together	2025-02-12 14:20:23 -08:00
Nikhil Kalra	f3a29906aa	[mlir] BytecodeWriter: invoke `reserveExtraSpace` (#126953 ) Update `BytecodeWriter` to invoke `reserveExtraSpace` on the stream before writing to it. This will give clients implementing custom output streams the opportunity to allocate an appropriately sized buffer for the write.	2025-02-12 14:17:30 -08:00
Razvan Lupusoru	ceb00c0702	[mlir][acc] Clean up TypedValue builders (#126968 ) When MappableType was introduced alongside PointerLikeType, the data clause operation builders were duplicated to accept a `TypedValue` of one of the two type options. However, the underlying builder takes a `Value` and this difference is not relevant for it. The only difference is that `varType` is set differently depending on the type. Having two duplicated builders can lead to clunky building since a `Value` must always be cast to one of the two options. Thus, simply clean this up - the verifier already checks that it is a type that implements one of the two interfaces.	2025-02-12 14:13:45 -08:00
Jeffrey Byrnes	c5a4512d85	[AMDGPU] iglp.opt does not clobber memory operands (#126976 ) I think it was an accident that this wasn't included.	2025-02-12 14:11:02 -08:00
Shubham Sandeep Rastogi	92f916faba	Add a pass to collect dropped var statistics for MIR (#126686 ) This patch attempts to reland https://github.com/llvm/llvm-project/pull/120780 while addressing the issues that caused the patch to be reverted. Namely: 1. The patch had included code from the llvm/Passes directory in the llvm/CodeGen directory. 2. The patch increased the backend compile time by 2% due to adding a very expensive include in MachineFunctionPass.h The patch has been re-structured so that there is no dependency between the llvm/Passes and llvm/CodeGen directory, by moving the base class, `class DroppedVariableStats` to the llvm/IR directory. The expensive include in MachineFunctionPass.h has been changed to contain forward declarations instead of other header includes which was pulling a ton of code into MachineFunctionPass.h and should resolve any issues when it comes to compile time increase.	2025-02-12 14:08:18 -08:00
Nikhil Kalra	65ed4fa57e	[mlir] Python: Parse ModuleOp from file path (#126572 ) For extremely large models, it may be inefficient to load the model into memory in Python prior to passing it to the MLIR C APIs for deserialization. This change adds an API to parse a ModuleOp directly from a file path. Re-lands [`4e14b8a`](`4e14b8afb4`).	2025-02-12 14:02:41 -08:00
Jason Molenda	fa71238da8	[lldb] inserted a typeo when checking in a suggested fix	2025-02-12 14:00:41 -08:00
Jason Molenda	cbb4e99f36	[lldb] Update ThreadPlanStepOut to handle new breakpoint behavior (#126838 ) I will be changing breakpoint hitting behavior soon, where currently lldb reports a breakpoint as being hit when a thread is at a BreakpointSite, but possibly has not executed the breakpoint instruction and trapped yet, to having lldb only report a breakpoint hit when the breakpoint instruction has actually been executed. One corner case bug with this change is that when you are stopped at a breakpoint (that has been hit) on the last instruction of a function, and you do `finish`, a ThreadPlanStepOut is pushed to the thread's plan stack to put a breakpoint on the return address and resume execution. And when the thread is asked to resume, it sees that it is at a BreakpointSite that has been hit, and pushes a ThreadPlanStepOverBreakpoint on the thread. The StepOverBreakpoint plan sees that the thread's state is eStateRunning (not eStateStepping), so it marks itself as "auto continue" -- so once the breakpoint has been stepped over, we will execution on the thread. With current lldb stepping behavior ("a thread at a BreakpointSite is said to have stopped with a breakpoint-hit stop reason, even if the breakpoint hasn't been executed yet"), `ThreadPlanStepOverBreakpoint::DoPlanExplainsStop` has a special bit of code which detects when the thread stops with a eStopReasonBreakpoint. It first checks if the pc is the same as when we started -- did our "step instruction" not actually step? -- says the stop reason is explained. Otherwise it sets auto-continue to false (because we've hit an unexpected breakpoint, and we have advanced past our original pc, and returns false - the stop reason is not explained. So we do the "finish", lldb instruction steps, we stop at the return-address breakpoint and lldb sets the thread's stop reason to breakpoint-hit. ThreadPlanStepOverBreakpoint sees an eStopReasonBreakpoint, sets its auto-continue to false, and says we stopped for osme reason other than this plan. (and it will also report `IsPlanStale()==true` so it will remove itself) Meanwhile the ThreadPlanStepOut sees that it has stopped in the StackID it wanted to run to, and return success. This all changes when stopping at a breakpoint site doesn't report breakpoint-hit until we actually execute the instruction. Now the ThraedPlanStepOverBreakpoint looks at the thread's stop reason, it's eStopReasonTrace (we've instruction stepped), and so it leaves its auto-continue to `true`. ThreadPlanStepOut sees that it has reached its goal StackID, removes its breakpoint, and says it is done. Thread::ShouldStop thinks the auto-continue == yes vote from ThreadPlanStepOverBreakpoint wins, and we lose control of the process. This patch changes ThreadPlanStepOut to require that both (1) we are at the StackID of the caller function, where we wanted to end up, and (2) we have actually hit the breakpoint that we inserted. This in effect means that now lldb instruction-steps over the breakpoint in the callee function, stops at the return address of the caller function. StepOverBreakpoint has completed. StepOut is still running, and we continue the thread again. We immediatley hit the breakpoint (that we're sitting at), and now ThreadPlanStepOut marks itself as completed, and we return control to the user. Jim suggests that ThreadPlanStepOverBreakpoint is a bit unusual because it's not something pushed on the stack by a higher-order thread plan that "owns" it, it is inserted by the Thread as it is about to resume, if we're at a BreakpointSite. It has no connection to the thread plans above it, but tries to set the auto-continue mode based on the state of the thread when it is inserted (and tries to detect an unexpected breakpoint and unset that auto-continue it previously decided on, because it now realizes it should not influence execution control any more). Instead maybe the ThreadPlanStepOverBreakpoint should be inserted as a child plan of whatever the lowest plan is on the stack at the point it is added. I added an API test that will catch this bug in the new thread breakpoint algorithm.	2025-02-12 13:48:01 -08:00
S. Bharadwaj Yadavalli	f2650c54c9	[DirectX] Set Shader Flag DisableOptimizations (#126813 ) - Set the shader flag `DisableOptimizations` based on `optnone` attribute of shader entry functions. - Add DXIL Metadata Analysis pass as pre-requisite for Shader Flags pass to obtain entry function information collected therein. - Named module metadata `dx.disable_optimizations` is intended to indicate disabling optimizations (`-O0`) via commandline flag. However, its intent is fulfilled by `optnone` attribute of shader entry functions as implemented in a recent change, and thus not needed. Delete generation of named metadata and corresponding test file `disable_opt.ll`. - Add tests to verify correctness of setting shader flag. Closes #112263	2025-02-12 16:45:01 -05:00
Thurston Dang	0d95631a3a	[msan] Handle llvm.[us]cmp (starship operator) (#125804 ) Apply handleShadowOr to llvm.[us]cmp. Previously, llvm.[su]cmp was correctly handled heuristically when each parameter type is the same as the return type (e.g., `call i8 @llvm.ucmp.i8.i8(i8 %x, i8 %y)`) but handled incorrectly by visitInstruction when the return type is different e.g., (`call i8 @llvm.ucmp.i8.i62(i62 %x, i62 %y)`, `call <4 x i8> @llvm.ucmp.v4i8.v4i32(<4 x i32> %x, <4 x i32> %y)`). Updates the tests from https://github.com/llvm/llvm-project/pull/125790	2025-02-12 13:38:45 -08:00
Thurston Dang	e9e6ba6a5e	[msan] Handle single-parameter Arm NEON vector convert intrinsics (#126136 ) This handles the following llvm.aarch64.neon intrinsics, which were suboptimally handled by visitInstruction: - fcvtas, fcvtau - fcvtms, fcvtmu - fcvtns, fcvtnu - fcvtps, fcvtpu - fcvtzs, fcvtzu The old instrumentation checked that the shadow of every element of the input vector was fully initialized, and aborted otherwise. The new instrumentation propagates the shadow: for each element of the output, the shadow is initialized iff the corresponding element of the input is fully initialized (since these are floating-point to integer conversions). Updates the tests from https://github.com/llvm/llvm-project/pull/126095	2025-02-12 13:20:22 -08:00
Florian Hahn	82605285b8	[LAA] Also clear CheckingGroups in RuntimePointerChecking::reset. This fixes a crash when trying to print access-info in the newly added test cases.	2025-02-12 21:49:22 +01:00
Vasileios Porpodas	e75e61728e	[SandboxVec] Fix warnings introduced by `7a7f9190d0`	2025-02-12 12:43:24 -08:00
Philip Reames	859c871184	[RISCV] Default to MicroOpBufferSize = 1 for scheduling purposes (#126608 ) This change introduces a default schedule model for the RISCV target which leaves everything unchanged except the MicroOpBufferSize. The default value of this flag in NoSched is 0. Both configurations represent in order cores (i.e. no reorder window), the difference between them comes down to whether heuristics other than latency are allowed to apply. (Implementation details below) I left the processor models which explicitly set MicroOpBufferSize=0 unchanged in this patch, but strongly suspect we should change those too. Honestly, I think the LLVM wide default for this flag should be changed, but don't have the energy to manage the updates for all targets. Implementation wise, the effect of this change is that schedule units which are ready to run except that one of their predecessors may not have completed yet are added to the Available list, not the Pending one. The result of this is that it becomes possible to chose to schedule a node before it's ready cycle if the heuristics prefer. This is essentially chosing to insert a resource stall instead of e.g. increasing register pressure. Note that I was initially concerned there might be a correctness aspect (as in some kind of exposed pipeline design), but the generic scheduler doesn't seem to know how to insert noop instructions. Without that, a program wouldn't be guaranteed to schedule on an exposed pipeline depending on the program and schedule model in question. The effect of this is that we sometimes prefer register pressure in codegen results. This is mostly churn (or small wins) on scalar because we have many more registers, but is of major importance on vector - particularly high LMUL - because we effectively have many fewer registers and the relative cost of spilling is much higher. This is a significant improvement on high LMUL code quality for default rva23u configurations - or any non -mcpu vector configuration for that matter. Fixes #107532	2025-02-12 12:31:39 -08:00
vporpo	7a7f9190d0	[SandboxVec][Legality] Fix mask on diamond reuse with shuffle (#126963 ) This patch fixes a bug in the creation of shuffle masks when vectorizing vectors in case of a diamond reuse with shuffle. The mask needs to enumerate all elements of a vector, not treat the original vector value as a single element. That is: if vectorizing two <2 x float> vectors into a <4 x float> the mask needs to have 4 indices, not just 2.	2025-02-12 12:29:09 -08:00
Philip Reames	9478822f4f	[RISCV] Decompose single source shuffles (without exact VLEN) (#126951 ) (This is a re-apply for what was `8374d42`. The bug there was fairly major - despite the comments and review description, the code was using each register in the source register group, not only the first register. This was completely wrong.) This is a continuation of the work started in https://github.com/llvm/llvm-project/pull/125735 to lower selected VLA shuffles in linear m1 components instead of generating O(LMUL^2) or O(LMUL*Log2(LMUL) high LMUL shuffles. This pattern focuses on shuffles where all the elements being used across the entire destination register group come from a single register in the source register group. Such cases come up fairly frequently via e.g. spread(N), and repeat(N) idioms. One subtlety to this patch is the handling of the index vector for vrgatherei16.vv. Because the index and source registers can have different EEW, the index vector for the Nth chunk of the destination is not guaranteed to be register aligned. In fact, it is common for e.g. an EEW=64 shuffle to have EEW=16 indices which are four chunks per source register. Given this, we have to pay a cost for extracting these chunks into the low position before performing each shuffle. I'd initially expressed this as a naive extract sub-vector for each data parallel piece. However, at high LMUL, this quickly caused register pressure problems since we could at worst need 4x the temporary registers for the index. Instead, this patch uses a repeating slidedown chained from previous iterations. This increases critical path by at worst 3 slides (SEW=64 is the worst case), but reduces register pressure to at worst 2x - and only if the original index vector is reused elsewhere. I view this as arguably a bit of a workaround (since our scheduling should have done better with the plain extract variant), but a probably necessary one.	2025-02-12 12:10:35 -08:00
Peter Rong	53c618c071	[clang] run clang-format on some CGObjC files (#126644 ) These files are relatively old and don't confront our formatting rules. It's hard to change them without massive clang-format changes. --------- Signed-off-by: Peter Rong <PeterRong@meta.com>	2025-02-12 11:52:49 -08:00
vporpo	6d7a84d72b	[SandboxVec][Scheduler] Fix top of schedule (#126820 ) This patch fixes the way the top-of-schedule variable gets set and updated. Before this patch it used to get updated whenever we scheduled a bundle, which is wrong, as the top-of-schedule needs to be maintained across scheduling attempts. It should get reset only when we clear the schedule or when we destroy the current schedule and re-schedule.	2025-02-12 11:52:01 -08:00

1 2 3 4 5 ...

527198 Commits