[llvm] Fix typos in documentation (#140844)
This commit is contained in:
@@ -1222,7 +1222,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
|
||||
argument should be wavefront-uniform; the global pointer need not be.
|
||||
The LDS pointer is implicitly offset by 4 * lane_id bytes for sies <= 4 bytes
|
||||
and 16 * lane_id bytes for larger sizes. This lowers to `global_load_lds`,
|
||||
`buffer_load_* ... lds`, or `global_load__* ... lds` depnedening on address
|
||||
`buffer_load_* ... lds`, or `global_load__* ... lds` depending on address
|
||||
space and architecture. `amdgcn.global.load.lds` has the same semantics as
|
||||
`amdgcn.load.to.lds.p1`.
|
||||
llvm.amdgcn.readfirstlane Provides direct access to v_readfirstlane_b32. Returns the value in
|
||||
@@ -1354,7 +1354,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
|
||||
- 0x0020: VMEM read instructions may be scheduled across sched_barrier.
|
||||
- 0x0040: VMEM write instructions may be scheduled across sched_barrier.
|
||||
- 0x0080: All DS instructions may be scheduled across sched_barrier.
|
||||
- 0x0100: All DS read instructions may be scheduled accoss sched_barrier.
|
||||
- 0x0100: All DS read instructions may be scheduled across sched_barrier.
|
||||
- 0x0200: All DS write instructions may be scheduled across sched_barrier.
|
||||
- 0x0400: All Transcendental (e.g. V_EXP) instructions may be scheduled across sched_barrier.
|
||||
|
||||
@@ -1383,7 +1383,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
|
||||
| ``__builtin_amdgcn_sched_group_barrier(8, 5, 0)``
|
||||
|
||||
llvm.amdgcn.iglp.opt An **experimental** intrinsic for instruction group level parallelism. The intrinsic
|
||||
implements predefined intruction scheduling orderings. The intrinsic applies to the
|
||||
implements predefined instruction scheduling orderings. The intrinsic applies to the
|
||||
surrounding scheduling region. The intrinsic takes a value that specifies the
|
||||
strategy. The compiler implements two strategies.
|
||||
|
||||
|
||||
@@ -733,7 +733,7 @@ enabled sub-projects. Nearly all of these variable names begin with
|
||||
On Windows, allows embedding a different C runtime allocator into the LLVM
|
||||
tools and libraries. Using a lock-free allocator such as the ones listed below
|
||||
greatly decreases ThinLTO link time by about an order of magnitude. It also
|
||||
midly improves Clang build times, by about 5-10%. At the moment, rpmalloc,
|
||||
mildly improves Clang build times, by about 5-10%. At the moment, rpmalloc,
|
||||
snmalloc and mimalloc are supported. Use the path to `git clone` to select
|
||||
the respective allocator, for example:
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ parallel environment. To eliminate this assumption:
|
||||
instruction can be examined for uniformity across multiple threads only if the
|
||||
corresponding executions of that instruction are converged.
|
||||
|
||||
This document decribes a static analysis for determining convergence at each
|
||||
This document describes a static analysis for determining convergence at each
|
||||
instruction in a function. The analysis extends previous work on divergence
|
||||
analysis [DivergenceSPMD]_ to cover irreducible control-flow. The described
|
||||
analysis is used in LLVM to implement a UniformityAnalysis that determines the
|
||||
|
||||
@@ -3413,11 +3413,11 @@ memory before the call, the call may capture two components of the pointer:
|
||||
whether only the fact that the address is/isn't null is captured.
|
||||
* The provenance of the pointer, which is the ability to perform memory
|
||||
accesses through the pointer, in the sense of the :ref:`pointer aliasing
|
||||
rules <pointeraliasing>`. We further distinguish whether only read acceses
|
||||
rules <pointeraliasing>`. We further distinguish whether only read accesses
|
||||
are allowed, or both reads and writes.
|
||||
|
||||
For example, the following function captures the address of ``%a``, because
|
||||
it is compared to a pointer, leaking information about the identitiy of the
|
||||
it is compared to a pointer, leaking information about the identity of the
|
||||
pointer:
|
||||
|
||||
.. code-block:: llvm
|
||||
@@ -3472,7 +3472,7 @@ through the return value only:
|
||||
However, we always consider direct inspection of the pointer address
|
||||
(e.g. using ``ptrtoint``) to be location-independent. The following example
|
||||
is *not* considered a return-only capture, even though the ``ptrtoint``
|
||||
ultimately only contribues to the return value:
|
||||
ultimately only contributes to the return value:
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
@@ -17041,12 +17041,12 @@ and IEEE-754-2008: the result of ``minnum(-0.0, +0.0)`` may be either -0.0 or +0
|
||||
|
||||
Some architectures, such as ARMv8 (FMINNM), LoongArch (fmin), MIPSr6 (min.fmt), PowerPC/VSX (xsmindp),
|
||||
have instructions that match these semantics exactly; thus it is quite simple for these architectures.
|
||||
Some architectures have similiar ones while they are not exact equivalent. Such as x86 implements ``MINPS``,
|
||||
Some architectures have similar ones while they are not exact equivalent. Such as x86 implements ``MINPS``,
|
||||
which implements the semantics of C code ``a<b?a:b``: NUM vs qNaN always return qNaN. ``MINPS`` can be used
|
||||
if ``nsz`` and ``nnan`` are given.
|
||||
|
||||
For existing libc implementations, the behaviors of fmin may be quite different on sNaN and signed zero behaviors,
|
||||
even in the same release of a single libm implemention.
|
||||
even in the same release of a single libm implementation.
|
||||
|
||||
.. _i_maxnum:
|
||||
|
||||
@@ -17101,12 +17101,12 @@ and IEEE-754-2008: the result of maxnum(-0.0, +0.0) may be either -0.0 or +0.0.
|
||||
|
||||
Some architectures, such as ARMv8 (FMAXNM), LoongArch (fmax), MIPSr6 (max.fmt), PowerPC/VSX (xsmaxdp),
|
||||
have instructions that match these semantics exactly; thus it is quite simple for these architectures.
|
||||
Some architectures have similiar ones while they are not exact equivalent. Such as x86 implements ``MAXPS``,
|
||||
Some architectures have similar ones while they are not exact equivalent. Such as x86 implements ``MAXPS``,
|
||||
which implements the semantics of C code ``a>b?a:b``: NUM vs qNaN always return qNaN. ``MAXPS`` can be used
|
||||
if ``nsz`` and ``nnan`` are given.
|
||||
|
||||
For existing libc implementations, the behaviors of fmin may be quite different on sNaN and signed zero behaviors,
|
||||
even in the same release of a single libm implemention.
|
||||
even in the same release of a single libm implementation.
|
||||
|
||||
.. _i_minimum:
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ This document is an outline of the tooling and APIs facilitating MLGO.
|
||||
|
||||
Note that tools for orchestrating ML training are not part of LLVM, as they are
|
||||
dependency-heavy - both on the ML infrastructure choice, as well as choices of
|
||||
distrubuted computing. For the training scenario, LLVM only contains facilities
|
||||
distributed computing. For the training scenario, LLVM only contains facilities
|
||||
enabling it, such as corpus extraction, training data extraction, and evaluation
|
||||
of models during training.
|
||||
|
||||
@@ -212,7 +212,7 @@ decisions.
|
||||
For a specific optimization problem - i.e. inlining, or regalloc eviction - we
|
||||
first separate correctness - preserving decisions from optimization decisions.
|
||||
For example, not inlining functions marked "no inline" is an example of the
|
||||
former. Same is not evicting an unevictable live range. An exmple of the latter
|
||||
former. Same is not evicting an unevictable live range. An example of the latter
|
||||
is deciding to inline a function that will bloat the caller size, just because
|
||||
we have reason to believe that later, the effect will be some constant
|
||||
propagation that will actually reduce the size (or dynamic instruction count).
|
||||
|
||||
@@ -338,7 +338,7 @@ In the above figure, ``X`` and ``Y`` are atomic operations on a
|
||||
location in the ``global`` address space. If ``X`` synchronizes with
|
||||
``Y``, then ``B`` happens-before ``C`` in the ``local`` address
|
||||
space. But no such statement can be made about operations ``A`` and
|
||||
``D``, although they are peformed on a location in the ``global``
|
||||
``D``, although they are performed on a location in the ``global``
|
||||
address space.
|
||||
|
||||
Implementation Example: Adding Address Space Information to Fences
|
||||
|
||||
@@ -360,7 +360,7 @@ changed.
|
||||
Within a few days, someone should start the review. They may add
|
||||
themselves as a reviewer, or simply start leaving comments. You'll get
|
||||
another email any time the review is updated. For more detail see the
|
||||
:ref:`Code Review Poilicy <code_review_policy>`.
|
||||
:ref:`Code Review Policy <code_review_policy>`.
|
||||
|
||||
Comments
|
||||
~~~~~~~~
|
||||
|
||||
@@ -385,7 +385,7 @@ Semantics:
|
||||
""""""""""
|
||||
|
||||
Before the absolute value is taken, the input is flushed to sign preserving
|
||||
zero if it is a subnormal. In addtion, unlike '``llvm.fabs.*``', a NaN input
|
||||
zero if it is a subnormal. In addition, unlike '``llvm.fabs.*``', a NaN input
|
||||
yields an unspecified NaN output.
|
||||
|
||||
|
||||
@@ -473,7 +473,7 @@ Overview:
|
||||
|
||||
The '``llvm.nvvm.fshl.clamp``' family of intrinsics performs a clamped funnel
|
||||
shift left. These intrinsics are very similar to '``llvm.fshl``', except the
|
||||
shift ammont is clamped at the integer width (instead of modulo it). Currently,
|
||||
shift amount is clamped at the integer width (instead of modulo it). Currently,
|
||||
only ``i32`` is supported.
|
||||
|
||||
Semantics:
|
||||
@@ -501,7 +501,7 @@ Overview:
|
||||
|
||||
The '``llvm.nvvm.fshr.clamp``' family of intrinsics perform a clamped funnel
|
||||
shift right. These intrinsics are very similar to '``llvm.fshr``', except the
|
||||
shift ammont is clamped at the integer width (instead of modulo it). Currently,
|
||||
shift amount is clamped at the integer width (instead of modulo it). Currently,
|
||||
only ``i32`` is supported.
|
||||
|
||||
Semantics:
|
||||
|
||||
@@ -71,7 +71,7 @@ Pointee types provide some value to frontends because the IR verifier uses types
|
||||
to detect straightforward type confusion bugs. However, frontends also have to
|
||||
deal with the complexity of inserting bitcasts everywhere that they might be
|
||||
required. The community consensus is that the costs of pointee types
|
||||
outweight the benefits, and that they should be removed.
|
||||
outweigh the benefits, and that they should be removed.
|
||||
|
||||
Many operations do not actually care about the underlying type. These
|
||||
operations, typically intrinsics, usually end up taking an arbitrary pointer
|
||||
|
||||
@@ -306,7 +306,7 @@ Supported
|
||||
.. _riscv-zacas-note:
|
||||
|
||||
``Zacas``
|
||||
The compiler will not generate amocas.d on RV32 or amocas.q on RV64 due to ABI compatibilty. These can only be used in the assembler.
|
||||
The compiler will not generate amocas.d on RV32 or amocas.q on RV64 due to ABI compatibility. These can only be used in the assembler.
|
||||
|
||||
Atomics ABIs
|
||||
============
|
||||
|
||||
@@ -346,7 +346,7 @@ instruction is available.
|
||||
}
|
||||
|
||||
Many of these math functions are only vectorizable if the file has been built
|
||||
with a specified target vector library that provides a vector implemention
|
||||
with a specified target vector library that provides a vector implementation
|
||||
of that math function. Using clang, this is handled by the "-fveclib" command
|
||||
line option with one of the following vector libraries:
|
||||
"accelerate,libmvec,massv,svml,sleef,darwin_libsystem_m,armpl,amdlibm"
|
||||
|
||||
@@ -35,7 +35,7 @@ produced the trace file.
|
||||
Header Section
|
||||
==============
|
||||
|
||||
A trace file begins with a 32 byte header.
|
||||
A trace file begins with a 32-byte header.
|
||||
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
@@ -119,7 +119,7 @@ attempt to pad for alignment, and it is not seekable.
|
||||
Function Records
|
||||
----------------
|
||||
|
||||
Function Records have an 8 byte layout. This layout encodes information to
|
||||
Function Records have an 8-byte layout. This layout encodes information to
|
||||
reconstruct a call stack of instrumented function and their durations.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
@@ -178,7 +178,7 @@ records for each of the logged args follow the function record in the stream.
|
||||
Metadata Records
|
||||
----------------
|
||||
|
||||
Interspersed throughout the buffer are 16 byte Metadata records. For typically
|
||||
Interspersed throughout the buffer are 16-byte Metadata records. For typically
|
||||
instrumented binaries, they will be sparser than Function records, and they
|
||||
provide a fuller picture of the binary execution state.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user