[chore] Refresh refcache, fix external links with invalid fragments (#6206)

This commit is contained in:
Patrice Chalin 2025-02-06 06:35:44 -05:00 committed by GitHub
parent fe623719bc
commit 3f5742fb4c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
17 changed files with 1302 additions and 1162 deletions

View File

@ -111,9 +111,9 @@ that our Kafka installation is working as expected.
### Export metrics to Prometheus
The metrics can be exported by any of the supported metric exporters, to a
backend of your choice. The full list of exporters and their configuration
options can be found
[here](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/README.md#exporters).
backend of your choice. For the full list of exporters and their configuration
options, see
[Properties: exporters](/docs/languages/java/configuration/#properties-exporters).
For instance, you can export the metrics to an OTel collector using the OTLP
exporter, perform some processing and then consume the metrics on a backend of
your choice. In this example for the sake of simplicity, we are directly

View File

@ -65,7 +65,7 @@ wanted to focus on:
OTLP exporter.
2. Individual Go modules that the Collector components rely upon must also be
marked as stable as per the project's
[versioning guidelines](https://github.com/open-telemetry/opentelemetry-collector/blob/main/VERSIONING.md#public-api-expectations).
[versioning guidelines](https://github.com/open-telemetry/opentelemetry-collector/blob/main/VERSIONING.md#general-go-api-considerations).
Aside from this, there were a few areas the contributors wanted to improve based
on user feedback:

View File

@ -47,7 +47,7 @@ To become a code owner of one of the modules, you need to be a member of the
OpenTelemetry organization and have a good working knowledge of the code you
seek to maintain. To become a member of OpenTelemetry in GitHub, see the
requirements in
[Community membership](https://github.com/open-telemetry/community/blob/main/community-membership.md#requirements).
[Community membership](https://github.com/open-telemetry/community/blob/main/guides/contributor/membership.md#requirements).
If you satisfy all requirements,
[open an issue](https://github.com/open-telemetry/opentelemetry-go-contrib/issues/new?assignees=&labels=&projects=&template=owner.md&title=).

View File

@ -285,7 +285,7 @@ The result is a configurable option unique to OpenTelemetry Java called
what their memory mode is based on whether they read metric state concurrently
or not. Right now you opt into the optimized memory behavior (which we call
`MemoryMode.reusable_data`) via an
[environment variable](https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/autoconfigure#exporters).
[environment variable](/docs/languages/java/configuration/#properties-exporters).
In the future, the optimized memory mode will be enabled by default, since only
exceptional cases need concurrent access to the metric state. It turns out that
the objects holding the metric state (`MetricData` in OpenTelemetry Java terms)

View File

@ -10,8 +10,7 @@ cSpell:ignore: otlphttp spanmetrics tracetest tracetesting
## Prerequisites
- Docker
- [Docker Compose](https://docs.docker.com/compose/install/#install-compose)
v2.0.0+
- [Docker Compose](https://docs.docker.com/compose/install/) v2.0.0+
- Make (optional)
- 6 GB of RAM for the application

View File

@ -329,7 +329,7 @@ See the full `OpenTelemetryCollector`
### Did you configure a ServiceMonitor (or PodMonitor) selector?
If you configured a
[`ServiceMonitor`](https://observability.thomasriley.co.uk/prometheus/configuring-prometheus/using-service-monitors/#:~:text=The%20ServiceMonitor%20is%20used%20to,build%20the%20required%20Prometheus%20configuration.)
[`ServiceMonitor`](https://observability.thomasriley.co.uk/prometheus/configuring-prometheus/using-service-monitors/)
selector, it means that the Target Allocator only looks for `ServiceMonitors`
having a `metadata.label` that matches the value in
[`serviceMonitorSelector`](https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr-1).

View File

@ -245,10 +245,10 @@ value=8192, exemplars=[]}], monotonic=false, aggregationTemporality=CUMULATIVE}}
For more:
- Run this example with another [exporter][] for telemetry data.
- Run this example with another [exporter] for telemetry data.
- Try [zero-code instrumentation](/docs/zero-code/java/agent/) on one of your
own apps.
- For light-weight customized telemetry, try [annotations][].
- For light-weight customized telemetry, try [annotations].
- Learn about [manual instrumentation][] and try out more
[examples](../examples/).
- Take a look at the [OpenTelemetry Demo](/docs/demo/), which includes Java
@ -260,10 +260,8 @@ For more:
[logs]: /docs/concepts/signals/logs/
[annotations]: /docs/zero-code/java/agent/annotations/
[configure the java agent]: /docs/zero-code/java/agent/configuration/
[console exporter]:
https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/README.md#logging-exporter
[exporter]:
https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/README.md#exporters
[console exporter]: /docs/languages/java/configuration/#properties-exporters
[exporter]: /docs/languages/java/configuration/#properties-exporters
[java-vers]:
https://github.com/open-telemetry/opentelemetry-java/blob/main/VERSIONING.md#language-version-compatibility
[manual instrumentation]: ../instrumentation

View File

@ -12,7 +12,6 @@ cSpell:ignore: mbstring opcache
## Further Reading
- [OpenTelemetry for PHP on GitHub](https://github.com/open-telemetry/opentelemetry-php)
- [Installation](https://github.com/open-telemetry/opentelemetry-php#installation)
- [Examples](https://github.com/open-telemetry/opentelemetry-php/tree/main/examples)
## Requirements

View File

@ -189,7 +189,7 @@ a few more features that will allow you gain even deeper insights!
[traces]: /docs/concepts/signals/traces/
[instrumentations]:
https://github.com/open-telemetry/opentelemetry-ruby#instrumentation-libraries
https://github.com/open-telemetry/opentelemetry-ruby-contrib/tree/main/instrumentation
[config]: ../libraries/#configuring-specific-instrumentation-libraries
[exporters]: ../exporters/
[context propagation]: ../instrumentation/#context-propagation

View File

@ -36,7 +36,7 @@ project. For example, if you import the `spring-boot-dependencies` BOM, you have
to declare it after the OpenTelemetry BOMs.
Gradle selects the
[latest version](https://docs.gradle.org/current/userguide/dependency_resolution.html#sec:version-conflict)
[latest version](https://docs.gradle.org/current/userguide/dependency_resolution.html#2_perform_conflict_resolution)
of a dependency when multiple BOMs, so the order of BOMs is not important.
{{% /alert %}}
@ -106,7 +106,7 @@ with the `io.spring.dependency-management` plugin.
Add the dependency given below to enable the OpenTelemetry starter.
The OpenTelemetry starter uses OpenTelemetry Spring Boot
[autoconfiguration](https://docs.spring.io/spring-boot/docs/current/reference/html/using.html#using.auto-configuration).
[autoconfiguration](https://docs.spring.io/spring-boot/reference/using/auto-configuration.html).
{{< tabpane text=true >}} {{% tab header="Maven (`pom.xml`)" lang=Maven %}}

View File

@ -157,7 +157,7 @@ that require the assemblies used to instrument .NET Framework applications, the
ones under the `netfx` folder of the installation directory, to be also
installed into the Global Assembly Cache (GAC):
1. [**Monkey patch instrumentation**](https://en.wikipedia.org/wiki/Monkey_patch#:~:text=Monkey%20patching%20is%20a%20technique,Python%2C%20Groovy%2C%20etc.)
1. [**Monkey patch instrumentation**](https://en.wikipedia.org/wiki/Monkey_patch)
of assemblies loaded as domain-neutral.
2. Assembly redirection for strong-named applications if the app also ships
different versions of some assemblies also shipped in the `netfx` folder.

View File

@ -212,7 +212,7 @@
commercial: true
- name: Logz.io
nativeOTLP: false
url: https://docs.logz.io/shipping/tracing-sources/opentelemetry.html#overview
url: https://docs.logz.io/docs/shipping/other/opentelemetry-data/
contact:
oss: false
commercial: true
@ -344,7 +344,7 @@
commercial: true
- name: SolarWinds
nativeOTLP: true
url: https://documentation.solarwinds.com/en/success_center/observability/default.htm#cshid=third-otel-integration
url: https://documentation.solarwinds.com/en/success_center/observability/content/intro/otel.htm
contact:
oss: false
commercial: true

View File

@ -16,6 +16,6 @@ authors:
url: https://www.cisco.com/
urls:
website: https://www.cisco.com/c/en/us/products/cloud-systems-management/network-services-orchestrator/index.html
docs: https://developer.cisco.com/docs/nso/#!observability-exporter/
docs: https://developer.cisco.com/docs/nso/observability-exporter/
createdAt: '2024-08-06'
isFirstParty: true

View File

@ -118,6 +118,7 @@ sub patchSemConv1_30_0() {
s|(docs/specs/otel/logs/api.md#emit-a)n-event|$1-logrecord|;
s|\[semantic-convention-groups\]|[group-stability]|;
s|\Q../../docs/|../|g; # https://github.com/open-telemetry/semantic-conventions/pull/1843
s|\Qhttps://wikipedia.org/wiki/Where_(SQL)#IN|https://wikipedia.org/wiki/SQL_syntax#Operators|g;
}
sub getVersFromSubmodule() {

View File

@ -2,10 +2,37 @@
import fs from 'fs/promises';
import { getUrlStatus, isHttp2XX } from './get-url-status.mjs';
import { exit } from 'process';
const CACHE_FILE = 'static/refcache.json';
const GOOGLE_DOCS_URL = 'https://docs.google.com/';
let checkForFragments = false;
let maxNumEntriesToUpdate = 3;
const cratesIoURL = 'https://crates.io/crates/';
// Magic numbers that we use to determine if a URL with a fragment has been
// checked with this script. Since we can't add new fields to the cache, we
// encode "magic" values in the LastSeen field.
const fragSecondsOk = 12;
const fragMillisecondsOk = 345;
const fragSecondsInvalid = 59;
const fragMillisecondsInvalid = 999;
function isHttp2XXForFragments(StatusCode, lastSeenDate) {
return (
isHttp2XX(StatusCode) &&
lastSeenDate.getSeconds() === fragSecondsOk &&
lastSeenDate.getMilliseconds() === fragMillisecondsOk
);
}
function is4XXForFragments(StatusCode, lastSeenDate) {
return (
lastSeenDate.getSeconds() === fragSecondsInvalid &&
lastSeenDate.getMilliseconds() === fragMillisecondsInvalid
);
}
async function readRefcache() {
try {
const data = await fs.readFile(CACHE_FILE, 'utf8');
@ -18,42 +45,154 @@ async function readRefcache() {
async function writeRefcache(cache) {
await fs.writeFile(CACHE_FILE, JSON.stringify(cache, null, 2) + '\n', 'utf8');
console.log(`Updated ${CACHE_FILE} with fixed links.`);
console.log(`Wrote updated ${CACHE_FILE}.`);
}
// Retry HTTP status check for refcache URLs with non-200s and not 404
async function retry400sAndUpdateCache() {
console.log(`Checking ${CACHE_FILE} for 4XX status URLs ...`);
const cache = await readRefcache();
let updated = false;
let updatedCount = 0;
let entriesCount = 0;
let urlWithFragmentCount = 0;
let urlWithInvalidFragCount = 0;
let statusCounts = {};
for (const [url, details] of Object.entries(cache)) {
entriesCount++;
const parsedUrl = new URL(url);
if (parsedUrl.hash) urlWithFragmentCount++;
const { StatusCode, LastSeen } = details;
if (isHttp2XX(StatusCode)) continue;
if (StatusCode === 404 && !url.startsWith(cratesIoURL)) {
console.log(`Skipping 404: ${url} (last seen ${LastSeen}).`);
const lastSeenDate = new Date(LastSeen);
countStatuses(StatusCode, parsedUrl, lastSeenDate, statusCounts);
if (
checkForFragments && parsedUrl.hash
? isHttp2XXForFragments(StatusCode, lastSeenDate)
: isHttp2XX(StatusCode)
) {
// process.stdout.write('.');
continue;
}
process.stdout.write(`Checking: ${url} (was ${StatusCode}) ... `);
const verbose = false;
const status = await getUrlStatus(url, verbose);
if (
(StatusCode === 404 &&
// Handles special case of crates.io. For details, see:
// https://github.com/rust-lang/crates.io/issues/788
!url.startsWith(cratesIoURL)) ||
(parsedUrl.hash && is4XXForFragments(StatusCode, lastSeenDate))
) {
console.log(
`Skipping ${StatusCode}: ${url} (last seen ${lastSeenDate.toLocaleDateString()})${
is4XXForFragments(StatusCode, lastSeenDate) ? ' INVALID FRAGMENT' : ''
}`,
);
if (parsedUrl.hash) urlWithInvalidFragCount++;
continue;
}
if (url.startsWith(GOOGLE_DOCS_URL)) {
// console.log(`Skipping Google Docs URL (for now): ${url}.`);
// process.stdout.write('.');
continue;
/*
URLs are of the form:
https://docs.google.com/document/d/15vR7D1x2tKd7u3zaTF0yH1WaHkUr2T4hhr7OyiZgmBg/edit?tab=t.0#heading=h.4xuru5ljcups
We can simply check for the presence of the heading query parameter value in the page.
"ps_hdid":"h.4xuru5ljcups" # cSpell:disable-line
*/
}
if (maxNumEntriesToUpdate && updatedCount >= maxNumEntriesToUpdate) {
console.log(`Updated max of ${maxNumEntriesToUpdate} entries, exiting.`);
break;
}
process.stdout.write(
`Checking${
parsedUrl.hash ? ` for fragment in` : `:`
} ${url} (was ${StatusCode}) ... `,
);
let status = await getUrlStatus(url);
console.log(`${status}.`);
if (!isHttp2XX(status)) continue;
let now = new Date();
if (parsedUrl.hash) {
if (isHttp2XX(status)) {
// Encore that the fragment was checked and is valid.
now.setSeconds(fragSecondsOk);
now.setMilliseconds(fragMillisecondsOk);
} else {
status = StatusCode; // Keep the original status, rather than our custom 4XX status.
now.setSeconds(fragSecondsInvalid);
now.setMilliseconds(fragMillisecondsInvalid);
urlWithInvalidFragCount++;
}
} else if (!isHttp2XX(status)) {
continue;
}
cache[url] = {
StatusCode: status,
LastSeen: new Date().toISOString(),
LastSeen: now.toISOString(),
};
updated = true;
updatedCount++;
}
if (updated) {
if (updatedCount) {
await writeRefcache(cache);
} else {
console.log(`No updates needed.`);
}
console.log(
`Processed ${entriesCount} URLs${
checkForFragments
? ` (${urlWithFragmentCount} with fragments, ${urlWithInvalidFragCount} are invalid)`
: ''
}`,
);
for (const [status, count] of Object.entries(statusCounts)) {
console.log(`Status ${status}: ${count}`);
}
}
function countStatuses(StatusCode, parsedUrl, lastSeenDate, statusCounts) {
let sc = StatusCode;
if (checkForFragments) {
sc += parsedUrl.hash
? ' frag ' +
(isHttp2XXForFragments(StatusCode, lastSeenDate) ? 'ok' : 'er')
: ' no frag';
}
statusCounts[sc] = (statusCounts[sc] || 0) + 1;
}
function getNumericFlagValue(flagName) {
const flagArg = process.argv.find((arg) => arg.startsWith(flagName));
if (!flagArg) return;
const valueArg = flagArg.includes('=')
? flagArg.split('=')[1]
: process.argv[process.argv.indexOf(flagName) + 1];
let value = parseInt(valueArg);
if (value < 0) {
console.error(
`ERROR: invalid value for ${flagName}: ${valueArg}. ` +
`Must be a number > 0. Using default ${maxNumEntriesToUpdate}.`,
);
exit(1);
}
return value;
}
const _maxNumEntriesToUpdateFlag = getNumericFlagValue('--max-num-to-update');
if (_maxNumEntriesToUpdateFlag >= 0)
maxNumEntriesToUpdate = _maxNumEntriesToUpdateFlag;
checkForFragments =
process.argv.includes('--check-for-fragments') || process.argv.includes('-f');
await retry400sAndUpdateCache();

View File

@ -1,11 +1,15 @@
#!/usr/bin/env node
import puppeteer from 'puppeteer'; // Consider using puppeteer-core
import { URL } from 'url';
const DOCS_ORACLE_URL = 'https://docs.oracle.com/';
const STATUS_OK_BUT_FRAG_NOT_FOUND = 422;
const cratesIoURL = 'https://crates.io/crates/';
let verbose = false;
function log(...args) {
export function log(...args) {
if (!verbose) return;
const lastArg = args[args.length - 1];
if (typeof lastArg === 'string' && lastArg.endsWith(' ')) {
@ -15,11 +19,67 @@ function log(...args) {
}
}
// Check for fragment and corresponding anchor ID in page.
async function checkForFragment(url, page, status) {
const parsedUrl = new URL(url);
if (parsedUrl.hash) {
let fragmentID = parsedUrl.hash.substring(1); // Remove the leading '#'
// if (url.startsWith(DOCS_ORACLE_URL)) { // Would also need for GitHub.com
fragmentID = decodeURIComponent(fragmentID);
// }
let anchorExists =
//
// Look for ID attribute in the page.
//
(await page.evaluate((id) => {
return !!document.getElementById(id);
}, fragmentID)) ||
//
// Look for named anchors
//
(await page.evaluate((name) => {
const elt = document.querySelector(`a[name="${name}"]`);
return !!elt;
}, fragmentID)) ||
//
// Github.com repo special cases
//
(url.startsWith('https://github.com/') &&
(await anchorExistsInGitHub(page, fragmentID)));
if (!anchorExists) status = STATUS_OK_BUT_FRAG_NOT_FOUND;
}
return status;
}
async function anchorExistsInGitHub(page, fragmentID) {
if (/L\d+(-L\d+)?/.test(fragmentID)) {
// Handle line references in GitHub repos.
return await page.evaluate((name) => {
// Look for references to the fragment in the page, possibly with an
// `-ov-file` suffix (used as anchors of tabs in repo landing pages).
return !!document.querySelector('div.highlighted-line');
}, fragmentID);
}
// Handle other fragment references in GitHub repos, link references
// to files (such as README), or to headings inside of displayed markdown.
return await page.evaluate((name) => {
// Look for references to the fragment in the page, possibly with an
// `-ov-file` suffix (used as anchors of tabs in repo landing pages).
const elt = document.querySelector(
`a[href="#${name}"], a[href="#${name}-ov-file"]`,
);
return !!elt;
}, fragmentID);
}
async function getUrlHeadless(url) {
// Get the URL, headless, while trying our best to avoid triggering
// bot-rejection from some servers. Returns the HTTP status code.
log(`Headless fetch of ${url} ... `);
log(`Fetch ${url} headless ... `);
let browser;
try {
@ -62,6 +122,7 @@ async function getUrlHeadless(url) {
if (!crateNameRegex.test(title)) status = 404;
}
status = await checkForFragment(url, page, status);
log(`${status}; page title: '${title}'`);
return status;
@ -87,8 +148,10 @@ async function getUrlInBrowser(url) {
if (!response) throw new Error('No response from server.');
const status = response.status();
log(`HTTP status code: ${status}`);
let status = response.status();
const title = await page.title();
status = await checkForFragment(url, page, status);
log(`${status}; page title: '${title}'`);
return status;
} catch (error) {
@ -107,7 +170,8 @@ export async function getUrlStatus(url, _verbose = false) {
verbose = _verbose;
let status = await getUrlHeadless(url);
// If headless fetch fails, try in browser for non-404 statuses
if (!isHttp2XX(status) && status !== 404) {
if (!isHttp2XX(status) && status !== 404 && status !== 422) {
log(`\n\t retrying in browser ... `);
status = await getUrlInBrowser(url);
}
return status;

File diff suppressed because it is too large Load Diff