Skip to main content
Since many LLM operations tend to be I/O bound, it is often useful to use threads to perform multiple operations at once. Usually, you’ll use the ThreadPoolExecutor class from the concurrent.futures module in the Python standard library, like this:
indexes = [pinecone.Index(f"index{i}") for i in range(3)]
executor = ThreadPoolExecutor(max_workers=3)
for i in range(3):
    executor.submit(indexes[i].query, [1.0, 2.0, 3.0], top_k=10)
Unfortunately, this won’t work as you expect and may cause you to see “broken” traces or missing spans. The reason relies in how OpenTelemetry (which is what we use under the hood for tracing) uses Python’s context to propagate the trace. You’ll need to explicitly propagate the context to the threads:
import contextvars
import functools

indexes = [pinecone.Index(f"index{i}") for i in range(3)]
executor = ThreadPoolExecutor(max_workers=3)
for i in range(3):
    # Copy context for EACH submit call
    ctx = contextvars.copy_context()
    executor.submit(
        ctx.run,
        functools.partial(indexes[i].query, [1.0, 2.0, 3.0], top_k=10),
    )
You must copy the context for each executor.submit() call.A common mistake is to copy the context once and reuse it:
# WRONG - causes race conditions and missing spans
ctx = contextvars.copy_context()  # Only copied once!
for i in range(3):
    executor.submit(ctx.run, my_function)  # All threads share same ctx
This causes race conditions when multiple threads call ctx.run() simultaneously on the same context object, resulting in missing or orphaned spans.Always copy inside the loop:
# CORRECT - each thread gets its own context copy
for i in range(3):
    ctx = contextvars.copy_context()  # Fresh copy for each submit
    executor.submit(ctx.run, my_function)

Learn more