How Python Lists Really Work: Memory Allocation, Time Complexity, and Sorting Internals Explained Through Live Experiments
Most Python tutorials show you that list.append() “adds to the end.” This article shows you why it is fast, when it becomes slow, and what CPython is doing in C underneath every operation you write. Every claim is backed by a live experiment you can interact with on this page.
Most Python Developers Use Lists Correctly — But Almost Nobody Understands Why They Work
If you have been writing Python for any length of time, you have written this:
my_list = [1, 2, 3]
my_list.append(4)
print(my_list) # [1, 2, 3, 4]
print(len(my_list)) # 4
That code is correct, readable, and runs fast — and for most purposes, knowing that it works is enough. But consider what happens when you start asking different questions. Why does append() never seem to slow down, even as the list grows to ten thousand items? Why does insert(0, x) get measurably slower the longer your list gets, even though you’re only inserting one thing? Why does copying a list the obvious way — b = a — produce a bug that only shows up later, in a completely different part of your program?
None of these questions have obvious answers. They require understanding what Python is actually doing in C when you call these methods. And that understanding, once you have it, permanently changes how you write code — which data structures you reach for, when you pre-allocate, when you use a generator instead of a list, and when you avoid lists altogether.
This article works through CPython’s list implementation in layers. We start with the memory model — what a list actually is in RAM — and build toward sorting algorithms, thread safety, and production engineering patterns. Each section ends with a live experiment so you can verify the findings yourself rather than take them on faith.
Section 1: A Python List Does Not Store Your Data — It Stores Addresses
The single most clarifying thing you can learn about Python lists is that they do not contain your values directly. A list stores pointers — 8-byte memory addresses on a 64-bit system, each one pointing to an object that lives somewhere else in the heap. The list itself is just an ordered row of these addresses.
This is implemented in CPython as PyListObject, defined in Objects/listobject.c. The structure has five fields that matter to us: ob_refcnt (the reference count used by Python’s garbage collector), ob_type (a pointer to the type object), ob_size (the current length — what len() reads), allocated (the actual capacity of the internal C array), and ob_item (a pointer to the start of that C array).
Notice that ob_size and allocated are two different numbers. Your list might have 10 items (ob_size = 10) but its internal C array might have room for 16 (allocated = 16). Those 6 extra slots are intentionally reserved but empty. We will see exactly why in the next section — it is the mechanism behind fast appending.
(len=4)
(cap=8)
(ptr)
PyListObject with 4 items and 8 allocated slots. 4 slots are reserved but empty — this is CPython’s over-allocation.
The pointer-based design also explains something that confuses newcomers: why a single list can hold an integer, a string, a float, and another list at the same time. From the list’s perspective, there is no difference between those values — each one is just an 8-byte address. The list does not know or care what is at that address. Whether it points to a PyLongObject or a PyUnicodeObject is the runtime’s problem when you actually read the value.
This same design is also the source of the aliasing bug that catches everyone at least once. When you write b = a where a is a list, Python copies the pointer to the list, not the list itself. Both a and b now hold the same memory address. When you modify the list through b, you are modifying the same object that a still points to. There is no separate copy — there is only one list object, and it now has two names. We will return to this in detail in Section 4.
Section 2: The Formula CPython Uses to Pre-Allocate Memory — And Why It Gives You O(1) Appends
Every time you call append() and the list has no room left, CPython does not simply add one slot. It runs a growth formula that deliberately over-allocates — reserves more space than you just asked for. That surplus is what makes the next several appends essentially free.
The exact formula, taken directly from Objects/listobject.c in CPython 3.11, is:
# CPython list_resize() — the exact C expression:
new_allocated = newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6)
# Breaking it down for newsize = 9 (appending to a full list of 8):
# newsize = 9
# newsize >> 3 = 1 (right-shift by 3 bits = floor(9/8))
# newsize < 9 ? 3 : 6 = 6 (9 is not less than 9, so we use 6)
# new_allocated = 9 + 1 + 6 = 16
The >> 3 is a bitwise right shift, equivalent to floor(n / 8). For large lists, this adds roughly 12.5% extra capacity above whatever was needed. So when a list of 1,000 elements needs to grow, Python allocates room for about 1,125 — not 1,001.
The practical consequence is a growth sequence that looks like this: 0 → 4 → 8 → 16 → 25 → 35 → 46 → 58 → 72 → 88. Each time a reallocation fires, CPython calls C’s realloc() to get a new, larger block of memory, then copies the existing pointer array into it. That copy is an O(n) operation — it takes time proportional to the current list size. But because the growth is geometric, this O(n) event happens at most O(log n) times across n total appends. The average cost per append works out to O(1). This is the formal result called amortized constant time. The cost is real; it just gets spread across many cheap operations.
| A concrete number to rememberOn a 64-bit system, an empty list costs 56 bytes. Each pointer slot adds 8 bytes. A list of 100 items allocates roughly 120 slots: 56 + (120 × 8) = 1,016 bytes — about 9% wasted on over-allocation. That waste is the price of fast appending. |
The live experiment below runs this exact formula as you append. Watch the ‘allocated’ counter — it will jump ahead of ‘len’ at each reallocation event. You can also see the full growth sequence table for the first 100 appends, which shows exactly which n values trigger an O(n) copy.
| n (len) | Allocated | Wasted slots | sys.getsizeof() bytes | Reallocation? |
|---|
If you watched the experiment, you noticed something: reallocation events become progressively less frequent as the list grows. For the first 20 items, you see several. For the next 80, far fewer. That slowing frequency is the geometric growth at work — each successive over-allocation buys you more breathing room than the last.
Section 3: Not All List Operations Are Fast — A Measured Look at What Actually Scales
Knowing that append() is O(1) amortized is useful. But it is incomplete. A Python list exposes about a dozen methods, and their performance varies dramatically. The table below maps each operation to its complexity and explains the C-level reason behind it.
| Operation | Average | Worst | Why |
| list[i] | O(1) | O(1) | Reads pointer at offset i directly from C array |
| append(x) | O(1) amortized | O(n) | O(n) only on realloc; ~12.5% over-alloc makes it rare |
| pop() | O(1) | O(1) | Decrements ob_size; no shifting |
| insert(0, x) | O(n) | O(n) | Shifts all n pointers right via memmove() |
| pop(0) | O(n) | O(n) | Shifts all remaining pointers left |
| remove(x) | O(n) | O(n) | Linear scan for value, then memmove |
| x in list | O(n) | O(n) | No hash table — sequential scan only |
| sort() | O(n log n) | O(n log n) | TimSort; O(n) on already-sorted input |
| list[i:j] | O(k) | O(k) | Copies k = j−i pointer references |
| list + list | O(n+m) | O(n+m) | Allocates new list; copies both pointer arrays |
| len(list) | O(1) | O(1) | Reads ob_size field — not counted |
The entry that surprises most developers is insert(0, x). It feels like a simple operation — add one item at the front. But because CPython stores list items in a contiguous C array, inserting at index 0 requires sliding every existing pointer one slot to the right to make room. On a list of 10,000 items, that is 10,000 pointer moves — 80 kilobytes of memory shifted — for a single insertion. Measured with timeit, inserting at index 0 is roughly 200 times slower than appending to the end at n=10,000, and the gap widens linearly as n grows.
If your program frequently inserts or removes items from the front of a collection, the right tool is not a list — it is collections.deque. A deque uses a doubly-linked list of 64-item fixed blocks under the hood, which gives O(1) appendleft() and popleft() at either end. The tradeoff is that random index access into a deque is O(n) rather than O(1), so the choice depends on your access pattern.
import timeit
from collections import deque
n = 10_000
# Measure 1,000 front-insertions on a list vs a deque
lst_time = timeit.timeit(
stmt='lst.insert(0, 1)',
setup=f"lst = list(range({n}))",
number=1000
)
dq_time = timeit.timeit(
stmt='dq.appendleft(1)',
setup=f"from collections import deque; dq = deque(range({n}))",
number=1000
)
print(f'list.insert(0,x): {lst_time:.4f}s')
print(f'deque.appendleft: {dq_time:.4f}s')
print(f'deque is {lst_time/dq_time:.0f}x faster at n={n}')
Output:
list.insert(0,x): 0.3821s
deque.appendleft: 0.0018s
deque is 212x faster at n=10,000
Typical CPython 3.11 output — not a synthetic benchmark.
The benchmark chart in the experiment below makes the O(1) vs O(n) difference visible as scaling curves. The flat line is append(). The rising line is insert(0, x). They start close together at small n — and diverge dramatically as n climbs toward 50,000.
Y-axis: relative time units. X-axis: n from 500 to 50,000. Flat line = O(1). Rising line = O(n).
Section 4: The Bug That Hits Every Python Developer Once — Why Copying a List Is Not What You Think
We established in Section 1 that a list holds pointers, not values. That fact has a consequence that regularly causes production bugs: assignment does not copy a list. It copies the reference. Two variable names then point to the same object, and a mutation through either name changes the same data.
This is not a bug in Python — it is a deliberate design choice called reference semantics, as opposed to the value semantics of languages like C++ or Rust. Once you understand it, you can work with it cleanly. Until you do, it produces failures that are genuinely hard to trace because the symptom (a got modified) and the cause (you modified b) appear to be unrelated.
import copy
# ─── Case 1: Assignment (alias) ────────────────────────────
# This does NOT create a new list.
original = [1, [2, 3], [4, 5]]
alias = original # both point to the same object
alias[0] = 99
print(original[0]) # 99 ← original was changed
print(id(alias) == id(original)) # True ← same object in memory
# ─── Case 2: Shallow copy ──────────────────────────────────
# The outer list is new. The inner objects are still shared.
original = [1, [2, 3], [4, 5]]
shallow = original.copy() # equivalent to original[:]
shallow[0] = 99 # safe — outer element replaced
shallow[1].append(99) # DANGER — inner list is shared
print(original) # [1, [2, 3, 99], [4, 5]] ← changed!
# ─── Case 3: Deep copy ─────────────────────────────────────
# Every object at every depth is independently duplicated.
original = [1, [2, 3], [4, 5]]
deep = copy.deepcopy(original)
deep[1].append(99)
print(original) # [1, [2, 3], [4, 5]] ← untouched
print(id(deep[1]) == id(original[1])) # False ← different objects
The shallow copy case is the one that bites hardest, because it partially works. You get a new list object, so changes to the outer structure are independent. But the inner lists [2, 3] and [4, 5] are the same objects in both original and shallow. The list holds pointers to them, and both copies hold the same pointers. When you append to shallow[1], you are calling append() on the actual inner list object — the one original[1] also points to. The outer boundary is independent; the interior is shared. This is exactly what ‘shallow’ means.
The decision between .copy() and copy.deepcopy() should be driven by the structure of your data. If your list contains only immutable objects — integers, strings, tuples — a shallow copy is functionally identical to a deep copy, because there is nothing to share-and-mutate. The distinction only matters when your list contains mutable inner objects. When it does, and when you need true independence, deepcopy is the correct tool — but be aware that it recursively traverses the entire object graph, which is expensive for deeply nested structures.
| The mutable default argument — a related trap def fn(data=[]): data.append(1); return data — Python creates the default list once, at function definition time, shared across all calls. Calling fn() three times returns [1], [1,1], [1,1,1]. The fix: use None as the default and create the list inside: if data is None: data = []. This is one of the most cited Python gotchas and it comes directly from reference semantics. |
Stack = variable names. Heap = actual objects in memory. Arrows = pointers (references).
Section 5: TimSort — The Sorting Algorithm Inside list.sort(), and Why It Was Built for Real Data
When you call list.sort(), CPython runs TimSort — a hybrid sorting algorithm designed by Tim Peters in 2002, specifically for Python’s workloads. It is not a standard textbook algorithm. It was engineered by observing the kinds of data that actually need sorting in real programs and finding an algorithm that exploits the structure those datasets tend to have.
The central observation that motivated TimSort is that real-world data is rarely fully random. A list of log entries is almost sorted — there might be a few out-of-order timestamps, but most are in sequence. A list of database records comes partially ordered by whatever index the query used. A list of sensor readings trends upward or downward. Random permutations are the least common case in production code, yet most sorting algorithms are designed to handle them as the typical case.
How TimSort works in two phases
Phase 1 — Run detection: TimSort makes a single left-to-right pass over the input, looking for runs — sequences that are already sorted in ascending or descending order. Descending runs are reversed in O(k) time. Runs shorter than a minimum size (typically 16 to 64 elements, determined by the input length) are extended to that minimum size using binary insertion sort, which is fast for small arrays.
Phase 2 — Merge: Detected runs are pushed onto a stack. TimSort merges adjacent runs according to invariants that keep the stack depth at O(log n), and crucially, it merges in a way that preserves stability — equal elements always remain in their original relative order. This stability guarantee is what makes list.sort() safe for multi-key sorting patterns like sorting first by age and then by name.
The payoff of this design is striking: on an already-sorted list of 1,000,000 elements, TimSort runs in O(n) — it makes one pass, detects the entire list as a single natural run, and does no merge work at all. On random data it degrades gracefully to O(n log n), equivalent to merge sort, but with lower constant factors because insertion sort handles small runs more efficiently than merge sort would.
| TimSort vs Quicksort — why CPython does not use Quicksort Quicksort is fast on random data but has two weaknesses CPython cares about: it is not stable, and its O(n²) worst case can be triggered by adversarial input. TimSort is always O(n log n) worst case and is stable. Java, V8 JavaScript, and Rust’s standard library also use TimSort or a close variant for exactly these reasons. |
The experiment below runs a simplified TimSort on 20 elements. Choose ‘already sorted’ or ‘multiple natural runs’ to see Phase 1 detect the runs with no merge work. Choose ‘random’ to see both phases fire. The execution log shows every step.
20 elements. Colors show algorithm phase. Log shows every detected run and merge operation.
sort() vs sorted() — a distinction that matters more than it looks
Both sort() and sorted() use TimSort internally. The difference is behavioral: sort() modifies the list in-place and returns None. sorted() leaves the original untouched and returns a new sorted list. That return-value difference produces one of Python's most common silent bugs:
data = [5, 2, 8, 1, 9]
# The common mistake: assigning the result of sort()
sorted_data = data.sort()
print(sorted_data) # None ← data.sort() returns None, not the list
print(data) # [1, 2, 5, 8, 9] ← the list itself was sorted
# The correct approach when you need the return value:
sorted_data = sorted(data) # sorted() returns a new list
print(sorted_data) # [1, 2, 5, 8, 9]
print(data) # [5, 2, 8, 1, 9] ← original unchanged
# Stability demonstrated: sort by age; Alice comes before Eve (both 30)
people = [('Alice', 30), ('Bob', 25), ('Eve', 30)]
people.sort(key=lambda p: p[1])
print(people) # [('Bob', 25), ('Alice', 30), ('Eve', 30)]
# Alice still precedes Eve — equal elements preserved original order
Section 6: The GIL Does Not Make Your List Code Thread-Safe — Here Is What It Does and Does Not Protect
Python's Global Interpreter Lock (GIL) ensures that only one thread can execute Python bytecode at any given moment. This causes a reasonable but wrong inference: 'If only one thread runs at a time, my list operations are safe.' They are safe in some cases and dangerously unsafe in others, and the line between them is subtle.
Single-bytecode operations are effectively atomic under the GIL. When you call list.append(x), that compiles to a single CALL_METHOD bytecode. The GIL is held for its entire execution. No other thread can interrupt it mid-operation. The same is true for list.pop(), reading list[i], and writing list[i] = x.
The problem arises with read-modify-write patterns — operations that span multiple bytecodes. lst[i] = lst[i] + 1 is three distinct bytecodes: read lst[i], add 1, write back. Between any two bytecodes, the GIL can be released and another thread can execute. If two threads run this simultaneously on the same list slot, one of the increments gets lost:
CODE BLOCK — PYTHON (RACE CONDITION)
import threading
shared = [0]
def unsafe_increment():
for _ in range(100_000):
shared[0] = shared[0] + 1 # 3 bytecodes: READ, ADD, WRITE
# GIL can release between READ and WRITE — race condition
threads = [threading.Thread(target=unsafe_increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(shared[0])
# Expected: 400,000 Actual: typically 315,000–390,000 (varies each run)
# ── Safe version ────────────────────────────────────────────
import threading
lock = threading.Lock()
safe = [0]
def safe_increment():
for _ in range(100_000):
with lock:
safe[0] += 1 # lock ensures READ-ADD-WRITE is atomic
The result is non-deterministic — you get a different wrong answer each time you run it. That is the signature of a race condition. The fix is explicit locking: wrap the entire read-modify-write operation in a threading.Lock() context manager so it executes as an atomic unit regardless of GIL scheduling.
| CPython 3.13 changes this picture Python 3.13 introduced an experimental free-threaded build (--disable-gil, specified in PEP 703). In this build, even single-bytecode list operations like append() lose their atomicity guarantee. If you upgrade to CPython 3.13 free-threaded, audit every shared list for explicit locking. The GIL was both a performance constraint and a safety blanket — removing it gives you real parallelism and real data races. |
Section 7: Why List Comprehensions Are Faster Than For-Loops — The Bytecode Explanation
The claim that list comprehensions are faster than equivalent for loops is widely repeated, but usually without explanation. It is not a runtime optimization or a caching effect. It has a specific, verifiable cause at the bytecode level: the LIST_APPEND opcode.
When you write a list comprehension, Python compiles it into a dedicated code object that uses LIST_APPEND — a single bytecode that calls PyList_Append() in C directly, bypassing Python's method resolution protocol entirely. When you write an explicit for loop with result.append(x), Python must: (1) load the result variable, (2) perform LOAD_ATTR to look up the append attribute on the list object, (3) call it as a Python function with full call overhead. That attribute lookup happens on every single iteration.
You can inspect this yourself with Python's built-in dis module, which disassembles bytecode:
import dis
def loop_version(n):
result = []
for x in range(n):
result.append(x * 2)
return result
def comp_version(n):
return [x * 2 for x in range(n)]
dis.dis(loop_version)
# Each iteration of loop_version generates roughly:
# LOAD_FAST 'result' ← load the list every iteration
# LOAD_ATTR 'append' ← attribute lookup every iteration (slow)
# LOAD_FAST 'x'
# BINARY_OP '*'
# CALL 1 ← full Python function call overhead
dis.dis(comp_version)
# Each iteration of comp_version generates roughly:
# LOAD_FAST '.0' ← iterator pre-loaded once
# FOR_ITER
# BINARY_OP '*'
# LIST_APPEND 1 ← C-level, no attribute lookup, no call
Measured with timeit on CPython 3.11 at n=10,000 over 1,000 iterations: the comprehension runs in about 0.51 seconds versus the loop's 0.85 seconds — a difference of roughly 40%. This is consistent and reproducible across Python versions.
One important nuance: map() with a built-in function (not a lambda) can beat both. list(map(str, range(n))) outperforms [str(x) for x in range(n)] because str is a C type constructor — calling it has zero Python function overhead. But map() with a lambda adds Python call overhead through the lambda object, which typically makes it slower than the comprehension equivalent. The ordering in practice is: map(builtin, ...) > comprehension > map(lambda, ...) > explicit loop.
Section 8: Production Patterns — When Not to Use a List and What to Use Instead
Everything covered so far has been explanatory. This section is practical. These are the decisions that distinguish Python code written by someone who knows the internals from code written by someone who does not.
Pre-allocate when you know the final size
If you know in advance how many items a list will hold, building it with [default] * n is dramatically faster than appending n times. Multiplication allocates the entire block in one malloc call, sets ob_size = n and allocated = n directly, and fills the pointer array at C speed. Zero reallocation events. At n=100,000, this runs in about 0.4ms versus 8ms for the append loop — twenty times faster.
n = 100_000
# Slow: triggers 17 reallocation events at various growth points
result = []
for i in range(n):
result.append(i) # 0.4ms + ~8ms total
# Fast: single malloc, no reallocations, then fill via index
result = [None] * n
for i in range(n):
result[i] = i # pure O(1) pointer writes, no copies
# Fastest for uniform data: list multiplication
zeros = [0] * n # 0.4ms — CPython uses memset at C level
Use a generator when you only need one pass
A list comprehension materializes all results into memory immediately. A generator expression produces one value at a time and holds only the current state — no list is ever built. If you only need to iterate once (to compute a sum, find a maximum, or feed a pipeline), a generator uses O(1) memory regardless of how much data you process:
import sys
# Builds the entire list in memory before doing anything
lst = [x * x for x in range(1_000_000)]
print(sys.getsizeof(lst)) # ~8,700,000 bytes — 8.3 MB
# Yields values one at a time — never builds the list
gen = (x * x for x in range(1_000_000))
print(sys.getsizeof(gen)) # 192 bytes — regardless of n
# For a single-pass aggregation, always prefer the generator
total = sum(x * x for x in range(1_000_000)) # 192 bytes, not 8.3MB
Use NumPy or the array module for large numerical data
A Python list of floats is memory-inefficient by a large factor. Each Python float is a PyFloatObject: 8 bytes for ob_refcnt, 8 bytes for ob_type, 8 bytes for the actual double value — 24 bytes minimum, plus the 8-byte list pointer = about 32 bytes per float. A NumPy float64 array stores raw 8-byte IEEE 754 doubles in a contiguous block: 8 bytes per float, no overhead per element. The memory ratio is roughly 8 to 1, and the computational gap is larger still — NumPy operations execute in vectorized C or Fortran without any Python loop overhead.
import sys
from array import array
import numpy as np
n = 1_000_000
data_list = [1.0] * n
data_array = array('d', [1.0] * n) # 'd' = C double (8 bytes each)
data_numpy = np.ones(n, dtype=np.float64)
print(sys.getsizeof(data_list), 'bytes') # ~8,697,456 (pointers + objects)
print(sys.getsizeof(data_array), 'bytes') # 8,000,112 (raw C doubles)
print(data_numpy.nbytes, 'bytes') # 8,000,000 (raw C doubles)
# Rule of thumb:
# Working with > 10,000 numbers of the same type? Use numpy.
# Working with > 10,000 numbers, need fast iteration but not math? Use array.
# Working with mixed types or dynamic structure? Use a list.
Frequently Asked Questions
PyListObject). The list stores a contiguous C array of PyObject* pointers — not the objects themselves. When appending beyond capacity, CPython runs: new_allocated = newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6). This over-allocates by roughly 12.5% to achieve O(1) amortized appends. Reallocation points for the first 100 appends: 4, 8, 16, 25, 35, 46, 58, 72, 88 — only 9 O(n) copy events across 100 insertions.memmove() in C — O(n) proportional to list length. CPython stores items in a contiguous C array, so there is no shortcut. For workloads with frequent front-insertions, collections.deque provides O(1) appendleft() because it uses a doubly-linked list of 64-item blocks rather than a single contiguous array.list.sort() uses TimSort — a hybrid of merge sort and insertion sort designed by Tim Peters in 2002. TimSort is stable (equal elements preserve original order), achieves O(n) on already-sorted data by detecting natural runs, and O(n log n) in the average and worst cases. Java's Arrays.sort() for objects also adopted TimSort since Java 7.list.append(), list.pop(), and element access are individually atomic under CPython's GIL. Multi-step read-modify-write patterns like lst[i] = lst[i] + 1 are not atomic across threads and require explicit threading.Lock(). In CPython 3.13+ free-threaded builds (PEP 703, --disable-gil), even single-bytecode operations require explicit synchronization.sys.getsizeof([])) uses 56 bytes on 64-bit CPython 3.11. Each pointer slot is 8 bytes. The first append() triggers allocation for 4 slots (88 bytes total). Growth reallocation points occur at n = 4, 8, 16, 25, 35, 46, 58, 72, 88... Formula per reallocation: new_allocated = n + (n >> 3) + (n < 9 ? 3 : 6). For n=1000, allocated is approximately 1079 — 7.9% over-allocation.list.copy() or list[:]) creates a new list object but copies only the pointers — the inner objects are shared between both lists. If your list contains other mutable objects like nested lists, dicts, or custom objects, modifying them through either copy affects both. A deep copy (copy.deepcopy()) recursively traverses the entire object graph and creates independent copies of every object encountered, producing fully isolated data at every depth.References and Further Reading
The following are primary sources. These are not link-for-the-sake-of-linking — they are the actual documents that form the basis of every claim in this article. Open them when you want to verify something or go deeper.
- CPython source: Objects/listobject.c. The C implementation of list_resize(), over-allocation formula, and all list methods. Read this alongside this article. https://github.com/python/cpython/blob/main/Objects/listobject.c
- Python FAQ: How are lists implemented?. The official Python FAQ entry on list internals. Concise and authoritative. https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython
- Python Wiki: Time Complexity. The definitive reference for O() costs of every Python container operation. https://wiki.python.org/moin/TimeComplexity
- Tim Peters: listsort.txt (original TimSort specification). Tim Peters's own notes explaining the design decisions behind TimSort. Readable and historically significant. https://svn.python.org/projects/python/trunk/Objects/listsort.txt
- PEP 703: Making the GIL Optional. The formal specification for free-threaded CPython (Python 3.13+). Essential if you are writing concurrent code. https://peps.python.org/pep-0703/
- Real Python: Lists and Tuples. A complementary practical guide that pairs well with this article's internals focus. https://realpython.com/python-lists-tuples/

Leave a Reply