May 16, 2024

Efficient Python Data Handling with MemoryView

By Alyce Osbourne

The topic of working with low-level buffers may not come up often in Python, but there are occasions when the application we are building requires it. Whether it involves using serializers, interpreters, or working with sockets, we need to find efficient ways to slice, reshape, and modify buffers without causing inefficiency in our program.

Breaking down the problem

Slicing and manipulating bytes and bytearrays typically requires creating copies, which can significantly increase the memory usage of your applications as the data size grows. Additionally, it can extend the overall runtime as the data needs to be reassembled.

Your initial thought might be to use external libraries like numpy, which is a common choice. However, in some cases, adding multiple dependencies may not be feasible or preferred.

So let’s take a look at how we might approach this in Python without any libraries.

MemoryView

A memoryview is a built-in object that allows you to access an underlying object’s buffer interface without the need to create duplicate copies of the data. This feature makes it ideal for applications that require the efficient handling of large amounts of data.

Any modifications made to the buffer will reflect in the original object, enabling easy interpretation, slicing, and alterations to the buffer. The functionality of these buffers is similar to that of numpy arrays, but with more restricted capabilities.

What can be used as buffers?

The main objects you are likely to use with a memoryview are:

bytes
bytearray
array,array
ctypes arrays

When might I need to use this?

There are lots of cases where you might be working with streams of data:

Web Applications
Device Interfaces
Multimedia Editing
Interpreters
GPU Rendering

While there are many libraries that help facilitate these, sometimes the application you are writing involves directly working with this data, and in high-speed or high-volume environments, making duplicates of data can come at a cost.

Using Memoryview with Bytearrays

Let’s see how we can use memoryview to manipulate a bytearray:

Creation: You first need to create a memoryview from the bytearray.

arr = bytearray(b"capybara")
view = memoryview*(arr)

Slicing: You can slice the memoryview, which doesn’t create a copy of the sliced data.

capy = view[0:3]
bara = view[4:-1]

Modifying: Changes made to the memoryview are reflected in the original bytearray.

view[0: 3] = b"Bara"

Shaping: You can also shape a memoryview, but its functionality is very limited compared to numpy.

reshaped = view.cast('c', (4, 2))
reshaped[(0, 0)] = b'c'
reshaped[(0, 1)] = b'a'
reshaped[(1, 0)] = b'p'
reshaped[(1, 1)] = b'y'

This will allow modify the array much like before, but using tuple indices.

Note

The memoryview does not support multi-dimensional subviews, so when you reshape it, it does not generate a list of subviews structured like a nested list. Instead, the indices are utilized to calculate the strides in order to locate the index in the array. Valid

for y in range(4):
   for x in range(2):
     print(reshaped[(y, x)].decode(), end='')
 print()

Invalid

for y in range(4):
  print(reshaped[y].decode())

This is one of the ways it’s much more limited than numpy.

Buffer objects

In Python 3.12, it’s now possible to create objects that are compatible with the buffer protocol. This enables the creation of wrappers for memoryview in order to implement customized functionality.

class BufferedObject:
    def __init__(self, data: bytes):
        self.data = bytearray(data)
        self.view = None

    def __buffer__(self, flags: inspect.BufferFlags) -> memoryview:
        if flags != inspect.BufferFlags.FULL_RO:
            raise TypeError("Only BufferFlags.FULL_RO supported")
        if self.view is not None:
            raise RuntimeError("Buffer already held")
        self.view = memoryview(self.data)
        return self.view

    def __release_buffer__(self, view: memoryview):
        self.view = None
        view.release()

    def extend(self, b: bytes) -> None:
        if self.view is not None:
            raise RuntimeError("Cannot extend held buffer")
        self.data.extend(b)

To implement a class that works with Python’s buffer protocol

__buffer__: This method is used to initialize and return a memoryview object. It receives inspect.BufferFlags. memoryview passed the inspect.BufferFlags.FULL_RO flag, so this is the flag to check for in this scenario.

__release_buffer__: This method is called when a buffer is no longer needed. The buffer argument is a memoryview object that was previously returned by buffer. All clean-up associated with the buffer must be done in this method. If no special clean-up is needed, then this method need not be implemented.

You can also type hint for a Buffer class by using the collections.abc.Buffer type.

buffer: Buffer = BufferedObject("capybara")

Final thoughts

As you can see, memoryview, the buffer protocol and Buffer objects provide a powerful and Pythonic tool for interacting with low-level C buffer objects without the need to duplicate data. This is especially powerful in situations where you are working with a high volume of data and external systems.

Efficient Python Data Handling with MemoryView

Breaking down the problem

MemoryView

What can be used as buffers?

When might I need to use this?

Using Memoryview with Bytearrays

Note

Buffer objects

Final thoughts

Improve your code with my 3-part code diagnosis framework

Recent posts

Python Pickle Risks and Safer Serialization Alternatives

Environment Variables & Dotfiles for Secure Projects

Python Doc Generation Made Easy With PDoc