Using Trio inside Jupyter notebook?

mehaase · July 16, 2019, 9:23pm

Jupyter has built-in support for trio so you can run coroutines inside a cell (doesn’t have to be inside an async def).

This is pretty nifty, but its usefulness is limited by structured concurrency. For example, if I have a connection object that relies on a background task (like a Trio Websocket), then that object’s lifetime has to exist entirely inside of a nursery. Jupyter does not run cells concurrently, which means I can’t create a connection object in one cell and use it another. Here’s an example:

In the second cell, I want to create a connection. The connection needs a nursery to spawn a background task into, but where can I obtain such a nursery from? In ordinary code, I would create a new nursery:

But this nursery will not exit until all tasks finish, including the connection’s background task. This means I can’t use the connection in any other cell.

Of course, this is exactly how Trio is supposed to work! It’s structured concurrency after all. That’s why I’m posing this here instead of GitHub. Has anybody tried to do something like this and are there any tips for making it work? I’m willing to use a hacky solution, since this is just for experimenting and not for production code.

I tried some various sketchy ideas but nothing worked:

I think this probably requires some support in Jupyter itself, e.g. a global nursery that is created for you. (Jupyter’s trio integration uses trio.run() on each cell, which is also a big blocker, I think…)

mehaase · July 17, 2019, 2:38pm

I looked into this a bit more, and I shouldn’t have been surprised to find that Nathaniel has already commented on this over on the IPython project:

I opened up a new issue to track this:

zthompson47 · August 19, 2019, 10:03pm

I use IPython in the terminal for most things, but I’d love to see it supported in Jupyter notebooks. I’d probably use notebooks a lot more with more Trio support.

As an alternative to a global nursery object, there could be magic to run notebook cells concurrently as tasks. You could open a nursery block which runs in the background as you edit other notebook cells:

In [1]: %task async with trio.open_nursery() as nursery:
            nursery.start_soon(ws_reverse_server, 8888)

I also saw a feature request in ipython to “Create a %with magic”, which would expose the context manager variable to the interactive shell. Extending that idea, I could see a cell with a magic background task, running a context manager with an interactive prompt:

In [2]: %task async with open_websocket_url("ws://localhost:8888") as ws:
        >>> await ws.send_message(b"Hello, world.")
        >>> await ws.get_message()
        b'.dlrow ,olleH'
        >>>

Not really sure how viable these ideas are, but it’s interesting to consider the possibilities…

I tried some approaches to getting your example code to work with the current IPython terminal and came up with a few results (one of them actually works).

The first one was:

import atexit
from functools import partial

import IPython
import trio


def ipython_embed(nursery):
    class NurseryWrapper:
        def __init__(self, nursery):
            self._nursery = nursery

        def start_soon(self, fn, *args):
            trio.from_thread.run_sync(self._nursery.start_soon, fn, *args)

    nursery = NurseryWrapper(nursery)
    IPython.embed()

    # Avoid ipython-history-sqlite3-threading error
    atexit._run_exitfuncs()


async def main():
    async with trio.open_nursery() as nursery:
        await trio.to_thread.run_sync(ipython_embed, nursery)
        nursery.cancel_scope.cancel()


trio.run(main)

This one works for some really basic stuff (i.e. run a background task that prints and sleeps in a loop), but the nursery can’t be used with %autoawait, and it’s probably broken in many other ways.

Next, I started looking at the ipython code and got confused. I decided to write my own REPL to get a feel for what kind of patterns I’d expect to recognize in the ipython code. In particular, I was curious about integration with Python Prompt Toolkit. I made a gist with two examples that sort of work (one for prompt-toolkit 2 and one for version 3). The script for version 2 seems more stable. I think they both end up losing the ability to print to stdout after a while, so not really usable…

Anyway, the most successful attempt was just forking prompt-toolkit and making it work with Trio natively. Version 3 of prompt-toolkit is asyncio native, so I just went in with brute force and put Trio code where I saw asyncio code. The Trio REPL example script can do this (with syntax hilighting!):

>>> import trio_websocket                                                                  
>>> async def tock(n=5):                                                                   
...     for i in range(n):                                                                 
...         print(i + 1)                                                                   
...         await trio.sleep(i + 1)                                                        
...                                                                                        
>>> await tock(3)                                                                          
1
2
3
>>> conn = await trio_websocket.connect_websocket_url(nursery, "ws://localhost:8888")      
>>> nursery.start_soon(tock, 42)                                                           
1
2
3
4
>>> await conn.send_message("Hello, world.")                                               
5
6
>>> await conn.get_message()                                                               
.dlrow ,olleH
>>>

I think I’ll continue working on this idea in the context of the Trio monitor, the maintenance of which is an open issue. I’m hoping that this work could eventually be a path towards Trio support in Jupyter notebooks.

Thanks for the inspiration to look into all this stuff! I’ve been curious, but didn’t take a serious look until you made this post.

njs · August 20, 2019, 4:54am

@zthompson47 I know there’s still a lot of work to do, but that’s SUPER AWESOME. A trio-enabled ptt would be a tremendous step towards trio REPL, trio monitor, being able to ssh into your trio server to poke around and debug it…

Carreau · August 21, 2019, 11:45am

Hey there,

I saw Nathaniel ping from twitter, so a couple of notes:

I’d like to avoid magics when we can use Python constructs. There will also anyway be issues if you try to run cell concurrently as there are some assumptions cells run in order in the jupyter protocol.
The notebook and CLI implementation of async differs; in particular the kernel when using a notebook has a persistent running eventloop (tornado), I think it might be easier to just swap that for trio at startup to get native trio features.
Currently Terminal IPython start and stop the event loop between each user input, so no BG task can run when waiting for user input. I’ve started to work on this some time ago.

Yes the IPython codebase is quite complex, if you need pointers, feel free to open issues (I’m most watching ipython/ipython); on GitHub, I’ll do my best when I have time to try to explain it.

I would also strongly suggest for you to do all those experiments with Python 3.8 which gained the ability to compile top-level await, so no need to do all those ast and source code munging.

I’m not having much time to hack on it these days; so PR welcome I’ll do my best to review when I can.

mehaase · January 14, 2020, 5:40pm

Awesome @zthompson47!

I started messing around with this, except I’m focusing on ipykernel instead of IPython. I uploaded two notebooks to Gist.

The first notebook tries to illustrate the problem with a very minimal example of a server task that I want to stay alive even after its cell finishes executing.

The second notebook demonstrates a possible solution that makes sense to me, but I haven’t seen anybody else mention it: run Trio in a background thread and dispatch coroutines using a combination of trio.hazmat.Token.run_sync_soon() and trio.hazmat.spawn_system_task(). It seems to work! This is conceptually simpler (to me, anyway) than replacing the Tornado event loop, as @Carreau mentioned above, but I’m not 100% sure that this is a sound idea.

Here are a few problems I’ve identified. (Please let me know if you think of problems I didn’t list here.)

It’s probably bad to spawn a system task for an async cell, since an uncaught exception will crash the whole event loop.
Sync and async code will run in different threads, because ipykernel only calls the Trio runner for async code. This may have unintended consequences? Thread locals, for example, would obviously break.
From the notebook’s point of view, each async cell finishes executing immediately, but as a user we want some control over when the cell finishes. Some async cells should wait until the coroutine finishes before completing, and other cells should finish while the coroutine keeps running in the background.
Most Trio code won’t work inside a synchronous cell. For example, a cell that contains just trio.hazmat.current_trio_token() will fail because it runs on the main different thread (see previous bullet) and will complain about not being called in an async context. (But confusingly, if you add an await expression anywhere in the cell, then it will work again because it will be scheduled on the Trio thread.)

I don’t think these are unsolvable problems though, if you all think that the core idea is worth working on.

oremanj · January 14, 2020, 9:11pm

trio-asyncio uses the same approach (run the event loop in a different thread) to deal with supporting asyncio code that wants to start and stop the loop multiple times (in other words, “Trio on top of asyncio” as opposed to “asyncio on top of Trio”). It… mostly works? A small percent of the stock asyncio tests fail when using this approach, almost certainly for thread-related reasons. But it’s not a showstopper by any means.

I’ve recently been looking into possibilities for multiplexing different event loops on the same thread by putting each one in a greenlet. I think this would be more reliable in the long run than using different threads, but it requires either a bit of support from Trio or some monkeypatching.

Example of the monkeypatching/internals-accessing approach: https://github.com/oremanj/trio-asyncio/blob/41a339748248e93d08f246998cadf76a1070320a/trio_asyncio/sync.py

Issue for the Trio-side primitive needed to do this without such trickery: https://github.com/python-trio/trio/issues/1358

mehaase · January 15, 2020, 10:30pm

My coworker and I have hacked up a kinda-working implementation of this in IPython. Here’s an example notebook that walks through what it is and how it works. The complete code is here.

We are interested in feedback, especially @njs and @Carreau.

Carreau · January 15, 2020, 11:03pm

Quick note; and I haven’t looked deep into the notebooks and examples:

Issue with threads is that users end up running code that does not like to not be on the main thread; (indirectly for example opencv…) and hence why we avoid running anything in threads if we can.
It might be possible to run IPykernel using trio natively and basically alway go through the trio runner. But that will requirer some code change in IPykernel. It would require to restart the kernel to change eventloop; but is that really and issue ?

If you feel changes need to be made (even experimental, behind a flag) in IPython, feel free to send pull-requests.

njs · January 16, 2020, 6:09am

Even regular Python code expects to run in the main thread, because only the main thread gets KeyboardInterrupts. But that doesn’t stop us from using separate threads for executing code and for running the kernel/zeromq/tornado stuff; it just means we have to put the kernel/zeromq/tornado stuff inside a child thread, instead of vice-versa.

I guess this is basically suggesting that we have separate kernels python, python-asyncio, python-trio, etc.? That could work, but I feel like people would find it much more pleasant to pick the eventloop with a %magic command, instead of having to pick when you start the kernel?

mehaase · January 16, 2020, 2:40pm

It’s not an issue for me personally. In the same way that you can currently choose between “Python 2” and “Python 3” when you select a kernel, would it make sense to have an option like “Python 3 Trio”?

Ahh, okay, this is good to know. I guess my next goal should be switching it up so that Trio is on main thread and rest of kernel is on background thread.

mehaase · January 24, 2020, 9:03pm

My coworker and I have implemented this “long-lived Trio loop on main thread” idea in a pair of PRs for IPython and ipykernel.

Looking forward to your feedback, thanks!

mehaase · April 1, 2021, 2:08am

I recently made a new package trio-jupyter that installs a Jupyter kernelspec with Trio-mode enabled. This simplifies the process of setting up the Trio integration I described above. GitHub - mehaase/trio-jupyter: Jupyter kernelspec with Trio integration

Topic		Replies	Views
Concurrency and trio as implmentation detail in blocking/sync method	7	2355	April 15, 2021
Talk: Productive concurrency with Trio (PyCon JP) Show off your work	2	2260	September 18, 2019
Trio with TKinter UI? Help and advice	1	1054	July 23, 2020
Sleeping whether or not async Help and advice	1	708	August 2, 2020
Trio future library - feedback welcome! Show off your work	3	1178	December 15, 2020

Using Trio inside Jupyter notebook?

Related topics