Eliot, the causal logging library, now supports Trio

Python’s built-in logging and other similar systems output a stream of factoids: they’re interesting, but you can’t really tell what’s going on.

  • Why is your application slow?
  • What caused this code path to be chosen?
  • Why did this error happen?

Standard logging can’t answer these questions.

But with a better model you could understand what and why things happened in your application. You could pinpoint performance bottlenecks, you could understand what happened when, who called what.

That is what Eliot does. eliot is a Python logging system that outputs causal chains of actions : actions can spawn other actions, and eventually they either succeed or fail . The resulting logs tell you the story of what your software did: what happened, and what caused it.

The latest version of Eliot (1.8) has support for Trio (via contextvars), so you can trace logs across Trio’s coroutines. You can see an example of how this works here: https://eliot.readthedocs.io/en/stable/generating/asyncio.html#trio-example

For more about Eliot see https://eliot.readthedocs.io/en/stable/

1 Like

Sweet!

I think this bit at the top of that page might be referring just to asyncio, and didn’t get updated when trio was added?

On Python 3.7 or later, no particular care is needed. For Python 3.5 and 3.6 you will need to import either eliot (or the backport package aiocontextvars) before you create your first event loop.

It might be interesting in the example to also show what happens if you put the action inside the nursery, around the start_soon calls, and explain the difference. (Well, I’m curious what Eliot will do here anyway :-). And I’ve noticed people get confused about this in other cases, e.g. using a with semaphore to limit how many calls to start_soon can happen at a time, and not understanding why that doesn’t limit the number of tasks.)

Thanks for the feedback!

  1. Trio doesn’t need aiocontextvars, yeah.
  2. The top-level action will end as soon as the start_soon calls are scheduled and the context manager exists, because it doesn’t know to wait for the nursery to end. And since the top-level action will end before the presumed child actions start, the child actions will be independent top-level actions.

In an ideal world the Eliot support would be built-in to Trio, and then the nursery context manager would ensure the continuity of actions… :wink:

I filed an issue to cover both points in the docs.

That makes sense. I wasn’t sure if eliot had a way to link together actions beyond nesting, like “this action was kicked off by that other action”. It’s a much more common pattern in Twisted/asyncio than it is in Trio, so it seemed like maybe it would…

In asyncio, if I write:

async def f():
    with start_action("f"):
        await sleep(random.random())

async def g():
    with start_action("g"):
        asyncio.create_task(f())
        await sleep(0.5)

Then there’s a 50% change that the "f" action will be nested within the "g" action, and a 50% chance that they’ll be independent actions?

You can explicitly pass the action object around, and there’s an API variant that doesn’t automatically stop the action (https://eliot.readthedocs.io/en/stable/generating/actions.html#non-finishing-contexts), so it is possible.

For the asyncio example, the random sleep is irrelevant to nesting. The nesting is determined based on the time start_action is called. So whether or not it’s independent actions depends on how asyncio contextvars integration works—i.e. whether the started task counts as an independent context or inherits the context.

I believe that asyncio does the same thing as Trio: the task gets an independent context, but the initial contents of that context are inherited from the parent.

To check if I’m understanding right: this means that if we have this Trio program:

async def say(message, delay):
    with start_action(action_type="say", message=message):
        await trio.sleep(delay)

async def main():
    async with trio.open_nursery() as nursery:
        with start_action(action_type="main"):
            nursery.start_soon(say, "hello", 1)
            nursery.start_soon(say, "world", 2)
            await trio.sleep(0)

Then the say actions will be nested inside the main action if, and only if, the Trio scheduler decides to schedule the say tasks before it resumes the main task?

Actually I realized I could easily just try it and see :slight_smile:

And indeed, this code gives non-deterministic output:

from eliot import start_action, to_file
import trio

to_file(open("trio.log", "w"))

async def say(message, delay):
    with start_action(action_type="say", message=message):
        await trio.sleep(0)

async def main():
    async with trio.open_nursery() as nursery:
        with start_action(action_type="main"):
            nursery.start_soon(say, "hello", 1)
            nursery.start_soon(say, "world", 2)
            await trio.sleep(0)

trio.run(main)

Sometimes I get:

f42a3fad-cdb9-4a11-bc6d-db8592c828d3
└── main/1 ⇒ started 2019-04-25 22:19:27 ⧖ 0.000s
    ├── say/2/1 ⇒ started 2019-04-25 22:19:27 ⧖ 0.000s
    │   ├── message: world
    │   └── say/2/2 ⇒ succeeded 2019-04-25 22:19:27
    ├── say/4/1 ⇒ started 2019-04-25 22:19:27 ⧖ 0.000s
    │   ├── message: hello
    │   └── say/4/2 ⇒ succeeded 2019-04-25 22:19:27
    └── main/3 ⇒ succeeded 2019-04-25 22:19:27

and sometimes I get:

29695784-e8bc-4a02-9d12-5e2d3391e31f
└── main/1 ⇒ started 2019-04-25 22:19:25 ⧖ 0.000s
    └── main/2 ⇒ succeeded 2019-04-25 22:19:25

29695784-e8bc-4a02-9d12-5e2d3391e31f
└── <unnamed>
    ├── say/3/1 ⇒ started 2019-04-25 22:19:25 ⧖ 0.000s
    │   ├── message: hello
    │   └── say/3/2 ⇒ succeeded 2019-04-25 22:19:25
    └── say/4/1 ⇒ started 2019-04-25 22:19:25 ⧖ 0.000s
        ├── message: world
        └── say/4/2 ⇒ succeeded 2019-04-25 22:19:25

It’s confusing that eliot-tree is claiming there are two unrelated events that happen to have the same UUID…

(Also confusing: on my terminal eliot-tree is printing the UUID in white-on-white.)

Bonus mystery: why do I never get one of the say's inside the main, and the other attached to unnamed task? It seems to always be both or neither.

White-on-white might just be that it doesn’t look at current terminals’ colors? You can file a bug: https://github.com/jonathanj/eliottree

As far as the two trees: if you look at locations in the tree (main/1, main/2, say/3/1, say/4/1), really those should be one tree.

They are displayed different trees in eliot-tree output as an artifact of how the parser works: it see the start message and finish message for the tree for a particular UUID, there are no missing messages in between, so it assumes it’s done and outputs the finished tree. (Basically you shouldn’t really be adding child actions to a finished action, though it is possible.)

Potential solutions:

  1. Add Eliot support to Trio.
  2. Eliot provides wrappers of Trio APIs.
  3. Add buffering to the parser, where it only outputs actions after it’s parsed everything. This would have obvious memory usage problems.
  4. Documentation saying “always put start_action outside of async context managers”.
  5. API for wrapping async context managers?
nursery = trio.open_nursery()
with start_action_for_async_context_manger(nursery, action_type="myaction"):
    # ... do stuff with nursery ...

Another potential solution: an Instrument that starts an eliot action on task_spawned, and finishes the eliot action on task_exited.

Done

ahhh, that makes sense.

I’m not sure what “Add Eliot support to Trio” would actually mean? It sounds like the data in the logs is already fully accurate; it’s just challenging to make sense of it efficiently.

If you’re using contextvars to track the eliot context, then you’re going to have situations where code is executing with its “current parent action id” pointing to an action that has already finished. Copying contexts around is what contextvars is designed to do, and it does it all the time (and this is even more true for asyncio than trio – e.g. every time you register a callback in asyncio, it copies the context). And part of the motivation for that was exactly logging use cases :-). Imagine a simple logging system that assigns each incoming RPC a unique id, and then prefixes all the subsequent log messages with that id… if the RPC handler spawns a background task, you probably want the log id to carry over into those tasks!

But it does make it tricky to figure out the last moment that someone could emit a log message that refers to a given action id… contextvars don’t have any kind of deterministic lifetime management.

I guess you could make the action id that you save in the ContextVar an full-fledged object with a custom __del__ method; when that gets invoked you know that all the contexts that were holding that id are gone. But this seems like a bad idea; doing things in __del__ is precarious under the best of circumstances, and __del__ calls can be arbitrarily delayed.

Maybe there’s some kind of clever multi-pass log parsing that could reconnect these events without needing to hold the whole thing in memory?

Maybe eliot-tree should just render the existing output better, like putting on note on the second round saying “(continued from previous event)”?

I guess we could consider extending contextvars to have some concept of lifecycle?

This is fairly intrusive and potentially adds substantial overhead, plus might generate a lot of log spam (maybe restrict it to only cases where the new task is spawned from inside an existing action?). Also there’s no asyncio equivalent. So my intuition is that eliot probably wants to do something sensible even if this isn’t set up :-). But, I can imagine that some people might be interested in it as an opt-in thing, and maybe it makes sense to provide some API to enable it, like calling eliot.log_all_tasks() or something?

Broad solution may be tricky, but for Trio my current leaning is towards wrapping the API. That is, adding a eliot.trio.open_nursery() and telling users to use that instead of trio.open_nursery():

def open_nursery():
    return _EliotNursery()

class _EliotNursery:
    def __init__(self):
        self._nursery = trio.open_nursery()
    def __getattr__(self, attr):
        return getattr(self._nursery, attr)
    def start_soon(self, f, *args, **kwargs):
       action = start_action(action_type="trio:start_soon")
       async def go():
           with action:
               return await f(*args, **kwargs)
       self._nursery.start_soon(go)

    # ... similar wrappers for other nursery methods

The above could be part of Trio too, which is what I meant by “Trio could support Eliot”… or Trio could add some generic hooks to enable the above.

The above could be part of Trio too, which is what I meant by “Trio could support Eliot”… or Trio could add some generic hooks to enable the above.

Trio already has an instrumentation system: https://trio.readthedocs.io/en/latest/reference-hazmat.html#instrument-api. While I’m not sure we’d be interested in adding Eliot support specifically, changes that make instrumentation powerful enough for Eliot’s needs seem very much in-scope to me.

I think you can accomplish the goal of the EliotNursery using the existing task_spawned and task_exited instrumentation hooks, but you might need better upstream support in order to handle nursery.start() correctly.

1 Like

One concern I have with current instrumentation API is that it adds performance overhead—I wouldn’t need all the hooks, but users would pay performance hit anyway since all the hooks would be called.

We actually rewrote the Instrument API a while ago to make it possible to avoid that overhead:

https://github.com/python-trio/trio/issues/257

It’s true that the current implementation still adds unnecessary overhead for operations you haven’t hooked, but that’s just because we’re lazy and were waiting for someone to complain. It would be pretty straightforward to remove that overhead, by maintaining a seperate list of active instruments for each hook, and updating it when instruments are added and removed.

2 Likes

This Eliot issue was opened a while back when this discussion first happened. The failure uses this code:

import eliot
import trio

eliot.to_file(open('eliot.log', 'w'))


async def say(message, delay):
    with eliot.start_action(action_type="say", message=message):
        await trio.sleep(delay)

async def main():
    async with trio.open_nursery() as nursery:
        with eliot.start_action(action_type="main"):
            nursery.start_soon(say, "hello", 1)
            nursery.start_soon(say, "world", 2)

trio.run(main)

which produces suboptimal results.

0ed1a1c3-050c-4fb9-9426-a7e72d0acfc7
└── main/1 ⇒ started 2019-04-26 13:01:13 ⧖ 0.000s
    └── main/2 ⇒ succeeded 2019-04-26 13:01:13

0ed1a1c3-050c-4fb9-9426-a7e72d0acfc7
└── <unnamed>
    ├── say/3/1 ⇒ started 2019-04-26 13:01:13 ⧖ 2.002s
    │   ├── message: world
    │   └── say/3/2 ⇒ succeeded 2019-04-26 13:01:15
    └── say/4/1 ⇒ started 2019-04-26 13:01:13 ⧖ 1.001s
        ├── message: hello
        └── say/4/2 ⇒ succeeded 2019-04-26 13:01:14

It seems to me that the solution is just a rule that eliot.start_action() should wrap the nursery, not the other way around.

import eliot
import trio

eliot.to_file(open('eliot.log', 'w'))


async def say(message, delay):
    with eliot.start_action(action_type="say", message=message):
        await trio.sleep(delay)

async def main():
    with eliot.start_action(action_type="main"):
        async with trio.open_nursery() as nursery:
            nursery.start_soon(say, "hello", 1)
            nursery.start_soon(say, "world", 2)

trio.run(main)

produces

1f711486-612b-4fd3-98d4-78a6f9bf71c9
└── main/1 ⇒ started 2019-12-18 05:03:13 ⧖ 2.007s
    ├── say/2/1 ⇒ started 2019-12-18 05:03:13 ⧖ 1.006s
    │   ├── message: hello
    │   └── say/2/2 ⇒ succeeded 2019-12-18 05:03:14
    ├── say/3/1 ⇒ started 2019-12-18 05:03:13 ⧖ 2.006s
    │   ├── message: world
    │   └── say/3/2 ⇒ succeeded 2019-12-18 05:03:15
    └── main/4 ⇒ succeeded 2019-12-18 05:03:15

It’s the same issue as this one about trio.open_process(). The solution proposed in that thread is a linter warning, but it’s so confusing for a user that I wonder if it could/should be prevented with a runtime exception.

The eliot docs do suggest doing the start_action on the outside. I’m not sure how to catch that with a runtime exception though—there are cases where it’s fine to have start_action inside the nursery… I guess one could argue that those cases are easily restructured and it’s worth avoiding the confusion. But it would also add performance overhead to all eliot calls.