What Is a Node?
A node is the smallest execution unit in the pipeline. It turns concrete interactions—mouse moves, clicks, keystrokes, image recognition—into real actions. Each node focuses on one verifiable interaction so the automation can drive desktop or web apps as if a person were operating them.
Responsibilities of a Node
- Consume inputs: read coordinates, text, templates, or other variables from the pipeline context.
- Execute the action: carry out the configured mouse move/click, keyboard input, or vision routine.
- Return outputs: write detected positions, logs, success/failure flags, or extracted data back to the context for downstream nodes.
Key Interaction Capabilities
Mouse Movement
- Supports absolute and relative coordinates, plus positions returned from image recognition.
- Lets you tune speed, path, and dwell time for hover, drag, or other human-like gestures.
Mouse Click
- Covers single, double, and right clicks, often paired with wait logic to ensure the UI is ready.
- Can chain with image recognition or coordinate math to “find → move → click” in one flow.
Keyboard Input
- Sends text, shortcuts (e.g.,
Ctrl+S), and function keys (Enter,Esc, etc.). - Works with forms, terminals, or editors and can type character-by-character or paste entire strings.
Image Recognition
- Uses template matching or OCR to locate buttons, icons, or text areas when there is no DOM/control access.
- Allows similarity thresholds, region constraints, and retry counts to raise accuracy.
Common Node Types
- Mouse event nodes: bundle movement, clicks, and drags with configurable coordinates, delays, and click styles—ideal for buttons, sliders, or menus.
- Keyboard event nodes: handle text input, shortcuts, and function keys for forms, command execution, or terminal control.
- Image-recognition nodes: find element locations or capture on-screen text, making them essential when DOM hooks are unavailable.
- Composite action nodes: encapsulate “recognize → move → click → type” sequences to reduce repetitive configuration.
- Start/end nodes: clearly mark the pipeline entry and exit so runtime and status are easy to track.
- Wait nodes (TimeWait/EventWait): pause for a fixed duration or until an external signal arrives, useful for async loading or approval steps.
What to Configure on a Node
- Runtime or environment to define dependencies, permissions, and resource quotas.
- Input/output mapping so data flows into the node and returns to context with clear names.
- Observability hooks such as log verbosity, metrics, or alerts for troubleshooting.
Design Tips
- Keep each node single-purpose; split large logic into multiple nodes.
- Aim for idempotency so retries do not introduce side effects and recovery stays simple.
Last updated on: