How to Attach a Secure Sandbox Shell to Your OpenAI Model in Minutes

Intro

Allowing an LLM to run shell commands unlocks powerful diagnostics and automation (filesystem inspection, running tests, building artifacts). But running arbitrary shell commands is dangerous unless you sandbox execution, set strict allow/deny rules, and capture audit logs.

ShellifyAI makes this integration fast and safe: define a local_shell tool in your OpenAI request, forward tool calls to ShellifyAI’s secure execute API, then return the results to the model. In minutes you get sandboxing, timeouts, streaming output, artifact capture, and session persistence — without writing your own container orchestration.

What you’ll learn

How the OpenAI shell tool flow works (quick recap)
How to wire ShellifyAI into that flow in minutes
Security best practices and common patterns
Ready-to-run examples (JavaScript + Python + Vercel AI SDK)

Why use ShellifyAI

ShellifyAI handles the hard parts so you don't have to build and maintain sandbox infrastructure:

Secure sandboxed execution in ephemeral containers
Session-based persistence for multi-step workflows
Real-time stdout/stderr streaming for responsive UIs
File artifact capture (uploads with signed URLs)
Adapter support for different agent SDKs and adapters

Step 0 — Prerequisites

Create a ShellifyAI project and get your SHELLIFYAI_API_KEY from the Shellify console.
Have an OpenAI integration that can define a function/tool (Responses API or Agents SDK where applicable).

Set environment variables

bash
1export SHELLIFYAI_API_KEY=your_api_key

Quick recap — how OpenAI’s shell tool works

OpenAI’s shell tool lets models propose commands via the Responses API in the form of function/tool calls. Your integration executes those commands and returns structured outputs with stdout, stderr and an outcome (exit or timeout).

Key points:

The model outputs a function/tool call with arguments (e.g., { command: "..." }).
Execute commands in a sandbox and return outputs including stdout, stderr and an outcome ({type: "exit", exit_code: N} or {type: "timeout"}).
If the model includes a max_output_length or similar parameter, make sure you honor/copy it back in your response to avoid errors.

How ShellifyAI fits — the simple flow

Define a local_shell tool in your OpenAI request.
When the model calls the tool, forward the command to ShellifyAI’s API: POST https://shellifyai.com/v1/execute with your SHELLIFYAI_API_KEY in the x-api-key header.
ShellifyAI executes in a sandbox and returns structured events (stdout/stderr logs, artifacts, status updates, and completed results).
Map Shellify’s response back to the model via the Responses API (submit tool outputs or function outputs depending on your setup).

JavaScript (OpenAI client)

This example uses the direct POST to the execute endpoint. Note the environment variable name and the single execute endpoint — the API key encodes your project, so there's no separate projectId parameter.

typescript
1import OpenAI from "openai";
2
3const client = new OpenAI();
4
5const tools = [{
6  type: "function",
7  name: "local_shell",
8  description: "Execute shell commands in a secure sandbox",
9  parameters: {
10    type: "object",
11    properties: {
12      command: { type: "string", description: "Shell command to execute" }
13    },
14    required: ["command"]
15  }
16}];
17
18const response = await client.responses.create({
19  model: "gpt-5.1",
20  input: "Create a Python file that prints Hello World and run it",
21  tools
22});
23
24for (const item of response.output) {
25  if (item.type === "function_call" && item.name === "local_shell") {
26    const args = JSON.parse(item.arguments || "{}");
27
28    const result = await fetch("https://shellifyai.com/v1/execute", {
29      method: "POST",
30      headers: {
31        "Content-Type": "application/json",
32        "x-api-key": process.env.SHELLIFYAI_API_KEY!
33      },
34      body: JSON.stringify({
35        adapterType: "local_shell",
36        tool: "local_shell",
37        payload: {
38          command: args.command,
39          // optional: sessionId, timeoutMs, workingDirectory, env, systemMessage
40          sessionId: args.sessionId,
41          timeoutMs: args.timeoutMs,
42        }
43      })
44    }).then(r => r.json());
45
46    await client.responses.submitToolOutputs(response.id, {
47      tool_outputs: [{ tool_call_id: item.call_id, output: JSON.stringify(result) }]
48    });
49  }
50}

Python (OpenAI client)

python
1from openai import OpenAI
2import requests
3import json
4import os
5
6client = OpenAI()
7
8tools = [{
9    "type": "function",
10    "name": "local_shell",
11    "description": "Execute shell commands in a secure sandbox",
12    "parameters": {
13        "type": "object",
14        "properties": {
15            "command": {"type": "string", "description": "Shell command to execute"}
16        },
17        "required": ["command"]
18    }
19}]
20
21response = client.responses.create(
22    model="gpt-5.1",
23    input="Create a Python file that prints Hello World and run it",
24    tools=tools
25)
26
27for item in response.output:
28    if item.type == "function_call" and item.name == "local_shell":
29        args = json.loads(item.arguments or "{}")
30        result = requests.post(
31            "https://shellifyai.com/v1/execute",
32            headers={
33                "Content-Type": "application/json",
34                "x-api-key": os.environ["SHELLIFYAI_API_KEY"]
35            },
36            json={
37                "adapterType": "local_shell",
38                "tool": "local_shell",
39                "payload": {
40                    "command": args.get("command"),
41                    "sessionId": args.get("sessionId"),
42                    "timeoutMs": args.get("timeoutMs"),
43                }
44            }
45        ).json()
46
47        client.responses.submit_tool_outputs(
48            response_id=response.id,
49            tool_outputs=[{"tool_call_id": item.call_id, "output": json.dumps(result)}]
50        )

Vercel AI SDK

If you use the Vercel AI SDK, use the Shellify helper from @shellifyai/shell-tool. The helper handles execute calls for you; provide your SHELLIFYAI_API_KEY and the SDK takes care of forwarding commands and streaming responses.

Handling streaming & artifacts

For responsive UI and long-running commands, enable streaming by setting the Accept header to application/jsonl or by adding ?stream=true to the execute POST. Shellify emits structured events as JSON lines: meta, status, log (stdout/stderr), artifact (filename + signed URL), and completed status. Use these events to progressively render output and surface artifacts (e.g., test reports, built binaries) to users without waiting for the full job to finish.

When capturing artifacts, Shellify uploads files and returns signed URLs in artifact events. Store those URLs or surface them in the chat UI so users can download outputs produced by the shell session.

Security best practices

Shellify provides sandboxing by default, but you should still apply additional controls depending on your risk profile:

Allow/deny lists: Block risky commands (e.g., rm -rf, curl to external hosts) at the integration layer when possible.
Timeouts: Respect the timeoutMs provided to Shellify and configure conservative defaults for unknown commands.
Session scope: Use sessionId only when you need filesystem persistence across steps; ephemeral sessions reduce blast radius.
Network restrictions: Disable network access in sandboxes unless explicitly necessary.
Least privilege: Run commands as a non-root user inside the container and restrict filesystem mounts.
Audit logging: Persist command, stdout/stderr, and artifact metadata for traceability and debugging.
System messages: Shellify appends security policies to systemMessage, so custom overrides cannot bypass sandbox rules.

Mapping Shellify responses to OpenAI

Shellify returns a structured result containing events and a summary. OpenAI expects a tool output with stdout, stderr, and an outcome ({ "type": "exit", "exit_code": N } or { "type": "timeout" }). Map the final aggregated stdout/stderr and set the outcome based on the exit code or timeout events. Also copy any max output length values the model specified back into your returned tool output to avoid validation errors.

Advanced tips & troubleshooting

adapterType: only override it if you need a non-default adapter for specific behavior—otherwise omit it and let the project default apply.
Long outputs: rely on streaming to avoid very large payloads and to keep user experience snappy.
Non-interactive commands: ensure tools the model calls are non-interactive (no password prompts or full-screen editors).
Exit codes: Capture non-zero exit codes and still return stdout/stderr so the model can reason about warnings vs fatal errors.

Wrap-up — minutes, not days

With ShellifyAI you don’t need to build sandbox infrastructure or artifact handling. Define a local_shell tool, forward model tool calls to POST https://shellifyai.com/v1/execute with your SHELLIFYAI_API_KEY, and return the structured result to the model. Follow the security checklist and you can safely attach a sandboxed shell to your model in minutes.

Next steps

Create a Shellify project and copy your credentials into SHELLIFYAI_API_KEY.
Try the JavaScript quick-start snippet above.
If you’re using Vercel, install @shellifyai/shell-tool and use the shellify helper for automatic execution.

Resources

For full API reference and developer documentation, see https://shellifyai.com/docs