Llama 4 Scout

35k+ Runs

17B active parameter model with 16 experts designed to fit on a single Nvidia H100 GPU while offering an industry leading 10M context window.

Llama API Usage

POST /v1/chat/completions

import requests
import json

url = "https://api.akashml.com/v1/chat/completions"

payload = {
    "model": "llama-4-scout",
    "messages": [
        {
            "role": "user",
            "content": "Hello, how are you?"
        }
    ],
    "max_tokens": 150,
    "temperature": 0.7
}

headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_API_KEY"
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Pricing

Price (per 1M Tokens)

Model

Input

Output

Llama 4 Scout

Model Details

Provider