Head-to-head comparison across 25 high-difficulty extraction tasks spanning e-commerce, legal, medical, finance, and logic domains.
25 Extraction Tasks
Hand-crafted extraction tasks across 5 domains (E-commerce, Legal, Medical, Finance, Logic). Each rated difficulty 4–5, designed to expose ambiguous entities, nested structures, multi-step reasoning, and type coercion.
Deterministic Scoring
Every response is scored with field-level deterministic comparison - booleans and numbers checked exactly (5 % tolerance for numbers), strings compared via embedding cosine similarity. A field is marked correct when similarity ≥ 80%. Arrays are matched element-by-element. Scores range 0–100 per task.
Consistent & Reproducible
Each model runs every task once under identical conditions (same prompts, same schema). Results are timestamped and versioned so you can track changes over time.
Each task shows the input, both model outputs, and the accuracy score.
Nike Air Max 97 OG QS "Metallic Gold" Bullet — Titanium Violet / Varsity Red — Men's US 10.5 — 2018 Retro Release
{
"brand": "Nike",
"model": "Air Max 97 OG QS",
"colorway": "Metallic Gold",
"color_1": "Titanium Violet",
"color_2": "Varsity Red",
"size": "US 10.5",
"gender": "Men's",
"year": "2018",
"edition": "Retro Release"
}All fields correct
Incorrect/missing fields: edition
Apple MacBook Pro 14" M3 Max (16-core CPU / 40-core GPU) — 48GB Unified Memory — 1TB SSD — Space Black — AppleCare+ Bundle — Open Box Like New
{
"brand": "Apple",
"model": "MacBook Pro 14\"",
"chip": "M3 Max",
"cpu_cores": 16,
"gpu_cores": 40,
"memory_gb": 48,
"storage": "1TB SSD",
"color": "Space Black",
"condition": "Open Box Like New",
"includes_warranty": true
}All fields correct
All fields correct
Olaplex Hair Perfector No. 3 — 100ml (3.3 fl oz) × 3 Pack — $89.97 ($29.99/each) — Sulfate Free — Salon Professional — Ships from Authorized Dealer
{
"brand": "Olaplex",
"product_name": "Hair Perfector No. 3",
"volume_ml": 100,
"volume_oz": 3.3,
"pack_quantity": 3,
"total_price_usd": 89.97,
"unit_price_usd": 29.99,
"is_sulfate_free": true,
"seller_type": "Authorized Dealer"
}All fields correct
All fields correct
Sony WH-1000XM5 ワイヤレスノイズキャンセリングヘッドホン ブラック — Bluetooth 5.3 / LDAC — 30hr Battery — 100V-240V AC Adapter — JAN: 4548736132351 — Japan Domestic Model
{
"brand": "Sony",
"model": "WH-1000XM5",
"type": "Wireless Noise Cancelling Headphones",
"color": "Black",
"bluetooth_version": "5.3",
"audio_codec": "LDAC",
"battery_hours": 30,
"voltage_range": "100V-240V",
"jan_code": "4548736132351",
"region": "Japan Domestic"
}Incorrect/missing fields: type, color
All fields correct
Tiffany & Co. Soleste Round Brilliant Diamond Engagement Ring — 1.52ct E/VVS2 — Platinum Setting with 18K Rose Gold Accent Band — GIA #2215847290 — Size 6.5 — Retail $42,800
{
"brand": "Tiffany & Co.",
"collection": "Soleste",
"stone_type": "Round Brilliant Diamond",
"carat_weight": 1.52,
"color_grade": "E",
"clarity_grade": "VVS2",
"primary_metal": "Platinum",
"accent_metal": "18K Rose Gold",
"gia_report_number": "2215847290",
"ring_size": 6.5,
"retail_price_usd": 42800
}All fields correct
All fields correct
MASTER SERVICES AGREEMENT dated January 15, 2026. Section 8.2 (Limitation of Liability): Except for Vendor's indemnification obligations under Section 9.1 and breaches of Section 12 (Confidentiality), in no event shall either party's aggregate liability under this Agreement exceed the greater of (a) the total fees paid or payable by Client in the twelve (12) month period immediately preceding the event giving rise to the claim, or (b) Five Hundred Thousand Dollars ($500,000). Notwithstanding the foregoing, Vendor's aggregate liability for indemnification claims under Section 9.1 shall not exceed Two Million Dollars ($2,000,000). Section 8.3: IN NO EVENT SHALL EITHER PARTY BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES, REGARDLESS OF THE CAUSE OF ACTION OR THEORY OF LIABILITY, EVEN IF SUCH PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. The foregoing limitation shall not apply to breaches of Section 12 (Confidentiality) or willful misconduct.
{
"general_liability_cap": "$500,000",
"general_liability_basis": "Greater of 12-month fees or $500,000",
"indemnification_cap": "$2,000,000",
"indemnification_section": "9.1",
"consequential_damages_excluded": true,
"consequential_damages_exceptions": [
"Confidentiality breaches",
"Willful misconduct"
],
"liability_cap_exceptions": [
"Indemnification under Section 9.1",
"Confidentiality under Section 12"
]
}Incorrect/missing fields: general_liability_cap, general_liability_basis
Incorrect/missing fields: general_liability_cap, general_liability_basis
Section 14 — TERMINATION. 14.1 Termination for Convenience: Either party may terminate this Agreement upon ninety (90) days' prior written notice to the other party. 14.2 Termination for Cause: Either party may terminate this Agreement immediately upon written notice if the other party: (a) materially breaches this Agreement and fails to cure such breach within thirty (30) days after receiving written notice; (b) becomes insolvent, files for bankruptcy under Title 11, or has a receiver appointed for substantially all of its assets; or (c) is acquired by a direct competitor of the non-breaching party ('Change of Control'). 14.3 Effect of Termination: Upon termination under 14.1, all prepaid but unused fees shall be refunded pro rata. Upon termination under 14.2(a), the breaching party shall forfeit any unused prepaid fees. Upon termination under 14.2(b) or 14.2(c), fees shall be handled in accordance with Section 7.4.{
"termination_for_convenience.notice_period_days": 90,
"termination_for_convenience.available_to": "Either party",
"termination_for_cause.triggers": [
{
"type": "Material breach",
"cure_period_days": 30
},
{
"type": "Insolvency/Bankruptcy",
"cure_period_days": 0
},
{
"type": "Change of Control",
"cure_period_days": 0
}
],
"termination_effects.convenience": "Pro rata refund of prepaid unused fees",
"termination_effects.cause_breach": "Breaching party forfeits unused prepaid fees",
"termination_effects.cause_insolvency_or_change_of_control": "Per Section 7.4"
}Incorrect/missing fields: termination_effects.cause_insolvency_or_change_of_control
Incorrect/missing fields: termination_effects.cause_insolvency_or_change_of_control
Section 6 — INTELLECTUAL PROPERTY. 6.1 Work Product: All deliverables, work product, and inventions conceived, created, or developed by Vendor in the performance of this Agreement ("Work Product") shall be considered "works made for hire" as defined by the U.S. Copyright Act. To the extent any Work Product does not qualify as a work made for hire, Vendor hereby irrevocably assigns to Client all right, title, and interest in and to such Work Product, including all intellectual property rights therein. 6.2 Pre-Existing IP: Notwithstanding Section 6.1, Vendor retains all right, title, and interest in and to any tools, frameworks, libraries, methodologies, and know-how that: (a) existed prior to this Agreement; (b) were developed independently outside the scope of this Agreement; or (c) are of general applicability and not specific to Client's business (collectively, "Pre-Existing IP"). To the extent any Pre-Existing IP is incorporated into the Work Product, Vendor grants Client a perpetual, irrevocable, worldwide, royalty-free, non-exclusive license to use, reproduce, and modify such Pre-Existing IP solely as embedded in the Work Product. 6.3 Residual Knowledge: Nothing in this Agreement shall restrict Vendor's right to use general skills, knowledge, experience, and ideas retained in the unaided memories of Vendor's personnel, provided such use does not infringe Client's patents or copyrights in the Work Product.{
"work_product_ownership": "Assigned to Client",
"assignment_mechanism": "Work made for hire + irrevocable assignment fallback",
"pre_existing_ip_ownership": "Retained by Vendor",
"pre_existing_ip_categories": [
"Prior tools/frameworks",
"Independently developed",
"General applicability"
],
"pre_existing_ip_license_to_client.type": "Non-exclusive license",
"pre_existing_ip_license_to_client.scope": "Use, reproduce, modify as embedded in Work Product",
"pre_existing_ip_license_to_client.duration": "Perpetual",
"pre_existing_ip_license_to_client.royalty": "Royalty-free",
"pre_existing_ip_license_to_client.territory": "Worldwide",
"residual_knowledge_clause": true,
"residual_knowledge_limitation": "Must not infringe Client's patents or copyrights"
}Incorrect/missing fields: pre_existing_ip_categories, work_product_ownership, pre_existing_ip_ownership, pre_existing_ip_license_to_client.duration, residual_knowledge_limitation
Incorrect/missing fields: pre_existing_ip_categories, work_product_ownership, pre_existing_ip_ownership, pre_existing_ip_license_to_client.type, residual_knowledge_limitation
Section 15 — FORCE MAJEURE. 15.1 Neither party shall be liable for any failure or delay in performing its obligations under this Agreement (other than payment obligations, which shall not be excused) where such failure or delay results from Force Majeure Events including, but not limited to: acts of God, fire, flood, earthquake, epidemic, pandemic, war, terrorism, civil unrest, government sanctions, embargoes, strikes or labor disputes (excluding those involving a party's own employees), power grid failures, or internet backbone outages. 15.2 The affected party must provide written notice within five (5) business days of the Force Majeure Event's commencement, including a reasonable estimate of the expected duration. The affected party shall use commercially reasonable efforts to mitigate the impact. 15.3 If a Force Majeure Event continues for more than one hundred and twenty (120) consecutive calendar days, either party may terminate this Agreement upon written notice without penalty.
{
"payment_obligations_excused": false,
"notice_period_business_days": 5,
"notice_requirements": [
"Written notice",
"Reasonable duration estimate"
],
"mitigation_standard": "Commercially reasonable efforts",
"termination_threshold_days": 120,
"termination_penalty": false,
"excluded_events": [
"Strikes involving own employees"
],
"included_events": [
"Acts of God",
"Fire",
"Flood",
"Earthquake",
"Epidemic",
"Pandemic",
"War",
"Terrorism",
"Civil unrest",
"Government sanctions",
"Embargoes",
"Strikes/labor disputes",
"Power grid failures",
"Internet backbone outages"
]
}Incorrect/missing fields: notice_requirements
All fields correct
Section 11 — RESTRICTIVE COVENANTS. 11.1 Non-Competition: For a period of eighteen (18) months following termination of employment for any reason, Employee shall not, directly or indirectly, engage in, own, manage, operate, consult for, or be employed by any Competing Business within the Restricted Territory. "Competing Business" means any entity that derives more than twenty percent (20%) of its annual revenue from products or services substantially similar to those offered by Company. "Restricted Territory" means the United States, the United Kingdom, and any country in which Company generated more than $5,000,000 in revenue during the twelve (12) months preceding termination. 11.2 Non-Solicitation of Employees: For a period of twenty-four (24) months following termination, Employee shall not recruit, solicit, or induce any person employed by Company (or who was employed within the six (6) months preceding such solicitation). 11.3 Non-Solicitation of Customers: For a period of twelve (12) months following termination, Employee shall not solicit any customer or prospective customer with whom Employee had material contact during the last twenty-four (24) months of employment. 11.4 Non-Disparagement: For a period of thirty-six (36) months following termination, Employee shall not make any public statements that disparage or defame Company, its officers, directors, or products. This obligation is mutual — Company shall instruct its officers and directors to refrain from disparaging Employee.
{
"non_compete.duration_months": 18,
"non_compete.scope": "Engage, own, manage, operate, consult, or be employed",
"non_compete.competing_business_threshold": "20% of annual revenue from similar products/services",
"non_compete.restricted_territory": [
"United States",
"United Kingdom",
"Countries with >$5M Company revenue in prior 12 months"
],
"non_solicitation_employees.duration_months": 24,
"non_solicitation_employees.lookback_months": 6,
"non_solicitation_customers.duration_months": 12,
"non_solicitation_customers.contact_lookback_months": 24,
"non_disparagement.duration_months": 36,
"non_disparagement.is_mutual": true,
"non_disparagement.company_scope": "Officers and directors"
}Incorrect/missing fields: non_compete.scope, non_disparagement.company_scope
Incorrect/missing fields: non_compete.scope, non_disparagement.company_scope
Patient: Female, 67yo, presents to ED with acute onset chest pain (started 2 hours ago, 8/10 severity, substernal, radiating to left arm), on a background of chronic stable angina (diagnosed 2019, managed with isosorbide mononitrate 60mg QD). History of Type 2 DM (HbA1c 7.2%, on metformin 1000mg BID + empagliflozin 25mg QD), hypertension (BP 168/94 on arrival, home meds: amlodipine 10mg + lisinopril 40mg), and previous NSTEMI (March 2023). Allergies: Aspirin (urticaria), Atorvastatin (myalgia — switched to rosuvastatin 20mg). Current vitals: HR 96 bpm, RR 22, SpO2 94% on RA, Temp 36.8°C.
{
"demographics.sex": "Female",
"demographics.age": 67,
"chief_complaint": "Acute onset chest pain",
"presenting_symptoms": [
{
"symptom": "Chest pain",
"onset": "Acute",
"duration": "2 hours",
"severity": "8/10",
"location": "Substernal",
"radiation": "Left arm"
}
],
"chronic_conditions": [
{
"condition": "Chronic stable angina",
"diagnosed": "2019",
"status": "Managed"
},
{
"condition": "Type 2 Diabetes Mellitus",
"marker": "HbA1c 7.2%"
},
{
"condition": "Hypertension",
"current_bp": "168/94"
},
{
"condition": "Previous NSTEMI",
"date": "March 2023"
}
],
"medications": [
{
"name": "Isosorbide mononitrate",
"dose": "60mg",
"frequency": "QD",
"indication": "Angina"
},
{
"name": "Metformin",
"dose": "1000mg",
"frequency": "BID",
"indication": "Diabetes"
},
{
"name": "Empagliflozin",
"dose": "25mg",
"frequency": "QD",
"indication": "Diabetes"
},
{
"name": "Amlodipine",
"dose": "10mg",
"frequency": "QD",
"indication": "Hypertension"
},
{
"name": "Lisinopril",
"dose": "40mg",
"frequency": "QD",
"indication": "Hypertension"
},
{
"name": "Rosuvastatin",
"dose": "20mg",
"frequency": "QD",
"indication": "Cholesterol"
}
],
"allergies": [
{
"drug": "Aspirin",
"reaction": "Urticaria"
},
{
"drug": "Atorvastatin",
"reaction": "Myalgia",
"note": "Switched to rosuvastatin"
}
],
"vitals.hr_bpm": 96,
"vitals.rr": 22,
"vitals.spo2_percent": 94,
"vitals.temp_celsius": 36.8
}Incorrect/missing fields: chronic_conditions
Incorrect/missing fields: chronic_conditions, chief_complaint
COMPREHENSIVE METABOLIC PANEL — Collected: 02/15/2026 07:30. Glucose: 187 mg/dL (ref: 70-100, FLAG: HIGH). BUN: 42 mg/dL (ref: 7-20, FLAG: HIGH). Creatinine: 2.1 mg/dL (ref: 0.7-1.3, FLAG: HIGH). eGFR: 28 mL/min/1.73m² (ref: >60, FLAG: LOW — Stage 4 CKD). Sodium: 138 mEq/L (ref: 136-145). Potassium: 5.6 mEq/L (ref: 3.5-5.0, FLAG: HIGH — CRITICAL). Chloride: 102 mEq/L (ref: 98-106). CO2: 19 mEq/L (ref: 23-29, FLAG: LOW). Calcium: 8.2 mg/dL (ref: 8.5-10.5, FLAG: LOW). Total Protein: 6.8 g/dL (ref: 6.0-8.3). Albumin: 3.1 g/dL (ref: 3.5-5.5, FLAG: LOW). Bilirubin, Total: 0.9 mg/dL (ref: 0.1-1.2). ALT: 24 U/L (ref: 7-56). AST: 31 U/L (ref: 10-40). Alk Phos: 98 U/L (ref: 44-147).
{
"panel_type": "Comprehensive Metabolic Panel",
"collected": "2026-02-15T07:30",
"results": [
{
"test": "Glucose",
"value": 187,
"unit": "mg/dL",
"ref_low": 70,
"ref_high": 100,
"flag": "HIGH"
},
{
"test": "BUN",
"value": 42,
"unit": "mg/dL",
"ref_low": 7,
"ref_high": 20,
"flag": "HIGH"
},
{
"test": "Creatinine",
"value": 2.1,
"unit": "mg/dL",
"ref_low": 0.7,
"ref_high": 1.3,
"flag": "HIGH"
},
{
"test": "eGFR",
"value": 28,
"unit": "mL/min/1.73m²",
"ref_low": 60,
"ref_high": null,
"flag": "LOW",
"note": "Stage 4 CKD"
},
{
"test": "Sodium",
"value": 138,
"unit": "mEq/L",
"ref_low": 136,
"ref_high": 145,
"flag": null
},
{
"test": "Potassium",
"value": 5.6,
"unit": "mEq/L",
"ref_low": 3.5,
"ref_high": 5,
"flag": "HIGH",
"critical": true
},
{
"test": "Chloride",
"value": 102,
"unit": "mEq/L",
"ref_low": 98,
"ref_high": 106,
"flag": null
},
{
"test": "CO2",
"value": 19,
"unit": "mEq/L",
"ref_low": 23,
"ref_high": 29,
"flag": "LOW"
},
{
"test": "Calcium",
"value": 8.2,
"unit": "mg/dL",
"ref_low": 8.5,
"ref_high": 10.5,
"flag": "LOW"
},
{
"test": "Total Protein",
"value": 6.8,
"unit": "g/dL",
"ref_low": 6,
"ref_high": 8.3,
"flag": null
},
{
"test": "Albumin",
"value": 3.1,
"unit": "g/dL",
"ref_low": 3.5,
"ref_high": 5.5,
"flag": "LOW"
},
{
"test": "Bilirubin Total",
"value": 0.9,
"unit": "mg/dL",
"ref_low": 0.1,
"ref_high": 1.2,
"flag": null
},
{
"test": "ALT",
"value": 24,
"unit": "U/L",
"ref_low": 7,
"ref_high": 56,
"flag": null
},
{
"test": "AST",
"value": 31,
"unit": "U/L",
"ref_low": 10,
"ref_high": 40,
"flag": null
},
{
"test": "Alk Phos",
"value": 98,
"unit": "U/L",
"ref_low": 44,
"ref_high": 147,
"flag": null
}
],
"critical_values": [
"Potassium 5.6 mEq/L"
],
"abnormal_count": 7,
"normal_count": 8
}Incorrect/missing fields: abnormal_count, normal_count
Incorrect/missing fields: abnormal_count, normal_count, critical_values
MEDICATION RECONCILIATION — Inpatient Day 3. Current Orders: (1) Warfarin 5mg PO QD (for AFib — INR target 2.0-3.0, last INR 2.4 on 02/13). (2) Metoprolol Succinate 100mg PO QD. (3) Diltiazem ER 240mg PO QD. (4) Amiodarone 200mg PO QD (started 02/12 for rate-refractory AFib). (5) Fluconazole 400mg IV QD (Day 5 of 14 for candidemia). (6) Enoxaparin 80mg SQ Q12H (DVT prophylaxis — started before warfarin was therapeutic). (7) Acetaminophen 1000mg PO Q6H PRN. (8) Omeprazole 40mg PO QD. Home Medications NOT continued: Aspirin 81mg QD (held due to warfarin + enoxaparin = triple antithrombotic risk).
{
"active_medications": [
{
"name": "Warfarin",
"dose": "5mg",
"route": "PO",
"frequency": "QD",
"indication": "AFib"
},
{
"name": "Metoprolol Succinate",
"dose": "100mg",
"route": "PO",
"frequency": "QD"
},
{
"name": "Diltiazem ER",
"dose": "240mg",
"route": "PO",
"frequency": "QD"
},
{
"name": "Amiodarone",
"dose": "200mg",
"route": "PO",
"frequency": "QD",
"indication": "Rate-refractory AFib",
"start_date": "02/12"
},
{
"name": "Fluconazole",
"dose": "400mg",
"route": "IV",
"frequency": "QD",
"indication": "Candidemia",
"day": 5,
"total_days": 14
},
{
"name": "Enoxaparin",
"dose": "80mg",
"route": "SQ",
"frequency": "Q12H",
"indication": "DVT prophylaxis"
},
{
"name": "Acetaminophen",
"dose": "1000mg",
"route": "PO",
"frequency": "Q6H",
"prn": true
},
{
"name": "Omeprazole",
"dose": "40mg",
"route": "PO",
"frequency": "QD"
}
],
"held_medications": [
{
"name": "Aspirin",
"dose": "81mg",
"reason": "Triple antithrombotic risk with warfarin + enoxaparin"
}
],
"potential_interactions": [
{
"drugs": [
"Warfarin",
"Fluconazole"
],
"severity": "Major",
"effect": "Fluconazole inhibits CYP2C9, significantly increasing warfarin levels and bleeding risk"
},
{
"drugs": [
"Warfarin",
"Amiodarone"
],
"severity": "Major",
"effect": "Amiodarone inhibits CYP2C9/1A2/3A4, increasing warfarin effect — typical dose reduction 30-50%"
},
{
"drugs": [
"Metoprolol",
"Diltiazem"
],
"severity": "Major",
"effect": "Additive AV nodal blockade — risk of severe bradycardia or heart block"
},
{
"drugs": [
"Warfarin",
"Enoxaparin"
],
"severity": "High",
"effect": "Dual anticoagulation increases bleeding risk"
}
],
"therapeutic_duplications": [
{
"category": "Rate control",
"drugs": [
"Metoprolol",
"Diltiazem",
"Amiodarone"
],
"note": "Triple rate control — assess need for all three"
}
]
}Incorrect/missing fields: potential_interactions, therapeutic_duplications
Incorrect/missing fields: potential_interactions, therapeutic_duplications
OPERATIVE REPORT. Date: 02/20/2026. Surgeon: Dr. Sarah Chen, MD, FACS. Assistant: Dr. James Park, MD. Anesthesia: General endotracheal (Dr. Reeves). Procedure: Laparoscopic cholecystectomy converted to open cholecystectomy. Indication: Acute cholecystitis with empyema, failed medical management. Findings: Gallbladder severely inflamed, gangrenous with empyema. Dense adhesions to duodenum and hepatic flexure of colon. Critical view of safety could NOT be obtained laparoscopically — decision to convert at 47 minutes. Common bile duct diameter 6mm, no stones, confirmed with intraoperative cholangiogram. Estimated blood loss: 350mL. Specimens: Gallbladder sent to pathology. Drain: 19-Fr Blake drain placed in Morrison's pouch. Complications: None intraoperative. Patient extubated and transferred to PACU in stable condition.
{
"date": "2026-02-20",
"surgeon.name": "Dr. Sarah Chen",
"surgeon.credentials": "MD, FACS",
"assistant.name": "Dr. James Park",
"assistant.credentials": "MD",
"anesthesiologist": "Dr. Reeves",
"anesthesia_type": "General endotracheal",
"procedure_planned": "Laparoscopic cholecystectomy",
"procedure_actual": "Open cholecystectomy",
"was_converted": true,
"conversion_time_minutes": 47,
"indication": "Acute cholecystitis with empyema",
"findings.gallbladder_status": "Gangrenous with empyema",
"findings.adhesions": "Dense, to duodenum and hepatic flexure",
"findings.critical_view_obtained": false,
"findings.cbd_diameter_mm": 6,
"findings.cbd_stones": false,
"findings.cholangiogram_performed": true,
"ebl_ml": 350,
"specimens": [
"Gallbladder"
],
"drain.type": "19-Fr Blake",
"drain.location": "Morrison's pouch",
"complications": "None",
"disposition": "PACU, stable"
}Incorrect/missing fields: specimens, complications
Incorrect/missing fields: complications, disposition
CT CHEST WITH CONTRAST — 02/18/2026. CLINICAL INDICATION: 67F with history of NSCLC s/p right upper lobectomy (2024), new cough, rule out recurrence. TECHNIQUE: Helical CT from thoracic inlet to adrenals with 80mL Omnipaque 350 IV contrast. COMPARISON: CT Chest 08/12/2025. FINDINGS: LUNGS: New 1.8 × 1.4 cm spiculated soft tissue nodule in the right lower lobe (series 4, image 187), suspicious for recurrence. No prior correlate. Stable 4mm ground-glass nodule in the left lower lobe (previously noted, unchanged from 08/2025). Post-surgical changes in the right upper lobe with expected fibrotic bands. MEDIASTINUM: New subcarinal lymph node measuring 1.6 cm short axis (previously 0.8 cm), concerning for metastatic adenopathy. Right hilar lymphadenopathy, largest node 1.2 cm (new). Heart size normal. No pericardial effusion. PLEURA: Small right-sided pleural effusion (new). No pneumothorax. BONES: No suspicious osseous lesions. Mild degenerative changes thoracic spine. UPPER ABDOMEN: Left adrenal gland 2.1 cm nodule with enhancement (not present on prior) — cannot exclude metastasis. Right adrenal normal. IMPRESSION: 1. New 1.8 cm spiculated RLL nodule — highly suspicious for recurrence of NSCLC. 2. New mediastinal and right hilar lymphadenopathy — concerning for nodal metastasis. 3. New left adrenal nodule — metastasis cannot be excluded; recommend dedicated adrenal CT or PET-CT. 4. New small right pleural effusion — may represent malignant effusion in this context. 5. Stable LLL 4mm GGN — recommend continued surveillance. RECOMMENDATION: PET-CT recommended. Multidisciplinary tumor board review.
{
"study_type": "CT Chest with contrast",
"date": "2026-02-18",
"clinical_history": "67F, NSCLC s/p right upper lobectomy (2024), new cough",
"comparison_study": "CT Chest 08/12/2025",
"new_findings": [
{
"location": "Right lower lobe",
"description": "1.8 × 1.4 cm spiculated soft tissue nodule",
"concern": "Recurrence of NSCLC",
"severity": "Highly suspicious"
},
{
"location": "Subcarinal",
"description": "Lymph node 1.6 cm short axis (was 0.8 cm)",
"concern": "Metastatic adenopathy"
},
{
"location": "Right hilum",
"description": "Lymphadenopathy, largest 1.2 cm",
"concern": "Metastatic adenopathy"
},
{
"location": "Right pleura",
"description": "Small pleural effusion",
"concern": "Possible malignant effusion"
},
{
"location": "Left adrenal",
"description": "2.1 cm enhancing nodule",
"concern": "Cannot exclude metastasis"
}
],
"stable_findings": [
{
"location": "Left lower lobe",
"description": "4mm ground-glass nodule",
"comparison": "Unchanged from 08/2025"
}
],
"post_surgical": "Right upper lobectomy changes with fibrotic bands",
"impressions_count": 5,
"recommendations": [
"PET-CT",
"Multidisciplinary tumor board review",
"Continued surveillance of LLL GGN"
],
"overall_concern": "Likely NSCLC recurrence with possible nodal and adrenal metastasis"
}Incorrect/missing fields: new_findings
All fields correct
Q4 2025 Earnings Call — TechCorp Inc. (Ticker: TCHK). CFO Maria Santos: "Total revenue for Q4 was $4.23 billion, up 18% year-over-year. Breaking that down: Cloud Services revenue was $2.87 billion, growing 31% and now representing 68% of total revenue. Enterprise Software was $980 million, roughly flat. Professional Services contributed $380 million, down 12% as we continue the strategic shift away from lower-margin consulting. On the cost side, GAAP gross profit was $2.71 billion, for a gross margin of 64.1%. Non-GAAP gross margin, excluding $142 million in stock-based compensation allocated to cost of revenue, was 67.4%. Operating expenses: R&D was $890 million, up from $760 million — we've been investing heavily in our AI platform. Sales & Marketing was $620 million, and G&A was $210 million. GAAP operating income was $990 million, and non-GAAP operating income was $1.38 billion. The delta is primarily SBC of $310 million, restructuring charges of $47 million related to the EMEA headcount reduction, and $33 million in acquisition-related costs. GAAP net income was $742 million, or $2.47 per diluted share on 300.4 million shares. Non-GAAP EPS was $3.84."
{
"period": "Q4 2025",
"company": "TechCorp Inc.",
"ticker": "TCHK",
"total_revenue_billions": 4.23,
"revenue_yoy_growth": "18%",
"revenue_breakdown": [
{
"segment": "Cloud Services",
"revenue_billions": 2.87,
"growth": "31%",
"pct_of_total": 68
},
{
"segment": "Enterprise Software",
"revenue_millions": 980,
"growth": "~0%"
},
{
"segment": "Professional Services",
"revenue_millions": 380,
"growth": "-12%"
}
],
"gaap_gross_profit_billions": 2.71,
"gaap_gross_margin_pct": 64.1,
"non_gaap_gross_margin_pct": 67.4,
"sbc_in_cogs_millions": 142,
"operating_expenses.rd_millions": 890,
"operating_expenses.rd_prior_millions": 760,
"operating_expenses.sales_marketing_millions": 620,
"operating_expenses.ga_millions": 210,
"gaap_operating_income_millions": 990,
"non_gaap_operating_income_millions": 1380,
"gaap_non_gaap_reconciliation": [
{
"item": "Stock-based compensation",
"amount_millions": 310
},
{
"item": "Restructuring charges",
"amount_millions": 47
},
{
"item": "Acquisition-related costs",
"amount_millions": 33
}
],
"gaap_net_income_millions": 742,
"gaap_eps": 2.47,
"non_gaap_eps": 3.84,
"diluted_shares_millions": 300.4
}Incorrect/missing fields: revenue_breakdown
Incorrect/missing fields: revenue_breakdown
CONSOLIDATED BALANCE SHEET — December 31, 2025 (in millions). ASSETS: Cash and equivalents $3,420. Short-term investments $1,850. Accounts receivable, net $2,190 (allowance for doubtful accounts $68). Inventories $890. Prepaid expenses $340. Total Current Assets $8,690. Property and equipment, net $12,400. Goodwill $8,750. Intangible assets, net $3,200. Operating lease right-of-use assets $1,860. Other non-current assets $920. Total Assets $35,820. LIABILITIES: Accounts payable $1,640. Accrued expenses $2,180. Current portion of long-term debt $500. Deferred revenue, current $1,420. Total Current Liabilities $5,740. Long-term debt $8,200 (senior notes: $5B at 3.75% due 2030, $3.2B at 4.25% due 2033). Operating lease liabilities $1,680. Deferred tax liabilities $890. Other non-current liabilities $460. Total Liabilities $16,970. EQUITY: Common stock $30. Additional paid-in capital $12,840. Retained earnings $9,180. Treasury stock ($2,400). Accumulated other comprehensive loss ($800). Total Stockholders' Equity $18,850. Total Liabilities + Equity $35,820.
{
"as_of": "2025-12-31",
"currency": "USD millions",
"current_assets": 8690,
"total_assets": 35820,
"current_liabilities": 5740,
"total_liabilities": 16970,
"total_equity": 18850,
"cash_and_equivalents": 3420,
"total_debt": 8700,
"debt_breakdown": [
{
"instrument": "Senior notes 3.75%",
"amount": 5000,
"maturity": 2030
},
{
"instrument": "Senior notes 4.25%",
"amount": 3200,
"maturity": 2033
}
],
"computed_ratios.current_ratio": 1.51,
"computed_ratios.quick_ratio": 1.3,
"computed_ratios.debt_to_equity": 0.46,
"computed_ratios.debt_to_assets": 0.24,
"computed_ratios.working_capital_millions": 2950,
"balance_sheet_check": true
}Incorrect/missing fields: currency
Incorrect/missing fields: currency
GEOGRAPHIC REVENUE ANALYSIS — FY2025. Americas: Reported revenue $6.24B (up 22% YoY). Europe: Reported revenue €2.18B ($2.31B at avg rate 1.06 USD/EUR), up 14% in euros but only 9% in USD due to euro weakness (prior year avg rate was 1.11 USD/EUR). Asia-Pacific: Reported ¥412B ($2.78B at avg rate 148.1 JPY/USD), up 19% in yen but down 2% in USD — the yen depreciated from 128.4 to 148.1 JPY/USD. Japan alone was ¥298B ($2.01B), up 23% in local currency but flat in USD. Rest of World: $680M, up 8%. Total reported revenue: $12.01B, up 15.2% YoY. On a constant-currency basis (using FY2024 exchange rates), total revenue would have been $12.68B, representing 21.7% growth. FX headwind to reported revenue: approximately $670M or 5.6 percentage points of growth.
{
"fiscal_year": "FY2025",
"total_revenue_reported_billions": 12.01,
"total_revenue_constant_currency_billions": 12.68,
"reported_growth_pct": 15.2,
"constant_currency_growth_pct": 21.7,
"fx_headwind_millions": 670,
"fx_headwind_growth_points": 5.6,
"regions": [
{
"region": "Americas",
"reported_usd_billions": 6.24,
"yoy_reported_pct": 22
},
{
"region": "Europe",
"local_currency": "EUR",
"local_revenue_billions": 2.18,
"reported_usd_billions": 2.31,
"avg_fx_rate": 1.06,
"prior_year_fx_rate": 1.11,
"yoy_local_pct": 14,
"yoy_usd_pct": 9
},
{
"region": "Asia-Pacific",
"local_currency": "JPY",
"local_revenue_billions_jpy": 412,
"reported_usd_billions": 2.78,
"avg_fx_rate": 148.1,
"prior_year_fx_rate": 128.4,
"yoy_local_pct": 19,
"yoy_usd_pct": -2
},
{
"region": "Rest of World",
"reported_usd_millions": 680,
"yoy_reported_pct": 8
}
],
"japan_subset.local_revenue_billions_jpy": 298,
"japan_subset.reported_usd_billions": 2.01,
"japan_subset.yoy_local_pct": 23,
"japan_subset.yoy_usd_pct": 0
}Incorrect/missing fields: regions
Incorrect/missing fields: regions
CONSOLIDATED STATEMENT OF CASH FLOWS — FY2025 (in millions). OPERATING ACTIVITIES: Net income $2,890. Adjustments: Depreciation and amortization $1,420. Stock-based compensation $680. Deferred income taxes ($210). Changes in working capital: Accounts receivable ($340). Inventories $85. Accounts payable $220. Deferred revenue $190. Accrued expenses ($65). Net cash from operating activities $4,870. INVESTING ACTIVITIES: Capital expenditures ($2,180). Acquisitions, net of cash acquired ($3,400). Purchases of short-term investments ($2,800). Maturities of short-term investments $1,950. Proceeds from sale of assets $120. Net cash used in investing activities ($6,310). FINANCING ACTIVITIES: Proceeds from issuance of debt $3,000. Repayment of debt ($1,500). Share repurchases ($2,200). Dividends paid ($580). Proceeds from employee stock plans $340. Net cash used in financing activities ($940). Effect of exchange rate changes ($85). NET DECREASE IN CASH ($2,465). Cash at beginning of period $5,885. Cash at end of period $3,420.
{
"fiscal_year": "FY2025",
"operating_cash_flow": 4870,
"investing_cash_flow": -6310,
"financing_cash_flow": -940,
"fx_effect": -85,
"net_change_in_cash": -2465,
"beginning_cash": 5885,
"ending_cash": 3420,
"free_cash_flow": 2690,
"key_items.net_income": 2890,
"key_items.depreciation_amortization": 1420,
"key_items.sbc": 680,
"key_items.capex": -2180,
"key_items.acquisitions": -3400,
"key_items.share_repurchases": -2200,
"key_items.dividends": -580,
"key_items.debt_issued": 3000,
"key_items.debt_repaid": -1500,
"cash_flow_check": true
}All fields correct
All fields correct
SERIES B CONVERTIBLE NOTE TERM SHEET — Acme AI, Inc. Principal Amount: $25,000,000. Issuance Date: January 15, 2026. Maturity Date: January 15, 2028 (24 months). Interest Rate: 6% per annum, simple interest, compounded annually, payable at conversion or maturity (not current-pay). Conversion Discount: 25% discount to the price per share in the Next Qualified Financing (minimum $50M raise). Valuation Cap: $200,000,000 pre-money. Conversion Mechanics: Notes convert at the LOWER of (a) the Valuation Cap price or (b) the Discount Price. If no Qualified Financing occurs by maturity, noteholder may elect: (i) conversion at the Valuation Cap, (ii) repayment of principal plus accrued interest, or (iii) extension for 12 months at 8% interest. Anti-Dilution: Broad-based weighted average adjustment. Most Favored Nation: If the Company issues subsequent convertible instruments with more favorable terms (lower cap or higher discount), these notes shall automatically adjust to the more favorable terms. Pro Rata Rights: Noteholders have the right to participate in the Next Qualified Financing for up to 2× their original investment. Information Rights: Monthly financial statements, annual audited financials, and board observer seat for investors holding >$5M in notes.
{
"instrument": "Series B Convertible Note",
"company": "Acme AI, Inc.",
"principal_amount": 25000000,
"issuance_date": "2026-01-15",
"maturity_date": "2028-01-15",
"term_months": 24,
"interest_rate_pct": 6,
"interest_type": "Simple, compounded annually",
"interest_payment": "At conversion or maturity (not current-pay)",
"conversion_discount_pct": 25,
"qualified_financing_minimum": 50000000,
"valuation_cap": 200000000,
"conversion_price": "Lower of cap price or discount price",
"maturity_options": [
{
"option": "Convert at valuation cap"
},
{
"option": "Repayment of principal plus accrued interest"
},
{
"option": "Extension for 12 months at 8% interest"
}
],
"anti_dilution": "Broad-based weighted average",
"most_favored_nation": true,
"mfn_triggers": [
"Lower valuation cap",
"Higher discount"
],
"pro_rata_rights.available": true,
"pro_rata_rights.participation_limit": "2× original investment",
"information_rights.monthly": "Financial statements",
"information_rights.annual": "Audited financials",
"information_rights.board_observer": true,
"information_rights.board_observer_threshold": 5000000
}Incorrect/missing fields: mfn_triggers, interest_type, conversion_price, information_rights.monthly
All fields correct
Extract ONLY the third item from the following list IF its price is greater than $100. Format all field names in camelCase. If the condition is not met, return {"result": null, "reason": "condition_not_met"}.
Items:
1. Widget Alpha — Price: $45.00 — Category: Hardware — In Stock: Yes
2. Widget Beta — Price: $220.00 — Category: Software — In Stock: No
3. Widget Gamma — Price: $189.50 — Category: Hardware — In Stock: Yes
4. Widget Delta — Price: $67.00 — Category: Electronics — In Stock: Yes{
"result.itemName": "Widget Gamma",
"result.price": 189.5,
"result.category": "Hardware",
"result.inStock": true,
"conditionMet": true,
"extractedIndex": 3
}All fields correct
All fields correct
Flatten the following nested JSON into a single-level object. When key conflicts arise (same key name at different nesting levels), use dot notation with the full path. Preserve original data types. Do NOT flatten arrays — keep them as-is.
{"user": {"id": 42, "name": "Alice", "address": {"city": "Portland", "state": "OR", "zip": "97201"}}, "order": {"id": 1087, "items": [{"sku": "A1", "qty": 2}, {"sku": "B3", "qty": 1}], "total": 299.99, "address": {"city": "Seattle", "state": "WA", "zip": "98101"}}, "id": 999}{
"user.id": 42,
"user.name": "Alice",
"user.address.city": "Portland",
"user.address.state": "OR",
"user.address.zip": "97201",
"order.id": 1087,
"order.items": [
{
"sku": "A1",
"qty": 2
},
{
"sku": "B3",
"qty": 1
}
],
"order.total": 299.99,
"order.address.city": "Seattle",
"order.address.state": "WA",
"order.address.zip": "98101",
"id": 999
}All fields correct
All fields correct
Calculate the number of BUSINESS DAYS between February 14, 2026 (Saturday) and March 6, 2026 (Friday), INCLUSIVE of both dates if they fall on business days. Exclude weekends (Saturday, Sunday) and the following US holidays: Presidents' Day (February 16, 2026 — Monday). Show your work by listing each business day.
{
"start_date": "2026-02-14",
"end_date": "2026-03-06",
"start_is_business_day": false,
"end_is_business_day": true,
"holidays_excluded": [
"2026-02-16"
],
"business_days": [
"2026-02-17",
"2026-02-18",
"2026-02-19",
"2026-02-20",
"2026-02-23",
"2026-02-24",
"2026-02-25",
"2026-02-26",
"2026-02-27",
"2026-03-02",
"2026-03-03",
"2026-03-04",
"2026-03-05",
"2026-03-06"
],
"total_business_days": 14
}All fields correct
All fields correct
Migrate the following V1 data to V2 schema. V2 rules: (1) 'full_name' splits into 'firstName' and 'lastName'. (2) 'age' is REMOVED — compute 'birthYear' from age assuming current year is 2026. (3) 'email' stays but add 'emailDomain'. (4) 'tags' array moves under a new 'metadata' object. (5) Add 'schemaVersion': 2. (6) For backward compatibility: keep 'full_name' as 'deprecated_fullName' and 'age' as 'deprecated_age'.
V1 Data: {"full_name": "Dr. Maria Santos-Garcia", "age": 34, "email": "maria.sg@tufts-medical.edu", "tags": ["cardiology", "research", "AI/ML"], "role": "Senior Fellow"}{
"schemaVersion": 2,
"firstName": "Maria",
"lastName": "Santos-Garcia",
"email": "maria.sg@tufts-medical.edu",
"emailDomain": "tufts-medical.edu",
"birthYear": 1992,
"role": "Senior Fellow",
"metadata.tags": [
"cardiology",
"research",
"AI/ML"
],
"deprecated_fullName": "Dr. Maria Santos-Garcia",
"deprecated_age": 34
}All fields correct
All fields correct
Given this org tree, compute for EACH node: (1) total_reports = count of ALL descendants (direct + indirect), (2) total_salary_below = sum of all descendant salaries, (3) max_depth = deepest level below this node (leaf = 0). Return the tree with computed fields added.
{"name": "CEO Jane", "salary": 450000, "reports": [{"name": "VP Engineering Bob", "salary": 320000, "reports": [{"name": "Dir Platform Carol", "salary": 250000, "reports": [{"name": "Senior Eng Dan", "salary": 185000, "reports": []}, {"name": "Senior Eng Eve", "salary": 180000, "reports": [{"name": "Eng Intern Frank", "salary": 65000, "reports": []}]}]}, {"name": "Dir ML Grace", "salary": 260000, "reports": [{"name": "ML Eng Hank", "salary": 195000, "reports": []}]}]}, {"name": "VP Sales Iris", "salary": 310000, "reports": [{"name": "Dir Enterprise Jack", "salary": 240000, "reports": []}]}]}{
"name": "CEO Jane",
"salary": 450000,
"total_reports": 9,
"total_salary_below": 2005000,
"max_depth": 4,
"reports": [
{
"name": "VP Engineering Bob",
"salary": 320000,
"total_reports": 6,
"total_salary_below": 1135000,
"max_depth": 3,
"reports": [
{
"name": "Dir Platform Carol",
"salary": 250000,
"total_reports": 3,
"total_salary_below": 430000,
"max_depth": 2,
"reports": [
{
"name": "Senior Eng Dan",
"salary": 185000,
"total_reports": 0,
"total_salary_below": 0,
"max_depth": 0,
"reports": []
},
{
"name": "Senior Eng Eve",
"salary": 180000,
"total_reports": 1,
"total_salary_below": 65000,
"max_depth": 1,
"reports": [
{
"name": "Eng Intern Frank",
"salary": 65000,
"total_reports": 0,
"total_salary_below": 0,
"max_depth": 0,
"reports": []
}
]
}
]
},
{
"name": "Dir ML Grace",
"salary": 260000,
"total_reports": 1,
"total_salary_below": 195000,
"max_depth": 1,
"reports": [
{
"name": "ML Eng Hank",
"salary": 195000,
"total_reports": 0,
"total_salary_below": 0,
"max_depth": 0,
"reports": []
}
]
}
]
},
{
"name": "VP Sales Iris",
"salary": 310000,
"total_reports": 1,
"total_salary_below": 240000,
"max_depth": 1,
"reports": [
{
"name": "Dir Enterprise Jack",
"salary": 240000,
"total_reports": 0,
"total_salary_below": 0,
"max_depth": 0,
"reports": []
}
]
}
]
}All fields correct
All fields correct
Grok 4 is 40% cheaper than Claude Opus 4.6
as accurate (0.1pt gap)
Accuracy gap is under 2%. Use Grok 4 to save 40% with virtually no quality loss.
Tasks with the largest score gap between models - showing the actual outputs and expected values side by side.
Claude Opus 4.6 scored 88/100 vs Grok 4's 98/100
Sony WH-1000XM5 ワイヤレスノイズキャンセリングヘッドホン ブラック — Bluetooth 5.3 / LDAC — 30hr Battery — 100V-240V AC Adapter — JAN: 4548736132351 — Japan Domestic Model
“Incorrect/missing fields: type, color”
“All fields correct”
| Field | Expected | Claude Opus 4.6 | Grok 4 |
|---|---|---|---|
| brand | Sony | ✓Sony | ✓Sony |
| model | WH-1000XM5 | ✓WH-1000XM5 | ✓WH-1000XM5 |
| type | Wireless Noise Cancelling Headphones | ✗ワイヤレスノイズキャンセリングヘッドホン | ✓Wireless Noise Cancelling Headphones |
| color | Black | ✗ブラック | ✓Black |
| bluetooth_version | 5.3 | ✓5.3 | ✓5.3 |
| audio_codec | LDAC | ✓LDAC | ✓LDAC |
| battery_hours | 30 | ✓30 | ✓30 |
| voltage_range | 100V-240V | ✓100V-240V | ✓100V-240V |
| jan_code | 4548736132351 | ✓4548736132351 | ✓4548736132351 |
| region | Japan Domestic | ✓Japan | ✓Japan |
The Japanese text ワイヤレスノイズキャンセリングヘッドホン must be translated to 'Wireless Noise Cancelling Headphones'. Color 'ブラック' = 'Black'. battery_hours must be integer. jan_code must be string (not number — leading zeros matter).
Grok 4 scored 92/100 vs Claude Opus 4.6's 97/100
Nike Air Max 97 OG QS "Metallic Gold" Bullet — Titanium Violet / Varsity Red — Men's US 10.5 — 2018 Retro Release
“All fields correct”
“Incorrect/missing fields: edition”
| Field | Expected | Claude Opus 4.6 | Grok 4 |
|---|---|---|---|
| brand | Nike | ✓Nike | ✓Nike |
| model | Air Max 97 OG QS | ✓Air Max 97 OG QS | ✓Air Max 97 |
| colorway | Metallic Gold | ✓Metallic Gold | ✓Metallic Gold Bullet |
| color_1 | Titanium Violet | ✓Titanium Violet | ✓Titanium Violet |
| color_2 | Varsity Red | ✓Varsity Red | ✓Varsity Red |
| size | US 10.5 | ✓10.5 | ✓US 10.5 |
| gender | Men's | ✓Men | ✓Men's |
| year | 2018 | ✓2018 | ✓2018 |
| edition | Retro Release | ✓Retro Release | ✗OG QS |
All 9 fields must be exact. 'Titanium Violet' must be extracted as a single compound color, NOT split into material + color. 'Metallic Gold' is the colorway nickname, not color_1.
Claude Opus 4.6 scored 60/100 vs Grok 4's 65/100
MEDICATION RECONCILIATION — Inpatient Day 3. Current Orders: (1) Warfarin 5mg PO QD (for AFib — INR target 2.0-3.0, last INR 2.4 on 02/13). (2) Metoprolol Succinate 100mg PO QD. (3) Diltiazem ER 240mg PO QD. (4) Amiodarone 200mg PO QD (started 02/12 for rate-refractory AFib). (5) Fluconazole 400mg IV QD (Day 5 of 14 for candidemia). (6) Enoxaparin 80mg SQ Q12H (DVT prophylaxis — started before warfarin was therapeutic). (7) Acetaminophen 1000mg PO Q6H PRN. (8) Omeprazole 40mg PO QD. Home Medications NOT continued: Aspirin 81mg QD (held due to warfarin + enoxaparin = triple antithrombotic risk).
“Incorrect/missing fields: potential_interactions, therapeutic_duplications”
“Incorrect/missing fields: potential_interactions, therapeutic_duplications”
| Field | Expected | Claude Opus 4.6 | Grok 4 |
|---|---|---|---|
| active_medications | [{"name":"Warfarin","dose":"5mg","route":"PO","frequency":"QD","indication":"AFib"},{"name":"Metoprolol Succinate","dose":"100mg","route":"PO","frequency":"QD"},{"name":"Diltiazem ER","dose":"240mg","route":"PO","frequency":"QD"},{"name":"Amiodarone","dose":"200mg","route":"PO","frequency":"QD","indication":"Rate-refractory AFib","start_date":"02/12"},{"name":"Fluconazole","dose":"400mg","route":"IV","frequency":"QD","indication":"Candidemia","day":5,"total_days":14},{"name":"Enoxaparin","dose":"80mg","route":"SQ","frequency":"Q12H","indication":"DVT prophylaxis"},{"name":"Acetaminophen","dose":"1000mg","route":"PO","frequency":"Q6H","prn":true},{"name":"Omeprazole","dose":"40mg","route":"PO","frequency":"QD"}] | ✓[{"name":"Warfarin","dose":"5mg","route":"PO","frequency":"QD","indication":"Atrial fibrillation (INR target 2.0-3.0, last INR 2.4 on 02/13)"},{"name":"Metoprolol Succinate","dose":"100mg","route":"PO","frequency":"QD","indication":"Rate control for atrial fibrillation"},{"name":"Diltiazem ER","dose":"240mg","route":"PO","frequency":"QD","indication":"Rate control for atrial fibrillation"},{"name":"Amiodarone","dose":"200mg","route":"PO","frequency":"QD","indication":"Rate-refractory atrial fibrillation (started 02/12)"},{"name":"Fluconazole","dose":"400mg","route":"IV","frequency":"QD","indication":"Candidemia (Day 5 of 14)"},{"name":"Enoxaparin","dose":"80mg","route":"SQ","frequency":"Q12H","indication":"DVT prophylaxis (bridging - started before warfarin was therapeutic)"},{"name":"Acetaminophen","dose":"1000mg","route":"PO","frequency":"Q6H PRN","indication":"Pain/fever management"},{"name":"Omeprazole","dose":"40mg","route":"PO","frequency":"QD","indication":"Gastric acid suppression/GI prophylaxis"}] | ✓[{"name":"Warfarin","dose":"5mg","route":"PO","frequency":"QD","indication":"for AFib"},{"name":"Metoprolol Succinate","dose":"100mg","route":"PO","frequency":"QD","indication":"for AFib rate control"},{"name":"Diltiazem ER","dose":"240mg","route":"PO","frequency":"QD","indication":"for AFib rate control"},{"name":"Amiodarone","dose":"200mg","route":"PO","frequency":"QD","indication":"for rate-refractory AFib"},{"name":"Fluconazole","dose":"400mg","route":"IV","frequency":"QD","indication":"for candidemia"},{"name":"Enoxaparin","dose":"80mg","route":"SQ","frequency":"Q12H","indication":"DVT prophylaxis"},{"name":"Acetaminophen","dose":"1000mg","route":"PO","frequency":"Q6H PRN","indication":"pain or fever"},{"name":"Omeprazole","dose":"40mg","route":"PO","frequency":"QD","indication":"acid suppression"}] |
| held_medications | [{"name":"Aspirin","dose":"81mg","reason":"Triple antithrombotic risk with warfarin + enoxaparin"}] | ✓[{"name":"Aspirin","dose":"81mg","reason":"Held due to concurrent warfarin + enoxaparin creating triple antithrombotic risk with excessive bleeding concern"}] | ✓[{"name":"Aspirin","dose":"81mg","reason":"held due to warfarin + enoxaparin = triple antithrombotic risk"}] |
| potential_interactions | [{"drugs":["Warfarin","Fluconazole"],"severity":"Major","effect":"Fluconazole inhibits CYP2C9, significantly increasing warfarin levels and bleeding risk"},{"drugs":["Warfarin","Amiodarone"],"severity":"Major","effect":"Amiodarone inhibits CYP2C9/1A2/3A4, increasing warfarin effect — typical dose reduction 30-50%"},{"drugs":["Metoprolol","Diltiazem"],"severity":"Major","effect":"Additive AV nodal blockade — risk of severe bradycardia or heart block"},{"drugs":["Warfarin","Enoxaparin"],"severity":"High","effect":"Dual anticoagulation increases bleeding risk"}] | ✗[{"drugs":["Fluconazole","Warfarin"],"severity":"Major","effect":"Fluconazole is a potent CYP2C9 inhibitor and significantly increases warfarin levels, raising INR and risk of serious bleeding. Requires close INR monitoring and likely warfarin dose reduction."},{"drugs":["Amiodarone","Warfarin"],"severity":"Major","effect":"Amiodarone inhibits CYP2C9 and CYP3A4, significantly increasing warfarin effect and INR. Combined with fluconazole, this creates compounded risk of supratherapeutic INR. Warfarin dose reduction of 30-50% typically recommended."},{"drugs":["Warfarin","Enoxaparin"],"severity":"Major","effect":"Concurrent use of two anticoagulants substantially increases bleeding risk. Enoxaparin should be discontinued once INR is consistently therapeutic (2.0-3.0)."},{"drugs":["Metoprolol","Diltiazem"],"severity":"Major","effect":"Both are AV nodal blocking agents. Combined use increases risk of severe bradycardia, heart block, and hypotension. Requires close cardiac monitoring."},{"drugs":["Amiodarone","Metoprolol"],"severity":"Moderate","effect":"Amiodarone added to metoprolol increases risk of bradycardia, AV block, and hemodynamic compromise due to additive negative chronotropic and dromotropic effects."},{"drugs":["Amiodarone","Diltiazem"],"severity":"Moderate","effect":"Both agents slow AV nodal conduction; combined use increases risk of bradycardia and AV block."},{"drugs":["Omeprazole","Warfarin"],"severity":"Moderate","effect":"Omeprazole may inhibit CYP2C19 and potentially increase warfarin levels, though clinical significance is variable. Monitor INR."},{"drugs":["Amiodarone","Fluconazole"],"severity":"Moderate","effect":"Both drugs can prolong the QT interval; concurrent use increases the risk of QT prolongation and torsades de pointes. ECG monitoring recommended."}] | ✗[{"drugs":["Warfarin","Amiodarone"],"severity":"Major","effect":"Amiodarone potentiates the anticoagulant effect of warfarin, increasing the risk of bleeding."},{"drugs":["Warfarin","Fluconazole"],"severity":"Major","effect":"Fluconazole increases the anticoagulant effect of warfarin by inhibiting its metabolism."},{"drugs":["Warfarin","Enoxaparin"],"severity":"Major","effect":"Concurrent use increases the risk of bleeding due to additive anticoagulant effects."},{"drugs":["Amiodarone","Metoprolol Succinate"],"severity":"Moderate","effect":"Additive bradycardic effects, may lead to hypotension or heart block."},{"drugs":["Amiodarone","Diltiazem ER"],"severity":"Moderate","effect":"Additive effects on AV conduction, risk of bradycardia."},{"drugs":["Metoprolol Succinate","Diltiazem ER"],"severity":"Moderate","effect":"Additive negative effects on heart rate and contractility."},{"drugs":["Warfarin","Acetaminophen"],"severity":"Moderate","effect":"High doses of acetaminophen may increase INR and bleeding risk."}] |
| therapeutic_duplications | [{"category":"Rate control","drugs":["Metoprolol","Diltiazem","Amiodarone"],"note":"Triple rate control — assess need for all three"}] | ✗[{"category":"AV Nodal Blocking Agents / Rate Control","drugs":["Metoprolol Succinate","Diltiazem ER","Amiodarone"],"note":"Three agents with AV nodal blocking and rate-controlling properties are prescribed concurrently. This significantly increases the risk of severe bradycardia and heart block. Consider whether all three are necessary or if one can be tapered/discontinued."},{"category":"Anticoagulants","drugs":["Warfarin","Enoxaparin"],"note":"Two anticoagulants are active simultaneously. Enoxaparin was started as bridging therapy before warfarin reached therapeutic INR (last INR 2.4 is now in range). Assess for discontinuation of enoxaparin if INR remains consistently therapeutic."}] | ✗[{"category":"Anticoagulants","drugs":["Warfarin","Enoxaparin"],"note":"Used for bridging therapy until warfarin is therapeutic, but increases bleeding risk."},{"category":"Rate control agents","drugs":["Metoprolol Succinate","Diltiazem ER","Amiodarone"],"note":"Multiple agents for rate control in AFib; monitor for excessive bradycardia."}] |
Must identify ALL 4 major drug interactions. The Warfarin-Fluconazole CYP2C9 interaction is frequently missed by smaller models. Metoprolol+Diltiazem dual AV nodal blockade must be flagged. Aspirin is HELD, not active. Acetaminophen must be marked PRN.
Grok 4 scored 81/100 vs Claude Opus 4.6's 84/100
MASTER SERVICES AGREEMENT dated January 15, 2026. Section 8.2 (Limitation of Liability): Except for Vendor's indemnification obligations under Section 9.1 and breaches of Section 12 (Confidentiality), in no event shall either party's aggregate liability under this Agreement exceed the greater of (a) the total fees paid or payable by Client in the twelve (12) month period immediately preceding the event giving rise to the claim, or (b) Five Hundred Thousand Dollars ($500,000). Notwithstanding the foregoing, Vendor's aggregate liability for indemnification claims under Section 9.1 shall not exceed Two Million Dollars ($2,000,000). Section 8.3: IN NO EVENT SHALL EITHER PARTY BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES, REGARDLESS OF THE CAUSE OF ACTION OR THEORY OF LIABILITY, EVEN IF SUCH PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. The foregoing limitation shall not apply to breaches of Section 12 (Confidentiality) or willful misconduct.
“Incorrect/missing fields: general_liability_cap, general_liability_basis”
“Incorrect/missing fields: general_liability_cap, general_liability_basis”
| Field | Expected | Claude Opus 4.6 | Grok 4 |
|---|---|---|---|
| general_liability_cap | $500,000 | ✗The greater of (a) total fees paid or payable by Client in the twelve (12) month period immediately preceding the event giving rise to the claim, or (b) Five Hundred Thousand Dollars ($500,000) | ✗the greater of (a) the total fees paid or payable by Client in the twelve (12) month period immediately preceding the event giving rise to the claim, or (b) Five Hundred Thousand Dollars ($500,000) |
| general_liability_basis | Greater of 12-month fees or $500,000 | ✗Trailing 12-month fees paid or payable, or $500,000 floor (whichever is greater) | ✗aggregate |
| indemnification_cap | $2,000,000 | ✓Two Million Dollars ($2,000,000) | ✓Two Million Dollars ($2,000,000) |
| indemnification_section | 9.1 | ✓Section 9.1 | ✓9.1 |
| consequential_damages_excluded | true | ✓true | ✓true |
| consequential_damages_exceptions | ["Confidentiality breaches","Willful misconduct"] | ✓["Breaches of Section 12 (Confidentiality)","Willful misconduct"] | ✓["breaches of Section 12 (Confidentiality)","willful misconduct"] |
| liability_cap_exceptions | ["Indemnification under Section 9.1","Confidentiality under Section 12"] | ✓["Vendor's indemnification obligations under Section 9.1","Breaches of Section 12 (Confidentiality)"] | ✓["Vendor's indemnification obligations under Section 9.1","breaches of Section 12 (Confidentiality)"] |
Must correctly distinguish the GENERAL liability cap ($500,000) from the INDEMNIFICATION cap ($2,000,000). Must identify both exceptions to the consequential damages waiver. Models often conflate the two caps or miss the 'notwithstanding' override.
Grok 4 scored 89/100 vs Claude Opus 4.6's 92/100
Patient: Female, 67yo, presents to ED with acute onset chest pain (started 2 hours ago, 8/10 severity, substernal, radiating to left arm), on a background of chronic stable angina (diagnosed 2019, managed with isosorbide mononitrate 60mg QD). History of Type 2 DM (HbA1c 7.2%, on metformin 1000mg BID + empagliflozin 25mg QD), hypertension (BP 168/94 on arrival, home meds: amlodipine 10mg + lisinopril 40mg), and previous NSTEMI (March 2023). Allergies: Aspirin (urticaria), Atorvastatin (myalgia — switched to rosuvastatin 20mg). Current vitals: HR 96 bpm, RR 22, SpO2 94% on RA, Temp 36.8°C.
“Incorrect/missing fields: chronic_conditions”
“Incorrect/missing fields: chronic_conditions, chief_complaint”
| Field | Expected | Claude Opus 4.6 | Grok 4 |
|---|---|---|---|
| demographics.sex | Female | ✓Female | ✓Female |
| demographics.age | 67 | ✓67 | ✓67 |
| chief_complaint | Acute onset chest pain | ✓Acute onset chest pain | ✗chest pain |
| presenting_symptoms | [{"symptom":"Chest pain","onset":"Acute","duration":"2 hours","severity":"8/10","location":"Substernal","radiation":"Left arm"}] | ✓[{"symptom":"Chest pain","onset":"2 hours ago","duration":"2 hours","severity":"8/10","location":"Substernal","radiation":"Left arm"}] | ✓[{"symptom":"chest pain","onset":"started 2 hours ago","duration":"2 hours","severity":"8/10","location":"substernal","radiation":"to left arm"}] |
| chronic_conditions | [{"condition":"Chronic stable angina","diagnosed":"2019","status":"Managed"},{"condition":"Type 2 Diabetes Mellitus","marker":"HbA1c 7.2%"},{"condition":"Hypertension","current_bp":"168/94"},{"condition":"Previous NSTEMI","date":"March 2023"}] | ✗[{"condition":"Chronic stable angina","diagnosed":"2019","status":"Managed with isosorbide mononitrate"},{"condition":"Type 2 Diabetes Mellitus","diagnosed":"Unknown","status":"HbA1c 7.2%, managed with metformin and empagliflozin"},{"condition":"Hypertension","diagnosed":"Unknown","status":"BP 168/94 on arrival, on amlodipine and lisinopril"},{"condition":"Previous NSTEMI","diagnosed":"March 2023","status":"History of prior event"}] | ✗[{"condition":"chronic stable angina","diagnosed":"2019","status":"stable"},{"condition":"Type 2 diabetes mellitus","diagnosed":"unknown","status":"managed"},{"condition":"hypertension","diagnosed":"unknown","status":"managed"},{"condition":"NSTEMI","diagnosed":"March 2023","status":"previous"}] |
| medications | [{"name":"Isosorbide mononitrate","dose":"60mg","frequency":"QD","indication":"Angina"},{"name":"Metformin","dose":"1000mg","frequency":"BID","indication":"Diabetes"},{"name":"Empagliflozin","dose":"25mg","frequency":"QD","indication":"Diabetes"},{"name":"Amlodipine","dose":"10mg","frequency":"QD","indication":"Hypertension"},{"name":"Lisinopril","dose":"40mg","frequency":"QD","indication":"Hypertension"},{"name":"Rosuvastatin","dose":"20mg","frequency":"QD","indication":"Cholesterol"}] | ✓[{"name":"Isosorbide mononitrate","dose":"60mg","frequency":"QD","indication":"Chronic stable angina"},{"name":"Metformin","dose":"1000mg","frequency":"BID","indication":"Type 2 Diabetes Mellitus"},{"name":"Empagliflozin","dose":"25mg","frequency":"QD","indication":"Type 2 Diabetes Mellitus"},{"name":"Amlodipine","dose":"10mg","frequency":"QD","indication":"Hypertension"},{"name":"Lisinopril","dose":"40mg","frequency":"QD","indication":"Hypertension"},{"name":"Rosuvastatin","dose":"20mg","frequency":"QD","indication":"Hyperlipidemia (switched from atorvastatin)"}] | ✓[{"name":"isosorbide mononitrate","dose":"60mg","frequency":"QD","indication":"angina"},{"name":"metformin","dose":"1000mg","frequency":"BID","indication":"Type 2 diabetes mellitus"},{"name":"empagliflozin","dose":"25mg","frequency":"QD","indication":"Type 2 diabetes mellitus"},{"name":"amlodipine","dose":"10mg","frequency":"QD","indication":"hypertension"},{"name":"lisinopril","dose":"40mg","frequency":"QD","indication":"hypertension"},{"name":"rosuvastatin","dose":"20mg","frequency":"QD","indication":"dyslipidemia"}] |
| allergies | [{"drug":"Aspirin","reaction":"Urticaria"},{"drug":"Atorvastatin","reaction":"Myalgia","note":"Switched to rosuvastatin"}] | ✓[{"drug":"Aspirin","reaction":"Urticaria"},{"drug":"Atorvastatin","reaction":"Myalgia"}] | ✓[{"drug":"Aspirin","reaction":"urticaria"},{"drug":"Atorvastatin","reaction":"myalgia"}] |
| vitals.hr_bpm | 96 | ✓96 | ✓96 |
| vitals.rr | 22 | ✓22 | ✓22 |
| vitals.spo2_percent | 94 | ✓94 | ✓94 |
| vitals.temp_celsius | 36.8 | ✓36.8 | ✓36.8 |
Chest pain is ACUTE (2 hours), angina is CHRONIC (2019). Models must not merge these. All 6 medications must be extracted with correct frequency. Atorvastatin is an ALLERGY not a current medication. Rosuvastatin is the replacement. Vitals must be numeric.
These benchmarks use generic tasks. Your use case is unique. Upload your own dataset and compare any of 50+ models with field-level accuracy scoring.
Start Free TrialSwitch to any head-to-head matchup, or browse the full leaderboard.
Pick a task, hit Run, and see both models respond in real time with accuracy scores.
Extract product attributes from a title where a material name overlaps with a color name.
Nike Air Max 97 OG QS "Metallic Gold" Bullet — Titanium Violet / Varsity Red — Men's US 10.5 — 2018 Retro Release