Given the following image and a prompt, will a browser chatbot be able to identify every note in the soprano, alto, tenor, and bass parts?

Needs to be correct:
pitch class (if letters are given without sharp/flat, "natural" will be assumed regardless of key signature)
duration
order of notes
Does not need to be specified/correct:
key signature, time signature
articulations, slurs, barlines, clefs, chords
octaves of note pitches
temporal positions of notes
format/preface/conclusion text
It can use any method for specifying pitch and duration as long as it's unambiguous to map pitch to one of 12 semitone options, and map duration to a real number.
Burden of proof will be on YES-holders. I'm happy to hop on a call to witness a YES-worthy setup. Non-free tiers allowed. Must be a browser-based, US public, general-purpose chatbot. May use up to 4 prompts on up to 4 sessions. No second chances for a demonstrator within 30 days.
I'll accept suggestions (but may veto) prompts and models to try. Otherwise in 2027 I'll just use this prompt on the best free chatgpt, gemini, and claude, filling in each of the 4 part names:
For this line of music, please list the notes/durations of the _______ part.
I won't bet on this market.
Update 2026-01-19 (PST) (AI summary of creator comment): Chatbots are allowed to call other tools and libraries that are available to US public general purpose browser chatbots.
Update 2026-01-29 (PST) (AI summary of creator comment): No installations on the user's computer allowed, but if the US public general purpose browser chatbot can install tools on its end and use those tools, all within the 4 prompt limit, this is allowed.
People are also trading
This is way overvalued.
The musical OCR tools I've tried need laborious manual interventions to correct symbols or re-associate notes with the correct voice.
Without musical OCR, the chatbots I've tried just sort of make up notes in the right key. This was also the state of affairs at the start of 2025 when I tried similar things. And there's little commercial incentive in 2026 to RL on sheet music.
Not only that, but this particular format where 4 parts are compressed into a single grand staff is somewhat uncommon, and make for confusing MusicXML representations. Even if an OCR works perfectly, the MusicXML output will use a technical concept of "voice" that is about whether note heads share a stem rather than whether e.g. tenor or bass ought to sing it. If I provide the correct MusicXML and task the coding agents to find SATB, they struggle to write a script to accomplish this robustly, nor is their judgment good enough to just wing it and pull everything from the XML in the right order (and I sympathize, MusicXML is super annoying for humans to read too).
I think there's slim chance this resolves yes, and if it does, it's probably because we saturate visual recognition benchmarks generally.

whenever ChatGPT opens up Python to write custom image inspection code it's always a bad sign

One the one hand, it's awesome that this is possible. On the other hand... if you have to write a custom clustering algorithm to read sheet music, you're NGMI
@CraigDemel I'll stick with generally intelligent chatbots. I understand that could be LLMs, or a interface composing LLM with image model and tools, or maybe evolve away from LLM paradigms somehow in 2026
@CraigDemel (On a personal note, I'd be curious about non-chatbot tools that could reliably get the job done too)
@kenakofer May I ask if prompting a browser-based chatbot with computer use to explicitly install, say, https://github.com/aashrafh/Mozart and convert the image to MIDI count for a YES resolution?
@Twig No installations on the user's computer allowed, but if the US public general purpose browser chatbot can install tools on its end, and use those tools, all within the 4 prompt limit, I'll allow it.