In an era where AI-driven UI automation is becoming increasingly sophisticated, Microsoft’s OmniParser v2.0 emerges as a game-changer for enhancing LLM-based UI agents. This powerful model converts raw UI screenshots into structured data, enabling AI models to meaningfully interpret and interact with digital interfaces. By leveraging a fine-tuned YOLOv8 model for interactable icon detection and a Florence-2 base model for icon descriptions, OmniParser ensures high accuracy in extracting actionable elements from a screen. Whether you’re developing an intelligent automation system, building accessibility tools, or optimizing UI testing workflows, OmniParser provides the structured insights needed to bridge the gap between visual interfaces and AI-driven decision-making.
In this guide, we’ll walk you through the steps to install OmniParser v2.0 locally or on Cloud GPU, and we’ll also launch it via Gradio so you can access and test its capabilities for your next project.
Prerequisites
The minimum system requirements for this use case are:
- GPUs: RTX 4090 or RTX A6000
- Disk Space: 100 GB
- RAM: At least 16 GB.
- Nvidia Cuda installed.
Note: The prerequisites for this are highly variable across use cases. A high-end configuration could be used for a large-scale deployment.
Step-by-step process to install and run and run Microsoft’s OmniParser-v2
For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by NodeShift since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.
Step 1: Setting up a NodeShift Account
Visit app.nodeshift.com and create an account by filling in basic details, or continue signing up with your Google/GitHub account.
If you already have an account, login straight to your dashboard.
Step 2: Create a GPU Node
After accessing your account, you should see a dashboard (see image), now:
- Navigate to the menu on the left side.
- Click on the GPU Nodes option.
- Click on Start to start creating your very first GPU node.
These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.
Step 3: Selecting configuration for GPU (model, region, storage)
- For this tutorial, we’ll be using the RTX 4090 GPU; however, you can choose any GPU of your choice based on your needs.
- Similarly, we’ll opt for 200GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.
Step 4: Choose GPU Configuration and Authentication method
- After selecting your required configuration options, you’ll see the available VMs in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x RTX 4090 GPU node with 12 vCPUs/96GB RAM/200 GB SSD.
2. Next, you’ll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our official documentation.
Step 5: Choose an Image
The final step would be to choose an image for the VM, which in our case is Nvidia Cuda, where we’ll deploy and run the inference of our model.
That’s it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click Create to deploy the node.
Step 6: Connect to active Compute Node using SSH
- As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status Running in green, meaning that our Compute node is ready to use!
- Once your GPU shows this status, navigate to the three dots on the right and click on Connect with SSH. This will open a new tab with a Jupyter Notebook session in which we can run our model.
Step 7: Set up the project environment with dependencies
1. Clone the official repository of Microsoft’s OmniParser and move it inside the project directory.
git clone https://github.com/microsoft/OmniParser.git && cd OmniParser
Output:
2. Once inside the project directory, create a virtual environment specifying the Python version.
Ensure you have Anaconda installed in your system to create a virtual environment with conda
.
(replace <ENVIRONEMENT_NAME>
with a name for the virtual environment, e.g., zonos-env
)
conda create -n <ENVIRONMENT_NAME> python==3.12
Output:
3. Activate the environment and install the model dependencies.
conda activate omni
pip install -r requirements.txt
Output:
Step 8: Download model files from Hugging Face
- Login to Hugging Face using
huggingface-cli
.
Enter your Hugging Face token with READ
access when prompted and press ENTER
.
huggingface-cli login
Output:
2. Remove any old model files or directories (of previous versions, if any).
rm -rf weights/icon_detect weights/icon_caption weights/icon_caption_florence
3. Download the model weights.
for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done
Output:
Step 9: Run & access the model through Gradio interface
- Launch the Gradio interface to access the model.
python gradio_demo.py
Troubleshooting Errors
As soon as you run the above command, you might encounter the below error.
This error usually occurs because the OpenCV library used in the Gradio file is trying to access the libgl library, which is not present in the system. To fix the error, we’ll need to install this package.
Before installing software, make sure to update the Ubuntu package source-list.
apt update
Output:
Install the libgl
library:
apt install -y libgl1-mesa-glx
Output:
After the installation, run the gradio command again; this time, it should launch perfectly.
Output:
2. Forward and tunnel the SSH port to access the URL in the local browser.
Run the following command in your local terminal after replacing:
<YOUR_SERVER_PORT>
with the PORT allotted to your remote server (For the NodeShift server – you can find it in the deployed GPU details on the dashboard).
<PATH_TO_SSH_KEY>
with the path to the location where your SSH key is stored.
<YOUR_SERVER_IP>
with the IP address of your remote server.
ssh -L 7861:localhost:7861 -p <YOUR_SERVER_PORT> -i <PATH_TO_SSH_KEY> root@<YOUR_SERVER_IP>
Output:
3. Open the local browser and access the interface at http://localhost:7861
.
Step 10: Test the model
Finally, we’ll test this vision GUI agent with some test screen images given in the project directory.
One-note windows screenshot
2. Paste the image on the left-hand box and receive the obtained results on the right.
3. Here’s the parsed screen image generated by the Agent.
The text description generated for the above parsed screen:
icon 0: {'type': 'text', 'bbox': [0.46274101734161377, 0.005560704506933689, 0.5403856039047241, 0.025949953123927116], 'interactivity': False, 'content': 'OneNote for Windows 10.', 'source': 'box_ocr_content_ocr'}
icon 1: {'type': 'text', 'bbox': [0.8634705543518066, 0.007414272520691156, 0.8947368264198303, 0.025949953123927116], 'interactivity': False, 'content': 'Yadong Lu', 'source': 'box_ocr_content_ocr'}
icon 2: {'type': 'text', 'bbox': [0.04273058846592903, 0.037071362137794495, 0.06357477605342865, 0.056533828377723694], 'interactivity': False, 'content': 'Insert', 'source': 'box_ocr_content_ocr'}
icon 3: {'type': 'text', 'bbox': [0.0760812908411026, 0.037071362137794495, 0.09692548215389252, 0.056533828377723694], 'interactivity': False, 'content': 'Draw', 'source': 'box_ocr_content_ocr'}
icon 4: {'type': 'text', 'bbox': [0.10943199694156647, 0.03521779552102089, 0.1281917691230774, 0.056533828377723694], 'interactivity': False, 'content': 'View', 'source': 'box_ocr_content_ocr'}
icon 5: {'type': 'text', 'bbox': [0.13965606689453125, 0.03521779552102089, 0.15841583907604218, 0.056533828377723694], 'interactivity': False, 'content': 'Help', 'source': 'box_ocr_content_ocr'}
icon 6: {'type': 'text', 'bbox': [0.2261594533920288, 0.14921222627162933, 0.3116206228733063, 0.16682113707065582], 'interactivity': False, 'content': 'Monday, November 11, 2024', 'source': 'box_ocr_content_ocr'}
icon 7: {'type': 'text', 'bbox': [0.3251693546772003, 0.14921222627162933, 0.3574778437614441, 0.16682113707065582], 'interactivity': False, 'content': '11:38 AM', 'source': 'box_ocr_content_ocr'}
icon 8: {'type': 'text', 'bbox': [0.9520583748817444, 0.964782178401947, 0.97811359167099, 0.9814643263816833], 'interactivity': False, 'content': '11:38 AM', 'source': 'box_ocr_content_ocr'}
icon 9: {'type': 'text', 'bbox': [0.9458051323890686, 0.9759036302566528, 0.97811359167099, 0.9962928891181946], 'interactivity': False, 'content': '11/11/2024', 'source': 'box_ocr_content_ocr'}
icon 10: {'type': 'icon', 'bbox': [0.4977208375930786, 0.058505620807409286, 0.5912500619888306, 0.09561822563409805], 'interactivity': True, 'content': 'Heading 1 ', 'source': 'box_yolo_content_ocr'}
icon 11: {'type': 'icon', 'bbox': [0.026060258969664574, 0.27056339383125305, 0.09852797538042068, 0.3053193688392639], 'interactivity': True, 'content': 'archive ', 'source': 'box_yolo_content_ocr'}
icon 12: {'type': 'icon', 'bbox': [0.025542298331856728, 0.3031369149684906, 0.09726299345493317, 0.3389579653739929], 'interactivity': True, 'content': 'Archive daily ', 'source': 'box_yolo_content_ocr'}
icon 13: {'type': 'icon', 'bbox': [0.02589419111609459, 0.16989213228225708, 0.09855923801660538, 0.20550964772701263], 'interactivity': True, 'content': 'bert structuree ', 'source': 'box_yolo_content_ocr'}
icon 14: {'type': 'icon', 'bbox': [0.02548798732459545, 0.3375644385814667, 0.09644851088523865, 0.3720583915710449], 'interactivity': True, 'content': 'summary of c.... ', 'source': 'box_yolo_content_ocr'}
icon 15: {'type': 'icon', 'bbox': [0.32861170172691345, 0.9623737335205078, 0.4448166489601135, 0.9935060739517212], 'interactivity': True, 'content': 'Q Search ', 'source': 'box_yolo_content_ocr'}
icon 16: {'type': 'icon', 'bbox': [0.09878935664892197, 0.2703218162059784, 0.2032867819070816, 0.3053605556488037], 'interactivity': True, 'content': 'Visual webarenae ', 'source': 'box_yolo_content_ocr'}
icon 17: {'type': 'icon', 'bbox': [0.09946783632040024, 0.2037675827741623, 0.2030877321958542, 0.2388870269060135], 'interactivity': True, 'content': 'Daily note ', 'source': 'box_yolo_content_ocr'}
icon 18: {'type': 'icon', 'bbox': [0.09879940003156662, 0.16923165321350098, 0.20284965634346008, 0.20517879724502563], 'interactivity': True, 'content': 'Design doc for coherent sto.... ', 'source': 'box_yolo_content_ocr'}
icon 19: {'type': 'icon', 'bbox': [0.09939892590045929, 0.30323848128318787, 0.2032860815525055, 0.3380596339702606], 'interactivity': True, 'content': ' Self-operating computer str... ', 'source': 'box_yolo_content_ocr'}
icon 20: {'type': 'icon', 'bbox': [0.09891830384731293, 0.3373170793056488, 0.2031133770942688, 0.37204623222351074], 'interactivity': True, 'content': 'Autogen + websurfere ', 'source': 'box_yolo_content_ocr'}
icon 21: {'type': 'icon', 'bbox': [0.025040971115231514, 0.36982670426368713, 0.09750079363584518, 0.40368154644966125], 'interactivity': True, 'content': 'New Section 1 ', 'source': 'box_yolo_content_ocr'}
icon 22: {'type': 'icon', 'bbox': [0.09847281873226166, 0.40325871109962463, 0.20296218991279602, 0.4393156170845032], 'interactivity': True, 'content': 'Untitled page ', 'source': 'box_yolo_content_ocr'}
icon 23: {'type': 'icon', 'bbox': [0.10036938637495041, 0.13676734268665314, 0.20343874394893646, 0.1714395433664322], 'interactivity': True, 'content': 'kgadmin111@BAGAI-TS-26L ', 'source': 'box_yolo_content_ocr'}
icon 24: {'type': 'icon', 'bbox': [0.0694265142083168, 0.06255117058753967, 0.12788702547550201, 0.09143448621034622], 'interactivity': True, 'content': 'Calibri Light ', 'source': 'box_yolo_content_ocr'}
icon 25: {'type': 'icon', 'bbox': [0.09861336648464203, 0.3698839545249939, 0.20276053249835968, 0.40438246726989746], 'interactivity': True, 'content': 'demo ', 'source': 'box_yolo_content_ocr'}
icon 26: {'type': 'icon', 'bbox': [0.16513708233833313, 0.06534593552350998, 0.1796489953994751, 0.09025215357542038], 'interactivity': True, 'content': 'B ', 'source': 'box_yolo_content_ocr'}
icon 27: {'type': 'icon', 'bbox': [0.10069171339273453, 0.9186595678329468, 0.15105381608009338, 0.9507640600204468], 'interactivity': True, 'content': '+ Add page ', 'source': 'box_yolo_content_ocr'}
icon 28: {'type': 'icon', 'bbox': [0.12439513206481934, 0.06278569251298904, 0.15950042009353638, 0.09146451950073242], 'interactivity': True, 'content': '20 ', 'source': 'box_yolo_content_ocr'}
icon 29: {'type': 'icon', 'bbox': [0.2526192367076874, 0.06430971622467041, 0.26943251490592957, 0.09162358939647675], 'interactivity': True, 'content': 'A ', 'source': 'box_yolo_content_ocr'}
icon 30: {'type': 'icon', 'bbox': [0.027142895385622978, 0.9183162450790405, 0.08834570646286011, 0.9509704113006592], 'interactivity': True, 'content': '+ Add section ', 'source': 'box_yolo_content_ocr'}
icon 31: {'type': 'icon', 'bbox': [0.028054427355527878, 0.10501008480787277, 0.11555267125368118, 0.128859281539917], 'interactivity': True, 'content': 'j Yadong @ Microsoft ', 'source': 'box_yolo_content_ocr'}
icon 32: {'type': 'icon', 'bbox': [0.18317294120788574, 0.10565543174743652, 0.19747339189052582, 0.13121400773525238], 'interactivity': True, 'content': 'F ', 'source': 'box_yolo_content_ocr'}
icon 33: {'type': 'icon', 'bbox': [0.004209435079246759, 0.030903520062565804, 0.030500514432787895, 0.05763404443860054], 'interactivity': True, 'content': 'Home ', 'source': 'box_yolo_content_ocr'}
icon 34: {'type': 'icon', 'bbox': [0.02639380842447281, 0.2360835075378418, 0.09882020205259323, 0.27237042784690857], 'interactivity': True, 'content': 'Haring', 'source': 'box_yolo_content_yolo'}
icon 35: {'type': 'icon', 'bbox': [0.026554755866527557, 0.20414064824581146, 0.09839887917041779, 0.2387440800666809], 'interactivity': True, 'content': 'Personal and', 'source': 'box_yolo_content_yolo'}
icon 36: {'type': 'icon', 'bbox': [0.02645830065011978, 0.13565753400325775, 0.09919014573097229, 0.1721661537885666], 'interactivity': True, 'content': 'Paragraph Props', 'source': 'box_yolo_content_yolo'}
icon 37: {'type': 'icon', 'bbox': [0.4575100243091583, 0.05803237482905388, 0.49638617038726807, 0.09549494087696075], 'interactivity': True, 'content': 'a checkbox for selecting or indicating a task.', 'source': 'box_yolo_content_yolo'}
icon 38: {'type': 'icon', 'bbox': [0.1000729352235794, 0.236659973859787, 0.2030917853116989, 0.2715550363063812], 'interactivity': True, 'content': 'Likra', 'source': 'box_yolo_content_yolo'}
icon 39: {'type': 'icon', 'bbox': [0.0017765597440302372, 0.1379750669002533, 0.022271646186709404, 0.1754968762397766], 'interactivity': True, 'content': 'search functionality.', 'source': 'box_yolo_content_yolo'}
icon 40: {'type': 'icon', 'bbox': [0.0019245148869231343, 0.1797219067811966, 0.02269672602415085, 0.2191949337720871], 'interactivity': True, 'content': 'Time', 'source': 'box_yolo_content_yolo'}
icon 41: {'type': 'icon', 'bbox': [0.026080019772052765, 0.05948018282651901, 0.04336424544453621, 0.09400228410959244], 'interactivity': True, 'content': 'Redo', 'source': 'box_yolo_content_yolo'}
icon 42: {'type': 'icon', 'bbox': [0.44613826274871826, 0.9643509387969971, 0.4645717442035675, 0.9967544674873352], 'interactivity': True, 'content': 'Microsoft Edge browser', 'source': 'box_yolo_content_yolo'}
icon 43: {'type': 'icon', 'bbox': [0.0030578786972910166, 0.09759819507598877, 0.022655321285128593, 0.13626854121685028], 'interactivity': True, 'content': 'Navigator', 'source': 'box_yolo_content_yolo'}
icon 44: {'type': 'icon', 'bbox': [0.49205249547958374, 0.9594891667366028, 0.512681782245636, 0.9976874589920044], 'interactivity': True, 'content': 'OneNote.', 'source': 'box_yolo_content_yolo'}
icon 45: {'type': 'icon', 'bbox': [0.5827235579490662, 0.9629796743392944, 0.6035123467445374, 0.9983140826225281], 'interactivity': True, 'content': 'Movies & TV', 'source': 'box_yolo_content_yolo'}
icon 46: {'type': 'icon', 'bbox': [0.4691826105117798, 0.9607325196266174, 0.48815110325813293, 0.9954813718795776], 'interactivity': True, 'content': 'folder', 'source': 'box_yolo_content_yolo'}
icon 47: {'type': 'icon', 'bbox': [0.5377991199493408, 0.9608587622642517, 0.5575319528579712, 0.9975340366363525], 'interactivity': True, 'content': 'Microsoft Edge browser', 'source': 'box_yolo_content_yolo'}
icon 48: {'type': 'icon', 'bbox': [0.5160161256790161, 0.9599819779396057, 0.5358539819717407, 0.997226893901825], 'interactivity': True, 'content': 'Outlook', 'source': 'box_yolo_content_yolo'}
icon 49: {'type': 'icon', 'bbox': [0.5610001087188721, 0.9626167416572571, 0.5807430744171143, 0.9981206655502319], 'interactivity': True, 'content': 'Microsoft Excel', 'source': 'box_yolo_content_yolo'}
icon 50: {'type': 'icon', 'bbox': [0.004009242635220289, 0.058906249701976776, 0.02360786683857441, 0.0936172604560852], 'interactivity': True, 'content': 'Undo', 'source': 'box_yolo_content_yolo'}
icon 51: {'type': 'icon', 'bbox': [0.9260776042938232, 0.030561067163944244, 0.9595690965652466, 0.057952553033828735], 'interactivity': True, 'content': 'Share', 'source': 'box_yolo_content_yolo'}
icon 52: {'type': 'icon', 'bbox': [0.6523982286453247, 0.9618777632713318, 0.6737245917320251, 0.9950875639915466], 'interactivity': True, 'content': 'Teams 1', 'source': 'box_yolo_content_yolo'}
icon 53: {'type': 'icon', 'bbox': [0.04548271745443344, 0.05883805826306343, 0.06414821743965149, 0.0938357561826706], 'interactivity': True, 'content': 'Paste', 'source': 'box_yolo_content_yolo'}
icon 54: {'type': 'icon', 'bbox': [0.6770122051239014, 0.9614969491958618, 0.6959044933319092, 0.9950157999992371], 'interactivity': True, 'content': 'Microsoft Edge Security', 'source': 'box_yolo_content_yolo'}
icon 55: {'type': 'icon', 'bbox': [0.5949482321739197, 0.06163399666547775, 0.6410712599754333, 0.09141834080219269], 'interactivity': True, 'content': 'Dictate', 'source': 'box_yolo_content_yolo'}
icon 56: {'type': 'icon', 'bbox': [0.30505266785621643, 0.9644448161125183, 0.32604458928108215, 0.9923491477966309], 'interactivity': True, 'content': 'Windows', 'source': 'box_yolo_content_yolo'}
icon 57: {'type': 'icon', 'bbox': [0.607010006904602, 0.9626840353012085, 0.6252184510231018, 0.9980597496032715], 'interactivity': True, 'content': 'Toggle Terminal', 'source': 'box_yolo_content_yolo'}
icon 58: {'type': 'icon', 'bbox': [0.001695476472377777, 0.0, 0.022611843422055244, 0.028786471113562584], 'interactivity': True, 'content': 'Back', 'source': 'box_yolo_content_yolo'}
icon 59: {'type': 'icon', 'bbox': [0.9057840704917908, 0.028024809435009956, 0.923634946346283, 0.06098105013370514], 'interactivity': True, 'content': 'Notifications', 'source': 'box_yolo_content_yolo'}
icon 60: {'type': 'icon', 'bbox': [0.22655856609344482, 0.06477610766887665, 0.24330861866474152, 0.09049416333436966], 'interactivity': True, 'content': 'Pencil', 'source': 'box_yolo_content_yolo'}
icon 61: {'type': 'icon', 'bbox': [0.3212123215198517, 0.06572858989238739, 0.3364010155200958, 0.09169113636016846], 'interactivity': True, 'content': 'Ribbon display options', 'source': 'box_yolo_content_yolo'}
icon 62: {'type': 'icon', 'bbox': [0.6302938461303711, 0.9623212218284607, 0.6480580568313599, 0.9958248138427734], 'interactivity': True, 'content': 'Microsoft Edge browser', 'source': 'box_yolo_content_yolo'}
icon 63: {'type': 'icon', 'bbox': [0.34321218729019165, 0.06483219563961029, 0.3608064353466034, 0.09094417095184326], 'interactivity': True, 'content': 'Bullets', 'source': 'box_yolo_content_yolo'}
icon 64: {'type': 'icon', 'bbox': [0.18577632308006287, 0.06645506620407104, 0.1998661756515503, 0.08947295695543289], 'interactivity': True, 'content': 'Italic', 'source': 'box_yolo_content_yolo'}
icon 65: {'type': 'icon', 'bbox': [0.8872997760772705, 0.02877923659980297, 0.9043022990226746, 0.06049835681915283], 'interactivity': True, 'content': 'Help or Information', 'source': 'box_yolo_content_yolo'}
icon 66: {'type': 'icon', 'bbox': [0.28059640526771545, 0.06576665490865707, 0.2941533029079437, 0.09154114127159119], 'interactivity': True, 'content': 'Formatting Marks', 'source': 'box_yolo_content_yolo'}
icon 67: {'type': 'icon', 'bbox': [0.4386585056781769, 0.06492161005735397, 0.4536566436290741, 0.0919819176197052], 'interactivity': True, 'content': 'Ribbon display options', 'source': 'box_yolo_content_yolo'}
icon 68: {'type': 'icon', 'bbox': [0.20658007264137268, 0.06572634726762772, 0.21995002031326294, 0.08992525190114975], 'interactivity': True, 'content': 'Underline', 'source': 'box_yolo_content_yolo'}
icon 69: {'type': 'icon', 'bbox': [0.3694952130317688, 0.06402627378702164, 0.3877516984939575, 0.09025061130523682], 'interactivity': True, 'content': 'Numbering', 'source': 'box_yolo_content_yolo'}
icon 70: {'type': 'icon', 'bbox': [0.3013240098953247, 0.06482964009046555, 0.31539127230644226, 0.09175645560026169], 'interactivity': True, 'content': 'Clear Formatting', 'source': 'box_yolo_content_yolo'}
icon 71: {'type': 'icon', 'bbox': [0.9633382558822632, 0.02953159064054489, 0.9812459349632263, 0.05979171395301819], 'interactivity': True, 'content': 'Line', 'source': 'box_yolo_content_yolo'}
icon 72: {'type': 'icon', 'bbox': [0.8676645755767822, 0.028567571192979813, 0.8854563236236572, 0.05995263159275055], 'interactivity': True, 'content': 'Copy', 'source': 'box_yolo_content_yolo'}
icon 73: {'type': 'icon', 'bbox': [0.8607571125030518, 0.9633863568305969, 0.8763588666915894, 0.9924913644790649], 'interactivity': True, 'content': 'Weather forecast', 'source': 'box_yolo_content_yolo'}
icon 74: {'type': 'icon', 'bbox': [0.8278011679649353, 0.9655506014823914, 0.8420353531837463, 0.9935774207115173], 'interactivity': True, 'content': 'M0,0L9,0 4.5,5z', 'source': 'box_yolo_content_yolo'}
icon 75: {'type': 'icon', 'bbox': [0.3948214054107666, 0.0597236268222332, 0.4131030738353729, 0.09249638766050339], 'interactivity': True, 'content': 'Decrease', 'source': 'box_yolo_content_yolo'}
icon 76: {'type': 'icon', 'bbox': [0.8516672849655151, 0.0291904266923666, 0.8654069900512695, 0.06068701669573784], 'interactivity': True, 'content': 'Sync with your family', 'source': 'box_yolo_content_yolo'}
icon 77: {'type': 'icon', 'bbox': [0.023891588672995567, 0.0, 0.04188734292984009, 0.02882027067244053], 'interactivity': True, 'content': 'Forward', 'source': 'box_yolo_content_yolo'}
icon 78: {'type': 'icon', 'bbox': [0.415931761264801, 0.05966160446405411, 0.434540718793869, 0.09321374446153641], 'interactivity': True, 'content': 'Increase', 'source': 'box_yolo_content_yolo'}
icon 79: {'type': 'icon', 'bbox': [0.9586057066917419, 0.0008117363322526217, 0.9709292650222778, 0.02594202756881714], 'interactivity': True, 'content': 'Copy', 'source': 'box_yolo_content_yolo'}
icon 80: {'type': 'icon', 'bbox': [0.9264345169067383, 0.9633491635322571, 0.9395644068717957, 0.9933136105537415], 'interactivity': True, 'content': 'Send this message', 'source': 'box_yolo_content_yolo'}
icon 81: {'type': 'icon', 'bbox': [0.8791786432266235, 0.9623896479606628, 0.8976826071739197, 0.994178056716919], 'interactivity': True, 'content': 'ENG', 'source': 'box_yolo_content_yolo'}
icon 82: {'type': 'icon', 'bbox': [0.8446676731109619, 0.9659549593925476, 0.8574271202087402, 0.9910972118377686], 'interactivity': True, 'content': 'Toggle Button', 'source': 'box_yolo_content_yolo'}
icon 83: {'type': 'icon', 'bbox': [0.6416144967079163, 0.06273786723613739, 0.656349241733551, 0.09095293283462524], 'interactivity': True, 'content': 'Ribbon display options', 'source': 'box_yolo_content_yolo'}
icon 84: {'type': 'icon', 'bbox': [0.9266740083694458, 0.0, 0.9560303688049316, 0.02743523381650448], 'interactivity': True, 'content': 'Minimize', 'source': 'box_yolo_content_yolo'}
icon 85: {'type': 'icon', 'bbox': [0.9151344895362854, 0.967095673084259, 0.9250877499580383, 0.9910761117935181], 'interactivity': True, 'content': 'Speaker', 'source': 'box_yolo_content_yolo'}
icon 86: {'type': 'icon', 'bbox': [0.025372261181473732, 0.40270090103149414, 0.09915894269943237, 0.439822793006897], 'interactivity': True, 'content': 'M0,0L9,0 4.5,5z', 'source': 'box_yolo_content_yolo'}
icon 87: {'type': 'icon', 'bbox': [0.9833769202232361, 0.031947046518325806, 0.9987261295318604, 0.05786075443029404], 'interactivity': True, 'content': 'More options', 'source': 'box_yolo_content_yolo'}
icon 88: {'type': 'icon', 'bbox': [0.09817462414503098, 0.43717724084854126, 0.20354987680912018, 0.48159924149513245], 'interactivity': True, 'content': 'A blank space for text or image.', 'source': 'box_yolo_content_yolo'}
icon 89: {'type': 'icon', 'bbox': [0.9075111746788025, 0.003739792387932539, 0.9231565594673157, 0.026471557095646858], 'interactivity': True, 'content': 'Line', 'source': 'box_yolo_content_yolo'}
icon 90: {'type': 'icon', 'bbox': [0.9796047806739807, 0.9616765975952148, 0.994205892086029, 0.996170699596405], 'interactivity': True, 'content': 'Notifications', 'source': 'box_yolo_content_yolo'}
icon 91: {'type': 'icon', 'bbox': [0.9012123346328735, 0.9660810828208923, 0.9133918881416321, 0.9918850064277649], 'interactivity': True, 'content': 'Wi-Fi connectivity', 'source': 'box_yolo_content_yolo'}
IOS mobile homescreen snapshot
Here’s the text description generated for the above parsed screen:
icon 0: {'type': 'text', 'bbox': [0.6521739363670349, 0.8856026530265808, 0.7125604152679443, 0.90625], 'interactivity': False, 'content': '38', 'source': 'box_ocr_content_ocr'}
icon 1: {'type': 'icon', 'bbox': [0.5070614218711853, 0.2060587853193283, 0.7308241128921509, 0.3040710389614105], 'interactivity': True, 'content': 'NetEase Music ', 'source': 'box_yolo_content_ocr'}
icon 2: {'type': 'icon', 'bbox': [0.07097821682691574, 0.20554324984550476, 0.2510303854942322, 0.3037087619304657], 'interactivity': True, 'content': ' Journal ', 'source': 'box_yolo_content_ocr'}
icon 3: {'type': 'icon', 'bbox': [0.06980455666780472, 0.08810707181692123, 0.24939627945423126, 0.18644098937511444], 'interactivity': True, 'content': 'Translate ', 'source': 'box_yolo_content_ocr'}
icon 4: {'type': 'icon', 'bbox': [0.29536083340644836, 0.08851530402898788, 0.47902894020080566, 0.18603260815143585], 'interactivity': True, 'content': ' Freeform ', 'source': 'box_yolo_content_ocr'}
icon 5: {'type': 'icon', 'bbox': [0.27456995844841003, 0.20604786276817322, 0.4959258437156677, 0.30438464879989624], 'interactivity': True, 'content': 'Ticketmaster ', 'source': 'box_yolo_content_ocr'}
icon 6: {'type': 'icon', 'bbox': [0.7551403045654297, 0.08895181864500046, 0.9312956929206848, 0.18618997931480408], 'interactivity': True, 'content': 'Twitter ', 'source': 'box_yolo_content_ocr'}
icon 7: {'type': 'icon', 'bbox': [0.044235989451408386, 0.4388943612575531, 0.2747706472873688, 0.5382285118103027], 'interactivity': True, 'content': 'OffScreen ', 'source': 'box_yolo_content_ocr'}
icon 8: {'type': 'icon', 'bbox': [0.7349879145622253, 0.20427937805652618, 0.9627728462219238, 0.3029525578022003], 'interactivity': True, 'content': 'COVID-19/Coro... ', 'source': 'box_yolo_content_ocr'}
icon 9: {'type': 'icon', 'bbox': [0.5074514150619507, 0.3222920000553131, 0.7237535715103149, 0.42309141159057617], 'interactivity': True, 'content': 'ESPN ', 'source': 'box_yolo_content_ocr'}
icon 10: {'type': 'icon', 'bbox': [0.5053181052207947, 0.4392409324645996, 0.7263579964637756, 0.5426095128059387], 'interactivity': True, 'content': 'WORLD HYATT. Hyatt ', 'source': 'box_yolo_content_ocr'}
icon 11: {'type': 'icon', 'bbox': [0.7328811287879944, 0.3222074508666992, 0.944567084312439, 0.4202495515346527], 'interactivity': True, 'content': 'CookUnity ', 'source': 'box_yolo_content_ocr'}
icon 12: {'type': 'icon', 'bbox': [0.40073251724243164, 0.8146427273750305, 0.5987398624420166, 0.8507030606269836], 'interactivity': True, 'content': 'Q Search ', 'source': 'box_yolo_content_ocr'}
icon 13: {'type': 'icon', 'bbox': [0.26826566457748413, 0.43897390365600586, 0.5032870173454285, 0.5410236120223999], 'interactivity': True, 'content': 'blind ', 'source': 'box_yolo_content_ocr'}
icon 14: {'type': 'icon', 'bbox': [0.7190653681755066, 0.44008806347846985, 0.9734245538711548, 0.5399961471557617], 'interactivity': True, 'content': 'ActiveShield ', 'source': 'box_yolo_content_ocr'}
icon 15: {'type': 'icon', 'bbox': [0.8907498121261597, 0.013169443234801292, 0.9643269181251526, 0.04306703805923462], 'interactivity': True, 'content': '4 ', 'source': 'box_yolo_content_ocr'}
icon 16: {'type': 'icon', 'bbox': [0.05283404886722565, 0.013748228549957275, 0.19146451354026794, 0.040847986936569214], 'interactivity': True, 'content': '4:03 : ', 'source': 'box_yolo_content_ocr'}
icon 17: {'type': 'icon', 'bbox': [0.5294579267501831, 0.08883235603570938, 0.7137314081192017, 0.18783912062644958], 'interactivity': True, 'content': 'Xiaohongshu (Little Red Book) app.', 'source': 'box_yolo_content_yolo'}
icon 18: {'type': 'icon', 'bbox': [0.050175897777080536, 0.3223983645439148, 0.27274176478385925, 0.4228990375995636], 'interactivity': True, 'content': 'Xiaohongshu (Little Red Book)', 'source': 'box_yolo_content_yolo'}
icon 19: {'type': 'icon', 'bbox': [0.28012850880622864, 0.32221469283103943, 0.49765777587890625, 0.4219929277896881], 'interactivity': True, 'content': 'Microsoft Edge Pro', 'source': 'box_yolo_content_yolo'}
icon 20: {'type': 'icon', 'bbox': [0.5147342085838318, 0.890608012676239, 0.7010797262191772, 0.9666597247123718], 'interactivity': True, 'content': 'a messaging application.', 'source': 'box_yolo_content_yolo'}
icon 21: {'type': 'icon', 'bbox': [0.7509947419166565, 0.8924387693405151, 0.9278311729431152, 0.9683733582496643], 'interactivity': True, 'content': 'Music application.', 'source': 'box_yolo_content_yolo'}
icon 22: {'type': 'icon', 'bbox': [0.2880189120769501, 0.8922565579414368, 0.4733337163925171, 0.967495322227478], 'interactivity': True, 'content': 'a phone call application.', 'source': 'box_yolo_content_yolo'}
icon 23: {'type': 'icon', 'bbox': [0.07358679175376892, 0.8929973244667053, 0.23364505171775818, 0.973598837852478], 'interactivity': True, 'content': 'Strawberry', 'source': 'box_yolo_content_yolo'}
icon 24: {'type': 'icon', 'bbox': [0.7780632972717285, 0.01428948063403368, 0.8351246118545532, 0.04639788344502449], 'interactivity': True, 'content': 'View as more options', 'source': 'box_yolo_content_yolo'}
icon 25: {'type': 'icon', 'bbox': [0.8340836763381958, 0.014536445960402489, 0.8900679349899292, 0.04208401218056679], 'interactivity': True, 'content': 'WiFi connectivity.', 'source': 'box_yolo_content_yolo'}
Conclusion
Microsoft’s OmniParser v2.0 can help developers transform the UI screenshots into structured data, enhancing AI-driven UI automation with improved accuracy and efficiency. However, deploying and scaling such AI-driven tools require robust infrastructure, and this is where NodeShift’s cloud infra comes into play. With optimized compute resources and intuitive deployment features, NodeShift ensures that models and agents like these run efficiently in cloud environments, allowing developers to focus on innovation rather than infrastructure management.