- Ainsider
- Posts
- Weekly AI Newsletter
Weekly AI Newsletter
New AI Tools and Models | The most important AI news and releases from the last week: OpenAI / Gemini / Grok / Canva / Kling and much more

Table of Contents
Ainsider AI: Your Weekly Dose of AI Innovation
The most important AI news and releases from the past week:
OpenAI – New Models: GPT-4.1, o3, and o4-mini
OpenAI has introduced three groundbreaking new models: GPT-4.1 (available exclusively through the API), o3 (the most intelligent reasoning model capable of "thinking in images" and autonomously using tools), and o4-mini (a faster version with advanced capabilities).
GPT-4.1: Accessible only via the API, featuring enhancements in coding and understanding long contexts.
o3: The most advanced reasoning model, capable of "thinking in images" and autonomously utilizing tools.
o4-mini: A faster and more economical version with similar capabilities, available in ChatGPT.
Key Features of the Models
GPT-4.1:
Coding: Significantly improved in creating and debugging code, as well as adhering to diff formats.
Longer Context: Supports up to 1 million tokens, ideal for large datasets.
Multimodality: Understands text, images, and video.
Pricing: $0.15 per million input tokens, $3.50 per million output tokens with reasoning.
o3:
Reasoning: "Thinks" before responding, enhancing quality and accuracy.
Image Thinking: Analyzes images, such as diagrams and drawings.
Tools: Autonomously uses ChatGPT functions like web browsing and image generation.
Benchmark Results: 92.7% in AIME 2025 (mathematics), 69.1% in SWE-Bench Verified (coding), 82.9% in MMMU (visual reasoning).
o4-mini:
Economy: Faster and cheaper, with similar capabilities to o3.
Visual Reasoning: Interprets images and performs visual tasks.
Tools: Utilizes ChatGPT functions.
Benchmark Results: 68.1% in SWE-Bench Verified (coding).
Variants: Standard and "o4-mini-high" with a higher level of reasoning.
Google – Gemini 2.5 Flash
Gemini 2.5 Flash is Google's new groundbreaking AI model that combines advanced reasoning capabilities with cost-effectiveness and speed. Its key features, such as hybrid reasoning, massive context support, multimodality, and integration with Google tools, make it a versatile tool for developers and users. The model is available for free in a preview version, allowing easy exploration of its capabilities.
Key Features and Capabilities:
Hybrid Reasoning Model:
The first fully hybrid reasoning model, enabling it to "think" before responding, improving performance and accuracy.
Developers can toggle the "thinking" feature on or off and set a "thinking budget" (from 0 to 24576 tokens), allowing customization of quality, cost, and latency for specific tasks. The model autonomously assesses task complexity and adjusts thinking intensity if no budget is specified.
Massive Context:
Supports 1 million tokens in the input context, enabling the processing of very large datasets like long documents, codebases, and system logs.
Multimodality:
Understands and processes various data types, including text, images, audio, and video. It can generate images and detect objects in photos (e.g., by generating bounding boxes or segmentation masks).
Code Execution:
Can write and execute Python code directly, which is extremely useful for developers.
Cost Efficiency:
Priced at $0.15 per 1 million input tokens and $3.50 per 1 million output tokens with thinking enabled. The model is considered best-in-class for its price-to-performance ratio, placing it on the "Pareto frontier" for costs and efficiency.
Benchmark Results:
Excels in diverse tasks, including:
Humanity's Last Exam (no tools): 18.8%.
GPQA diamond (one-shot): 84.0%.
Mathematics AIME 2025 (one-shot): 86.7%.
Visual Reasoning MMMU (one-shot): 81.7%.
Long Contexts MRCR 1M: 83.1%.
These results demonstrate the model's speed, efficiency, and precision in complex tasks.
Integration with Google Workspace:
Seamlessly integrates with Google products like Gmail, Docs, and Sheets, facilitating user workflows within a familiar environment.