Autodesk

Autodesk

Design and make software for architecture, engineering, construction, and entertainment industries.

11,600Building DesignConstructionAutomotiveBuilding Product Manufacturing3D AnimationArchitectureEngineeringConstruction ProfessionalsMechanical EngineeringMechanical CADThermal SimulationElectronic Design AutomationPrint Circuit Board DesignMechanical, Electrical, and Plumbing (MEP)HVACFabricationEstimationInfrastructureCivil EngineeringGenetic Engineering (Life Sciences)Website

Intern, Research Foundational Models

Research intern developing spatial reasoning methods for vision-language models.

Toronto, Ontario, Canada
Internship
Entry-level

Job Highlights

Environment
Office Full-Time

About the Role

The project will investigate approaches such as reinforcement learning, test-time computation, and “thinking with images,” where models iteratively attend to visual evidence, reason over intermediate representations, and verify hypotheses through visual feedback. The goal is to advance state-of-the-art methods for spatially grounded reasoning and generate insights valuable to both the research community and Autodesk’s long-term vision for intelligent design tools. Over the internship you will define and drive a focused research project, conduct experiments, analyze results, and have opportunities to publish and present findings. • Define and execute a research project on geometric reasoning in vision-language models. • Conduct literature reviews to identify limitations and related prior work. • Design and implement novel training or inference strategies using reinforcement learning, test-time computation, or iterative visual reasoning. • Develop model architectures, training pipelines, and evaluation benchmarks for spatial tasks. • Run large-scale experiments, analyze results, and iterate on designs. • Compare approaches against strong baselines and state-of-the-art methods. • Collaborate with research mentors and peers, sharing progress and incorporating feedback. • Author a research paper for top-tier machine learning or computer vision conferences. • Present findings internally at Autodesk and externally at academic venues.

Key Responsibilities

  • research design
  • literature review
  • model development
  • training pipelines
  • experimentation
  • paper publication

What You Bring

We are seeking a research intern to tackle fundamental challenges in geometry, design understanding, and relative spatial reasoning for vision-language models (VLMs). Modern VLMs excel at captioning, semantic understanding, and segmentation, but they still struggle with geometric reasoning, layout understanding, and precise relative positioning—capabilities essential for design, engineering, and creation workflows. The internship will involve close collaboration with research mentors to explore new modeling and training paradigms that move beyond one-shot visual reasoning. Candidates must be currently enrolled in a PhD program in Computer Science, Machine Learning, Computer Vision, or a closely related field, with at least one academic semester remaining after the internship. A strong publication record in top-tier ML or vision conferences and hands-on experience training VLMs and reinforcement learning algorithms are required. Proficiency with modern deep-learning frameworks such as PyTorch, TRL, or Ray, and a solid grounding in machine-learning fundamentals and experimental methodology are essential. Preferred qualifications include experience with multimodal or embodied reasoning, test-time optimization, or iterative inference methods, as well as familiarity with geometric vision and spatial reasoning benchmarks or synthetic visual datasets. Experience scaling experiments on distributed systems or large compute clusters is a plus, along with strong written and verbal communication skills. • Currently enrolled in a PhD program in CS, ML, CV, or a related field with at least one semester remaining. • Strong publication record in top-tier conferences such as ICML, NeurIPS, ICLR, CVPR, ICCV, or ECCV. • Hands-on experience training vision-language models and reinforcement learning algorithms. • Proficient in modern deep-learning frameworks (e.g., PyTorch, TRL, Ray). • Solid background in machine learning fundamentals and experimental research methodology. • Ability to work independently on open-ended research problems and communicate results clearly. • Experience with multimodal or embodied reasoning, test-time optimization, or iterative inference methods. • Familiarity with geometric vision, spatial reasoning benchmarks, or synthetic visual datasets. • Experience scaling experiments on distributed systems or large compute clusters. • Strong written and verbal communication skills.

Requirements

  • phd
  • top conferences
  • vision-language
  • pytorch
  • distributed
  • communication

Benefits

Salary is part of Autodesk’s competitive compensation package and is determined based on experience, education level, and geographic location. Offers are transparent and reflect market standards.

Work Environment

Office Full-Time

Apply Now