UnifoLM-VLA
RobotUnifoLM-VLA-0: A Vision-Language-Action (VLA) large model in the UnifoLM series, designed for general-purpose humanoid robot manipulation. Goes beyond conventional Vision-Language Models (VLMs) in physical interaction through continued pre-training on robot manipulation data.
Implemented Skills
README
UnifoLM-VLA-0 is a Vision–Language–Action (VLA) large model in the UnifoLM series, designed for general-purpose humanoid robot manipulation. It goes beyond the limitations of conventional Vision–Language Models (VLMs) in physical interaction. Through continued pre-training on robot manipulation data, the model evolves from 'vision-language understanding' to an 'embodied brain' equipped with physical common sense. Features spatial semantic enhancement and manipulation generalization across 12 categories of complex manipulation tasks.