AI新工具
banner

Phi-3-MLX


介绍:

Phi-3-MLX 是面向苹果晶片的语言和视觉模型AI框架,提供多样AI任务接口。









Phi-3-MLX

Phi-3-MLX简介

Phi-3-MLX是一个多功能AI框架,结合了Phi-3-Vision多模态模型和最近更新的Phi-3-Mini-128K语言模型,特别优化用于Apple Silicon,通过MLX框架进行加速优化。该项目提供了一个易于使用的接口,适用于多种AI任务,包括高级文本生成、视觉问答和代码执行等。

主要功能
  • 支持Phi-3-Mini-128K(仅语言模型)
  • 集成Phi-3-Vision(多模态模型)
  • 针对Apple Silicon优化性能
  • 支持批量生成,处理多个提示
  • 支持多种AI任务的灵活代理系统
  • 专用工具链定制化工作流
  • 通过模型量化提升效率
  • 支持LoRA微调
  • 提供API集成功能(如图像生成、文本转语音)
快速开始

您可以通过命令行安装和启动Phi-3-MLX:

pip install phi-3-vision-mlx
phi3v

或在Python脚本中使用该库:

from phi_3_vision_mlx import generate
核心功能
视觉问答
generate('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')
批量文本生成
prompts = [
    "Explain the key concepts of quantum computing and provide a Rust code example demonstrating quantum superposition.",
    "Write a poem about the first snowfall of the year.",
    "Summarize the major events of the French Revolution.",
    ...
]
generate(prompts, max_tokens=100)
generate(prompts, max_tokens=100, blind_model=True)
模型和缓存量化
generate("Describe the water cycle.", quantize_model=True)
generate("Explain quantum computing.", quantize_cache=True)
LoRA微调

训练LoRA适配器:

from phi_3_vision_mlx import train_lora

train_lora(
    lora_layers=5,
    lora_rank=16,
    epochs=10,
    lr=1e-4,
    warmup=0.5,
    dataset_path="JosefAlbers/akemiH_MedQA_Reason"
)

使用LoRA生成文本:

generate("Describe the potential applications of CRISPR gene editing in medicine.",
    blind_model=True,
    quantize_model=True,
    use_adapter=True)
代理交互
多轮对话
from phi_3_vision_mlx import Agent

agent = Agent()
agent('Analyze this image and describe the architectural style:', 'https://images.metmuseum.org/CRDImages/rl/original/DP-19531-075.jpg')
agent('What historical period does this architecture likely belong to?')
agent.end()
生成反馈循环
agent('Plot a Lissajous Curve.')
agent('Modify the code to plot 3:4 frequency')
agent.end()
外部API工具使用
agent('Draw "A perfectly red apple, 32k HDR, studio lighting"')
agent('Speak "People say nothing is impossible, but I do nothing every day."')
agent.end()
定制工具链
示例1:上下文学习代理
from phi_3_vision_mlx import _load_text

def add_text(prompt):
    prompt, path = prompt.split('@')
    return f'{_load_text(path)}\n<|end|>\n<|user|>{prompt}'

toolchain = """
    prompt = add_text(prompt)
    responses = generate(prompt, images)
    """
agent = Agent(toolchain, early_stop=100)
agent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')
示例2:检索增强编码代理
from phi_3_vision_mlx import VDB
import datasets

user_input = 'Comparison of Sortino Ratio for Bitcoin and Ethereum.'

def rag(prompt, repo_id="JosefAlbers/sharegpt_python_mlx", n_topk=1):
    ds = datasets.load_dataset(repo_id, split='train')
    vdb = VDB(ds)
    context = vdb(prompt, n_topk)[0][0]
    return f'{context}\n<|end|>\n<|user|>Plot: {prompt}'

toolchain_plot = """
    prompt = rag(prompt)
    responses = generate(prompt, images)
    files = execute(responses, step)
    """
agent = Agent(toolchain_plot, False)
_, images = agent(user_input)
示例3:多代理互动
agent_writer = Agent(early_stop=100)
agent_writer(f'Write a stock analysis report on: {user_input}', images)
基准测试
from phi_3_vision_mlx import benchmark
benchmark()
任务 原始模型 量化模型 量化缓存 LoRA适配器
文本生成 8.46 tps 51.69 tps 6.94 tps 8.58 tps
图像描述 7.72 tps 33.10 tps 1.75 tps 7.11 tps
批量生成 103.47 tps 182.83 tps 38.72 tps 101.02 tps

(在M1 Max 64GB上)

可关注我们的公众号:每天AI新工具

广告:私人定制视频文本提取,字幕翻译制作等,欢迎联系QQ:1752338621