Ansible 自动化运维指南(持续更新)
前言
Ansible 是什么?
想象一下:你要管理 100 台服务器,每台服务器都要执行同样的操作:安装 Nginx、修改配置文件、启动服务。
如果你一台台登录进去操作,可能要花一整天。但如果有一个"遥控器",你按一下按钮,100 台服务器同时开始工作,10 分钟就搞定了。
Ansible 就是这个"遥控器"! 它能让你同时控制成千上万台服务器,自动完成各种运维任务。
在现代运维中,自动化是必备技能。Ansible 是最流行的自动化工具之一,它简单易用、功能强大,被无数公司用于配置管理、应用部署、自动化运维等场景。
本文档会告诉你:
- Ansible 的核心概念和工作原理
- 如何安装和配置 Ansible
- 如何编写 Playbook 实现自动化任务
- 如何管理主机清单和变量
- 如何使用 Roles 组织复杂任务
- 生产环境最佳实践
目录
核心概念详解
1. Ansible 是什么?
Ansible 是什么?
想象一下:你是乐队的指挥家。乐谱就是"Playbook",乐器就是"服务器"。你不用亲自去演奏每一种乐器,只需要挥动指挥棒(执行命令),所有乐器就会按照乐谱演奏。
Ansible 就是这个"指挥家",它让你用简单的语言描述任务,然后自动在多台服务器上执行。
Ansible 核心特点:
| 特点 | 说明 |
|---|---|
| Agentless(无代理) | 不需要在被管理服务器上安装软件,通过 SSH 通信 |
| 幂等性 | 同一个操作执行多次,结果都一样 |
| YAML 语法 | 用人类可读的 YAML 格式编写任务 |
| 丰富的模块 | 2000+ 模块,覆盖各种运维场景 |
| 社区活跃 | 大量 Galaxy 角色可用 |
2. Ansible 工作原理
┌─────────────────────────────────────────────────────────────┐
│ Ansible 工作流程 │
│ │
│ ┌─────────────┐ │
│ │ 手动执行 │ │
│ │ ansible-playbook │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Ansible Engine │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐│ │
│ │ │ Inventory│ │ API │ │ Modules ││ │
│ │ │ (主机清单) │ │ (接口) │ │ (模块) ││ │
│ │ └─────────┘ └─────────┘ └─────────┘│ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Plugins │ │ Playbook│ │ │
│ │ │ (插件) │ │ (剧本) │ │ │
│ │ └─────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ │ SSH │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ 被管理节点 ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
│ │ │ Server 1 │ │ Server 2 │ │ Server N │ ││
│ │ │ (web-1) │ │ (web-2) │ │ (db-1) │ ││
│ │ └──────────┘ └──────────┘ └──────────┘ ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
流程说明:
| 步骤 | 通俗解释 |
|---|---|
| 1. 读取 Inventory | 确定要操作哪些服务器(就像名单) |
| 2. 读取 Playbook | 确定要执行什么任务(就像剧本) |
| 3. 连接服务器 | 通过 SSH 连接到每台服务器 |
| 4. 执行模块 | 在服务器上运行 Ansible 模块 |
| 5. 返回结果 | 收集执行结果并汇总展示 |
3. 核心概念速查表
| 概念 | 通俗解释 | 类比 |
|---|---|---|
| Inventory | 服务器名单 | 通讯录 |
| Playbook | 任务剧本 | 演出剧本 |
| Module | 可执行的动作 | 演员的动作 |
| Task | 单个任务 | 剧本中的一场戏 |
| Role | 任务角色包 | 剧本的场景包 |
| Handler | 触发执行的任务 | 剧中剧的触发器 |
| Fact | 主机信息 | 演员的背景资料 |
| Template | 配置文件模板 | 可变台词 |
4. Ansible vs 其他工具
| 对比项 | Ansible | Chef | Puppet | SaltStack |
|---|---|---|---|---|
| 架构 | Agentless | Agent | Agent | Agent/Agentless |
| 语言 | YAML | Ruby | Ruby DSL | YAML |
| 学习曲线 | 低 | 高 | 中 | 中 |
| 规模 | 数千台 | 数百台 | 数百台 | 数千台 |
| 执行速度 | 中 | 慢 | 慢 | 快 |
| 社区 | 非常活跃 | 活跃 | 活跃 | 一般 |
选择建议:
- 中小企业、个人项目:选 Ansible,简单易用
- 大型企业、复杂场景:可考虑 SaltStack(速度更快)
- 已有 Ruby 技术栈:可选 Chef
Ansible 安装与配置
安装方式
方式一:pip 安装(推荐)
# 👀 安装 Ansible(Python 包管理)
pip install ansible
# 👀 验证安装
ansible --version
# 输出示例:
# ansible 2.10.x
# config file = None
# configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
# ansible python module location = /usr/lib/python3.10/site-packages/ansible
# executable location = /usr/bin/ansible
方式二:apt 安装(Ubuntu/Debian)
# 👀 更新软件源
sudo apt update
# 👀 安装 Ansible
sudo apt install ansible
# 👀 验证
ansible --version
方式三:yum 安装(CentOS/RHEL)
# 👀 安装 EPEL 仓库
sudo yum install epel-release
# 👀 安装 Ansible
sudo yum install ansible
# 👀 验证
ansible --version
目录结构
# 👀 Ansible 最佳实践目录结构
.
├── ansible.cfg # Ansible 配置文件
├── inventory/ # 主机清单目录
│ ├── hosts # 生产环境主机
│ └── hosts-dev # 开发环境主机
├── playbooks/ # Playbook 目录
│ ├── site.yml # 主剧本
│ └── webservers.yml # Web 服务器剧本
├── roles/ # 角色目录
│ ├── common/ # 通用角色
│ └── nginx/ # Nginx 角色
├── plugins/ # 插件目录
├── library/ # 自定义模块
└── files/ # 静态文件
└── templates/ # 模板文件
ansible.cfg 配置详解
# 👀 ansible.cfg 主配置文件
[defaults]
# 主机清单文件位置
inventory = ./inventory/hosts
# 每次执行命令的用户
remote_user = ubuntu
# 私有密钥文件
private_key_file = ~/.ssh/id_rsa
# 首次连接时确认主机指纹
host_key_checking = False
# 并行执行任务数
forks = 10
# 详细输出
# verbose = False (0) / -v / -vv / -vvv / -vvvv
# 失败时继续执行
# unlimited = True
# 日志文件
log_path = /var/log/ansible.log
# 角色路径
roles_path = ./roles
[privilege_escalation]
# 提权配置
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
# SSH 连接优化
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
配置优先级(从高到低):
| 优先级 | 位置 | 说明 |
|---|---|---|
| 1 | 环境变量 | ANSIBLE_CONFIG=xxx.cfg ansible ... |
| 2 | 当前目录 | ./ansible.cfg |
| 3 | 用户家目录 | ~/.ansible.cfg |
| 4 | 系统目录 | /etc/ansible/ansible.cfg |
主机清单管理
主机清单是什么?
主机清单是什么?
想象一下:你要给 100 个人发邮件,你需要一个"通讯录"记录每个人的邮箱地址。主机清单就是 Ansible 的"通讯录",记录所有要管理的服务器信息。
基础主机清单
# 👀 inventory/hosts - 基础主机清单
# 👀 方式1:直接指定 IP(单台)
web1 ansible_host=192.168.1.10
# 👀 方式2:指定 IP 和端口
web2 ansible_host=192.168.1.11 ansible_port=2222
# 👀 方式3:使用主机名
db1 ansible_host=db.example.com
# 👀 方式4:指定 SSH 用户和密钥
app1 ansible_host=192.168.1.20 ansible_user=ubuntu ansible_private_key_file=~/.ssh/app_key
主机组管理
# 👀 主机组示例
# 👀 定义 Web 服务器组
[webservers]
web1 ansible_host=192.168.1.10
web2 ansible_host=192.168.1.11
web3 ansible_host=192.168.1.12
# 👀 定义数据库服务器组
[dbservers]
db1 ansible_host=192.168.1.20
db2 ansible_host=192.168.1.21
# 👀 定义所有生产服务器组
[production:children]
webservers
dbservers
# 👀 定义变量(组级别)
[webservers:vars]
nginx_version=1.24.0
http_port=80
[dbservers:vars]
mysql_version=8.0
datadir=/data/mysql
主机范围和正则
# 👀 主机范围示例
# 👀 数字范围
[webservers]
web[1:5] ansible_host=192.168.1.[10:14]
# 展开为:web1=192.168.1.10, web2=192.168.1.11, ...
# 👀 字母范围
[apps]
app[a:c] ansible_host=192.168.2.[10:12]
# 展开为:appa=192.168.2.10, appb=192.168.2.11, appc=192.168.2.12
# 👀 正则表达式
[monitoring]
~(node|web|database)-\d+\.example\.com
动态主机清单
动态主机清单是什么?
想象一下:你的服务器 IP 是动态分配的,每次启动都不一样。动态主机清单就是"自动更新的通讯录",自动从云服务商获取最新的服务器列表。
# 👀 动态主机清单脚本示例(AWS EC2)
# 👀 安装 EC2 插件
ansible-galaxy collection amazon.aws
# 👀 aws_ec2 动态清单配置
cat > inventory/aws_ec2.yml << 'EOF'
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
- cn-north-1
filters:
tag:Environment: production
keyed_groups:
- key: tags['Role']
prefix: role
- key: instance_type
prefix: type
EOF
# 👀 使用动态清单
ansible all -i inventory/aws_ec2.yml -m ping
主机清单变量
# 👀 主机级别变量
[webservers]
web1 ansible_host=192.168.1.10 nginx_workers=4
web2 ansible_host=192.168.1.11 nginx_workers=8
# 👀 验证主机清单
ansible-inventory -i inventory/hosts --list
# 输出示例:
# {
# "webservers": {
# "hosts": ["web1", "web2"]
# },
# "_meta": {
# "hostvars": {
# "web1": {"ansible_host": "192.168.1.10", "nginx_workers": 4},
# "web2": {"ansible_host": "192.168.1.11", "nginx_workers": 8}
# }
# }
# }
Ad-Hoc 临时命令
Ad-Hoc 是什么?
Ad-Hoc 是什么?
想象一下:你不用每次都写剧本,只需要对着对讲机喊一声"所有人把灯关掉"。这就是 Ad-Hoc——一次性、临时性的命令。
使用场景:
- 快速测试
- 一次性操作
- 简单任务
基础命令格式
# 👀 基础格式
ansible <主机> -m <模块> -a "<模块参数>"
# 👀 常用参数
# -i <inventory> 指定主机清单
# -m <module> 指定模块
# -a <args> 模块参数
# -k 询问 SSH 密码
# -K 询问 sudo 密码
# -v 详细输出
# --list-hosts 列出匹配的主机
常用命令示例
1. 测试连通性
# 👀 测试所有服务器连通性
ansible all -m ping
# 输出示例:
# web1 | SUCCESS => {
# "ansible_facts": {
# "discovered_interpreter_python": "/usr/bin/python3"
# },
# "changed": false,
# "ping": "pong"
# }
# 👀 只列出匹配的主机,不执行
ansible webservers --list-hosts
2. 执行 Shell 命令
# 👀 查看服务器 uptime
ansible all -m shell -a "uptime"
# 👀 查看内存使用
ansible all -m shell -a "free -h"
# 👀 查看磁盘使用
ansible all -m shell -a "df -h"
# 👀 执行多个命令
ansible all -m shell -a "cd /tmp && ls -la"
3. 复制文件
# 👀 复制文件到远程服务器
ansible all -m copy -a "src=./file.txt dest=/tmp/file.txt"
# 👀 复制并修改权限
ansible all -m copy -a "src=./script.sh dest=/tmp/script.sh mode=0755"
# 👀 复制目录
ansible all -m copy -a "src=./config dest=/etc/myconfig directory_mode=0755"
4. 管理服务
# 👀 启动服务
ansible webservers -m service -a "name=nginx state=started"
# 👀 停止服务
ansible webservers -m service -a "name=nginx state=stopped"
# 👀 重启服务
ansible webservers -m service -a "name=nginx state=restarted"
# 👀 开机自启
ansible webservers -m service -a "name=nginx state=started enabled=yes"
5. 安装软件包
# 👀 Ubuntu/Debian 安装
ansible all -m apt -a "name=nginx state=present" # 安装
ansible all -m apt -a "name=nginx state=latest" # 更新
ansible all -m apt -a "name=nginx state=absent" # 卸载
# 👀 CentOS/RHEL 安装
ansible all -m yum -a "name=nginx state=present"
ansible all -m dnf -a "name=nginx state=present"
6. 用户管理
# 👀 创建用户
ansible all -m user -a "name=deployer comment='Deploy User' shell=/bin/bash"
# 👀 创建用户并设置密码
ansible all -m user -a "name=deployer password={{ 'password123' | password_hash('sha512') }}"
# 👀 删除用户
ansible all -m user -a "name=deployer state=absent"
7. 文件权限
# 👀 修改文件所有者
ansible all -m file -a "path=/data owner=www-data group=www-data"
# 👀 创建目录
ansible all -m file -a "path=/data/backups state=directory mode=0755"
# 👀 创建软链接
ansible all -m file -a "src=/data/www dest=/var/www state=link"
8. 收集主机信息
# 👀 收集所有主机信息(Fact)
ansible all -m setup
# 👀 只看内存信息
ansible all -m setup -a "filter=*memory*"
# 👀 只看 CPU 信息
ansible all -m setup -a "filter=*processor*"
# 👀 只看网络信息
ansible all -m setup -a "filter=*ipv4*"
Playbook 剧本
Playbook 是什么?
Playbook 是什么?
想象一下:你要演出《西游记》,需要按顺序演:悟空出世 → 大闹天宫 → 取经路 → 真经归来。Playbook 就是这样的"剧本",它定义了任务的执行顺序。
Playbook 核心特点:
| 特点 | 说明 |
|---|---|
| YAML 格式 | 人类可读 |
| 声明式 | 描述最终状态,而不是步骤 |
| 幂等性 | 多次执行结果一致 |
| 顺序执行 | 按定义顺序执行任务 |
Playbook 基础结构
# 👀 基础 Playbook 示例:部署 Nginx
# 1. 声明式:定义这个 Playbook 的目标
- name: Deploy Nginx Web Server # 剧本名称
hosts: webservers # 在哪些服务器上执行
become: yes # 是否提权
vars: # 变量定义
nginx_version: "1.24.0"
nginx_port: 80
# 2. 执行前检查
pre_tasks:
- name: Update apt cache
apt:
update_cache: yes
when: ansible_os_family == "Debian"
# 3. 执行任务
tasks:
# 任务1:安装 Nginx
- name: Install Nginx
apt:
name: nginx
state: present
notify: Start Nginx # 触发 Handler
# 任务2:复制配置文件
- name: Copy Nginx Config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: Reload Nginx
# 任务3:创建网站目录
- name: Create web directory
file:
path: /var/www/html
state: directory
owner: www-data
group: www-data
# 4. 执行后清理
post_tasks:
- name: Verify Nginx is running
service:
name: nginx
state: started
# 5. Handler 定义(被 notify 触发)
handlers:
- name: Start Nginx
service:
name: nginx
state: started
- name: Reload Nginx
service:
name: nginx
state: reloaded
Playbook 完整示例
# 👀 playbooks/deploy-nginx.yml
---
- name: Deploy Nginx to Web Servers
hosts: webservers
remote_user: ubuntu
become: yes
# 👀 变量定义
vars:
app_name: myapp
app_port: 8080
nginx_workers: 4
# 👀 环境信息收集
pre_tasks:
- name: Gather OS Facts
setup:
filter: '*distribution*'
- name: Show OS Info
debug:
msg: "Deploying to {{ ansible_distribution }} {{ ansible_distribution_version }}"
# 👀 主要任务
tasks:
# 👀 任务1:安装 Nginx
- name: Install Nginx and required packages
apt:
name:
- nginx
- python3-pip
state: present
update_cache: yes
tags: install
# 👀 任务2:配置 Nginx
- name: Configure Nginx worker processes
lineinfile:
path: /etc/nginx/nginx.conf
regexp: '^worker_processes'
line: "worker_processes {{ nginx_workers }};"
when: ansible_processor_vcpus is defined
notify: Restart Nginx
tags: config
# 👀 任务3:复制配置文件模板
- name: Deploy Nginx virtual host config
template:
src: vhost.conf.j2
dest: /etc/nginx/sites-available/{{ app_name }}.conf
mode: '0644'
notify: Restart Nginx
tags: config
# 👀 任务4:启用站点
- name: Enable Nginx site
file:
src: /etc/nginx/sites-available/{{ app_name }}.conf
dest: /etc/nginx/sites-enabled/{{ app_name }}.conf
state: link
notify: Restart Nginx
tags: config
# 👀 任务5:创建网页目录
- name: Create application directory
file:
path: /var/www/{{ app_name }}
state: directory
owner: www-data
group: www-data
mode: '0755'
tags: files
# 👀 任务6:部署网页文件
- name: Deploy index.html
copy:
content: |
<!DOCTYPE html>
<html>
<head><title>{{ app_name }}</title></head>
<body>
<h1>Welcome to {{ app_name }}</h1>
<p>Server: {{ ansible_hostname }}</p>
<p>OS: {{ ansible_distribution }} {{ ansible_distribution_version }}</p>
</body>
</html>
dest: /var/www/{{ app_name }}/index.html
owner: www-data
group: www-data
mode: '0644'
tags: deploy
# 👀 任务7:测试 Nginx 配置
- name: Test Nginx configuration
command: nginx -t
register: nginx_test
changed_when: false
tags: test
- name: Show Nginx test result
debug:
msg: "{{ nginx_test.stdout }}"
tags: test
# 👀 任务8:防火墙配置
- name: Configure UFW firewall
ufw:
rule: allow
port: "{{ nginx_port }}"
proto: tcp
when: ansible_distribution == "Ubuntu"
tags: firewall
# 👀 Handler 定义
handlers:
- name: Restart Nginx
service:
name: nginx
state: restarted
- name: Reload Nginx
service:
name: nginx
state: reloaded
条件执行
# 👀 when 条件示例
- name: Install Apache on Debian/Ubuntu
apt:
name: apache2
state: present
when: ansible_os_family == "Debian"
- name: Install Apache on RHEL/CentOS
yum:
name: httpd
state: present
when: ansible_os_family == "RedHat"
# 👀 多个条件
- name: Install for specific version
apt:
name: nginx=1.24.0
state: present
when:
- ansible_os_family == "Debian"
- ansible_distribution_version is version('22.04', '>=')
循环执行
# 👀 loop 循环示例
# 👀 安装多个软件包
- name: Install multiple packages
apt:
name: "{{ item }}"
state: present
loop:
- nginx
- mysql-server
- php-fpm
# 👀 创建多个用户
- name: Create multiple users
user:
name: "{{ item.name }}"
shell: "{{ item.shell | default('/bin/bash') }}"
groups: "{{ item.groups | default('') }}"
loop:
- { name: 'deployer', groups: 'www-data' }
- { name: 'developer', shell: '/bin/zsh' }
- { name: 'monitor', groups: 'monitor' }
# 👀 with_items 简化语法
- name: Create directories
file:
path: "{{ item }}"
state: directory
mode: '0755'
with_items:
- /data/backups
- /data/logs
- /data/uploads
# 👀 with_dict 字典循环
- name: Configure application settings
lineinfile:
path: /etc/app/config.conf
regexp: "^{{ item.key }}"
line: "{{ item.key }} = {{ item.value }}"
with_dict:
app_name: myapp
max_connections: 1000
timeout: 30
错误处理
# 👀 ignore_errors 忽略错误
- name: Try to backup, continue even if failed
shell: backup.sh
register: backup_result
ignore_errors: yes
# 👀 failed_when 自定义失败条件
- name: Check disk space
shell: df -h | grep /dev/sda1 | awk '{print $5}' | sed 's/%//'
register: disk_usage
failed_when: disk_usage.stdout | int > 90
# 👀 force_handlers 强制执行 Handler
- name: Deploy application
hosts: webservers
force_handlers: yes
tasks:
- name: Install app
apt:
name: myapp
state: present
- name: Fail intentionally
shell: exit 1
- name: This task will not execute
debug:
msg: "This will not run"
# �♀️ block/rescue 错误捕获
- name: Deploy with rollback
block:
- name: Backup current version
shell: backup.sh
- name: Deploy new version
shell: deploy.sh
- name: Verify deployment
shell: verify.sh
rescue:
- name: Rollback on failure
shell: rollback.sh
- name: Notify failure
debug:
msg: "Deployment failed, rolled back"
任务委托
# 👀 delegate_to 委托到其他主机
# 👀 在本地生成配置,然后推送到远程
- name: Generate configuration locally
template:
src: app.conf.j2
dest: /tmp/app.conf
delegate_to: localhost
changed_when: true
- name: Deploy generated config
copy:
src: /tmp/app.conf
dest: /etc/app.conf
# 👀 通知某个特定服务器
- name: Reload load balancer
service:
name: haproxy
state: reloaded
delegate_to: lb01
# 👀 本地执行一次(once)
- name: Notify monitoring system once
debug:
msg: "Deployment completed"
delegate_to: localhost
run_once: true
标签管理
# 👀 给任务打标签
tasks:
- name: Install packages
apt:
name: "{{ packages }}"
tags: [install, packages]
- name: Configure application
template:
src: app.conf.j2
dest: /etc/app.conf
tags: [config, template]
- name: Start services
service:
name: "{{ service_name }}"
state: started
tags: [service, start]
# 👀 使用标签执行
ansible-playbook site.yml --tags "install,config" # 只执行 install 和 config
ansible-playbook site.yml --skip-tags "service" # 跳过 service 标签
ansible-playbook site.yml --list-tags # 列出所有标签
变量与模板
变量基础
变量是什么?
想象一下:你要写很多封信,内容基本一样,但收件人名字不同。变量就像"占位符",让你写一封信模板,收件人名字自动替换。
# 👀 变量定义示例
vars:
app_name: myapp
app_version: "1.0.0"
app_port: 8080
enabled: true
# 👀 在任务中使用变量
- name: Print app info
debug:
msg: "{{ app_name }} v{{ app_version }} is running on port {{ app_port }}"
变量来源
# 👀 1. Playbook 中定义
vars:
http_port: 80
# 👀 2. inventory 中定义
# [webservers]
# web1 http_port=80
# 👀 3. 命令行传递
# ansible-playbook site.yml --extra-vars "http_port=8080"
# 👀 4. 文件中定义
- name: Load variables from file
include_vars:
file: vars/app.yml
# 👀 5. 主机 facts
# ansible_hostname, ansible_ip, ansible_os_family 等
Jinja2 模板
# 👀 模板文件:config.conf.j2
# 👀 基本变量
app_name = {{ app_name }}
app_version = {{ app_version }}
port = {{ port }}
# 👀 条件判断
{% if enable_cache %}
cache_enabled = true
cache_size = {{ cache_size }}
{% else %}
cache_enabled = false
{% endif %}
# 👀 循环
{% for user in allowed_users %}
allow_user = {{ user }}
{% endfor %}
# 👀 过滤器
version = {{ app_version | upper }}
date = {{ ansible_date_time.iso8601 }}
# 👀 默认值
log_level = {{ log_level | default('INFO') }}
# 👀 算术运算
max_connections = {{ workers * 50 }}
变量文件分离
# 👀 vars/app.yml
---
app_name: myapp
app_version: "2.0.0"
app_port: 8080
database:
host: localhost
port: 3306
name: myapp_db
# 👀 vars/secrets.yml(敏感变量)
---
db_password: "your_secure_password"
api_key: "sk-xxxxx"
# 👀 Playbook 中引用
- name: Deploy application
hosts: webservers
vars_files:
- vars/app.yml
- vars/secrets.yml
tasks:
- name: Display app info
debug:
msg: "Deploying {{ app_name }} v{{ app_version }}"
Roles 角色
Role 是什么?
Role 是什么?
想象一下:你要装修很多房子,每套房子都需要:水电工、木工、油漆工。如果每个房子都单独找工人,很麻烦。
Role 就像"装修包",把相关的任务(配置 Nginx、安装 PHP、配置防火墙)打包成一个可复用的模块。
Role 目录结构:
# 👀 roles/nginx 目录结构
roles/nginx/
├── defaults/ # 默认变量(优先级最低)
│ └── main.yml
├── files/ # 静态文件
│ ├── nginx.conf
│ └── mime.types
├── handlers/ # Handler
│ └── main.yml
├── meta/ # 角色依赖
│ └── main.yml
├── tasks/ # 任务
│ └── main.yml
├── templates/ # 模板文件
│ └── vhost.conf.j2
├── tests/ # 测试
│ ├── inventory
│ └── test.yml
└── vars/ # 变量(优先级高)
└── main.yml
创建 Role
# 👀 使用 ansible-galaxy 创建角色
ansible-galaxy role init roles/nginx
# 👀 查看创建的结构
tree roles/nginx/
Role 示例:Nginx Role
# 👀 roles/nginx/defaults/main.yml
---
nginx_port: 80
nginx_server_name: localhost
nginx_workers: 4
nginx_keepalive_timeout: 65
app_root: /var/www/html
# 👀 roles/nginx/tasks/main.yml
---
- name: Install Nginx
apt:
name: nginx
state: present
update_cache: yes
notify: Start Nginx
- name: Configure Nginx workers
lineinfile:
path: /etc/nginx/nginx.conf
regexp: '^worker_processes'
line: "worker_processes {{ nginx_workers }};"
notify: Reload Nginx
- name: Deploy virtual host config
template:
src: vhost.conf.j2
dest: /etc/nginx/sites-available/{{ nginx_server_name }}.conf
notify: Reload Nginx
- name: Enable site
file:
src: /etc/nginx/sites-available/{{ nginx_server_name }}.conf
dest: /etc/nginx/sites-enabled/{{ nginx_server_name }}.conf
state: link
notify: Reload Nginx
- name: Create document root
file:
path: "{{ app_root }}"
state: directory
owner: www-data
group: www-data
mode: '0755'
# 👀 roles/nginx/handlers/main.yml
---
- name: Start Nginx
service:
name: nginx
state: started
enabled: yes
- name: Restart Nginx
service:
name: nginx
state: restarted
- name: Reload Nginx
service:
name: nginx
state: reloaded
# 👀 roles/nginx/templates/vhost.conf.j2
server {
listen {{ nginx_port }};
server_name {{ nginx_server_name }};
root {{ app_root }};
index index.html;
location / {
try_files $uri $uri/ =404;
}
client_max_body_size 10M;
keepalive_timeout {{ nginx_keepalive_timeout }};
}
使用 Role
# 👀 site.yml 使用 Role
---
- name: Deploy Web Infrastructure
hosts: webservers
become: yes
# 👀 导入角色
roles:
- role: nginx
when: "'web' in group_names"
- role: php-fpm
vars:
php_version: "8.1"
- role: firewall
tags: firewall
# 👀 导入多个角色
roles:
- common
- nginx
- mysql
- app
Role 依赖
# 👀 roles/nginx/meta/main.yml
# 👀 依赖其他角色
dependencies:
- role: common
vars:
timezone: Asia/Shanghai
常用模块详解
模块速查表
| 模块 | 用途 | 示例 |
|---|---|---|
| apt/yum | 安装软件包 | apt: name=nginx state=present |
| service | 管理服务 | service: name=nginx state=started |
| shell | 执行 Shell 命令 | shell: uptime >> /tmp/uptime.log |
| copy | 复制文件 | copy: src=file.txt dest=/tmp/ |
| template | 复制模板 | template: src=conf.j2 dest=/etc/conf |
| file | 管理文件和目录 | file: path=/tmp state=directory |
| user | 用户管理 | user: name=deployer shell=/bin/bash |
| group | 组管理 | group: name=www state=present |
| lineinfile | 修改文件内容 | lineinfile: path=file line="text" |
| command | 执行命令 | command: /usr/bin/foo creates=/tmp/bar |
| cron | 定时任务 | cron: name="backup" minute=0 job="/backup.sh" |
| debug | 调试输出 | debug: msg="{{ variable }}" |
| wait_for | 等待条件 | wait_for: port=3306 state=started |
核心模块详解
1. apt 模块(Debian/Ubuntu)
# 👀 安装软件包
- name: Install Nginx
apt:
name: nginx
state: present
# 👀 安装多个
- name: Install LAMP stack
apt:
name:
- nginx
- mysql-server
- php-fpm
state: present
update_cache: yes
# 👀 安装特定版本
- name: Install specific version
apt:
name: nginx=1.24.0
state: present
# 👀 卸载软件
- name: Remove Apache
apt:
name: apache2
state: absent
2. service 模块
# 👀 启动服务
- name: Start Nginx
service:
name: nginx
state: started
enabled: yes
# 👀 重启服务
- name: Restart MySQL
service:
name: mysql
state: restarted
# 👀 重载配置
- name: Reload Nginx
service:
name: nginx
state: reloaded
3. copy 模块
# 👀 复制文件
- name: Copy config file
copy:
src: myapp.conf
dest: /etc/myapp.conf
owner: root
group: root
mode: '0644'
backup: yes # 备份原文件
# 👀 复制目录
- name: Copy directory
copy:
src: /local/configs/
dest: /etc/myapp/
owner: root
group: root
mode: '0755'
directory_mode: '0755'
4. template 模块
# 👀 使用模板
- name: Deploy config from template
template:
src: app.conf.j2
dest: /etc/myapp/app.conf
owner: root
group: root
mode: '0644'
validate: '/usr/sbin/nginx -t -c %s' # 验证配置
notify: Restart App
5. lineinfile 模块
# 👀 确保一行存在
- name: Set timezone
lineinfile:
path: /etc/timezone
line: Asia/Shanghai
state: present
# 👀 替换匹配的行
- name: Configure max connections
lineinfile:
path: /etc/nginx/nginx.conf
regexp: '^worker_connections'
line: "worker_connections {{ max_connections }};"
# 👀 删除匹配的行
- name: Remove debug line
lineinfile:
path: /etc/app.conf
regexp: '^debug'
state: absent
# 👀 在文件末尾添加
- name: Add line to file
lineinfile:
path: /etc/hosts
line: "192.168.1.100 app-server"
state: present
6. user 模块
# 👀 创建用户
- name: Create deploy user
user:
name: deployer
comment: "Deployment User"
shell: /bin/bash
groups: sudo
append: yes
# 👀 创建系统用户(无登录)
- name: Create service account
user:
name: myapp
system: yes
shell: /usr/sbin/nologin
create_home: no
# 👀 设置密码
- name: Set user password
user:
name: deployer
password: "{{ 'secret123' | password_hash('sha512') }}"
# 👀 删除用户
- name: Remove user
user:
name: olduser
state: absent
remove: yes # 删除用户目录
实战案例
案例 1:批量部署 Web 应用
# 👀 playbooks/deploy-app.yml
---
- name: Deploy Web Application
hosts: webservers
become: yes
vars_files:
- vars/app.yml
tasks:
# 👀 1. 检查环境
- name: Check Python availability
shell: python3 --version
register: python_version
changed_when: false
# 👀 2. 创建应用用户
- name: Create app user
user:
name: "{{ app_user }}"
shell: /bin/bash
home: "{{ app_dir }}"
create_home: yes
# 👀 3. 创建目录结构
- name: Create app directories
file:
path: "{{ item }}"
state: directory
owner: "{{ app_user }}"
group: "{{ app_user }}"
mode: '0755'
loop:
- "{{ app_dir }}"
- "{{ app_dir }}/logs"
- "{{ app_dir }}/tmp"
# 👀 4. 安装系统依赖
- name: Install dependencies
apt:
name:
- python3
- python3-pip
- python3-venv
- nginx
state: present
update_cache: yes
# 👀 5. 部署应用代码
- name: Deploy application code
synchronize:
src: "{{ local_app_dir }}/"
dest: "{{ app_dir }}/current"
delete: yes
rsync_opts:
- "--exclude=.git"
- "--exclude=venv"
become_user: "{{ app_user }}"
# 👀 6. 安装 Python 依赖
- name: Install Python dependencies
pip:
requirements: "{{ app_dir }}/current/requirements.txt"
virtualenv: "{{ app_dir }}/venv"
virtualenv_command: python3 -m venv
become_user: "{{ app_user }}"
# 👀 7. 配置 Nginx
- name: Configure Nginx
template:
src: nginx-app.conf.j2
dest: /etc/nginx/sites-available/{{ app_name }}
notify: Reload Nginx
- name: Enable Nginx site
file:
src: /etc/nginx/sites-available/{{ app_name }}
dest: /etc/nginx/sites-enabled/{{ app_name }}
state: link
notify: Reload Nginx
# 👀 8. 配置 Systemd 服务
- name: Deploy systemd service
template:
src: app.service.j2
dest: /etc/systemd/system/{{ app_name }}.service
notify:
- Reload systemd
- Restart app
- name: Start app service
systemd:
name: "{{ app_name }}"
state: started
enabled: yes
daemon_reload: yes
handlers:
- name: Reload Nginx
service:
name: nginx
state: reloaded
- name: Restart app
systemd:
name: "{{ app_name }}"
state: restarted
- name: Reload systemd
shell: systemctl daemon-reload
案例 2:批量配置 MySQL 主从
# 👀 playbooks/mysql-replication.yml
---
- name: Setup MySQL Replication
hosts: dbservers
become: yes
vars:
mysql_port: 3306
replication_user: repl
replication_password: "{{ vault_replication_password }}"
tasks:
- name: Install MySQL
apt:
name:
- mysql-server
- python3-mysqldb
state: present
update_cache: yes
- name: Configure MySQL
template:
src: my.cnf.j2
dest: /etc/mysql/my.cnf
notify: Restart MySQL
- name: Set root password
mysql_user:
name: root
host: "{{ item }}"
password: "{{ mysql_root_password }}"
check_immediate_update: yes
loop:
- localhost
- 127.0.0.1
- "{{ ansible_fqdn }}"
- name: Create replication user
mysql_user:
name: "{{ replication_user }}"
host: "%"
password: "{{ replication_password }}"
priv: "*.*:REPLICATION SLAVE"
state: present
when: "'master' in group_names"
- name: Get master status
mysql_query:
query: SHOW MASTER STATUS
register: master_status
when: "'master' in group_names"
- name: Configure slave replication
mysql_query:
query: >
CHANGE MASTER TO
MASTER_HOST='{{ groups['dbservers_master'][0] }}',
MASTER_USER='{{ replication_user }}',
MASTER_PASSWORD='{{ replication_password }}',
MASTER_LOG_FILE='{{ hostvars[groups['dbservers_master'][0]].master_status.File }}',
MASTER_LOG_POS={{ hostvars[groups['dbservers_master'][0]].master_status.Position }};
when: "'slave' in group_names"
- name: Start slave
mysql_query:
query: START SLAVE
when: "'slave' in group_names"
handlers:
- name: Restart MySQL
service:
name: mysql
state: restarted
案例 3:初始化多台服务器
# 👀 playbooks/server-init.yml
---
- name: Initialize New Servers
hosts: newservers
become: yes
vars:
admin_users:
- name: admin
shell: /bin/bash
- name: deploy
shell: /bin/bash
ntp_server: pool.ntp.org
timezone: Asia/Shanghai
tasks:
# 👀 1. 更新系统
- name: Update apt cache and upgrade
apt:
upgrade: yes
update_cache: yes
autoremove: yes
when: ansible_os_family == "Debian"
- name: Update yum packages
yum:
name: '*'
state: latest
when: ansible_os_family == "RedHat"
# 👀 2. 安装基础软件
- name: Install common packages
apt:
name:
- vim
- curl
- wget
- git
- htop
- net-tools
- unzip
state: present
# 👀 3. 配置时区
- name: Set timezone
timezone:
name: "{{ timezone }}"
# 👀 4. 配置 NTP
- name: Install and configure NTP
template:
src: ntp.conf.j2
dest: /etc/ntp.conf
when: ansible_os_family == "Debian"
# 👀 5. 创建管理用户
- name: Create admin users
user:
name: "{{ item.name }}"
shell: "{{ item.shell }}"
groups: sudo
append: yes
loop: "{{ admin_users }}"
# 👀 6. 配置 SSH
- name: Configure SSH daemon
lineinfile:
path: /etc/ssh/sshd_config
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
loop:
- { regexp: '^PermitRootLogin', line: 'PermitRootLogin no' }
- { regexp: '^PasswordAuthentication', line: 'PasswordAuthentication no' }
notify: Restart SSH
# 👀 7. 配置防火墙
- name: Configure UFW
ufw:
rule: allow
port: "{{ item }}"
proto: tcp
loop:
- 22
- 80
- 443
when: ansible_distribution == "Ubuntu"
handlers:
- name: Restart SSH
service:
name: sshd
state: restarted
常见问题排查
问题 1:SSH 连接失败
现象: 执行 Ansible 时报错 “UNREACHABLE”
排查步骤:
# 👀 1. 手动测试 SSH 连接
ssh -i ~/.ssh/key.pem ubuntu@192.168.1.10
# 👀 2. 检查 SSH 密钥权限
ls -la ~/.ssh/
# 👀 3. 测试 Ansible 连接
ansible all -m ping -vvvv
# 👀 4. 检查主机清单
ansible-inventory -i inventory/hosts --list
解决方案:
| 可能原因 | 解决方法 |
|---|---|
| SSH 密钥权限错误 | chmod 600 ~/.ssh/id_rsa |
| 主机清单 IP 错误 | 确认 IP 地址正确 |
| SSH 端口不是 22 | 添加 ansible_port=2222 |
| 用户名错误 | 添加 ansible_user=ubuntu |
问题 2:执行权限不足
现象: 报错 “FAILED! => {…, ‘msg’: ‘Missing sudo password’}”
排查步骤:
# 👀 1. 检查 sudo 配置
ansible all -m shell -a "sudo -l" -K
# 👀 2. 检查 ansible.cfg 配置
grep -A 5 "\[privilege_escalation\]" ansible.cfg
解决方案:
# 👀 方法1:配置 ansible.cfg
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
# 👀 方法2:命令行指定
ansible-playbook site.yml -K
# 👀 方法3:免密 sudo
echo "ubuntu ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/ubuntu
问题 3:Playbook 执行失败
现象: 任务执行失败,但不知道原因
排查步骤:
# 👀 1. 使用详细模式
ansible-playbook site.yml -vvvv
# 👀 2. 测试单个任务
ansible-playbook site.yml --tags "install" --start-at-task="Install packages"
# 👀 3. 检查语法
ansible-playbook site.yml --syntax-check
# 👀 4. 列出任务(不执行)
ansible-playbook site.yml --list-tasks
# 👀 5. 检查模拟执行(check mode)
ansible-playbook site.yml --check
问题 4:任务执行慢
现象: Playbook 执行时间太长
排查优化:
# 👀 ansible.cfg 优化
[defaults]
# 增加并行数
forks = 20
# 关闭事实收集(如果不需要)
gather_facts = no
# 开启 SSH 流水线
[pipelining]
pipelining = True
[ssh_connection]
# SSH 优化
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
# 👀 使用异步执行
- name: Long running task
command: /tmp/long_task.sh
async: 3600 # 最大执行时间(秒)
poll: 0 # 不等待完成
register: job
- name: Check job status
async_status:
jid: "{{ job.ansible_job_id }}"
register: result
until: result.finished
retries: 100
delay: 30
问题 5:变量未定义
现象: 报错 “ansible undefined”
排查步骤:
# 👀 1. 列出所有变量
ansible all -m setup --tree /tmp/facts
# 👀 2. 查看特定变量
ansible webservers -m debug -a "var=hostvars"
# 👀 3. 添加默认值
- name: Set variable with default
set_fact:
my_var: "{{ my_var | default('default_value') }}"
总结
Ansible 架构回顾
┌─────────────────────────────────────────────────────────────┐
│ Ansible 自动化运维 │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Inventory │────▶│ Playbook │ │
│ │ (主机清单) │ │ (YAML) │ │
│ └─────────────┘ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Modules │ │
│ │ (2000+) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Executors │ │
│ │ (SSH/Local)│ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
核心要点
| 概念 | 说明 |
|---|---|
| Inventory | 定义要管理的主机 |
| Playbook | 定义要执行的任务 |
| Module | 执行具体操作的模块 |
| Role | 组织复杂任务的模块包 |
| Handler | 被任务触发的事件 |
| Variable | 可变的配置值 |
| Template | 可变的配置文件 |
黄金法则
- 先 inventory 再 playbook - 确认主机清单正确后再执行任务
- 使用 --check 模式 - 生产环境执行前先 dry run
- 合理使用 tags - 按需执行部分任务
- 使用 roles 组织代码 - 复杂任务用 roles 复用
- 幂等性优先 - 编写任务时考虑多次执行的结果
- 错误处理要完善 - 使用 block/rescue 处理异常
常用命令速查
# 👀 基础命令
ansible all -m ping # 测试连接
ansible-playbook site.yml # 执行 playbook
ansible-playbook site.yml --check # dry run
ansible-playbook site.yml -t "install" # 只执行 install 标签
# 👀 管理
ansible-inventory -i hosts --list # 列出主机
ansible-galaxy role install nginx # 安装 role
ansible-vault encrypt secrets.yml # 加密敏感文件
# 👀 调试
ansible-playbook site.yml -vvvv # 详细输出
ansible-playbook site.yml --list-tasks # 列出任务
ansible-playbook site.yml --syntax-check # 检查语法
持续更新中… 如有问题或建议,欢迎交流讨论!
评论区