侧边栏壁纸
  • 累计撰写 16 篇文章
  • 累计创建 1 个标签
  • 累计收到 4 条评论

目 录CONTENT

文章目录

Ansible 自动化运维指南(持续更新)

Ansible 自动化运维指南(持续更新)

前言

Ansible 是什么?

想象一下:你要管理 100 台服务器,每台服务器都要执行同样的操作:安装 Nginx、修改配置文件、启动服务。

如果你一台台登录进去操作,可能要花一整天。但如果有一个"遥控器",你按一下按钮,100 台服务器同时开始工作,10 分钟就搞定了。

Ansible 就是这个"遥控器"! 它能让你同时控制成千上万台服务器,自动完成各种运维任务。

在现代运维中,自动化是必备技能。Ansible 是最流行的自动化工具之一,它简单易用、功能强大,被无数公司用于配置管理、应用部署、自动化运维等场景。

本文档会告诉你:

  • Ansible 的核心概念和工作原理
  • 如何安装和配置 Ansible
  • 如何编写 Playbook 实现自动化任务
  • 如何管理主机清单和变量
  • 如何使用 Roles 组织复杂任务
  • 生产环境最佳实践

目录

  1. 核心概念详解
  2. Ansible 安装与配置
  3. 主机清单管理
  4. Ad-Hoc 临时命令
  5. Playbook 剧本
  6. 变量与模板
  7. Roles 角色
  8. 常用模块详解
  9. 实战案例
  10. 常见问题排查

核心概念详解

1. Ansible 是什么?

Ansible 是什么?

想象一下:你是乐队的指挥家。乐谱就是"Playbook",乐器就是"服务器"。你不用亲自去演奏每一种乐器,只需要挥动指挥棒(执行命令),所有乐器就会按照乐谱演奏。

Ansible 就是这个"指挥家",它让你用简单的语言描述任务,然后自动在多台服务器上执行。

Ansible 核心特点:

特点 说明
Agentless(无代理) 不需要在被管理服务器上安装软件,通过 SSH 通信
幂等性 同一个操作执行多次,结果都一样
YAML 语法 用人类可读的 YAML 格式编写任务
丰富的模块 2000+ 模块,覆盖各种运维场景
社区活跃 大量 Galaxy 角色可用

2. Ansible 工作原理

┌─────────────────────────────────────────────────────────────┐
│                      Ansible 工作流程                         │
│                                                              │
│  ┌─────────────┐                                           │
│  │   手动执行   │                                           │
│  │ ansible-playbook                                      │
│  └──────┬──────┘                                           │
│         │                                                   │
│         ▼                                                   │
│  ┌─────────────────────────────────────────┐               │
│  │           Ansible Engine                 │               │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐│               │
│  │  │ Inventory│  │   API   │  │ Modules ││               │
│  │  │ (主机清单) │  │ (接口)  │  │ (模块)  ││               │
│  │  └─────────┘  └─────────┘  └─────────┘│               │
│  │  ┌─────────┐  ┌─────────┐              │               │
│  │  │ Plugins │  │ Playbook│              │               │
│  │  │ (插件)  │  │ (剧本)  │              │               │
│  │  └─────────┘  └─────────┘              │               │
│  └─────────────────────────────────────────┘               │
│         │                                                   │
│         │ SSH                                               │
│         ▼                                                   │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                    被管理节点                            ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐            ││
│  │  │ Server 1 │  │ Server 2 │  │ Server N │            ││
│  │  │ (web-1)  │  │ (web-2)  │  │ (db-1)   │            ││
│  │  └──────────┘  └──────────┘  └──────────┘            ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

流程说明:

步骤 通俗解释
1. 读取 Inventory 确定要操作哪些服务器(就像名单)
2. 读取 Playbook 确定要执行什么任务(就像剧本)
3. 连接服务器 通过 SSH 连接到每台服务器
4. 执行模块 在服务器上运行 Ansible 模块
5. 返回结果 收集执行结果并汇总展示

3. 核心概念速查表

概念 通俗解释 类比
Inventory 服务器名单 通讯录
Playbook 任务剧本 演出剧本
Module 可执行的动作 演员的动作
Task 单个任务 剧本中的一场戏
Role 任务角色包 剧本的场景包
Handler 触发执行的任务 剧中剧的触发器
Fact 主机信息 演员的背景资料
Template 配置文件模板 可变台词

4. Ansible vs 其他工具

对比项 Ansible Chef Puppet SaltStack
架构 Agentless Agent Agent Agent/Agentless
语言 YAML Ruby Ruby DSL YAML
学习曲线
规模 数千台 数百台 数百台 数千台
执行速度
社区 非常活跃 活跃 活跃 一般

选择建议:

  • 中小企业、个人项目:选 Ansible,简单易用
  • 大型企业、复杂场景:可考虑 SaltStack(速度更快)
  • 已有 Ruby 技术栈:可选 Chef

Ansible 安装与配置

安装方式

方式一:pip 安装(推荐)

# 👀 安装 Ansible(Python 包管理)
pip install ansible

# 👀 验证安装
ansible --version

# 输出示例:
# ansible 2.10.x
#   config file = None
#   configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
#   ansible python module location = /usr/lib/python3.10/site-packages/ansible
#   executable location = /usr/bin/ansible

方式二:apt 安装(Ubuntu/Debian)

# 👀 更新软件源
sudo apt update

# 👀 安装 Ansible
sudo apt install ansible

# 👀 验证
ansible --version

方式三:yum 安装(CentOS/RHEL)

# 👀 安装 EPEL 仓库
sudo yum install epel-release

# 👀 安装 Ansible
sudo yum install ansible

# 👀 验证
ansible --version

目录结构

# 👀 Ansible 最佳实践目录结构
.
├── ansible.cfg          # Ansible 配置文件
├── inventory/           # 主机清单目录
│   ├── hosts            # 生产环境主机
│   └── hosts-dev        # 开发环境主机
├── playbooks/           # Playbook 目录
│   ├── site.yml         # 主剧本
│   └── webservers.yml   # Web 服务器剧本
├── roles/               # 角色目录
│   ├── common/          # 通用角色
│   └── nginx/           # Nginx 角色
├── plugins/              # 插件目录
├── library/              # 自定义模块
└── files/                # 静态文件
    └── templates/        # 模板文件

ansible.cfg 配置详解

# 👀 ansible.cfg 主配置文件

[defaults]
# 主机清单文件位置
inventory = ./inventory/hosts

# 每次执行命令的用户
remote_user = ubuntu

# 私有密钥文件
private_key_file = ~/.ssh/id_rsa

# 首次连接时确认主机指纹
host_key_checking = False

# 并行执行任务数
forks = 10

# 详细输出
# verbose = False (0) / -v / -vv / -vvv / -vvvv

# 失败时继续执行
# unlimited = True

# 日志文件
log_path = /var/log/ansible.log

# 角色路径
roles_path = ./roles

[privilege_escalation]
# 提权配置
become = True
become_method = sudo
become_user = root
become_ask_pass = False

[ssh_connection]
# SSH 连接优化
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

配置优先级(从高到低):

优先级 位置 说明
1 环境变量 ANSIBLE_CONFIG=xxx.cfg ansible ...
2 当前目录 ./ansible.cfg
3 用户家目录 ~/.ansible.cfg
4 系统目录 /etc/ansible/ansible.cfg

主机清单管理

主机清单是什么?

主机清单是什么?

想象一下:你要给 100 个人发邮件,你需要一个"通讯录"记录每个人的邮箱地址。主机清单就是 Ansible 的"通讯录",记录所有要管理的服务器信息。


基础主机清单

# 👀 inventory/hosts - 基础主机清单

# 👀 方式1:直接指定 IP(单台)
web1 ansible_host=192.168.1.10

# 👀 方式2:指定 IP 和端口
web2 ansible_host=192.168.1.11 ansible_port=2222

# 👀 方式3:使用主机名
db1 ansible_host=db.example.com

# 👀 方式4:指定 SSH 用户和密钥
app1 ansible_host=192.168.1.20 ansible_user=ubuntu ansible_private_key_file=~/.ssh/app_key

主机组管理

# 👀 主机组示例

# 👀 定义 Web 服务器组
[webservers]
web1 ansible_host=192.168.1.10
web2 ansible_host=192.168.1.11
web3 ansible_host=192.168.1.12

# 👀 定义数据库服务器组
[dbservers]
db1 ansible_host=192.168.1.20
db2 ansible_host=192.168.1.21

# 👀 定义所有生产服务器组
[production:children]
webservers
dbservers

# 👀 定义变量(组级别)
[webservers:vars]
nginx_version=1.24.0
http_port=80

[dbservers:vars]
mysql_version=8.0
datadir=/data/mysql

主机范围和正则

# 👀 主机范围示例

# 👀 数字范围
[webservers]
web[1:5] ansible_host=192.168.1.[10:14]
# 展开为:web1=192.168.1.10, web2=192.168.1.11, ...

# 👀 字母范围
[apps]
app[a:c] ansible_host=192.168.2.[10:12]
# 展开为:appa=192.168.2.10, appb=192.168.2.11, appc=192.168.2.12

# 👀 正则表达式
[monitoring]
~(node|web|database)-\d+\.example\.com

动态主机清单

动态主机清单是什么?

想象一下:你的服务器 IP 是动态分配的,每次启动都不一样。动态主机清单就是"自动更新的通讯录",自动从云服务商获取最新的服务器列表。

# 👀 动态主机清单脚本示例(AWS EC2)

# 👀 安装 EC2 插件
ansible-galaxy collection amazon.aws

# 👀 aws_ec2 动态清单配置
cat > inventory/aws_ec2.yml << 'EOF'
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
  - cn-north-1
filters:
  tag:Environment: production
keyed_groups:
  - key: tags['Role']
    prefix: role
  - key: instance_type
    prefix: type
EOF

# 👀 使用动态清单
ansible all -i inventory/aws_ec2.yml -m ping

主机清单变量

# 👀 主机级别变量

[webservers]
web1 ansible_host=192.168.1.10 nginx_workers=4
web2 ansible_host=192.168.1.11 nginx_workers=8

# 👀 验证主机清单
ansible-inventory -i inventory/hosts --list

# 输出示例:
# {
#     "webservers": {
#         "hosts": ["web1", "web2"]
#     },
#     "_meta": {
#         "hostvars": {
#             "web1": {"ansible_host": "192.168.1.10", "nginx_workers": 4},
#             "web2": {"ansible_host": "192.168.1.11", "nginx_workers": 8}
#         }
#     }
# }

Ad-Hoc 临时命令

Ad-Hoc 是什么?

Ad-Hoc 是什么?

想象一下:你不用每次都写剧本,只需要对着对讲机喊一声"所有人把灯关掉"。这就是 Ad-Hoc——一次性、临时性的命令。

使用场景:

  • 快速测试
  • 一次性操作
  • 简单任务

基础命令格式

# 👀 基础格式
ansible <主机> -m <模块> -a "<模块参数>"

# 👀 常用参数
# -i <inventory>  指定主机清单
# -m <module>     指定模块
# -a <args>       模块参数
# -k              询问 SSH 密码
# -K              询问 sudo 密码
# -v              详细输出
# --list-hosts    列出匹配的主机

常用命令示例

1. 测试连通性

# 👀 测试所有服务器连通性
ansible all -m ping

# 输出示例:
# web1 | SUCCESS => {
#     "ansible_facts": {
#         "discovered_interpreter_python": "/usr/bin/python3"
#     },
#     "changed": false,
#     "ping": "pong"
# }

# 👀 只列出匹配的主机,不执行
ansible webservers --list-hosts

2. 执行 Shell 命令

# 👀 查看服务器 uptime
ansible all -m shell -a "uptime"

# 👀 查看内存使用
ansible all -m shell -a "free -h"

# 👀 查看磁盘使用
ansible all -m shell -a "df -h"

# 👀 执行多个命令
ansible all -m shell -a "cd /tmp && ls -la"

3. 复制文件

# 👀 复制文件到远程服务器
ansible all -m copy -a "src=./file.txt dest=/tmp/file.txt"

# 👀 复制并修改权限
ansible all -m copy -a "src=./script.sh dest=/tmp/script.sh mode=0755"

# 👀 复制目录
ansible all -m copy -a "src=./config dest=/etc/myconfig directory_mode=0755"

4. 管理服务

# 👀 启动服务
ansible webservers -m service -a "name=nginx state=started"

# 👀 停止服务
ansible webservers -m service -a "name=nginx state=stopped"

# 👀 重启服务
ansible webservers -m service -a "name=nginx state=restarted"

# 👀 开机自启
ansible webservers -m service -a "name=nginx state=started enabled=yes"

5. 安装软件包

# 👀 Ubuntu/Debian 安装
ansible all -m apt -a "name=nginx state=present"        # 安装
ansible all -m apt -a "name=nginx state=latest"         # 更新
ansible all -m apt -a "name=nginx state=absent"         # 卸载

# 👀 CentOS/RHEL 安装
ansible all -m yum -a "name=nginx state=present"
ansible all -m dnf -a "name=nginx state=present"

6. 用户管理

# 👀 创建用户
ansible all -m user -a "name=deployer comment='Deploy User' shell=/bin/bash"

# 👀 创建用户并设置密码
ansible all -m user -a "name=deployer password={{ 'password123' | password_hash('sha512') }}"

# 👀 删除用户
ansible all -m user -a "name=deployer state=absent"

7. 文件权限

# 👀 修改文件所有者
ansible all -m file -a "path=/data owner=www-data group=www-data"

# 👀 创建目录
ansible all -m file -a "path=/data/backups state=directory mode=0755"

# 👀 创建软链接
ansible all -m file -a "src=/data/www dest=/var/www state=link"

8. 收集主机信息

# 👀 收集所有主机信息(Fact)
ansible all -m setup

# 👀 只看内存信息
ansible all -m setup -a "filter=*memory*"

# 👀 只看 CPU 信息
ansible all -m setup -a "filter=*processor*"

# 👀 只看网络信息
ansible all -m setup -a "filter=*ipv4*"

Playbook 剧本

Playbook 是什么?

Playbook 是什么?

想象一下:你要演出《西游记》,需要按顺序演:悟空出世 → 大闹天宫 → 取经路 → 真经归来。Playbook 就是这样的"剧本",它定义了任务的执行顺序。

Playbook 核心特点:

特点 说明
YAML 格式 人类可读
声明式 描述最终状态,而不是步骤
幂等性 多次执行结果一致
顺序执行 按定义顺序执行任务

Playbook 基础结构

# 👀 基础 Playbook 示例:部署 Nginx

# 1. 声明式:定义这个 Playbook 的目标
- name: Deploy Nginx Web Server          # 剧本名称
  hosts: webservers                      # 在哪些服务器上执行
  become: yes                            # 是否提权
  vars:                                 # 变量定义
    nginx_version: "1.24.0"
    nginx_port: 80

  # 2. 执行前检查
  pre_tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
      when: ansible_os_family == "Debian"

  # 3. 执行任务
  tasks:
    # 任务1:安装 Nginx
    - name: Install Nginx
      apt:
        name: nginx
        state: present
      notify: Start Nginx                # 触发 Handler

    # 任务2:复制配置文件
    - name: Copy Nginx Config
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: Reload Nginx

    # 任务3:创建网站目录
    - name: Create web directory
      file:
        path: /var/www/html
        state: directory
        owner: www-data
        group: www-data

  # 4. 执行后清理
  post_tasks:
    - name: Verify Nginx is running
      service:
        name: nginx
        state: started

  # 5. Handler 定义(被 notify 触发)
  handlers:
    - name: Start Nginx
      service:
        name: nginx
        state: started

    - name: Reload Nginx
      service:
        name: nginx
        state: reloaded

Playbook 完整示例

# 👀 playbooks/deploy-nginx.yml
---
- name: Deploy Nginx to Web Servers
  hosts: webservers
  remote_user: ubuntu
  become: yes
    
  # 👀 变量定义
  vars:
    app_name: myapp
    app_port: 8080
    nginx_workers: 4
    
  # 👀 环境信息收集
  pre_tasks:
    - name: Gather OS Facts
      setup:
        filter: '*distribution*'
    
    - name: Show OS Info
      debug:
        msg: "Deploying to {{ ansible_distribution }} {{ ansible_distribution_version }}"
    
  # 👀 主要任务
  tasks:
    
    # 👀 任务1:安装 Nginx
    - name: Install Nginx and required packages
      apt:
        name:
          - nginx
          - python3-pip
        state: present
        update_cache: yes
      tags: install
    
    # 👀 任务2:配置 Nginx
    - name: Configure Nginx worker processes
      lineinfile:
        path: /etc/nginx/nginx.conf
        regexp: '^worker_processes'
        line: "worker_processes {{ nginx_workers }};"
      when: ansible_processor_vcpus is defined
      notify: Restart Nginx
      tags: config
    
    # 👀 任务3:复制配置文件模板
    - name: Deploy Nginx virtual host config
      template:
        src: vhost.conf.j2
        dest: /etc/nginx/sites-available/{{ app_name }}.conf
        mode: '0644'
      notify: Restart Nginx
      tags: config
    
    # 👀 任务4:启用站点
    - name: Enable Nginx site
      file:
        src: /etc/nginx/sites-available/{{ app_name }}.conf
        dest: /etc/nginx/sites-enabled/{{ app_name }}.conf
        state: link
      notify: Restart Nginx
      tags: config
    
    # 👀 任务5:创建网页目录
    - name: Create application directory
      file:
        path: /var/www/{{ app_name }}
        state: directory
        owner: www-data
        group: www-data
        mode: '0755'
      tags: files
    
    # 👀 任务6:部署网页文件
    - name: Deploy index.html
      copy:
        content: |
          <!DOCTYPE html>
          <html>
          <head><title>{{ app_name }}</title></head>
          <body>
          <h1>Welcome to {{ app_name }}</h1>
          <p>Server: {{ ansible_hostname }}</p>
          <p>OS: {{ ansible_distribution }} {{ ansible_distribution_version }}</p>
          </body>
          </html>
        dest: /var/www/{{ app_name }}/index.html
        owner: www-data
        group: www-data
        mode: '0644'
      tags: deploy
    
    # 👀 任务7:测试 Nginx 配置
    - name: Test Nginx configuration
      command: nginx -t
      register: nginx_test
      changed_when: false
      tags: test
    
    - name: Show Nginx test result
      debug:
        msg: "{{ nginx_test.stdout }}"
      tags: test

  # 👀 任务8:防火墙配置
    - name: Configure UFW firewall
      ufw:
        rule: allow
        port: "{{ nginx_port }}"
        proto: tcp
      when: ansible_distribution == "Ubuntu"
      tags: firewall

  # 👀 Handler 定义
  handlers:
    - name: Restart Nginx
      service:
        name: nginx
        state: restarted
    
    - name: Reload Nginx
      service:
        name: nginx
        state: reloaded

条件执行

# 👀 when 条件示例

- name: Install Apache on Debian/Ubuntu
  apt:
    name: apache2
    state: present
  when: ansible_os_family == "Debian"

- name: Install Apache on RHEL/CentOS
  yum:
    name: httpd
    state: present
  when: ansible_os_family == "RedHat"

# 👀 多个条件
- name: Install for specific version
  apt:
    name: nginx=1.24.0
    state: present
  when:
    - ansible_os_family == "Debian"
    - ansible_distribution_version is version('22.04', '>=')

循环执行

# 👀 loop 循环示例

# 👀 安装多个软件包
- name: Install multiple packages
  apt:
    name: "{{ item }}"
    state: present
  loop:
    - nginx
    - mysql-server
    - php-fpm

# 👀 创建多个用户
- name: Create multiple users
  user:
    name: "{{ item.name }}"
    shell: "{{ item.shell | default('/bin/bash') }}"
    groups: "{{ item.groups | default('') }}"
  loop:
    - { name: 'deployer', groups: 'www-data' }
    - { name: 'developer', shell: '/bin/zsh' }
    - { name: 'monitor', groups: 'monitor' }

# 👀 with_items 简化语法
- name: Create directories
  file:
    path: "{{ item }}"
    state: directory
    mode: '0755'
  with_items:
    - /data/backups
    - /data/logs
    - /data/uploads

# 👀 with_dict 字典循环
- name: Configure application settings
  lineinfile:
    path: /etc/app/config.conf
    regexp: "^{{ item.key }}"
    line: "{{ item.key }} = {{ item.value }}"
  with_dict:
    app_name: myapp
    max_connections: 1000
    timeout: 30

错误处理

# 👀 ignore_errors 忽略错误

- name: Try to backup, continue even if failed
  shell: backup.sh
  register: backup_result
  ignore_errors: yes

# 👀 failed_when 自定义失败条件

- name: Check disk space
  shell: df -h | grep /dev/sda1 | awk '{print $5}' | sed 's/%//'
  register: disk_usage
  failed_when: disk_usage.stdout | int > 90

# 👀 force_handlers 强制执行 Handler

- name: Deploy application
  hosts: webservers
  force_handlers: yes
  tasks:
    - name: Install app
      apt:
        name: myapp
        state: present
    
    - name: Fail intentionally
      shell: exit 1
    
    - name: This task will not execute
      debug:
        msg: "This will not run"

# �‍♀️ block/rescue 错误捕获

- name: Deploy with rollback
  block:
    - name: Backup current version
      shell: backup.sh
    
    - name: Deploy new version
      shell: deploy.sh
    
    - name: Verify deployment
      shell: verify.sh
  rescue:
    - name: Rollback on failure
      shell: rollback.sh
    - name: Notify failure
      debug:
        msg: "Deployment failed, rolled back"

任务委托

# 👀 delegate_to 委托到其他主机

# 👀 在本地生成配置,然后推送到远程
- name: Generate configuration locally
  template:
    src: app.conf.j2
    dest: /tmp/app.conf
  delegate_to: localhost
  changed_when: true

- name: Deploy generated config
  copy:
    src: /tmp/app.conf
    dest: /etc/app.conf

# 👀 通知某个特定服务器
- name: Reload load balancer
  service:
    name: haproxy
    state: reloaded
  delegate_to: lb01

# 👀 本地执行一次(once)
- name: Notify monitoring system once
  debug:
    msg: "Deployment completed"
  delegate_to: localhost
  run_once: true

标签管理

# 👀 给任务打标签

tasks:
  - name: Install packages
    apt:
      name: "{{ packages }}"
    tags: [install, packages]
  
  - name: Configure application
    template:
      src: app.conf.j2
      dest: /etc/app.conf
    tags: [config, template]

  - name: Start services
    service:
      name: "{{ service_name }}"
      state: started
    tags: [service, start]

# 👀 使用标签执行
ansible-playbook site.yml --tags "install,config"    # 只执行 install 和 config
ansible-playbook site.yml --skip-tags "service"      # 跳过 service 标签
ansible-playbook site.yml --list-tags                # 列出所有标签

变量与模板

变量基础

变量是什么?

想象一下:你要写很多封信,内容基本一样,但收件人名字不同。变量就像"占位符",让你写一封信模板,收件人名字自动替换。

# 👀 变量定义示例

vars:
  app_name: myapp
  app_version: "1.0.0"
  app_port: 8080
  enabled: true

# 👀 在任务中使用变量
- name: Print app info
  debug:
    msg: "{{ app_name }} v{{ app_version }} is running on port {{ app_port }}"

变量来源

# 👀 1. Playbook 中定义
vars:
  http_port: 80

# 👀 2. inventory 中定义
# [webservers]
# web1 http_port=80

# 👀 3. 命令行传递
# ansible-playbook site.yml --extra-vars "http_port=8080"

# 👀 4. 文件中定义
- name: Load variables from file
  include_vars:
    file: vars/app.yml

# 👀 5. 主机 facts
# ansible_hostname, ansible_ip, ansible_os_family 等

Jinja2 模板

# 👀 模板文件:config.conf.j2

# 👀 基本变量
app_name = {{ app_name }}
app_version = {{ app_version }}
port = {{ port }}

# 👀 条件判断
{% if enable_cache %}
cache_enabled = true
cache_size = {{ cache_size }}
{% else %}
cache_enabled = false
{% endif %}

# 👀 循环
{% for user in allowed_users %}
allow_user = {{ user }}
{% endfor %}

# 👀 过滤器
version = {{ app_version | upper }}
date = {{ ansible_date_time.iso8601 }}

# 👀 默认值
log_level = {{ log_level | default('INFO') }}

# 👀 算术运算
max_connections = {{ workers * 50 }}

变量文件分离

# 👀 vars/app.yml
---
app_name: myapp
app_version: "2.0.0"
app_port: 8080
database:
  host: localhost
  port: 3306
  name: myapp_db
# 👀 vars/secrets.yml(敏感变量)
---
db_password: "your_secure_password"
api_key: "sk-xxxxx"
# 👀 Playbook 中引用
- name: Deploy application
  hosts: webservers
  vars_files:
    - vars/app.yml
    - vars/secrets.yml
  
  tasks:
    - name: Display app info
      debug:
        msg: "Deploying {{ app_name }} v{{ app_version }}"

Roles 角色

Role 是什么?

Role 是什么?

想象一下:你要装修很多房子,每套房子都需要:水电工、木工、油漆工。如果每个房子都单独找工人,很麻烦。

Role 就像"装修包",把相关的任务(配置 Nginx、安装 PHP、配置防火墙)打包成一个可复用的模块。

Role 目录结构:

# 👀 roles/nginx 目录结构
roles/nginx/
├── defaults/              # 默认变量(优先级最低)
│   └── main.yml
├── files/                 # 静态文件
│   ├── nginx.conf
│   └── mime.types
├── handlers/              # Handler
│   └── main.yml
├── meta/                  # 角色依赖
│   └── main.yml
├── tasks/                 # 任务
│   └── main.yml
├── templates/              # 模板文件
│   └── vhost.conf.j2
├── tests/                 # 测试
│   ├── inventory
│   └── test.yml
└── vars/                  # 变量(优先级高)
    └── main.yml

创建 Role

# 👀 使用 ansible-galaxy 创建角色
ansible-galaxy role init roles/nginx

# 👀 查看创建的结构
tree roles/nginx/

Role 示例:Nginx Role

# 👀 roles/nginx/defaults/main.yml
---
nginx_port: 80
nginx_server_name: localhost
nginx_workers: 4
nginx_keepalive_timeout: 65
app_root: /var/www/html
# 👀 roles/nginx/tasks/main.yml
---
- name: Install Nginx
  apt:
    name: nginx
    state: present
    update_cache: yes
  notify: Start Nginx

- name: Configure Nginx workers
  lineinfile:
    path: /etc/nginx/nginx.conf
    regexp: '^worker_processes'
    line: "worker_processes {{ nginx_workers }};"
  notify: Reload Nginx

- name: Deploy virtual host config
  template:
    src: vhost.conf.j2
    dest: /etc/nginx/sites-available/{{ nginx_server_name }}.conf
  notify: Reload Nginx

- name: Enable site
  file:
    src: /etc/nginx/sites-available/{{ nginx_server_name }}.conf
    dest: /etc/nginx/sites-enabled/{{ nginx_server_name }}.conf
    state: link
  notify: Reload Nginx

- name: Create document root
  file:
    path: "{{ app_root }}"
    state: directory
    owner: www-data
    group: www-data
    mode: '0755'
# 👀 roles/nginx/handlers/main.yml
---
- name: Start Nginx
  service:
    name: nginx
    state: started
    enabled: yes

- name: Restart Nginx
  service:
    name: nginx
    state: restarted

- name: Reload Nginx
  service:
    name: nginx
    state: reloaded
# 👀 roles/nginx/templates/vhost.conf.j2
server {
    listen {{ nginx_port }};
    server_name {{ nginx_server_name }};
    
    root {{ app_root }};
    index index.html;
    
    location / {
        try_files $uri $uri/ =404;
    }
    
    client_max_body_size 10M;
    
    keepalive_timeout {{ nginx_keepalive_timeout }};
}

使用 Role

# 👀 site.yml 使用 Role

---
- name: Deploy Web Infrastructure
  hosts: webservers
  become: yes
    
  # 👀 导入角色
  roles:
    - role: nginx
      when: "'web' in group_names"
    
    - role: php-fpm
      vars:
        php_version: "8.1"
    
    - role: firewall
      tags: firewall

# 👀 导入多个角色
  roles:
    - common
    - nginx
    - mysql
    - app

Role 依赖

# 👀 roles/nginx/meta/main.yml

# 👀 依赖其他角色
dependencies:
  - role: common
    vars:
      timezone: Asia/Shanghai

常用模块详解

模块速查表

模块 用途 示例
apt/yum 安装软件包 apt: name=nginx state=present
service 管理服务 service: name=nginx state=started
shell 执行 Shell 命令 shell: uptime >> /tmp/uptime.log
copy 复制文件 copy: src=file.txt dest=/tmp/
template 复制模板 template: src=conf.j2 dest=/etc/conf
file 管理文件和目录 file: path=/tmp state=directory
user 用户管理 user: name=deployer shell=/bin/bash
group 组管理 group: name=www state=present
lineinfile 修改文件内容 lineinfile: path=file line="text"
command 执行命令 command: /usr/bin/foo creates=/tmp/bar
cron 定时任务 cron: name="backup" minute=0 job="/backup.sh"
debug 调试输出 debug: msg="{{ variable }}"
wait_for 等待条件 wait_for: port=3306 state=started

核心模块详解

1. apt 模块(Debian/Ubuntu)

# 👀 安装软件包
- name: Install Nginx
  apt:
    name: nginx
    state: present

# 👀 安装多个
- name: Install LAMP stack
  apt:
    name:
      - nginx
      - mysql-server
      - php-fpm
    state: present
    update_cache: yes

# 👀 安装特定版本
- name: Install specific version
  apt:
    name: nginx=1.24.0
    state: present

# 👀 卸载软件
- name: Remove Apache
  apt:
    name: apache2
    state: absent

2. service 模块

# 👀 启动服务
- name: Start Nginx
  service:
    name: nginx
    state: started
    enabled: yes

# 👀 重启服务
- name: Restart MySQL
  service:
    name: mysql
    state: restarted

# 👀 重载配置
- name: Reload Nginx
  service:
    name: nginx
    state: reloaded

3. copy 模块

# 👀 复制文件
- name: Copy config file
  copy:
    src: myapp.conf
    dest: /etc/myapp.conf
    owner: root
    group: root
    mode: '0644'
    backup: yes          # 备份原文件

# 👀 复制目录
- name: Copy directory
  copy:
    src: /local/configs/
    dest: /etc/myapp/
    owner: root
    group: root
    mode: '0755'
    directory_mode: '0755'

4. template 模块

# 👀 使用模板
- name: Deploy config from template
  template:
    src: app.conf.j2
    dest: /etc/myapp/app.conf
    owner: root
    group: root
    mode: '0644'
    validate: '/usr/sbin/nginx -t -c %s'  # 验证配置
  notify: Restart App

5. lineinfile 模块

# 👀 确保一行存在
- name: Set timezone
  lineinfile:
    path: /etc/timezone
    line: Asia/Shanghai
    state: present

# 👀 替换匹配的行
- name: Configure max connections
  lineinfile:
    path: /etc/nginx/nginx.conf
    regexp: '^worker_connections'
    line: "worker_connections {{ max_connections }};"

# 👀 删除匹配的行
- name: Remove debug line
  lineinfile:
    path: /etc/app.conf
    regexp: '^debug'
    state: absent

# 👀 在文件末尾添加
- name: Add line to file
  lineinfile:
    path: /etc/hosts
    line: "192.168.1.100 app-server"
    state: present

6. user 模块

# 👀 创建用户
- name: Create deploy user
  user:
    name: deployer
    comment: "Deployment User"
    shell: /bin/bash
    groups: sudo
    append: yes

# 👀 创建系统用户(无登录)
- name: Create service account
  user:
    name: myapp
    system: yes
    shell: /usr/sbin/nologin
    create_home: no

# 👀 设置密码
- name: Set user password
  user:
    name: deployer
    password: "{{ 'secret123' | password_hash('sha512') }}"

# 👀 删除用户
- name: Remove user
  user:
    name: olduser
    state: absent
    remove: yes          # 删除用户目录

实战案例

案例 1:批量部署 Web 应用

# 👀 playbooks/deploy-app.yml
---
- name: Deploy Web Application
  hosts: webservers
  become: yes
  vars_files:
    - vars/app.yml

  tasks:
    # 👀 1. 检查环境
    - name: Check Python availability
      shell: python3 --version
      register: python_version
      changed_when: false

    # 👀 2. 创建应用用户
    - name: Create app user
      user:
        name: "{{ app_user }}"
        shell: /bin/bash
        home: "{{ app_dir }}"
        create_home: yes

    # 👀 3. 创建目录结构
    - name: Create app directories
      file:
        path: "{{ item }}"
        state: directory
        owner: "{{ app_user }}"
        group: "{{ app_user }}"
        mode: '0755'
      loop:
        - "{{ app_dir }}"
        - "{{ app_dir }}/logs"
        - "{{ app_dir }}/tmp"

    # 👀 4. 安装系统依赖
    - name: Install dependencies
      apt:
        name:
          - python3
          - python3-pip
          - python3-venv
          - nginx
        state: present
        update_cache: yes

    # 👀 5. 部署应用代码
    - name: Deploy application code
      synchronize:
        src: "{{ local_app_dir }}/"
        dest: "{{ app_dir }}/current"
        delete: yes
        rsync_opts:
          - "--exclude=.git"
          - "--exclude=venv"
      become_user: "{{ app_user }}"

    # 👀 6. 安装 Python 依赖
    - name: Install Python dependencies
      pip:
        requirements: "{{ app_dir }}/current/requirements.txt"
        virtualenv: "{{ app_dir }}/venv"
        virtualenv_command: python3 -m venv
      become_user: "{{ app_user }}"

    # 👀 7. 配置 Nginx
    - name: Configure Nginx
      template:
        src: nginx-app.conf.j2
        dest: /etc/nginx/sites-available/{{ app_name }}
      notify: Reload Nginx

    - name: Enable Nginx site
      file:
        src: /etc/nginx/sites-available/{{ app_name }}
        dest: /etc/nginx/sites-enabled/{{ app_name }}
        state: link
      notify: Reload Nginx

    # 👀 8. 配置 Systemd 服务
    - name: Deploy systemd service
      template:
        src: app.service.j2
        dest: /etc/systemd/system/{{ app_name }}.service
      notify:
        - Reload systemd
        - Restart app

    - name: Start app service
      systemd:
        name: "{{ app_name }}"
        state: started
        enabled: yes
        daemon_reload: yes

  handlers:
    - name: Reload Nginx
      service:
        name: nginx
        state: reloaded

    - name: Restart app
      systemd:
        name: "{{ app_name }}"
        state: restarted

    - name: Reload systemd
      shell: systemctl daemon-reload

案例 2:批量配置 MySQL 主从

# 👀 playbooks/mysql-replication.yml
---
- name: Setup MySQL Replication
  hosts: dbservers
  become: yes
  vars:
    mysql_port: 3306
    replication_user: repl
    replication_password: "{{ vault_replication_password }}"

  tasks:
    - name: Install MySQL
      apt:
        name:
          - mysql-server
          - python3-mysqldb
        state: present
        update_cache: yes

    - name: Configure MySQL
      template:
        src: my.cnf.j2
        dest: /etc/mysql/my.cnf
      notify: Restart MySQL

    - name: Set root password
      mysql_user:
        name: root
        host: "{{ item }}"
        password: "{{ mysql_root_password }}"
        check_immediate_update: yes
      loop:
        - localhost
        - 127.0.0.1
        - "{{ ansible_fqdn }}"

    - name: Create replication user
      mysql_user:
        name: "{{ replication_user }}"
        host: "%"
        password: "{{ replication_password }}"
        priv: "*.*:REPLICATION SLAVE"
        state: present
      when: "'master' in group_names"

    - name: Get master status
      mysql_query:
        query: SHOW MASTER STATUS
      register: master_status
      when: "'master' in group_names"

    - name: Configure slave replication
      mysql_query:
        query: >
          CHANGE MASTER TO
          MASTER_HOST='{{ groups['dbservers_master'][0] }}',
          MASTER_USER='{{ replication_user }}',
          MASTER_PASSWORD='{{ replication_password }}',
          MASTER_LOG_FILE='{{ hostvars[groups['dbservers_master'][0]].master_status.File }}',
          MASTER_LOG_POS={{ hostvars[groups['dbservers_master'][0]].master_status.Position }};
      when: "'slave' in group_names"

    - name: Start slave
      mysql_query:
        query: START SLAVE
      when: "'slave' in group_names"

  handlers:
    - name: Restart MySQL
      service:
        name: mysql
        state: restarted

案例 3:初始化多台服务器

# 👀 playbooks/server-init.yml
---
- name: Initialize New Servers
  hosts: newservers
  become: yes
  vars:
    admin_users:
      - name: admin
        shell: /bin/bash
      - name: deploy
        shell: /bin/bash
    ntp_server: pool.ntp.org
    timezone: Asia/Shanghai

  tasks:
    # 👀 1. 更新系统
    - name: Update apt cache and upgrade
      apt:
        upgrade: yes
        update_cache: yes
        autoremove: yes
      when: ansible_os_family == "Debian"

    - name: Update yum packages
      yum:
        name: '*'
        state: latest
      when: ansible_os_family == "RedHat"

    # 👀 2. 安装基础软件
    - name: Install common packages
      apt:
        name:
          - vim
          - curl
          - wget
          - git
          - htop
          - net-tools
          - unzip
        state: present

    # 👀 3. 配置时区
    - name: Set timezone
      timezone:
        name: "{{ timezone }}"

    # 👀 4. 配置 NTP
    - name: Install and configure NTP
      template:
        src: ntp.conf.j2
        dest: /etc/ntp.conf
      when: ansible_os_family == "Debian"

    # 👀 5. 创建管理用户
    - name: Create admin users
      user:
        name: "{{ item.name }}"
        shell: "{{ item.shell }}"
        groups: sudo
        append: yes
      loop: "{{ admin_users }}"

    # 👀 6. 配置 SSH
    - name: Configure SSH daemon
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^PermitRootLogin', line: 'PermitRootLogin no' }
        - { regexp: '^PasswordAuthentication', line: 'PasswordAuthentication no' }
      notify: Restart SSH

    # 👀 7. 配置防火墙
    - name: Configure UFW
      ufw:
        rule: allow
        port: "{{ item }}"
        proto: tcp
      loop:
        - 22
        - 80
        - 443
      when: ansible_distribution == "Ubuntu"

  handlers:
    - name: Restart SSH
      service:
        name: sshd
        state: restarted

常见问题排查

问题 1:SSH 连接失败

现象: 执行 Ansible 时报错 “UNREACHABLE”

排查步骤:

# 👀 1. 手动测试 SSH 连接
ssh -i ~/.ssh/key.pem ubuntu@192.168.1.10

# 👀 2. 检查 SSH 密钥权限
ls -la ~/.ssh/

# 👀 3. 测试 Ansible 连接
ansible all -m ping -vvvv

# 👀 4. 检查主机清单
ansible-inventory -i inventory/hosts --list

解决方案:

可能原因 解决方法
SSH 密钥权限错误 chmod 600 ~/.ssh/id_rsa
主机清单 IP 错误 确认 IP 地址正确
SSH 端口不是 22 添加 ansible_port=2222
用户名错误 添加 ansible_user=ubuntu

问题 2:执行权限不足

现象: 报错 “FAILED! => {…, ‘msg’: ‘Missing sudo password’}”

排查步骤:

# 👀 1. 检查 sudo 配置
ansible all -m shell -a "sudo -l" -K

# 👀 2. 检查 ansible.cfg 配置
grep -A 5 "\[privilege_escalation\]" ansible.cfg

解决方案:

# 👀 方法1:配置 ansible.cfg
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False

# 👀 方法2:命令行指定
ansible-playbook site.yml -K

# 👀 方法3:免密 sudo
echo "ubuntu ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/ubuntu

问题 3:Playbook 执行失败

现象: 任务执行失败,但不知道原因

排查步骤:

# 👀 1. 使用详细模式
ansible-playbook site.yml -vvvv

# 👀 2. 测试单个任务
ansible-playbook site.yml --tags "install" --start-at-task="Install packages"

# 👀 3. 检查语法
ansible-playbook site.yml --syntax-check

# 👀 4. 列出任务(不执行)
ansible-playbook site.yml --list-tasks

# 👀 5. 检查模拟执行(check mode)
ansible-playbook site.yml --check

问题 4:任务执行慢

现象: Playbook 执行时间太长

排查优化:

# 👀 ansible.cfg 优化

[defaults]
# 增加并行数
forks = 20

# 关闭事实收集(如果不需要)
gather_facts = no

# 开启 SSH 流水线
[pipelining]
pipelining = True

[ssh_connection]
# SSH 优化
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
# 👀 使用异步执行
- name: Long running task
  command: /tmp/long_task.sh
  async: 3600      # 最大执行时间(秒)
  poll: 0          # 不等待完成
  register: job

- name: Check job status
  async_status:
    jid: "{{ job.ansible_job_id }}"
  register: result
  until: result.finished
  retries: 100
  delay: 30

问题 5:变量未定义

现象: 报错 “ansible undefined”

排查步骤:

# 👀 1. 列出所有变量
ansible all -m setup --tree /tmp/facts

# 👀 2. 查看特定变量
ansible webservers -m debug -a "var=hostvars"

# 👀 3. 添加默认值
- name: Set variable with default
  set_fact:
    my_var: "{{ my_var | default('default_value') }}"

总结

Ansible 架构回顾

┌─────────────────────────────────────────────────────────────┐
│                    Ansible 自动化运维                         │
│                                                              │
│  ┌─────────────┐     ┌─────────────┐                       │
│  │  Inventory  │────▶│  Playbook   │                       │
│  │  (主机清单)  │     │   (YAML)    │                       │
│  └─────────────┘     └──────┬──────┘                       │
│                             │                               │
│                      ┌──────▼──────┐                       │
│                      │   Modules   │                       │
│                      │  (2000+)    │                       │
│                      └──────┬──────┘                       │
│                             │                               │
│                      ┌──────▼──────┐                       │
│                      │  Executors  │                       │
│                      │  (SSH/Local)│                       │
│                      └─────────────┘                       │
└─────────────────────────────────────────────────────────────┘

核心要点

概念 说明
Inventory 定义要管理的主机
Playbook 定义要执行的任务
Module 执行具体操作的模块
Role 组织复杂任务的模块包
Handler 被任务触发的事件
Variable 可变的配置值
Template 可变的配置文件

黄金法则

  1. 先 inventory 再 playbook - 确认主机清单正确后再执行任务
  2. 使用 --check 模式 - 生产环境执行前先 dry run
  3. 合理使用 tags - 按需执行部分任务
  4. 使用 roles 组织代码 - 复杂任务用 roles 复用
  5. 幂等性优先 - 编写任务时考虑多次执行的结果
  6. 错误处理要完善 - 使用 block/rescue 处理异常

常用命令速查

# 👀 基础命令
ansible all -m ping                          # 测试连接
ansible-playbook site.yml                    # 执行 playbook
ansible-playbook site.yml --check            # dry run
ansible-playbook site.yml -t "install"       # 只执行 install 标签

# 👀 管理
ansible-inventory -i hosts --list            # 列出主机
ansible-galaxy role install nginx            # 安装 role
ansible-vault encrypt secrets.yml            # 加密敏感文件

# 👀 调试
ansible-playbook site.yml -vvvv              # 详细输出
ansible-playbook site.yml --list-tasks       # 列出任务
ansible-playbook site.yml --syntax-check      # 检查语法

持续更新中… 如有问题或建议,欢迎交流讨论!

0

评论区