背景

某项目测试环境jenkins突然执行不了脚本,错误如下:

$ /bin/sh -xe /tmp/jenkins3171686575384136508.sh
FATAL: command execution failed
java.io.IOException: error=13, Permission denied
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
Caused: java.io.IOException: Cannot run program "/bin/sh" (in directory "/data/jenkins/workspace/yc-notice-service"): error=13, Permission denied
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at hudson.Proc$LocalProc.<init>(Proc.java:249)
	at hudson.Proc$LocalProc.<init>(Proc.java:218)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:935)
	at hudson.Launcher$ProcStarter.start(Launcher.java:454)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
	at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.build(MavenModuleSetBuild.java:945)
	at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:683)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1819)
	at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:543)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

测试环境在更改jenkins工作目录后问题解决,这个改动量其实蛮大的,需要迁移原工作目录数据,还要更改工作目录配置。然后生产环境也出现同样的问题,这边记录下解决的过程。

排查

从错误信息来看,是权限问题,jenkins执行脚本是将脚本内容写入到临时文件中,然后通过/bin/sh执行该临时文件,以为是/bin/sh没有执行权限,登录后台确认是有登录权限,并且jenkins用户有执行权限。

另外一个猜测是/tmp/jenkins3171686575384136508.sh文件没有执行权限,为了验证这个耗费了蛮久时间,因为jenkins执行完立马删除,因此在tmp目录下根本没法获取文件,后面才意识到/bin/sh -xe file并不要求脚本有可执行权限,因为不是直接运行脚本。

jenkins是java web开发运行在tomcat上,因此到这里就只能祭出神器arthas,关于arthas,我在这篇文章介绍过,这里就不在赘述。基本思路是

  • 根据错误栈找到代码所在的jar包
  • 根据jar包查看源码
  • 参考源码写个demo观察是否有同样的问题

之所以想到写个demo是因为错误栈本身不深,而且最终实现是通过java.lang.ProcessBuilder.start实现命令执行,总之不复杂可以短时间内写出一个demo验证。

arthas

  • 上传arthas程序包,执行arthas-boot.jar,选择jenkins进程,直接attach
$ java -jar arthas-boot.jar 
[INFO] arthas-boot version: 3.1.4
[INFO] Found existing java process, please choose one and hit RETURN.<strong>[1]: 115504 /usr/lib/jenkins/jenkins.war
1
[INFO] arthas home: /home/mpaas/arthas
[INFO] Try to attach process 115504
[INFO] Attach process 115504 success.
[INFO] arthas-client connect 127.0.0.1 3658
                                                                                
wiki      https://alibaba.github.io/arthas                                      
tutorials https://alibaba.github.io/arthas/arthas-tutorials                     
version   3.1.4                                                                 
pid       115504                                                                
time      2020-04-15 22:32:15
  • 根据错误栈可以查看类hudson.tasks.CommandInterpreter信息
[arthas@115504]$ sc -d hudson.tasks.CommandInterpreter
 class-info        hudson.tasks.BatchFile                                                                         
 code-source       /var/cache/jenkins/war/WEB-INF/lib/jenkins-core-2.138.4.jar                                      
 name              hudson.tasks.BatchFile                                                                    
 isInterface       false                                                                                            
 isAnnotation      false                                                                                            
 isEnum            false                                                                                           
 isAnonymousClass  false                                                                                    
 isArray           false 
 .....

找到代码位置为/var/cache/jenkins/war/WEB-INF/lib/jenkins-core-2.138.4.jar,下载jar包到本地用工具’jd-gui’进行反编译。

arthas自带一个反编译工具,在arthas命令行输入jad hudson.tasks.CommandInterpreter即可查看源代码,但毕竟没有本地gui工具查看方便

准备demo

可以看到,jenkins在这里创建了临时脚本文件,继续往下看可以发现在hudson.FilePath这个类中实现了创建文件的具体逻辑

public String invoke(File dir, VirtualChannel channel) throws IOException {
 if (!this.inThisDirectory) {
     dir = new File(System.getProperty("java.io.tmpdir"));
 } else {
     FilePath.this.mkdirs(dir);
 }
 File f;
 try {
     f = FilePath.this.creating(File.createTempFile(this.prefix, this.suffix, dir));
 } catch (IOException var16) {
     throw new IOException("Failed to create a temporary directory in " + dir, var16);
 }
 Writer w = new FileWriter(FilePath.this.writing(f));
 Throwable var5 = null;
 try {
     w.write(this.contents);
.....

首选判断变量inThisDirectory是否设置,如果没有设置,使用系统临时目录(java.io.tmpdir)作为jenkins临时目录,否则将当前目录作为临时目录,实际上inThisDirectory在代码里被hard code了,永远是false,因此无法进行配置。

根据错误栈找到最终执行命令的位置

根据代码逻辑,准备了一个基本一样的demo,代码如下

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
/*</strong>
 * @Description:
 * @author: jianfeng.zheng
 * @since: 2020/4/15 5:43 PM
 * @history: 1.2020/4/15 created by jianfeng.zheng
 */
public class Main {
    public static void main(String[] cmd) {
        ProcessBuilder builder = new ProcessBuilder(cmd);
        StringBuffer buf = new StringBuffer();
        for (String s : cmd) {
            buf.append(s);
            buf.append(" ");
        }
        System.out.println(buf);
        builder.directory(new File("/data/jenkins/workspace/test-permission"));
        try {
            Process proc = builder.start();
            InputStream procInputStream = proc.getInputStream();
            byte[] bt = new byte[1024];
            do {
                int size = procInputStream.read(bt);
                if (size <= 0) {
                    break;
                }
                System.out.println(new String(bt, 0, size));
            } while (true);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

核心代码就是创建了一个ProcessBuilder对象

ProcessBuilder builder = new ProcessBuilder(cmd);

设置了工作目录

builder.directory(new File("/data/jenkins/workspace/test-permission"));

执行命令并输出信息

Process proc = builder.start();
InputStream procInputStream = proc.getInputStream();

测试

将demo打包上传到服务器,准备了一个很简单的脚本test.sh,脚本内容如下

echo 'current directory:'
pwd

执行demo程序

[mpaas@harborprd arthas]$ java -jar test-1.0-SNAPSHOT.jar /bin/sh -xe /home/mpaas/arthas/test.sh
/bin/sh -xe /home/mpaas/arthas/test.sh 
current directory:
/home/mpaas/arthas

程序运行正常,然后把脚本复制到/tmp目录下,重新执行

$ java -jar test-1.0-SNAPSHOT.jar /bin/sh  /tmp/test.sh
/bin/sh /tmp/test.sh 
java.io.IOException: Cannot run program "/bin/sh" (in directory "/data/jenkins/workspace/test-permission"): error=13, Permission denied
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at Main.main(Main.java:23)
Caused by: java.io.IOException: error=13, Permission denied
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 1 more

错误和jenkins上的一致,那么问题基本可以断定为是tmp目录引起的。

解决

检查了tmp目录的权限,也用jenkins用户在tmp下创建脚本执行,也都是正常的,重启服务器问题并未解决,因此该问题原因未知,但解决方法很简单,只要更改jenkins的临时目录即可。

  • 修改jenkins配置文件/etc/sysconfig/jenkins,找到JENKINS_JAVA_OPTIONS参数,增加临时目录参数
JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true -Djava.io.tmpdir=/data/jenkins/tmp"
  • 重启jenkins
# systemctl stop jenkins
# systemctl start jenkins

问题解决。

Trackback

no comment untill now

Add your comment now