Erlang port巧用环境变量

Home > Erlang探索 > Erlang port巧用环境变量

October 15th, 2011 Yu Feng

原创文章，转载请注明： 转载自系统技术非业余研究

Erlang与外面世界的交互主要通过port来进行的，特别是和外部程序的协作，通常是通过管道进行的。
基本上有2种方法可以调用外部程序: 1. os:cmd 2. erlang:open_port, 这二种方式各有利弊，先看文档：

os:cmd的文档参见这里

cmd(Command) -> string()

Types:
Command = atom() | io_lib:chars()

Executes Command in a command shell of the target OS, captures the standard output of the command and returns this result as a string. This function is a replacement of the previous unix:cmd/1; on a Unix platform they are equivalent.

Examples:

LsOut = os:cmd(“ls”), % on unix platform
DirOut = os:cmd(“dir”), % on Win32 platform

Note that in some cases, standard output of a command when called from another program (for example, os:cmd/1) may differ, compared to the standard output of the command when called directly from an OS command shell.

open_port的文档参见这里

open_port(PortName, PortSettings) -> port()

Types:
PortName = {spawn, Command} | {spawn_driver, Command} | {spawn_executable, FileName} | {fd, In, Out}
Command = string()
FileName = [ FileNameChar ] | binary()
FileNameChar = integer() (1..255 or any Unicode codepoint, see description)
In = Out = integer()
PortSettings = [Opt]
Opt = {packet, N} | stream | {line, L} | {cd, Dir} | {env, Env} | {args, [ ArgString ]} | {arg0, ArgString} | exit_status | use_stdio | nouse_stdio | stderr_to_stdout | in | out | binary | eof
N = 1 | 2 | 4
L = integer()
Dir = string()
ArgString = [ FileNameChar ] | binary()
Env = [{Name, Val}]
Name = string()
Val = string() | false

Returns a port identifier as the result of opening a new Erlang port. A port can be seen as an external Erlang process. PortName is one of the following:

{spawn, Command}

Starts an external program. Command is the name of the external program which will be run. Command runs outside the Erlang work space unless an Erlang driver with the name Command is found. If found, that driver will be started. A driver runs in the Erlang workspace, which means that it is linked with the Erlang runtime system.

When starting external programs on Solaris, the system call vfork is used in preference to fork for performance reasons, although it has a history of being less robust. If there are problems with using vfork, setting the environment variable ERL_NO_VFORK to any value will cause fork to be used instead.

For external programs, the PATH is searched (or an equivalent method is used to find programs, depending on operating system). This is done by invoking the shell och certain platforms. The first space separated token of the command will be considered as the name of the executable (or driver). This (among other things) makes this option unsuitable for running programs having spaces in file or directory names. Use {spawn_executable, Command} instead if spaces in executable file names is desired.
{spawn_driver, Command}

Works like {spawn, Command}, but demands the first (space separated) token of the command to be the name of a loaded driver. If no driver with that name is loaded, a badarg error is raised.
{spawn_executable, Command}

Works like {spawn, Command}, but only runs external executables. The Command in its whole is used as the name of the executable, including any spaces. If arguments are to be passed, the args and arg0 PortSettings can be used.

The shell is not usually invoked to start the program, it’s executed directly. Neither is the PATH (or equivalent) searched. To find a program in the PATH to execute, use os:find_executable/1.

Only if a shell script or .bat file is executed, the appropriate command interpreter will implicitly be invoked, but there will still be no command argument expansion or implicit PATH search.

The name of the executable as well as the arguments given in args and arg0 is subject to Unicode file name translation if the system is running in Unicode file name mode. To avoid translation or force i.e. UTF-8, supply the executable and/or arguments as a binary in the correct encoding. See the file module, the file:native_name_encoding/0 function and the stdlib users guide for details.
Note

The characters in the name (if given as a list) can only be > 255 if the Erlang VM is started in Unicode file name translation mode, otherwise the name of the executable is limited to the ISO-latin-1 character set.

If the Command cannot be run, an error exception, with the posix error code as the reason, is raised. The error reason may differ between operating systems. Typically the error enoent is raised when one tries to run a program that is not found and eaccess is raised when the given file is not executable.
{fd, In, Out}

Allows an Erlang process to access any currently opened file descriptors used by Erlang. The file descriptor In can be used for standard input, and the file descriptor Out for standard output. It is only used for various servers in the Erlang operating system (shell and user). Hence, its use is very limited.

PortSettings is a list of settings for the port. Valid settings are:

{packet, N}

Messages are preceded by their length, sent in N bytes, with the most significant byte first. Valid values for N are 1, 2, or 4.
stream

Output messages are sent without packet lengths. A user-defined protocol must be used between the Erlang process and the external object.
{line, L}

Messages are delivered on a per line basis. Each line (delimited by the OS-dependent newline sequence) is delivered in one single message. The message data format is {Flag, Line}, where Flag is either eol or noeol and Line is the actual data delivered (without the newline sequence).

L specifies the maximum line length in bytes. Lines longer than this will be delivered in more than one message, with the Flag set to noeol for all but the last message. If end of file is encountered anywhere else than immediately following a newline sequence, the last line will also be delivered with the Flag set to noeol. In all other cases, lines are delivered with Flag set to eol.

The {packet, N} and {line, L} settings are mutually exclusive.
{cd, Dir}

This is only valid for {spawn, Command} and {spawn_executable, Command}. The external program starts using Dir as its working directory. Dir must be a string. Not available on VxWorks.
{env, Env}

This is only valid for {spawn, Command} and {spawn_executable, Command}. The environment of the started process is extended using the environment specifications in Env.

Env should be a list of tuples {Name, Val}, where Name is the name of an environment variable, and Val is the value it is to have in the spawned port process. Both Name and Val must be strings. The one exception is Val being the atom false (in analogy with os:getenv/1), which removes the environment variable. Not available on VxWorks.
{args, [ string() ]}

This option is only valid for {spawn_executable, Command} and specifies arguments to the executable. Each argument is given as a separate string and (on Unix) eventually ends up as one element each in the argument vector. On other platforms, similar behavior is mimicked.

The arguments are not expanded by the shell prior to being supplied to the executable, most notably this means that file wildcard expansion will not happen. Use filelib:wildcard/1 to expand wildcards for the arguments. Note that even if the program is a Unix shell script, meaning that the shell will ultimately be invoked, wildcard expansion will not happen and the script will be provided with the untouched arguments. On Windows®, wildcard expansion is always up to the program itself, why this isn’t an issue.

Note also that the actual executable name (a.k.a. argv[0]) should not be given in this list. The proper executable name will automatically be used as argv[0] where applicable.

When the Erlang VM is running in Unicode file name mode, the arguments can contain any Unicode characters and will be translated into whatever is appropriate on the underlying OS, which means UTF-8 for all platforms except Windows, which has other (more transparent) ways of dealing with Unicode arguments to programs. To avoid Unicode translation of arguments, they can be supplied as binaries in whatever encoding is deemed appropriate.
Note

The characters in the arguments (if given as a list of characters) can only be > 255 if the Erlang VM is started in Unicode file name mode, otherwise the arguments are limited to the ISO-latin-1 character set.

If one, for any reason, wants to explicitly set the program name in the argument vector, the arg0 option can be used.
{arg0, string()}

This option is only valid for {spawn_executable, Command} and explicitly specifies the program name argument when running an executable. This might in some circumstances, on some operating systems, be desirable. How the program responds to this is highly system dependent and no specific effect is guaranteed.

The unicode file name translation rules of the args option apply to this option as well.
exit_status

This is only valid for {spawn, Command} where Command refers to an external program, and for {spawn_executable, Command}.

When the external process connected to the port exits, a message of the form {Port,{exit_status,Status}} is sent to the connected process, where Status is the exit status of the external process. If the program aborts, on Unix the same convention is used as the shells do (i.e., 128+signal).

If the eof option has been given as well, the eof message and the exit_status message appear in an unspecified order.

If the port program closes its stdout without exiting, the exit_status option will not work.
use_stdio

This is only valid for {spawn, Command} and {spawn_executable, Command}. It allows the standard input and output (file descriptors 0 and 1) of the spawned (UNIX) process for communication with Erlang.
nouse_stdio

The opposite of use_stdio. Uses file descriptors 3 and 4 for communication with Erlang.
stderr_to_stdout

Affects ports to external programs. The executed program gets its standard error file redirected to its standard output file. stderr_to_stdout and nouse_stdio are mutually exclusive.
overlapped_io

Affects ports to external programs on Windows® only. The standard input and standard output handles of the port program will, if this option is supplied, be opened with the flag FILE_FLAG_OVERLAPPED, so that the port program can (and has to) do overlapped I/O on its standard handles. This is not normally the case for simple port programs, but an option of value for the experienced Windows programmer. On all other platforms, this option is silently discarded.
in

The port can only be used for input.
out

The port can only be used for output.
binary

All IO from the port are binary data objects as opposed to lists of bytes.
eof

The port will not be closed at the end of the file and produce an exit signal. Instead, it will remain open and a {Port, eof} message will be sent to the process holding the port.
hide

When running on Windows, suppress creation of a new console window when spawning the port program. (This option has no effect on other platforms.)

The default is stream for all types of port and use_stdio for spawned ports.

Failure: If the port cannot be opened, the exit reason is badarg, system_limit, or the Posix error code which most closely describes the error, or einval if no Posix code is appropriate:

badarg

Bad input arguments to open_port.
system_limit

All available ports in the Erlang emulator are in use.
enomem

There was not enough memory to create the port.
eagain

There are no more available operating system processes.
enametoolong

The external command given was too long.
emfile

There are no more available file descriptors (for the operating system process that the Erlang emulator runs in).
enfile

The file table is full (for the entire operating system).
eacces

The Command given in {spawn_executable, Command} does not point out an executable file.
enoent

The Command given in {spawn_executable, Command} does not point out an existing file.

During use of a port opened using {spawn, Name}, {spawn_driver, Name} or {spawn_executable, Name}, errors arising when sending messages to it are reported to the owning process using signals of the form {‘EXIT’, Port, PosixCode}. See file(3) for possible values of PosixCode.

The maximum number of ports that can be open at the same time is 1024 by default, but can be configured by the environment variable ERL_MAX_PORTS.

从上面的文档我们可以看出第一种方法主要用于一次行的交互，而第二种可以通过标准输入和输出做各种各样的长时间的交互。
既然是交互就涉及到信息的双向传递，从外部程序到Erlang基本上透过标准输出，或者进程返回码;给外部程序提供信息的方法有3个：1. 命令行参数 2. 环境变量 3. stdin，其中环境变量方式文档讲的都不太详细。

在实践中我们会发现open_port有很多选项需要设置，普通用户很容易搞的迷糊，编出来的代码也不会考虑太周全，rebar项目用到了大量的外部程序调用，所以就把这个功能封装起来了，用起来很爽，推荐给大家。

封装主要在src/rebar_utils.erl, 源代码参看这里，我稍微修改了下去掉无关的代码，下面演示下port env的作用：

$ cat  rebar_utils.erl
-module(rebar_utils).
-export([sh/2]).

%%
%% Options = [Option] -- defaults to [use_stdout, abort_on_error]
%% Option = ErrorOption | OutputOption | {cd, string()} | {env, Env}
%% ErrorOption = return_on_error | abort_on_error | {abort_on_error, string()}
%% OutputOption = use_stdout | {use_stdout, bool()}
%% Env = [{string(), Val}]
%% Val = string() | false
%%
sh(Command0, Options0) ->

    DefaultOptions = [use_stdout, abort_on_error],
    Options = [expand_sh_flag(V)
               || V <- proplists:compact(Options0 ++ DefaultOptions)],

    ErrorHandler = proplists:get_value(error_handler, Options),
    OutputHandler = proplists:get_value(output_handler, Options),

    Command = patch_on_windows(Command0, proplists:get_value(env, Options, [])),
    PortSettings = proplists:get_all_values(port_settings, Options) ++
        [exit_status, {line, 16384}, use_stdio, stderr_to_stdout, hide],
    Port = open_port({spawn, Command}, PortSettings),

    case sh_loop(Port, OutputHandler, []) of
        {ok, _Output} = Ok ->
            Ok;
        {error, {_Rc, _Output}=Err} ->
            ErrorHandler(Command, Err)
    end.

%% We use a bash shell to execute on windows if available. Otherwise we do the
%% shell variable substitution ourselves and hope that the command doesn't use
%% any shell magic. Also the port doesn't seem to close from time to time
%% (mingw).
patch_on_windows(Cmd, Env) ->
    case os:type() of
        {win32,nt} ->
            case find_executable("bash") of
                false -> Cmd;
                Bash ->
                    Bash ++ " -c \"" ++ Cmd ++ "; echo _port_cmd_status_ $?\" "
            end;
        _ ->
            lists:foldl(fun({Key, Value}, Acc) ->
                                expand_env_variable(Acc, Key, Value)
                        end, Cmd, Env)
    end.

find_executable(Name) ->
    case os:find_executable(Name) of
        false -> false;
        Path ->
            "\"" ++ filename:nativename(Path) ++ "\""
    end.

%%
%% Given env. variable FOO we want to expand all references to
%% it in InStr. References can have two forms: $FOO and ${FOO}
%% The end of form $FOO is delimited with whitespace or eol
%%
expand_env_variable(InStr, VarName, RawVarValue) ->
    ReOpts = [global, {return, list}],
    VarValue = re:replace(RawVarValue, "\\\\", "\\\\\\\\", ReOpts),
    R1 = re:replace(InStr, "\\\$" ++ VarName ++ "\\s", VarValue ++ " ",
                    [global]),
    R2 = re:replace(R1, "\\\$" ++ VarName ++ "\$", VarValue),
    re:replace(R2, "\\\${" ++ VarName ++ "}", VarValue, ReOpts).


%% ====================================================================
%% Internal functions
%% ====================================================================
log_msg_and_abort(_Message) ->
    fun(_Command, {_Rc, _Output}) ->
	    halt(1)
    end.

log_and_abort(_Command, {_Rc, _Output}) ->
    halt(1).


expand_sh_flag(return_on_error) ->
    {error_handler,
     fun(_Command, Err) ->
             {error, Err}
     end};
expand_sh_flag({abort_on_error, Message}) ->
    {error_handler,
     log_msg_and_abort(Message)};
expand_sh_flag(abort_on_error) ->
    {error_handler,
     fun log_and_abort/2};
expand_sh_flag(use_stdout) ->
    {output_handler,
     fun(Line, Acc) ->
             [Line | Acc]
     end};
expand_sh_flag({use_stdout, false}) ->
    {output_handler,
     fun(Line, Acc) ->
             [Line | Acc]
     end};
expand_sh_flag({cd, _CdArg} = Cd) ->
    {port_settings, Cd};
expand_sh_flag({env, _EnvArg} = Env) ->
    {port_settings, Env}.

sh_loop(Port, Fun, Acc) ->
    receive
        {Port, {data, {_, "_port_cmd_status_ " ++ Status}}} ->
            (catch erlang:port_close(Port)), % sigh () for indentation
            case list_to_integer(Status) of
                0  -> {ok, lists:flatten(Acc)};
                Rc -> {error, Rc}
            end;
        {Port, {data, {eol, Line}}} ->
            sh_loop(Port, Fun, Fun(Line ++ "\n", Acc));
        {Port, {data, {noeol, Line}}} ->
            sh_loop(Port, Fun, Fun(Line, Acc));
        {Port, {exit_status, 0}} ->
            {ok, lists:flatten(lists:reverse(Acc))};
        {Port, {exit_status, Rc}} ->
            {error, {Rc, lists:flatten(lists:reverse(Acc))}}
    end.


$ cat test.sh
#!/bin/bash
echo "arg $1"
echo "env ${X}"

$ erlc rebar_utils.erl
$ erl 
Erlang R14B04 (erts-5.8.5) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5  (abort with ^G)
1> rebar_utils:sh("./test.sh yufeng",[{env,[{"X", "yufeng"}]}]).
{ok,"arg yufeng\nenv yufeng\n"}
2> os:putenv("X", "yufeng").
true
3> os:cmd("./test.sh yufeng").
"arg yufeng\nenv yufeng\n"

小结：成熟项目里面有很多好东西，懂得顺手抄也是个本事。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索 Tags: env, open_port, os:cmd, spawn

Comments (1)

DennyZhang

October 4th, 2012 at 11:37 | #1

Reply | Quote

多谢,霸爷的分享.　一点个个意见:
如果是vm node要改env的话,我感觉在shell启动时设置就可以了.
如果是不同场合env是不一样的话,我感觉手动设置会不会更直白一些呢.

Comments are closed.

rebar和common_test使用实践和疑惑澄清 Erlang match_spec引擎介绍和应用

系统技术非业余研究