Thursday, October 11, 2018

Replacing bash with python3

iterating through directory

from pathlib import Path
for f in Path(".").glob("**/*.pdb"):  
    f.unlink()

reading file

from pathlib import Path
try:  
 # open file `version`, read all contents
    last_ver = Path("version").read_text()
except FileNotFoundError as e:
 print("error")

writing file

#open prepare.bat and write content hello
with open("prepare.bat", mode="w") as f:  
    f.write("hello\n")

directory of current script

import os
DIR = os.path.dirname(os.path.abspath(__file__))

read all lines

with open("TestClient.log",encoding="utf-8") as f:
    c = 0
    for l in f:
        c = c + 1
        print(f"{c}:{l}")

starting process

import subprocess
#run git and throw exception if return code is non-zero
subprocess.run(["git", "fetch", "origin", "master"], check=True)

current python exe path

sys.executable

Command Line

command line arguments

def parse_args():  
    parser = argparse.ArgumentParser()  
    parser.add_argument("--remote_ip", required=True)  
    parser.add_argument("--sign", default=False, required=False, type=boolean_string)  
    # using type=bool will case sign to be true will any non-empty string
    parser.add_argument("--check_in", default=False, required=False, type=boolean_string)  
    try:  
        return parser.parse_args()  
    except:  
        exit(1)
args = parse_args()
microDir = args.remote_ip

def boolean_string(s):  
    if s.upper() not in {'FALSE', 'TRUE'}:  
        raise ValueError('Not a valid boolean string')  
    return s.upper() == 'TRUE'

Environment variable

list environment variables

import os
for k, v in os.environ.items():
 # all k in Windows system will be upcase
 print(f"{k} = {v}")

read environment variables

import os
# print environment variable PATH
print(os.environ["PATH"])

write environment variables

import os
# set MY_ENV to myself and my child processes created afterwards
os.environ["MY_ENV"] = "hello"

regular expression

match and extract

import re
ip = "192.168.0.2"
m = re.match("(\d+).(\d+).(\d+).(\d+).(\d+)", ip)
v = [0,0,0,0]
if m:
 for i in range(0, 4):
  v[i] = m[i+1]

Python tips

grammar & buildin library


# list attributes of an obj
dir(obj)

# get help document of something
help("os")

# chaining to generators
import itertools
from pathlib import Path
p = Path(".")  
for f in itertools.chain(p.rglob("*.exe"), p.rglob("*.dll")):  
    print(f)

# format printting
name = "shawn"
msg = f"hello, {shawn}"

debug
– use python shell to try out single line scripts
– use Ctrl+N (Menu: File -> New File) in python shell to try out multiple lines scripts

Who's making Windows FileSystem slow

The story

One of our customer reported that it’s extremely slow when updating our product.
When updating, our product first download a patch file, then apply the patch to existing files or create new files.
After investigation, I found that it’s the apply patch stage that was wasting time. But as the logic is extremely simple, for no time causing operation such as sleeping or or network access is performed, I don’t think it’s our software that’s causing the problem.
So, I guessed it must be the problem of his OS. Either it’s infected by virus or was installed with some malware.
Obviously, he was not satisfied with my hypothesis, neither was myself. I have to prove it.

File System Filter Drivers

I googled around with keywords like “windows filesystem hooks” and fortunately found some Windows technologies related to File System Filter Drivers
There are two tyes of program that will affect filesystem:

file system minifilter drivers
legacy file system filter drivers
All of them can be identified with the command fltmc

For example here is the output of running this command on my computer

C:\Windows\system32>fltmc

Filter Name                     Num  Instances   Frame
------------------------------  -------------  ------------  -----
luafv                                   1       135000         0
FileInfo                                5        45000         0

Some filters provided by Microsoft, and thus can be considered safe, are:

WdFilter.sys – Windows Defender  
storqosflt.sys - Storage QoS Filter Driver  
luafv.sys – UAC File Virtualization  
npsvctrig.sys – Named Pipe Service Trigger Provider  
FileCrypt.sys - Windows sandboxing and encryption  
FileInfo.sys – FileInfo Filter Driver (SuperFetch / ReadyBoost)  
wcifs.sys - File System Filter  
Wof.sys – Windows Image File Boot

On the customer’s computer there are two suspicious named ‘tenmon’ and ‘tqsomething’ (sorry I can’t remember the name). I searched their name in the Autoruns tool and found that they are provide by http://www.qq.com/.
After deleting them in the Autoruns program. The problem goes away immediately and the customer is now happy.

Geometric Sequence and Arithmetic Sequence

I learned Geometric Sequence and Arithmetic Sequence in my senior high school. And as a programmer I found them very useful when calculating the time complex of an algorithm. But the theorem related to them are easily forgotten, so I tried to prove them in order to memorize them better.

A GC friendly Java Object Pool

To maintain pooled objects, most Object Pools libraries need allocate small objects when allocating/freeing objects. But if the objects we are going to pool is very small, it makes those libraries helpless.
In a latency critical application (FPS game), we use Java to implement server. To avoid GC as much as possible, we pool everything we can, even objects as small as float[3].
The source code is bellow in case it’s helpful to others:
https://github.com/shawn11ZX/zerogc-pool
Internally this lib uses single linked list to save pooled objects. When objects is allocated from pool the link nodes is cached.
To make it thread safe, it uses AtomicStampedReference<T> to set linkes between nodes.

Tips of developing Windows C++ applications

Recently, I developed a Windows C++ application, which is a mini game client framework that has the following features:

- Contains a launcher program that is very small ( ~ 2M ), with all other modules downloaded when needed.
- Support dynamic contents on web sites, especially flash content.
- Support Windows XP SP3 and above. As there are many XP users in China.
- Support downloading & launching applications as required, such as Unity Standalone games.

Although it's already online and downloaded by millions of users, there are some problems I have encountered when developing that I would like to share:

Why C++ not C#

We use C++ as our developing language, which is far less efficient in productively and maintainability then .NET such as C#.

Why?

We can't choose .NET. The main reason is that we want the application to be as small as possible while as portable as possible. Using C# means we need to install .NET (~ 200M) first if we want users to run our app on Windows XP. That will surely mean we were going to lose a lot of XP users. And According to a survey made in Jan 2017, the install rate of Windows XP in China is still as high as 17.79%.

Targeting Windows XP

For our application to be able to run on Windows XP, we don't need to install older versions of Visual Studio such as VS 2008. We can instead use the latest Visual Studio and selecting a XP releted platform toolset.

Before that, however, we have to install the "Windows XP support for C++" component first:

Provided all our dependent libraries have been built with similar options, our result executable file can be run on Windows XP SP3 now. It can't be run on XP SP2 or earlier according to: Configuring Programs for Windows XP.

C++ runtime support

Along with the Windows XP platform toolset, the C Runtime Library (CRT), C++ Standard Library, Active Template Library (ATL), Concurrency Runtime Library (ConCRT), Parallel Patterns Library (PPL), Microsoft Foundation Class Library (MFC), and C++ AMP (C++ Accelerated Massive Programming) library include runtime support for Windows XP and Windows Server 2003. For these operating systems, the minimum supported versions are Windows XP Service Pack 3 (SP3) for x86, Windows XP Service Pack 2 (SP2) for x64, and Windows Server 2003 Service Pack 2 (SP2) for both x86 and x64.

Third Party Libraries

Life is painful if we can't utilize third party libraries to help us when writing code.

For Java developers, there are tons of publicly available libraries. And if we are using project management tools such as maven or gradle, we can include these libraries with a few lines of code, then they will be downloaded automatically.

For C++ programmers, life is not so easy. There are multiple dimensions that affect compatibility of a binary library:

platform toolset
runtime library: Multi-threaded／Multi-threaded DLL / Multi-threaded Debug/ Multi-threaded DLL Debug/
release or debug
32 or 64 bits
static of dynamic
Linux or Windows...

And most libraries don't release binary distribution, rather they will ask us to build from source code.

So, including one extra libraries requires a lot of work as we have to:

setup the building environment, which may require installing extra tools or libs recursively.
tune the building variables to suit our requirements.
build it
copy includes headers and binaries to our project
modify our project makefile to use these header files and binaries.

Sometimes we may just stuck at the first step. One of my colleagues once spend a whole week to build a old version CEF, but failed to do so. Finally, with lots of search on google, I found it is contained in one version of Unreal Engine (Uploaded here) . That totally saved his life.

So I would include as less libraries as possible. Here are some libraries I used in this project:

Poco, which I depend a lot and has relatively good documents.
Duilib, for building UI. It's buggy, poor documented but easy to setup and very light weighted.
CEF, I use the very old version 3.2357.1291, which supports NPAPI.

I didn't use boost, as I was intimidated by both is library size and result exe size.

Character set

One decision we have to make before writing any code is choosing the character set of our project.

There are two options in Visual studio:

Use Multi-Byte Character Set (or MBCS)
Use Unicode Character Set

To make that decision, the following issues have to be considered:

Effects of character set option

There are two versions of Windows API: xxxxA and xxxxW. E.g. MoveFileExA and MoveFileExW. The former requires a char *, while the later requires wchar_t *
The character set option of visual studio only affect the default system API we use. For example if MBCS is chose, MoveFileEx is defined as MoveFileExA, so calling MoveFileEx will end up calling MoveFileExA. We can however call MoveFileExW directly even if we choose MBCS.

Meaning of char (char * and std::string)

On Windows, char in system API including standard lib means MBCS
On Linux, char in system API including standard lib means utf8
Different libraries have different means for char, e.g.:

Poco interpret it as UTF-8 encoding by default.
CEF as UTF-8 too

Meaning of wchar_t ( wchar_t * and std::wstring)

By contract Windows and most libraries interpret wchar_t as UTF16.

There is an article discussing this, which suggest the following five rules:

First rule: Use UTF-8 as the internal representation for text.
Second rule: In Visual Studio, avoid any non-ASCII characters in source code.
Third rule: Translate between UTF-8 and UTF-16 when calling Win32 functions.
Fourth rule: Use wide-character versions of standard C and C++ functions that take file paths.
Fifth rule: Be careful with third-party libraries.

Personally, I don't quite agree with it. For one thing, if we follow rule 1, rule 5 with be easily broken without notice. Because the stdlib treat char as MBCS and all libs depended on it, so it's very dangerous passing a UTF8 string to any third party library. No compiler warning is even given.

So for windows programs, I would prefer use wchar_t, even though it makes code more ugly with those L prefixes.

Build files

When you have to split your project into modules and make dependencies between them, or when you need to build 32/64, debug/release versions, or when you need easily and consistently add third party libraries, you can't rely on Visual Studio to maintain your project. It's simple too much work and too error prone.

However unlike Java community where there are many options for project management, there is little we can choose for c++ projects.

When the project begins, I use VS directly to manage my projects. Gradually, it becomes more and more painful. So I turned to CMake.

CMake is a powerfully but hard to learn build tool for C++. You will need to find examples to learn it and write product level scripts. The official document is affluent, detailed but hard to learn.

Naming Convention

There are many naming conventions in C++, which makes our code a mass if we introduce libraries with different naming conventions. This also happened to this project:

Poco in Pascal Casing
CEF in Pascal Casing while chromium it depends on in underscore case
Duilib in Pascal Casing

The one I choose is Pascal Casing.

Conclusion

So with all these problems, we can see it's complicated to develop in C++ application that are compatible in various Windows platforms and languages. if can have options I won't chose C++ in the first place.

Referencing Counting in C#

There are situations we may need reference counting (RC) in GC managed environment like C#. Implementing reference counter is easy. But as there is no destructor in C#, it’s hard to maintain a correct counter.
To ward off mistacks, we can summary some rules of using reference counters.
If a reference counted (RCed) object a field member of another object, or it is contained in some collection filed of another object, we can that object the ‘owner‘.
In the following code snippets, A is owner of RcObj (if RcObj is reference counted)

class A {
    RcObj _b;
}

class A {
    List<RcObj> _list;
}

With the owner defined, we can define the following rules:

Rule 1: The initial reference counter is 1 for newly allocated objects.
Rule 2: When owning a reference counted object, we should increase it’s reference counter.
Rule 3: When losing the ownership, we should decrease the reference counter.

We further define that:

Rule 4: When an owner’s method (not get property) returns RCed objects, it should increase RCs before returning them. In another words, it shared the ownership.
Rule 5: When an owner’s get property returns RCed objects, it should not increase RCs.

From the above 5 rules, we can infer that:

Inference 1: After getting a RCed object from a method, we should release it unless we return it to the caller or own it.

class A {
    C c;
    List<RcObj> list;

    void Foo1() {
        RcObj obj = c.GetRcObj();
        obj.Release(); // decrease reference counter
    }
    RcObj Foo2() {
        RcObj obj = c.GetRcObj();
        return obj;
        // no need to decrease reference counter
    }

    void Foo3() {
        RcObj obj = c.GetRcObj();
        list.Add(obj);
        // no need to decrease reference counter
    }

    RcObj Foo4() {
        RcObj obj = c.GetRcObj();
        list.Add(obj);
        obj.Own(); //increase reference counter
        return obj;
    }

}

class C {
    RcObj GetRcObj() {
        return new RcObj();
    }
    // or
    RcObj _obj;
    RcObj GetRcObj() {
        _obj.Own; // increase RC before method return
        return _obj;
    }
}

Inference 2: : After getting a RCed object from a get property, we should not change its RC, unless we return it to the caller or own it.

class A {
    C c;
    void Foo1() {
        RcObj obj = c.RcObj;
        Bar(obj);
        // don't change is RC in this method, (Bar may change RC)
    }

    RcObj Foo2() {
        RcObj obj = c.RcObj;
        obj.Own
        // increase RC 
        return obj;
    }
}

class C {
    RcObj _obj;
    RcObj RcObj {
        get {
            return _obj;
        }
    }
}

Inference 3: We should not change RC of a RCed object passed in as method parameter , unless we want to own it.

class A {
    void Foo1(RcObj obj) {
        obj.xxx();
        obj.yyy();
        // don't change RC
    }

    RcObj _obj;
    void Foo2(RcObj obj) {
        obj.xxx();
        obj.yyy();

        this._obj = obj;
        obj.Own; // increase RC according to Rule 2
    }

    RcObj Foo3(RcObj obj) { // bad practice
        obj.xxx();
        obj.yyy();
        obj.Own; // increase RC according to Rule 4
        return obj; 
    }
}

With these rules and inferences, we can write a static code analysis tool to help us eliminate RC related bug.

Basic Reference Counter Implementation:

    public interface IRefCounter
    {
        void Own();
        void Release();
        int RefCount {get;}
    }

    public sealed class RefCounter : AbstractRefCounter
    {
        private readonly Action _handler;
        public RefCounter(Action handler)
        {
            _handler = handler;
        }

        protected override void OnCleanUp()
        {
            _handler?.Invoke();
        }


    }

    public abstract class AbstractRefCounter : IRefCounter
    {
        private int _refCount;

        protected AbstractRefCounter()
        {
            Own();
        }

        public void Own()
        {
            _refCount++;
        }

        public void Release()
        {
            _refCount--;
            if (_refCount == 0)
            {
                OnCleanUp();
            }
        }

        protected abstract void OnCleanUp();

        public int RefCount
        {
            get {
                return _refCount;
            }
        }

    }

Sample usage:


    public class RefCountedObj : AbstractRefCounter
    {
        protected override void OnCleanUp()
        {
        }
    }

    public class RefCountedObjWithDelegate : IRefCounter
    {
        private RefCounter _refCounter;

        public RefCountedObjWithDelegate()
        {
            _refCounter = new RefCounter(OnCleanUp);
        }

        public void OnCleanUp()
        {

        }
        public void Own()
        {
            _refCounter.Own();
        }

        public void Release()
        {
            _refCounter.Release();
        }

        public int RefCount()
        {
            return _refCounter.RefCount();
        }
    }

For more concrete rules, see StaticAnalysisRules.md

For source code, see ObjectPool

Code Skater

Menu

Thursday, October 11, 2018

Replacing bash with python3

Command Line

Environment variable

regular expression

Python tips

Friday, September 14, 2018

Who's making Windows FileSystem slow

The story

File System Filter Drivers

Wednesday, September 12, 2018

Geometric Sequence and Arithmetic Sequence

A GC friendly Java Object Pool

Wednesday, August 29, 2018

Tips of developing Windows C++ applications

Why C++ not C#

Targeting Windows XP

C++ runtime support

Third Party Libraries

Character set

Build files

Naming Convention

Conclusion

Wednesday, February 14, 2018

Referencing Counting in C#

Search This Blog

About Me

Categories

Blog Archive

Thursday, October 11, 2018

File System related

Process related

Command Line

Environment variable

regular expression

Python tips

Friday, September 14, 2018

The story

File System Filter Drivers

Wednesday, September 12, 2018

Wednesday, August 29, 2018

Why C++ not C#

Targeting Windows XP

C++ runtime support

Third Party Libraries

Character set

Build files

Naming Convention

Conclusion

Wednesday, February 14, 2018

Search This Blog

About Me

Categories

Blog Archive