Thursday, May 02, 2019

Windows Form and Data Binding

Windows Form and Data Binding

When building desktop applications for Windows, there are plenty of options, such as:
  • WPF
  • QT
  • Chromium + JS, such as AppJS, Electron
  • Windows Form
If I get to choose, Windows Form would be my last option. But a lot of times, there is no such choice, such as when we are adding features to existing Winform applications, or the whole team is familar with Winform and is not receptive to new techonologies.
I have developed WPF applications before, and I know that WPF applications tend to be much clean in code aspect than Winform. The key feature of WPF is we can make UI with templates (XAML) with in turn binding with model cleanly (The MVVM model)
But even with Winform we can make code as clean as possible by using DataBinding. But I have seen many projecst written in Winform that barely adapt DataBinding quite well. If your project is in such situation, you should try it definitely.
Two source of information I found very useful to the understanding of Winform DataBinding:
Also, I have upload a demo application to GitHub:https://github.com/shawn11ZX/WinformDataBindingExample

Friday, February 01, 2019

Remembering Design Patterns

There are 23 design patterns described in the classic GoF book. I have found myself an easier way to remember them:

First, I divide them into three categories:

  • Creational Patterns
  • Structural Patterns
  • Behavior Patterns

Creational Patterns

Their purposes are to parameterize systems by the classes of objects they create.

  • Create a group of objects: Abstract Factory
  • Create a single object. This can further be categorized by how clients use this pattern.
    • Use by composition:
      • Create by clone: Prototype
      • Create once: Singleton
      • Create complex object: Builder
    • Use by inheritance: Factory Template

Structural Patterns

They are used to compose larger structures from smaller ones.

  • One interface (Don’t change or add other interfaces)
    • Group together calls: Composite
    • Intercept calls: Proxy
    • Add state or behavior to calls: Decorator
  • Two/Many interfaces
    • Adapt interface A to B: Adapter
    • Two interfaces at both sides of a bridge can evolve independently: Bridge
    • A single interface to hide the complexity of a number of interfaces: Facade
  • Structural & Creational pattern: Flyweight, to cache static states as singletons, and passing dynamic states when calling

Behavioral Patterns

They try to encapsulate algorithms or allocate responsibilities.

  • encapsulate general algorithm by composition: Strategy
  • encapsulate general algorithm by inheritance: Template Method
  • encapsulate iteration algorithm: Iterator
  • encapsulate state based algorithms: State
  • encapsulate snapshot algorithm: Memento
  • encapsulate event notification: Observer
  • encapsulate nodes (e.g. AST nodes) visit algorithms: Visitor
  • encapsulate Abstract Syntax Tree parser algorithm: Interpretor
  • decouple interaction between related peer classes: Mediator
  • encapsulate how I was called
    • Command, implement Command interface and submit to an Executor, the executor will call me later.
    • Chain of responsibility, implement Handler interface and chain into an existing list of handlers. Upstream Handler will call me, and I have to call downstream handler.

Thursday, October 11, 2018

Replacing bash with python3

  • iterating through directory
from pathlib import Path
for f in Path(".").glob("**/*.pdb"):  
    f.unlink()
  • reading file
from pathlib import Path
try:  
 # open file `version`, read all contents
    last_ver = Path("version").read_text()
except FileNotFoundError as e:
 print("error")
  • writing file
#open prepare.bat and write content hello
with open("prepare.bat", mode="w") as f:  
    f.write("hello\n")
  • directory of current script
import os
DIR = os.path.dirname(os.path.abspath(__file__))
  • read all lines
with open("TestClient.log",encoding="utf-8") as f:
    c = 0
    for l in f:
        c = c + 1
        print(f"{c}:{l}")
  • starting process
import subprocess
#run git and throw exception if return code is non-zero
subprocess.run(["git", "fetch", "origin", "master"], check=True)
  • current python exe path
sys.executable

Command Line

  • command line arguments
def parse_args():  
    parser = argparse.ArgumentParser()  
    parser.add_argument("--remote_ip", required=True)  
    parser.add_argument("--sign", default=False, required=False, type=boolean_string)  
    # using type=bool will case sign to be true will any non-empty string
    parser.add_argument("--check_in", default=False, required=False, type=boolean_string)  
    try:  
        return parser.parse_args()  
    except:  
        exit(1)
args = parse_args()
microDir = args.remote_ip
def boolean_string(s):  
    if s.upper() not in {'FALSE', 'TRUE'}:  
        raise ValueError('Not a valid boolean string')  
    return s.upper() == 'TRUE'

Environment variable

  • list environment variables
import os
for k, v in os.environ.items():
 # all k in Windows system will be upcase
 print(f"{k} = {v}")
  • read environment variables
import os
# print environment variable PATH
print(os.environ["PATH"])
  • write environment variables
import os
# set MY_ENV to myself and my child processes created afterwards
os.environ["MY_ENV"] = "hello"

regular expression

  • match and extract
import re
ip = "192.168.0.2"
m = re.match("(\d+).(\d+).(\d+).(\d+).(\d+)", ip)
v = [0,0,0,0]
if m:
 for i in range(0, 4):
  v[i] = m[i+1]

Python tips

  • grammar & buildin library

# list attributes of an obj
dir(obj)

# get help document of something
help("os")

# chaining to generators
import itertools
from pathlib import Path
p = Path(".")  
for f in itertools.chain(p.rglob("*.exe"), p.rglob("*.dll")):  
    print(f)

# format printting
name = "shawn"
msg = f"hello, {shawn}"

  • debug
    – use python shell to try out single line scripts
    – use Ctrl+N (Menu: File -> New File) in python shell to try out multiple lines scripts

Friday, September 14, 2018

Who's making Windows FileSystem slow

The story

One of our customer reported that it’s extremely slow when updating our product.
When updating, our product first download a patch file, then apply the patch to existing files or create new files.
After investigation, I found that it’s the apply patch stage that was wasting time. But as the logic is extremely simple, for no time causing operation such as sleeping or or network access is performed, I don’t think it’s our software that’s causing the problem.
So, I guessed it must be the problem of his OS. Either it’s infected by virus or was installed with some malware.
Obviously, he was not satisfied with my hypothesis, neither was myself. I have to prove it.

File System Filter Drivers

I googled around with keywords like “windows filesystem hooks” and fortunately found some Windows technologies related to File System Filter Drivers
There are two tyes of program that will affect filesystem:
  • file system minifilter drivers
  • legacy file system filter drivers
    All of them can be identified with the command fltmc
For example here is the output of running this command on my computer
C:\Windows\system32>fltmc

Filter Name                     Num  Instances   Frame
------------------------------  -------------  ------------  -----
luafv                                   1       135000         0
FileInfo                                5        45000         0
Some filters provided by Microsoft, and thus can be considered safe, are:
WdFilter.sys – Windows Defender  
storqosflt.sys - Storage QoS Filter Driver  
luafv.sys – UAC File Virtualization  
npsvctrig.sys – Named Pipe Service Trigger Provider  
FileCrypt.sys - Windows sandboxing and encryption  
FileInfo.sys – FileInfo Filter Driver (SuperFetch / ReadyBoost)  
wcifs.sys - File System Filter  
Wof.sys – Windows Image File Boot
On the customer’s computer there are two suspicious named ‘tenmon’ and ‘tqsomething’ (sorry I can’t remember the name). I searched their name in the Autoruns tool and found that they are provide by http://www.qq.com/.
After deleting them in the Autoruns program. The problem goes away immediately and the customer is now happy.

Wednesday, September 12, 2018

Geometric Sequence and Arithmetic Sequence

I learned Geometric Sequence and Arithmetic Sequence in my senior high school. And as a programmer I found them very useful when calculating the time complex of an algorithm. But the theorem related to them are easily forgotten, so I tried to prove them in order to memorize them better.

A GC friendly Java Object Pool

To maintain pooled objects, most Object Pools libraries need allocate small objects when allocating/freeing objects. But if the objects we are going to pool is very small, it makes those libraries helpless.
In a latency critical application (FPS game), we use Java to implement server. To avoid GC as much as possible, we pool everything we can, even objects as small as float[3].
The source code is bellow in case it’s helpful to others:
https://github.com/shawn11ZX/zerogc-pool
Internally this lib uses single linked list to save pooled objects. When objects is allocated from pool the link nodes is cached.
To make it thread safe, it uses AtomicStampedReference<T> to set linkes between nodes.

Wednesday, August 29, 2018

Tips of developing Windows C++ applications

Recently, I developed a Windows C++ application, which is a mini game client framework that has the following features:

- Contains a launcher program that is very small ( ~ 2M ), with all other modules downloaded when needed.
- Support dynamic contents on web sites, especially flash content.
- Support Windows XP SP3 and above. As there are many XP users in China.
- Support downloading & launching applications as required, such as Unity Standalone games.

Although it's already online and downloaded by millions of users, there are some problems I have encountered when developing that I would like to share:

Why C++ not C#

We use C++ as our developing language, which is far less efficient in productively and maintainability then .NET such as C#.

Why?

We can't choose .NET. The main reason is that we want the application to be as small as possible while as portable as possible.  Using C# means we need to install .NET (~ 200M) first if we want users to run our app on Windows XP. That will surely mean we were going to lose a lot of XP users. And According to a survey made in Jan 2017, the install rate of Windows XP in China is still as high as 17.79%.

Targeting Windows XP

For our application to be able to run on Windows XP, we don't need to install older versions of Visual Studio such as VS 2008. We can instead use the latest Visual Studio and selecting a XP releted platform toolset.



Before that, however, we have to install the "Windows XP support for C++" component first:

Provided all our dependent libraries have been built with similar options, our result executable file can be run on Windows XP SP3 now. It can't be run on XP SP2 or earlier according to: Configuring Programs for Windows XP.

C++ runtime support

Along with the Windows XP platform toolset, the C Runtime Library (CRT), C++ Standard Library, Active Template Library (ATL), Concurrency Runtime Library (ConCRT), Parallel Patterns Library (PPL), Microsoft Foundation Class Library (MFC), and C++ AMP (C++ Accelerated Massive Programming) library include runtime support for Windows XP and Windows Server 2003. For these operating systems, the minimum supported versions are Windows XP Service Pack 3 (SP3) for x86, Windows XP Service Pack 2 (SP2) for x64, and Windows Server 2003 Service Pack 2 (SP2) for both x86 and x64.


Third Party Libraries

Life is painful if we can't utilize third party libraries to help us when writing code. 

For Java developers, there are tons of publicly available libraries. And if we are using project management tools such as maven or gradle, we can include these libraries with a few lines of code, then they will be downloaded automatically.


For C++ programmers, life is not so easy. There are multiple dimensions that affect compatibility of a binary library:
  • platform toolset
  • runtime library: Multi-threaded/Multi-threaded DLL / Multi-threaded Debug/ Multi-threaded DLL Debug/
  • release or debug
  • 32 or 64 bits
  • static of dynamic
  • Linux or Windows...
And most libraries don't release binary distribution, rather they will ask us to build from source code.

So, including one extra libraries requires a lot of work as we have to:
  1. setup the building environment, which may require installing extra tools or libs recursively.
  2. tune the building variables to suit our requirements.
  3. build it
  4. copy includes headers and binaries to our project
  5. modify our project makefile to use these header files and binaries.


Sometimes we may just stuck at the first step. One of my colleagues once spend a whole week to build a old version CEF, but failed to do so. Finally, with lots of search on google, I found it is contained in one version of Unreal Engine (Uploaded here) . That totally saved his life.

So I would include as less libraries as possible. Here are some libraries I used in this project:
  • Poco, which I depend a lot and has relatively good documents.
  • Duilib, for building UI. It's buggy, poor documented but easy to setup and very light weighted.
  • CEF, I use the very old version 3.2357.1291, which supports NPAPI.
I didn't use boost, as I was intimidated by both is library size and result exe size.

Character set

One decision we have to make before writing any code is choosing the character set of our project.

There are two options in Visual studio:
  • Use Multi-Byte Character Set (or MBCS)
  • Use Unicode Character Set


To make that decision, the following issues have to be considered:

Effects of character set option 
  • There are two versions of Windows API: xxxxA and xxxxW. E.g. MoveFileExA and MoveFileExW. The former requires a char *, while the later requires wchar_t *
  • The character set option of visual studio only affect the default system API we use. For example if MBCS is chose, MoveFileEx is defined as MoveFileExA, so calling MoveFileEx will end up calling MoveFileExA. We can however call MoveFileExW directly even if we choose MBCS.
Meaning of char (char * and std::string)
  • On Windows, char in system API including standard lib means MBCS
  • On Linux, char in system API including standard lib means utf8
  • Different libraries have different means for char, e.g.:
    • Poco interpret it as  UTF-8 encoding by default.
    • CEF as UTF-8 too
Meaning of wchar_t ( wchar_t * and std::wstring)
  • By contract Windows and most libraries interpret wchar_t as UTF16.
There is an article discussing this, which suggest the following five rules:
  1. First rule: Use UTF-8 as the internal representation for text.
  2. Second rule: In Visual Studio, avoid any non-ASCII characters in source code.
  3. Third rule: Translate between UTF-8 and UTF-16 when calling Win32 functions.
  4. Fourth rule: Use wide-character versions of standard C and C++ functions that take file paths.
  5. Fifth rule: Be careful with third-party libraries.
Personally, I don't quite agree with it. For one thing, if we follow rule 1, rule 5 with be easily broken without notice. Because the stdlib treat char as MBCS and all libs depended on it, so it's very dangerous passing a UTF8 string to any third party library. No compiler warning is even given.

So for windows programs, I would prefer use wchar_t, even though it makes code more ugly with those L prefixes. 

Build files

When you have to split your project into modules and make dependencies between them, or when you need to build 32/64, debug/release versions, or when you need easily and consistently add third party libraries, you can't rely on Visual Studio to maintain your project. It's simple too much work and too error prone.

However unlike Java community where there are many options for project management, there is little we can choose for c++ projects.

When the project begins, I use VS directly to manage my projects. Gradually, it becomes more and more painful. So I turned to CMake.

CMake is a powerfully but hard to learn build tool for C++. You will need to find examples to learn it and write product level scripts.  The official document is affluent, detailed but hard to learn. 

Naming Convention

There are many naming conventions in C++, which makes our code a mass if we introduce libraries with different naming conventions. This also happened to this project: 
  • Poco in Pascal Casing
  • CEF in Pascal Casing while chromium it depends on in underscore case
  • Duilib in Pascal Casing 

The one I choose is Pascal Casing.

Conclusion


So with all these problems, we can see it's complicated to develop in C++ application that are compatible in various Windows platforms and languages. if can have options I won't chose C++ in the first place.

Wednesday, February 14, 2018

Referencing Counting in C#

There are situations we may need reference counting (RC) in GC managed environment like C#. Implementing reference counter is easy. But as there is no destructor in C#, it’s hard to maintain a correct counter.
To ward off mistacks, we can summary some rules of using reference counters.
If a reference counted (RCed) object a field member of another object, or it is contained in some collection filed of another object, we can that object the ‘owner‘.
In the following code snippets, A is owner of RcObj (if RcObj is reference counted)
class A {
    RcObj _b;
}

class A {
    List<RcObj> _list;
}
With the owner defined, we can define the following rules:
  • Rule 1: The initial reference counter is 1 for newly allocated objects.
  • Rule 2: When owning a reference counted object, we should increase it’s reference counter.
  • Rule 3: When losing the ownership, we should decrease the reference counter.
We further define that:
  • Rule 4: When an owner’s method (not get property) returns RCed objects, it should increase RCs before returning them. In another words, it shared the ownership.
  • Rule 5: When an owner’s get property returns RCed objects, it should not increase RCs.
From the above 5 rules, we can infer that:
  • Inference 1: After getting a RCed object from a method, we should release it unless we return it to the caller or own it.
class A {
    C c;
    List<RcObj> list;

    void Foo1() {
        RcObj obj = c.GetRcObj();
        obj.Release(); // decrease reference counter
    }
    RcObj Foo2() {
        RcObj obj = c.GetRcObj();
        return obj;
        // no need to decrease reference counter
    }

    void Foo3() {
        RcObj obj = c.GetRcObj();
        list.Add(obj);
        // no need to decrease reference counter
    }

    RcObj Foo4() {
        RcObj obj = c.GetRcObj();
        list.Add(obj);
        obj.Own(); //increase reference counter
        return obj;
    }

}

class C {
    RcObj GetRcObj() {
        return new RcObj();
    }
    // or
    RcObj _obj;
    RcObj GetRcObj() {
        _obj.Own; // increase RC before method return
        return _obj;
    }
}
  • Inference 2: : After getting a RCed object from a get property, we should not change its RC, unless we return it to the caller or own it.
class A {
    C c;
    void Foo1() {
        RcObj obj = c.RcObj;
        Bar(obj);
        // don't change is RC in this method, (Bar may change RC)
    }

    RcObj Foo2() {
        RcObj obj = c.RcObj;
        obj.Own
        // increase RC 
        return obj;
    }
}

class C {
    RcObj _obj;
    RcObj RcObj {
        get {
            return _obj;
        }
    }
}
  • Inference 3: We should not change RC of a RCed object passed in as method parameter , unless we want to own it.
class A {
    void Foo1(RcObj obj) {
        obj.xxx();
        obj.yyy();
        // don't change RC
    }

    RcObj _obj;
    void Foo2(RcObj obj) {
        obj.xxx();
        obj.yyy();

        this._obj = obj;
        obj.Own; // increase RC according to Rule 2
    }

    RcObj Foo3(RcObj obj) { // bad practice
        obj.xxx();
        obj.yyy();
        obj.Own; // increase RC according to Rule 4
        return obj; 
    }
}
With these rules and inferences, we can write a static code analysis tool to help us eliminate RC related bug.

Basic Reference Counter Implementation:
    public interface IRefCounter
    {
        void Own();
        void Release();
        int RefCount {get;}
    }

    public sealed class RefCounter : AbstractRefCounter
    {
        private readonly Action _handler;
        public RefCounter(Action handler)
        {
            _handler = handler;
        }

        protected override void OnCleanUp()
        {
            _handler?.Invoke();
        }


    }

    public abstract class AbstractRefCounter : IRefCounter
    {
        private int _refCount;

        protected AbstractRefCounter()
        {
            Own();
        }

        public void Own()
        {
            _refCount++;
        }

        public void Release()
        {
            _refCount--;
            if (_refCount == 0)
            {
                OnCleanUp();
            }
        }

        protected abstract void OnCleanUp();

        public int RefCount
        {
            get {
                return _refCount;
            }
        }

    }
Sample usage:

    public class RefCountedObj : AbstractRefCounter
    {
        protected override void OnCleanUp()
        {
        }
    }

    public class RefCountedObjWithDelegate : IRefCounter
    {
        private RefCounter _refCounter;

        public RefCountedObjWithDelegate()
        {
            _refCounter = new RefCounter(OnCleanUp);
        }

        public void OnCleanUp()
        {

        }
        public void Own()
        {
            _refCounter.Own();
        }

        public void Release()
        {
            _refCounter.Release();
        }

        public int RefCount()
        {
            return _refCounter.RefCount();
        }
    }

For more concrete rules, see StaticAnalysisRules.md
For source code, see ObjectPool

Saturday, November 18, 2017

Coordinate System Conversion Between Unity And A Java Game Server

Requirement from game

For the purpose of FPS attack predication I have to synchronized Hitboxs/BoxColliders between Unity Client And Java Game Server
enter image description here
To be more specific:
  • Each game character is exported to a DAE file for Server and to a FBX file for Client. These files contain geometry, joint and animation information.
  • HitBox information is manually defined and stored as a XML file and shared between Client&Server. Each Hitbox is attached to a Joint/Bone of the corresponding character.
  • When a player shoot a gun, both client and server will need to calculate the hit information. The server do so because it’s the arbitrator. The client do so to predicate the result and response immediately to the player.
The problem becomes complicated when:
  • For some reason we can’t ask unity to do the ray tracing. However there is a shared library between client and server that can calculate whether a bullet ( ray ) has hit a character and the bone being hit if so.
  • Client and server have different Coordinate Systems.
  • When imported to unity, character’s coordinate system is also changed.

Defining Problem

However, if we think about it, we can find the the problem is actually as simple as following:
For each hit box, given the transform (denoted as ) of the joint it attached to in Unity’s coordinate system, we need to find a matrix (denonated as ) that can transform it to the same state in server’s coordinate system.
In the game a character is rotated and translated after animated. Suppose a point has the hierarchy of the following:
- Translate Node
    - Rotate Node 
        - Character Pivot Node
            - Joint1 Node
                - Joint2 Node
                    - Joint3 Node
                        - P
Then in Unity can be defined as:

Where:
  • corresponds to translation
  • corresponds to rotation
  • corresponds to extra rotation introduced while imported by Unity
  • corresponds joint transform
Given we can transform the point from its local space coordinate to its world space coordinate :

Equally, in server we have :

And can use it to transform :

Note that there is a on Unity side, but no on server side. This will be explained latter.
We can easily convert between and and between and , because we know how two coordinate systems are defined. The unknown part is:

That is what Unity do when importing fbx files.

Coordinate Systems

The following image shows two coordinate systems:
enter image description here
Server’s on the left, Unity’s on the right. We can see that Unity’s left handed while server’s right handed. A point in server will be in Unity. The transform matrix would be:


What unity do when importing

After importing a FBX file, we can find that characters has the following changes:
  • If it is originally facing +X axis, it is now facing -X axis. This effectively changes it’s handness
  • It’s pivot is rotated around +x axis for +90 degrees. This step makes it head pointing to up direction again. Here is where the above mentioned comes from.
Given a point , if we change the character from binding pose to another pose, say run, we first transform from its model space to joint local space:

Where:
- is ’s coordinate in character’s binding pose.
- transform a point in joint1’s local space to joint1’s parent’s local space. So will have the reverse effect.
Then we transform to a new model space coordinate.

But if unity negates all model’s x axis. becomes . Suppose there is a matrix does the transform, that is: .
So we have:

Replacing with and with , we have:
$$

$$
For both of the above equations to be true. We can:

That way, the following equation also is true:

Here, stands for a matrix that can transform a point from jointX’s local coordinate to it’s parent’s local coordinate at posture posX.
Having the above information, we can now finish our task. Below is a code snippet of what I have written in our game:
public static Matrix3D Unity2SSJJ(Matrix4x4 result)
    {
        double[] data = new double[16];
        data[4 * 0 + 0] = result.m00;
        data[4 * 1 + 0] = result.m10;
        data[4 * 2 + 0] = result.m20;
        data[4 * 3 + 0] = result.m30;

        data[4 * 0 + 1] = result.m01;
        data[4 * 1 + 1] = result.m11;
        data[4 * 2 + 1] = result.m21;
        data[4 * 3 + 1] = result.m31;

        data[4 * 0 + 2] = result.m02;
        data[4 * 1 + 2] = result.m12;
        data[4 * 2 + 2] = result.m22;
        data[4 * 3 + 2] = result.m32;

        data[4 * 0 + 3] = result.m03;
        data[4 * 1 + 3] = result.m13;
        data[4 * 2 + 3] = result.m23;
        data[4 * 3 + 3] = result.m33;
        return new Matrix3D(data);
    }
    public static Matrix3D TransformToMatrix3D(Transform boneTransform, Matrix4x4 pivotWorldToLoal, PlayerEntity player)
    {
        Matrix4x4 mImportNegX = Matrix4x4.TRS(Vector3.zero, Quaternion.identity, new Vector3(-1, 1, 1));
        Matrix3D m  = Unity2SSJJ(mImportNegX * pivotWorldToLoal * boneTransform.localToWorldMatrix * mImportNegX);
        m.AppendRotation(player.orientation.ViewYaw, Vector3D.Z_AXIS);
        m.AppendTranslation(player.position.Origin.x, player.position.Origin.y, player.position.Origin.z);
        return m;

    }

Saturday, November 04, 2017

Sharing code between CSharp and Java

There are many ways we can share code between C# and Java, for example:

  • Write the shared code in a lightweight scripting language, and run a VM in C# and Java. Lua is one of this type.
  • Write the shared code in a language than can translate both to C# and java, such as Haxe.
  • Write the shared code in Java and use tools to translate to C#.
Sometimes, we already has some existing code in Java, so we are only left with the third option. Besides, even if there is no existing code, Java to C# is an attractive option thanks to Java's rich & friendly development environment.

Recently, I have to share some common logic between Java and C# . To be more specific, there is a Java based game server which are connected by an existing Flash client, what we want to do is to add another Unity client while preserving both Java and AS3's logic.  After a lot of discussion, we decided to adopt the following strategy:

  • First, refactor and extract the shared logic in the Java server to a standalone module which shouldn't be depend on any other java library except Java's buildin library.
  • Second, translated the extracted module from java to C# use a tool such as sharpen.
We choose sharpen as our translating tool, because:
  • first of all, it's open source and can be modified if necessary
  • then, there are some successful application of it already, such as ngit.
There are lots of sharpen version on the internet, and I used the mono version with some multi dimension array fixed from anders9ustafsson's version, and pushed final result to my folk.

The document is very hard to find, so I made a backup in sharpen-doc

Here are some other resources I found usefull:
  • Build help: https://github.com/mono/sharpen
  • Example configuration: https://github.com/ydanila/sharpen_imazen_config
  • Sharpen C# utility classes: https://github.com/mono/ngit/tree/master/Sharpen/Sharpen
  • Another sharpen blog: http://pauldb-blog.tumblr.com/post/14916717048/a-guide-to-sharpen-a-great-tool-for-converting
When running sharpen,do set -Dfile.encoding=UTF-8 option, or some file won't compile successfully.

Command line Arguments:

ArgumentUsage
-pascalCaseConvert Java identifiers to Pascal case
-pascalCase+Convert Java indentifiers and package names (namespaces) to Pascal case
-cpAdds a new entry to classpath:

-srcFolderAdds a new source folder for sharpening
-nativeTypeSystemMap java classes to .NET classes with a similar functionality. For example: java.lang.Class - System.Type
-nativeInterfacesAdds an "I" in front of the interface name
-organizeUsingsAdds "using" for the types used
-fullyQualifyConverts to a fully-qualified name:
-fullyQualify File
-namespaceMappingMaps a java package name to a .NET namespace. For example:
-namespaceMapping com.db4o Db4objects.Db4o
-methodMappingMaps a java method name to a .NET method (can be method in another class). For example:
-methodMapping java.util.Date.getTime Sharpen.Runtime.ToJavaMilliseconds
-typeMappingMaps a java class to .NET type:
-typeMapping com.db4o.Db4o Db4objects.Db4o.Db4oFactory
-propertyMappingMaps a java method to .NET property:
-propertyMapping com.db4odoc.structured.Car.getPilot Pilot
-runtimeTypeNameName of the runtime class. The runtime class provides implementation for methods that don't have a direct mapping or that are simpler to map at the language level than at the sharpen level. For instance: String.substring, String.valueOf, Exception.printStackTrace, etc.
For a complete list of all the method that can be mapped to the runtime class see Configuration#runtimeMethod call hierarchy.
-headerHeader comment to be added to all converted files.
-header config/copyright_comment.txt
-xmldocSpecifies an xml-overlay file, which overrides javadoc documentation for specific classes:
-xmldoc config/sharpen/ApiOverlay.xml
-eventMappingConverts the methods to an event.
-eventAddMappingMarks the method as an event subscription method. Invocations to the method in the form .method() will be replaced by the c# event subscription idiom: +=
-conditionalCompilation
Add a condition when to translate the Java code
- -conditionalCompilation com.db4o.db4ounit.common.cs !SILVERLIGHT
-configurationClassChange the configuration class. The default is 'sharpen.core.DefaultConfiguration'


Annotation Reference:

AnnotationMeaning
@sharpen.enumMark java class to be processed as a .NET enum
@sharpen.renameSpecifies a different name for the converted type, takes a single name argument. For example:
@sharpen.rename Db4oFactory
@sharpen.privateSpecifies that the element must be declared private in the converted file, though it can be not private in the java source:
/*
* @sharpen.private
*/
public List4 _first;
@sharpen.internalSpecifies that the element must be declared internal in the converted file:
/**
 * @sharpen.internal
*/
public abstract int size();
@sharpen.protectedSpecifies that the element must be declared protected in the converted file:
/**
 * @sharpen.protected
*/
public abstract int size();
@sharpen.newAdds the C#-'new' modifier to the translated code.
@sharpen.eventLinks an event to its arguments. For example:
Java:
/**
* @sharpen.event com.db4o.events.QueryEventArgs
*/
public Event4 queryStarted();
is converted to:
public delegate void QueryEventHandler(
  object sender,
  Db4objects.Db4o.Events.QueryEventArgs args);
.......
event Db4objects.Db4o.Events.QueryEventHandler QueryStarted;
@sharpen.event.addMarks the method as an event subscription method. Invocations to the method in the form .method() will be replaced by the c# event subscription idiom: +=
@sharpen.event.onAddValid for event declaration only (SHARPEN_EVENT). Configures the method to be invoked whenever a new event handler is subscribed to the event.
@sharpen.ifAdd #if #endif declaration:
@sharpen.if
@sharpen.propertyConvert a java method as a property:
/**
 * @sharpen.property
*/
public abstract int size();
@sharpen.indexerMarks an element as an indexer property
@sharpen.ignoreSkip the element while converting
@sharpen.ignore.extendsIgnore the extends clause in Java class definition
@sharpen.ignore.implementsIgnore the implements clause in Java class definition
@sharpen.extendsAdds an extends clause to the converted class definition. For example:
Java:
/**
* @sharpen.extends System.Collections.IList
*/
public interface ObjectSet {...
converts to
public interface IObjectSet : System.Collections.IList
@sharpen.partialMarks the converted class as partial
@sharpen.removeMarks a method invocation that should be removed
@sharpen.remove.first
Removes the first line of the method/constructor when converting to C#:
/**
* @sharpen.remove.first
*/
public void doSomething(){
    System.out.println("Java");
    NextMethod();
}
converts to:
public void DoSomething(){
    NextMethod();
}
@sharpen.structMarks class to be converted as c# struct
@sharpen.unwrap
When a method is marked with this annotation
all method calls are removed. This is useful for removing conversion methods when their aren't required in C#.
/*
* @sharpen.unwrap
*/
public Iterable toIterable(Object[] array){
   return Arrays.asList(array);
}
public void doSomething(Object[] objs){
  Iterable iterable = toIterable(objs);
  // do something with the iterable
}
Is converted to:
public IEnumerable ToIterable(object[] array){
    return Arrays.AsList(array);
}
public void doSomething(object[] objs){
   Iterable iterable = objs;
   // do something with the iterable
}
@sharpen.attribute
Adds an attribute to the converted code:
/*
* @sharpen.attribute TheAttribute
*/
public void doSomething(){}
Will be converted to:
[TheAttribute]
public void DoSomething(){}
@sharpen.macroAdd a replace-pattern macro to your code.



Refactoring Legacy Systems

In his book Working Effectively with Legacy Code (Robert C. Martin Series) Robert C. Martin defined legacy systems as systems with no unit testing code.

I have many experiences working with and refactoring legacy systems. Usually, the moment the word "refactoring" pops up in our head, is when the system is too messy and not just without unit testing:

  • The whole system is monolithic. It only can be divided into modules conceptually, but not physically. That is you can never successfully compile part of it separately. 
  • Modules boundaries are blur. No clear responsibility is defined and followed.
  • Dependencies are complex. Singletons are defined and referenced everywhere.  No only they are depended, themselves are dependent on our singletons.
  • ...
In my experience, to successfully refactor a legacy system, I usually go though the following steps:

Refactor building scripts to allow build modules separately. I usually refactor module by module: that is by extracting a cohesive collection of responsibilities into a sub module once at a time. The may depend on other refactored modules, but they should never depend on the monolithic chunk they are originally extracted from. Building modules separately is a good sign of dependency decoupling.

Setup testing environment. The most significant obstacle in refactoring is not that we don't know what's the right thing to do, but that we are afraid that bugs may be introduced. Martin suggest that characterization test code that define existing behavior should be written first. After that we can be more confident when modifying code.

Define and refactor out modules one at a time.  I define modules by their responsibilities. Cohesive responsibilities can be put into one module. Then the real job of code modification begins. Both Robert C. Martin's book and  Martin Fowler and Kent Beck's book gives lots of techniques to show you how to do this.

Write demo programs if unit testing is hard. To be honest, some modules are very hard to unit test, such as 3d rendering related modules. But we can always write some demo programs that uses the module we have just refactored to show that it's self functions is complete and correct.

Some more advice are:
  • Choose a good IDE differs a lot. To be more specific, I strongly recommend Intellij IDEA which I think is a must have tool for me when refactoring.
  •  Decoupling dependency is the most important job. Even if it may introduce some ugly code during the process, you should finish decoupling first. After that, more refactoring is very easy, and that's when you can fix these ugly code introduced earlier.
  • Read Robert C. Martin's book first before you do any refactoring job. It's very practical.

Finding memory leaks in Actionscript programs

It is true that Adobe Flash is dying. In its official blog FLASH & THE FUTURE OF INTERACTIVE CONTENT, Adobe said it will "stop updating and distributing the Flash Player at the end of 2020". But before that happens, we still need to find a way to support our existing AS3 game because millions of players are playing it.

In my experience, AS3 is not a nice programming language. More sadly, it doesn't have very friendly tools to help developing. Flash Builder is buggy when project structure is getting complicated.

And there is no good tool to help find memory leaks in AS3 programs, neither scout nor flash builder. But AS3 is a language that can easy end up with memory leak bugs, due to it's event registration mechanism.

In the end, we have to spend a few weaks to build our own AS3 memory profiling tool.

It has three parts:

  • A apparat based code injection tool named "apparat profiler", which captures all objects that are instantiated.
  • A AS3 library named "profiler.swc" that we can record object when "object instantation" is captured by "apparat profiler". It also is responsible of sending out recorded objects after requested.
  • A MAT based tool we can receive and analysis recorded objects.



There are conditions we need be carefully of:
  • Not all objects are created via AS3 code, so they can't be captured. The walk around is  to run though all recorded objects and see if they have reference to any objects that aren't captured yet.
  • The whole process is slow. But it's ok since finding a memory leak is very hard and more time consuming then without it.

Drawing Git Branch Graphs with gitgraphjs

Recently I have to write a PPT to share with colleages about how to use beyond compare to assist merging. To tell things clearly, I want to draw lots of git branch graphs. Using tools such as Microsoft Visio is one option, but I want a more easy way.

After some searching, I found this amazing tool: http://gitgraphjs.com/. With a few lines of javascript code, it can easily draw the following graph:

It almost satisfies all my needs, except that I want add some identification notes to the nodes, so that I can refer them clearly when presenting.

After some code modification, I successfully added a number to each node, as ire.n the following picture.

The final code is checkin to https://github.com/shawn11ZX/gitgraph.js  and the PPT is shared at slideshare


Sunday, February 12, 2017

Reading Notes: How to Read a Book

How to Read a Book is written by the American philosopher Mortimer J. Adler. It can be classified into the category of Practical Books.
In a nutshell, this book tries to help improve people’s ability of reading books. It states that before reading any book there are four questions each reader must ask themselves:
  • What is the book about as a whole
  • What is been said in detail and how
  • Is the book true, in whole and part
  • How is the book related to you?
By dividing people’s ability of reading into the following four different but highly related levels, the author helps us find ways as to how to answer the above four questions.
  • elementary reading
  • inspectional reading
  • analytical reading
  • syntopical reading
The book is consists of four parts with each part have several chapters.

1. PART ONE: THE DIMENSIONS OF READING.

Given the same book, different reader can have very different gains. The difference is mainly caused by reader’s active attitudes and reading skills. The more effort we put on reading the more understanding we can obtain from it.
There are four levels of reading, each lower level is included in higher level. That is the higher the level one is choosing, the more effort he will need to pay, and finally the more comprehension he can get from the reading activity.

1.1. elementary reading

The first and lowest level of reading is elementary reading. The purpose of this level is answer the question: What is the meaning of the sentences. This level can be further divided into four stages:
  1. reading readiness, corresponding to pre-school experiences.
  2. word mastery
  3. vocabulary growth and utilization of context
  4. functional literacy.
At the final stage, children can read traffic signs or picture captions fairly easy. They are considered mature readers and can almost read any kinds of books.

1.2. inspectional reading

The second level of reading is inspectional reading, which answers the question: what the book is about as a whole. It can be divided into two steps:
Step one is systematic skimming or pre-reading. By going through title, preface, table of content, index, publisher’s blurb, and some intentionally chosen chapters or paragraphs one can get a brief impression of the book and be able to classify it. We should consider ourselves as detectives looking for clues to a book’s general theme or idea.
Step two is superficial reading. The trick is to read from front to end with as little pause as possible, without looking up dictionaries, encyclopedia or thing alike. As the author said, trying to understand every detail at this stage can easily destroy your interest and curiosity in this book. Even if you are not going to read it again, reading a whole book without total understanding is better than never read it through.

1.3. Demanding Readers

To be good at reading, one need to be active. To be active, the trick is to first ask the above mentioned four questions and then find answers as you read.
During reading, we can also make marks on the book to:
  • keep us awake
  • help us clearly express our understanding and ideas
  • help us remember author’s ideas.
The person who says he knows what he thinks but cannot express it usually does not know what he thinks.

2. PART TWO: Analytical reading

Analytical reading contains three phases, each of which emphasizes several rules.
  • Phase 1 answers the question "What is the book about as a whole"
    • Rule #1: Classify a Book
    • Rule #2: Summarize the content of the book with one sentence
    • Rule #3: List the outline of the book
    • Rule #4: Find out the questions the author tries to answer in the book
  • Phase 2 answers the question "What is been said in detail and how"
    • Rule #5: For keywords, find out the exact definition
    • Rule #6: Find out key proposals that result in the main idea of the book
    • Rule #7: Figure out how these proposals are proven in the book.
    • Rule #8: Of the questions in Rule #4, make clear what of these are answered and what are not. Of the unanswered questions, what are recognized by the author.
  • Phase 3 answers the 3rd one by making remarks about the book
    • Rule #9: Fully understand the book before making any judgement.
    • Rule #10: Don’t be aggressive
    • Rule #11: Differentiate between knowledge and opinion before remarking
If you want to make criticisms:
  • Rule #12: prove that the author’s knowledge is not adequate
  • Rule #13: prove that the author’s knowledge is wrong
  • Rule #13: prove that the author is illogical
  • Rule #13: prove that the author’s analysis or reasoning is inadequate

2.1. Classifying a Book

The first step of analytical reading is find the category of the book you are about to read, the earlier the better.
Books can be first classified into two categories: expository or fiction. The purpose of expository book is to convey knowledge.
We may or may not find hinds about a book’s category from its title.
Expository books can be further split into two kinds: practical and theoretical. The former focuses on HOW, while the later concentrates on WHAT.
Examples of practical books include:
  • engineering
  • medical
  • cooking
Not only do practical books tells us HOW, they usually spend a lot of effort persuade us to follow their instructions. After all, a practical book that is not followed by its readers is actually mean-less.
Common words in practical books includes: should, ought, good or bad.
Examples of theoretical books include:
  • history, describes true events happened at specific locations in specific time periods.
  • science, which is about general truth, most of which can’t be easily verified in our daily life but rather based on accurate observations or experiments.
  • philosophy, which can’t usually drift too far away from our daily experiences.

2.2. Questions of the author

At the beginning of writing, writers usually has one or more questions, which they answer in the book.
For practical books, questions may be like:
  • What ends should be sought?
  • What means should be chosen to a given end?
  • Under given conditions, what’s better? what’s right? what’s worse?
For theoretical books, these are example questions:
  • Does something exist?
  • What kind of thing is it?
  • What’s the effect?

2.3. Keywords

No language is perfect. Every words can have several meanings and every meaning can be conveied by more than one word. That why we need to reach a common defination of the keywords in the book with the author.
Personally, that’s why I find it’s sometimes more easy to read the original version of a book even if it’s written in a foreign langure, than to read a translated book in my native language. Because during translation, if the translators are a little careless, meanings of keywords will be mistranslated.

3. PART THREE: syntopical reading

Synotopical reading is usually applied to reading of social science books. Since it’s relatively hard to find a main authentic book of a given subject, we need to read and compare several books to understand the subject.

Thursday, October 20, 2016

Covariant and Invariant in Java

1. About covariant and invariant

Arrays are reified, while generics are implemented by erasure. This means that arrays know and enforce their element types at runtime.
Below is the most clear explaination I ever found, which is cited from stackoverflow

1.1. Covariance, Invariance and Contravariance explained in plain English?

At heart, these terms describe how the subtype relation is affected by type transformations. That is, if A and B are types, f is a type transformation, and ≤ the subtype relation (i.e. A ≤ B means that A is a subtype of B), we have
  • f is covariant if A ≤ B implies that f(A) ≤ f(B)
  • f is contravariant if A ≤ B implies that f(B) ≤ f(A)
  • f is invariant if neither of the above holds Let’s consider an example. Let f(A) = List<A> where List is declared by
class List<T> { ... }
Is f covariant, contravariant, or invariant? Covariant would mean that a List<String> is a subtype of List<Object>, contravariant that a List<Object> is a subtype of List<String> and invariant that neither is a subtype of the other, i.e. List<String> and List<Object> are inconvertible types. In Java, the latter is true, we say (somewhat informally) that generics are invariant.
Another example. Let f(A) = A[]. Is f covariant, contravariant, or invariant? That is, is String[] a subtype of Object[], Object[] a subtype of String[], or is neither a subtype of the other? (Answer: In Java, arrays are covariant)
This was still rather abstract. To make it more concrete, let’s look at which operations in Java are defined in terms of the subtype relation. The simplest example is assignment. The statement
x = y;
will compile only if typeof(y) ≤ typeof(x). That is, we have just learned that the statements
ArrayList<String> strings = new ArrayList<Object>();
ArrayList<Object> objects = new ArrayList<String>();
will not compile in Java, but
Object[] objects = new String[1];
will.
Another example where the subtype relation matters is a method invocation expression:
result = method(a);
Informally speaking, this statement is evaluated by assigning the value of a to the method’s first parameter, then executing the body of the method, and then assigning the methods return value to result. Like the plain assignment in the last example, the "right hand side" must be a subtype of the "left hand side", i.e. this statement can only be valid if typeof(a) ≤ typeof(parameter(method)) and returntype(method) ≤ typeof(result). That is, if method is declared by:
Number[] method(ArrayList<Number> list) { ... }
none of the following expressions will compile:
Integer[] result = method(new ArrayList());
Number[] result = method(new ArrayList());
Object[] result = method(new ArrayList<Object>());
but
Number[] result = method(new ArrayList<Number>());
Object[] result = method(new ArrayList<Number>());
will.
Another example where subtyping matters is overriding. Consider:
Super sup = new Sub();
Number n = sup.method(1);
where
class Super {
    Number method(Number n) { ... }
}

class Sub extends Super {
    @Override
    Number method(Number n);
}
Informally, the runtime will rewrite this to:
class Super {
    Number method(Number n) {
        if (this instanceof Sub) {
            return ((Sub) this).method(n);  // *
        } else {
            ...
        }
    }
}
For the marked line to compile, the method parameter of the overriding method must be a supertype of the method parameter of the overridden method, and the return type a subtype of the overridden method’s one. Formally speaking, f(A) = parametertype(method asdeclaredin(A)) must at least be contravariant, and if f(A) = returntype(method asdeclaredin(A)) must at least be covariant.
Note the "at least" above. Those are minimum requirements any reasonable statically type safe object oriented programming language will enforce, but a programming language may elect to be more strict. In the case of Java 1.4, parameter types and method return types must be identical (except for type erasure) when overriding methods, i.e. parametertype(method asdeclaredin(A)) = parametertype(method asdeclaredin(B)) when overriding. Since Java 1.5, covariant return types are permitted when overriding, i.e. the following will compile in Java 1.5, but not in Java 1.4:
class Collection {
    Iterator iterator() { ... }
}

class List extends Collection {
    @Override
    ListIterator iterator() { ... }
}
I hope I covered everything - or rather, scratched the surface. Still I hope it will help to understand the abstract, but important concept of type variance.