diff --git a/.gitignore b/.gitignore index a8e229d..40ad8ee 100755 --- a/.gitignore +++ b/.gitignore @@ -59,3 +59,4 @@ data/ /pyenv /python/pysol.cpp /log +!requirements.txt diff --git a/README.md b/README.md index 9854cfd..5d8c511 100644 --- a/README.md +++ b/README.md @@ -39,9 +39,13 @@ To get started, please read the ``Quick Start'' section first. Table of Contents ================= -- Installation -- Quick Start -- Additional Information ++ [Installation](#installation) + + [Install from source](#install-from-source) + + [Known Issues of Python Wrappers](#known-issues-of-python-wrappers) ++ [Quick Start](#quick-start) ++ [Comparison of Online Learning Algorithms](#comparison-of-online-learning-algorithms) ++ [License and Citation](#licence-and-citation) ++ [Additional Information](#additional-information) Installation ====================== @@ -62,7 +66,7 @@ Both the python scripts and C++ executables & Libraries are dependent on the sam SOL features a very simple installation procedure. The project is managed by `CMake` for C++ and `setuptools` for python. -###Getting the code +### Getting the code There exists a `CMakeLists.txt` in the root directory. The latest version of SOL is always available via 'github' by invoking one @@ -74,7 +78,7 @@ of the following: ## For HTTP-based Git interaction $ git clone https://github.com/LIBOL/SOL.git -###Build C++ Executables and Dynamic Libraries +### Build C++ Executables and Dynamic Libraries 1. Prerequisites @@ -130,6 +134,7 @@ We highly recommend users to install python packages in a virtual enviroment. + Build and install the python scripts + $ pip install -r requirements.txt $ python setup.py build $ python setup.py install @@ -244,11 +249,11 @@ and LIBLINEAR. To quikly get a comparison on the small dataset ``a1a`` as provided in the data folder: $ cd experiments - $ python experiment.py --shufle 10 a1a ../data/a1a ../data/a1a.t + $ python experiment.py --repeat 10 a1a ../data/a1a ../data/a1a.t The script will conduct cross validation to select best parameters for each -algorithm. Then the script will shuffle the training 10 times. For each -shuffled data, the script will train and test for each algorithm. The final +algorithm. Then the script will repeat the training 10 times. For each +repeatd data, the script will train and test for each algorithm. The final output is the average of all results. And a final table report will be shown as follows. algorithm train train test test @@ -273,7 +278,7 @@ output is the average of all results. And a final table report will be shown as There will also be three pdf figures displaying the update number, training error rate, and test error rate over model sparsity. Users can also compare on the multi-class dataset -[``mnist``](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#mnist) with the follow command (Note that we only shuffle the training data once in this example, so the standard deviation is zero): +[``mnist``](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#mnist) with the follow command (Note that we only repeat the training data once in this example, so the standard deviation is zero): $ python experiment.py mnist ../data/mnist.scale ../data/mnist.scale.t @@ -301,7 +306,7 @@ The output is: The tables and figures in our paper description are obtained with the following command: - $ python experiment.py --shuffle 10 rcv1 ../data/rcv1_train ../data/rcv1_test + $ python experiment.py --repeat 10 rcv1 ../data/rcv1_train ../data/rcv1_test License and Citation diff --git a/python/pysol.pxd b/python/pysol.pxd index e2f522d..ca95b90 100644 --- a/python/pysol.pxd +++ b/python/pysol.pxd @@ -25,3 +25,10 @@ cdef extern from "sol/c_api.h": int sol_convert_data(const char* src_path, const char* src_type, const char* dst_path, const char* dst_type, bint binarize, float binarize_thresh) int sol_shuffle_data(const char* src_path, const char* src_type, const char* dst_path, const char* dst_type) int sol_split_data(const char* src_path, const char* src_type, int fold, const char* output_prefix, const char* dst_type, bint shuffle) + +cdef class SOL: + cdef void* _c_model + cdef void* _c_data_iter + cdef const char* algo + cdef int class_num + cdef bint verbose diff --git a/python/pysol.pyx b/python/pysol.pyx index a191277..c3732c2 100644 --- a/python/pysol.pyx +++ b/python/pysol.pyx @@ -43,12 +43,6 @@ cdef void inspect_iteration(void* user_context, handler(data_num, iter_num, update_num, err_rate) cdef class SOL: - cdef void* _c_model - cdef void* _c_data_iter - cdef const char* algo - cdef int class_num - cdef bint verbose - def __cinit__(self, const char* algo = NULL, int class_num = -1, int batch_size=256, int buf_size = 2, verbose=False, **params): """Create a new Handle for SOL C Library diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..e26b29e --- /dev/null +++ b/requirements.txt @@ -0,0 +1,26 @@ +asn1crypto==0.24.0 +backports.functools-lru-cache==1.6.1 +cryptography==2.1.4 +cycler==0.10.0 +Cython==0.29.21 +enum34==1.1.6 +idna==2.6 +ipaddress==1.0.17 +keyring==10.6.0 +keyrings.alt==3.0 +kiwisolver==1.1.0 +matplotlib==2.2.5 +mercurial==4.5.3 +numpy==1.16.6 +pycrypto==2.6.1 +pygobject==3.26.1 +pyparsing==2.4.7 +python-dateutil==2.8.1 +pytz==2020.4 +pyxdg==0.25 +PyYAML==3.12 +scikit-learn==0.20.4 +scipy==1.2.3 +SecretStorage==2.3.1 +six==1.11.0 +subprocess32==3.5.4