开源软件名称(OpenSource Name):HoloClean/holoclean开源软件地址(OpenSource Url):https://github.com/HoloClean/holoclean开源编程语言(OpenSource Language):Python 99.5%开源软件介绍(OpenSource Introduction):HoloClean: A Machine Learning System for Data EnrichmentHoloClean is built on top of PyTorch and PostgreSQL. HoloClean is a statistical inference engine to impute, clean, and enrich data. As a weakly supervised machine learning system, HoloClean leverages available quality rules, value correlations, reference data, and multiple other signals to build a probabilistic model that accurately captures the data generation process, and uses the model in a variety of data curation tasks. HoloClean allows data practitioners and scientists to save the enormous time they spend in building piecemeal cleaning solutions, and instead, effectively communicate their domain knowledge in a declarative way to enable accurate analytics, predictions, and insights form noisy, incomplete, and erroneous data. InstallationHoloClean was tested on Python versions 2.7, 3.6, and 3.7. It requires PostgreSQL version 9.4 or higher. 1. Install and configure PostgreSQLWe describe how to install PostgreSQL and configure it for HoloClean (creating a database, a user, and setting the required permissions). Option 1: Native installation of PostgreSQLA native installation of PostgreSQL runs faster than docker containers. We explain how to install PostgreSQL then how to configure it for HoloClean use. a. Installing PostgreSQLOn Ubuntu, install PostgreSQL by running
For macOS, you can find the installation instructions on https://www.postgresql.org/download/macosx/ b. Setting up PostgreSQL for HoloCleanBy default, HoloClean needs a database
CREATE DATABASE holo;
CREATE USER holocleanuser;
ALTER USER holocleanuser WITH PASSWORD 'abcd1234';
GRANT ALL PRIVILEGES ON DATABASE holo TO holocleanuser;
\c holo
ALTER SCHEMA public OWNER TO holocleanuser; You can connect to the HoloClean currently populates the database DROP DATABASE holo;
CREATE DATABASE holo; Option 2: Using DockerIf you are familiar with docker, an easy way to start using HoloClean is to start a PostgreSQL docker container. To start a PostgreSQL docker container, run the following command: docker run --name pghc \
-e POSTGRES_DB=holo -e POSTGRES_USER=holocleanuser -e POSTGRES_PASSWORD=abcd1234 \
-p 5432:5432 \
-d postgres:11 which starts a backend server and creates a database with the required permissions. You can then use Note the port number which may conflict with existing PostgreSQL servers. Read more about this docker image here. 2. Setting up HoloCleanHoloClean runs on Python 2.7 or 3.6+. We recommend running it from within a virtual environment. Creating a virtual environment for HoloCleanOption 1: Conda Virtual EnvironmentFirst, download Anaconda (not miniconda) from this link. Follow the steps for your OS and framework. Second, create a conda environment (python 2.7 or 3.6+). For example, to create a Python 3.6 conda environment, run: $ conda create -n hc36 python=3.6 Upon starting/restarting your terminal session, you will need to activate your conda environment by running $ conda activate hc36 Option 2: Set up a virtual environment using pip and VirtualenvIf you are familiar with For Python 3.6, create a new environment with your preferred virtualenv wrapper, for example:
Either follow instructions here or install via
$ pip install virtualenv Then, create a $ mkdir -p hc36
$ virtualenv --python=python3.6 hc36 where Activate the environment $ source hc36/bin/activate Install the required python packagesNote: make sure that the environment is activated throughout the installation process.
When you are done, deactivate it using
In the project root directory, run the following to install the required packages. Note that this commands installs the packages within the activated virtual environment. $ pip install -r requirements.txt Note for macOS Users:
you may need to install XCode developer tools using Running HoloCleanSee the code in In order to run the example script, run the following: $ cd examples
$ ./start_example.sh Notice that the script sets up the Python path environment to run HoloClean. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论