auto rtl has been added

summery
remove problems
2023-10-21 15:37:32 +03:00 · 2023-08-12 18:19:37 +03:00 · 2023-08-12 18:13:10 +03:00 · 2023-08-12 17:46:38 +03:00 · 2023-08-12 15:41:13 +03:00 · 2023-08-10 18:09:19 +03:00
16 changed files with 178199 additions and 2666 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,6 @@
-venv/*
+venv/*
+logs/*
+.vscode/*
+.ipynb_checkpoints/*
+__pycache__/*
+*.csv
--- a/.ipynb_checkpoints/project_notebook-checkpoint.ipynb
+++ b/.ipynb_checkpoints/project_notebook-checkpoint.ipynb
--- a/2023-05-15_21-45-30.log
+++ b/2023-05-15_21-45-30.log
--- a/README.md
+++ b/README.md
@@ -1,3 +1,48 @@
 # DH

-This is the project for course {ENTERCOUSENUMBER} of Dr. Renana Keidar
+This is the project for course 33503 of Dr. Renana Keidar
+
+Project, By Benny Saret
+
+# דו"ח התקדמות
+
+## מטרות
+מטרת הפרוייקט היא לייצר דרך למצוא קרבה או אינטראקסטואליות בין טקסטים שונים באכדית בין תקופות שונות, סוגות שונות ומרחקים גיאוגרפיים. [אינטרטקסטואליות](https://www.merriam-webster.com/dictionary/intertextually) הוא מונח המתאר מערכת קרבה וקשר בין טקסט מסויים לטקסטים אחרים, המשתמשים כחומר מצע, התכתבות, או ויכוח לאותו טקסט. את אותה קרבה ניתן לראות בעזרת מינוחים דומים, דימויים דומים, שיבוצי כתובים ועוד.
+
+## נתוני מקור
+נתוני המקור כולם נלקחו מפרוייקט ORACC [The Open Richly Annotated Cuneiform Corpus](http://oracc.museum.upenn.edu/ "ORACC, (לקמן, אוראקק)"). פרוייקט זה, הוא הפרוייקט הגדול והמקיף ביותר של טקסטים בכתב יתדות, פתוחים ונגישים לשימוש לקהל הרחב, ולחוקרים מכול הסוגים. הנתונים מגיעים בפורמטי JSON,TEI,XML ו־HTML, ומתעדכנים בכול עת.
+בפרוייקט ישנם לא רק טקסטים באכדית, אלא גם טקסטים באוררטית, שומרית וכן גם טקסטים בשפות משולבות של איזורי סְפָֿר.
+
+## אופן העבודה
+
+### איסוף הנתונים
+<style>
+ul{
+    align: right;
+    direction: rtl;
+}
+li{
+    align: right;
+    direction: rtl;
+}
+</style>
+השלב הראשון בפרוייקט היה איסוף הנתונים מאוראקק. תת השלבים של האיסוף היו: 
+1. הקמת נתונון לשמירה של המידע הנאסף. הנתונון שנבחר היה postgresql, נתונון יחסי המממש את שפת SQL.
+1. יצירת טבלאות להכנסת הנתונים. לשם כך נוצרו הטבלאות הבאות 
+    - סוגה: טבלא בשם סוגה (genre) שמרה בתוכה את הסוגה של כול טקסט, לפי קוד הטקסט. [Genre](https://dh.saret.tk/dh/api/ggenre)
+    - פרוייקט: טבלא בשם פרוייקט (project) שמרה בתוכה את כלל שמות הפרוייקטים ותתי הפרוייקטים. טבלא זו נדרש בעיקר בשלב גרידת הטקסטים.[Project](https://dh.saret.tk/dh/api/gprojects)
+    - תעתיק: טבלא בשם new כללה את התעתיק המפוצל לאכדית, יחד עם המזהה של הטקסט, על מנת להצמיד ביניהם בהמשך. [New](https://dh.saret.tk/dh/api/gnew)
+    - תרגום: טבלא נוספת הייתה טבלא בשם raw_texts שמטרתה הייתה להחזיק את כלל התרגומים של הטקסטים. [Jsons](https://dh.saret.tk/dh/api/gjson)
+    - ניתן לראות את כלל הקישרוים ב[קישורים](https://dh.saret.tk/dh/api/links)
+1. כתיבת קוד פייתון אשר יוריד את כלל המידע, ויכניס אותו לנתונון.
+
+### עיבוד הנתונים
+השלב הבא, לאחר איסוף הנתונים, הוא שלב העיבוד. שלב זה היה יחסית מאתגר. לאחר חודשים שבהם ניסיתי להריץ מספר מודלים פשוטים כגון  Word2Vec, TF-IDF, Doc2Vec ועוד, התקבלו תוצאות מוזרות, של קשרים שהתאימו רק בין טקסט לבין עצמו, התאמה של 1, והשאר, היו על התאמה של 0.
+
+לאחר מספר חודשים של ניסיונות, ונטישות, פניתי לעזרת פורום פייסבוק בקבוצת MDLI, שם הציעו לי מחדש ללכת על מודלים פשוטים, ואף שלחו לי מספר קישורים מתוך medium ([TF-IDF Vectorizer scikit-learn](https://medium.com/@cmukesh8688/tf-idf-vectorizer-scikit-learn-dbc0244a911a) ו־[Understanding TF-IDF and Cosine Similarity for Recommendation Engine](https://medium.com/geekculture/understanding-tf-idf-and-cosine-similarity-for-recommendation-engine-64d8b51aa9f9) ), והייתה לי התקדמות במודל. ואולם, על אף שהצליחו לצאת לי תוצאות, לא הצלחתי לייצר גרף מהווקטורים הללו.
+
+### הדגמת תוצאות
+שני טקסטים שנמצאו בעלי קרבה של כ־87% הם למשל, [P394767](http://oracc.iaas.upenn.edu/btto/P394767/html) ו־[P395011](http://oracc.iaas.upenn.edu/btto/P395011/html). לאחר בדיקה קצרה של הטקסטים הללו, גם לעיניים שלי, הם נראו דומים. ובאמת, שני הטקסטים הללו מגיעים מאותה רשימה קאנונית המכונה "House most high". באוראקק אין כול אזכור ש־P394767 הוא מתוך הרשימה ההיא, אך המודל מצא את הדמיון, והעלה זאת לבדו.
+
+# סיכום
+בסופו של דבר, המודל הצליח להציג תוצאות טובות, אך עדיין לא מספקות. על כן, יש צורך בעבודה נוספת על המודל, ובפרט על הנתונים שהוכנסו למודל. כמו כן, יש צורך בעבודה על הגרף עצמו, ובפרט על הצגתו למשתמש באופן נוח וידידותי. המודל, והשיטה יכולים להוות התקדמות למחקר עתידי, לפיתוחו ולשימוש להבנת האכדית בצורה טובה יותר.
--- a/pycache/scrapping.cpython-39.pyc
+++ b/pycache/scrapping.cpython-39.pyc
--- a/data.jsonl
+++ b/data.jsonl
--- a/datat.ipynb
+++ b/datat.ipynb
@@ -0,0 +1,147 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sklearn\n",
+    "import sklearn.model_selection\n",
+    "from sklearn.metrics.pairwise import cosine_similarity\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer\n",
+    "import pandas as pd\n",
+    "import scipy\n",
+    "import numpy as np\n",
+    "\n",
+    "df_eng = pd.read_csv('raw_texts.csv')\n",
+    "df_akk = pd.read_csv('new.csv')\n",
+    "# akk_raw_train, akk_raw_test = sklearn.model_selection.train_test_split(df_akk, test_size=0.2, random_state=0)\n",
+    "# eng_raw_train, eng_raw_test = sklearn.model_selection.train_test_split(df_eng, test_size=0.2, random_state=0)\n",
+    "tf_vectorizer = TfidfVectorizer(analyzer='word')\n",
+    "# tf_vectorizer.fit(akk_raw_train['Text'].to_list())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tf_vectorizer = TfidfVectorizer(analyzer='word')\n",
+    "save_vect = tf_vectorizer.fit_transform(df_akk['Text'].dropna().to_list())\n",
+    "# save_vect = tf_vectorizer.fit_transform(['The sun in the sky is bright', 'We can see the shining sun, the bright sun.'])\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tfidf_tokens = tf_vectorizer.get_feature_names_out()\n",
+    "df_tfidfvect = pd.DataFrame(data=save_vect.toarray(), columns=tfidf_tokens)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_mat = tf_vectorizer.transform(df_akk['Text'].dropna().to_list())\n",
+    "cc = cosine_similarity(save_vect,save_vect)\n",
+    "bool_similarity = cc > 0.5\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "abcd = np.where((cc > 0.5)&( cc< 1))\n",
+    "abcd[0].tofile(\"data.csv\", sep = \",\", format = \"%d\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Using matplotlib backend: <object object at 0x00000212CB626CA0>\n"
+     ]
+    }
+   ],
+   "source": [
+    "%matplotlib\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "f = sns.scatterplot(bool_similarity)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Project                                              P394767\n",
+       "Text       x x x BAD₃-ku-ri-gal-zi x E₂ 44 ša₂ BAD₃-{d}su...\n",
+       "Genre                                                lexical\n",
+       "Name: 4, dtype: object"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_akk.iloc[4,:]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/1
+++ b/1
@@ -0,0 +1 @@
+C:/Users/Saret/WaitForIt/oracc/
--- a/missing_list.txt
+++ b/missing_list.txt
--- a/10
+++ b/10
@@ -1,7 +1,7 @@
 adsd
 aemw
 akklove
-amgg
+amgg####
 ario
 armep
 arrim
@@ -12,14 +12,14 @@ blms
 btmao
 btto
 cams
-caspo
+#caspo##############
 ccpo
 cdli
 ckst
 cmawro
 contrib
-contrib/amarna
-contrib/lambert
+#contrib/amarna
+#contrib/lambert
 ctij
 dcclt
 dccmt
@@ -51,4 +51,4 @@ saao
 suhu
 tcma
 tsae
-xcat
+xcat
--- a/project_notebook.ipynb
+++ b/project_notebook.ipynb
--- a/report.html
+++ b/report.html
@@ -0,0 +1,15 @@
+<html>
+<head>
+    <title>דו"ח התקדמות</title>
+    <style>
+        
+    </style>
+</head>
+<body dir="rtl">
+    <h1>דו"ח התקדמות
+        <h2>מטרות
+            <p>מטרת הפרוייקט היא לייצר דרך למצוא קרבה או אינטראקסטואליות בין טקסטים שונים באכדית על מנת </p>
+        </h2>
+    </h1>
+</body>
+</html>
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,116 @@
+aiofiles==22.1.0
+aiosqlite==0.18.0
+anyio==3.6.2
+argon2-cffi==21.3.0
+argon2-cffi-bindings==21.2.0
+arrow==1.2.3
+asttokens==2.2.1
+attrs==22.2.0
+autopep8==2.0.2
+Babel==2.12.1
+backcall==0.2.0
+bcrypt==4.0.1
+beautifulsoup4==4.12.2
+bleach==6.0.0
+bs4==0.0.1
+certifi==2022.12.7
+cffi==1.15.1
+charset-normalizer==3.1.0
+colorama==0.4.6
+comm==0.1.3
+cryptography==40.0.1
+debugpy==1.6.7
+decorator==5.1.1
+defusedxml==0.7.1
+entrypoints==0.4
+executing==1.2.0
+fastjsonschema==2.16.3
+fqdn==1.5.1
+idna==3.4
+importlib-metadata==6.3.0
+ipykernel==6.22.0
+ipython==8.12.0
+ipython-genutils==0.2.0
+isoduration==20.11.0
+jedi==0.18.2
+Jinja2==3.1.2
+json5==0.9.11
+jsonpointer==2.3
+jsonschema==4.17.3
+jupyter-contrib-core==0.4.2
+jupyter-contrib-nbextensions==0.7.0
+jupyter-events==0.6.3
+jupyter-highlight-selected-word==0.2.0
+jupyter-kite==2.0.2
+jupyter-latex-envs==1.4.6
+jupyter-nbextensions-configurator==0.6.1
+jupyter-ydoc==0.2.3
+jupyter_client==8.1.0
+jupyter_core==5.3.0
+jupyter_server==2.5.0
+jupyter_server_fileid==0.9.0
+jupyter_server_terminals==0.4.4
+jupyter_server_ydoc==0.8.0
+jupyterlab==3.6.3
+jupyterlab-execute-time==2.3.1
+jupyterlab-pygments==0.2.2
+jupyterlab_server==2.22.0
+lxml==4.9.2
+MarkupSafe==2.1.2
+matplotlib-inline==0.1.6
+mistune==2.0.5
+nbclassic==0.5.5
+nbclient==0.7.3
+nbconvert==7.3.1
+nbformat==5.8.0
+nest-asyncio==1.5.6
+notebook==6.5.4
+notebook_shim==0.2.2
+numpy==1.24.2
+packaging==23.0
+pandas==2.0.0
+pandocfilters==1.5.0
+paramiko==3.1.0
+parso==0.8.3
+pickleshare==0.7.5
+platformdirs==3.2.0
+prometheus-client==0.16.0
+prompt-toolkit==3.0.38
+psutil==5.9.4
+psycopg2==2.9.6
+pure-eval==0.2.2
+pycodestyle==2.10.0
+pycparser==2.21
+Pygments==2.15.0
+PyNaCl==1.5.0
+pyrsistent==0.19.3
+python-dateutil==2.8.2
+python-json-logger==2.0.7
+pytz==2023.3
+PyYAML==6.0
+pyzmq==25.0.2
+requests==2.28.2
+rfc3339-validator==0.1.4
+rfc3986-validator==0.1.1
+Send2Trash==1.8.0
+six==1.16.0
+sniffio==1.3.0
+soupsieve==2.4
+sshtunnel==0.4.0
+stack-data==0.6.2
+terminado==0.17.1
+tinycss2==1.2.1
+tomli==2.0.1
+tornado==6.2
+traitlets==5.9.0
+typing_extensions==4.5.0
+tzdata==2023.3
+uri-template==1.2.0
+urllib3==1.26.15
+wcwidth==0.2.6
+webcolors==1.13
+webencodings==0.5.1
+websocket-client==1.5.1
+y-py==0.5.9
+ypy-websocket==0.8.2
+zipp==3.15.0
--- a/scrape.log
+++ b/scrape.log
--- a/scrape.py
+++ b/scrape.py
@@ -6,4 +6,4 @@ import psycopg2
 #     conn = psycopg2.connect("dbname='dh' user='dh' host='dh.saret.tk' password='qwerty'")
 #     return conn

-def 
+# def 
--- a/scrapping.py
+++ b/scrapping.py
@@ -1,7 +1,7 @@
 import json
 from typing import Dict, List
 import requests
-from bs4 import BeautifulSoup
+from bs4 import BeautifulSoup, ResultSet
 import os
 from pathlib import Path
 import re
@@ -9,7 +9,10 @@ import glob
 import logging
 # parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 # os.chdir(parent_dir)
-logging.basicConfig(filename='scrape.log', level=logging.INFO)
+import datetime
+logging.basicConfig(
+    level=logging.INFO, filename=f'{datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}.log', filemode='w',
+    format='%(name)s - %(levelname)s - %(message)s', datefmt='%d-%b-%y %H:%M:%S')
 JSONS_DIR = "jsons_unzipped"

 SUB_PROJECTS_IN_NEO_ASS = ["01", "05", "06", "07", "09", "11", "12", "14", "15", "16"]
@@ -29,9 +32,25 @@ def _load_json_from_path(json_path: str) -> Dict:
            return json.load(json_file)


-def get_raw_english_texts_of_project(project_dirname: str) -> List[Dict]:
+def _download_data_from_website(url: str) -> ResultSet:
+    try:
+        res = requests.get(url)
+        soup = BeautifulSoup(res.text, "html.parser")
+        return soup.find_all("span", {"class": "cell"})
+    except Exception as e:
+        logging.error(e)
+        return list()
+
+
+def _clean_raw_text(results: ResultSet) -> str:
+    return " ".join(["".join([content if isinstance(content, str) else content.text
+                              for content in result.contents]) for result in results]).replace('\n', ' ')
+
+
+def get_raw_english_texts_of_project(project_dirname: str, oracc_site: str = 'oracc.museum.upenn.edu') -> List[Dict]:
    raw_jsons = list()
-    all_paths = glob.glob(f'jsons_unzipped/{project_dirname}/**/corpusjson/*.json', recursive=True)
+    all_paths = glob.glob(f'jsons_unzipped/{project_dirname}/**/corpusjson/*.json', recursive=True) + glob.glob(
+        f'jsons_unzipped/{project_dirname}/corpusjson/*.json', recursive=True)
    # path = Path(os.path.join(JSONS_DIR, project_dirname, 'catalogue.json'))
    # if not os.path.isfile(path):
    #     return raw_jsons
@@ -40,24 +59,22 @@ def get_raw_english_texts_of_project(project_dirname: str) -> List[Dict]:
    # for member in d.get('members').values():
    for filename in all_paths:
        cur_json = _load_json_from_path(filename)
-        project_name = cur_json['project']
-
+        try:
+            project_name = cur_json['project']
+        except TypeError:
+            logging.error(f"Error in {filename}")
+            continue
        # # Skip in case we are in saa project and the current sub project is not in neo-assyrian
        # if project_dirname == "saao" and project_name[-2:] not in SUB_PROJECTS_IN_NEO_ASS:  # TODO: validate
        #     continue

        # id_text = member.get('id_text', "") + member.get('id_composite', "")
        # html_dir = "/".join(path.parts[1:-1])
-        url = f"http://oracc.iaas.upenn.edu/{project_name}/{cur_json['textid']}/html"
+        url = f"http://{oracc_site}/{project_name}/{cur_json['textid']}/html"
        # print(url)
        logging.info(url)
        try:
-            res = requests.get(url)
-            soup = BeautifulSoup(res.text, "html.parser")
-            results = soup.find_all("span", {"class": "cell"})
-            raw_text = " ".join(["".join([content if isinstance(content, str) else content.text
-                                        for content in result.contents]) for result in results])
-            raw_text = raw_text.replace('\n', ' ')
+            raw_text = _clean_raw_text(_download_data_from_website(url))
            if raw_text:
                raw_jsons.append({
                    "id_text": cur_json['textid'],
@@ -125,13 +142,13 @@ def get_raw_akk_texts_of_project(project_dirname: str) -> List[Dict]:
    :return: A list of jsons containing the raw texts of the given project and basic metadata.
    """
    raw_jsons = list()
-    all_paths = glob.glob(f'jsons_unzipped/{project_dirname}/**/corpusjson/*.json', recursive=True)
+    all_paths = glob.glob(f'jsons_unzipped/{project_dirname}/**/corpusjson/*.json', recursive=True)+glob.glob(
+        f'jsons_unzipped/{project_dirname}/corpusjson/*.json', recursive=True)

    for filename in all_paths:
        cur_json = _load_json_from_path(filename)

        try:
-            project_name = cur_json['project']
            sents_dicts = cur_json['cdl'][0]['cdl'][-1]['cdl']
        except Exception as e:
            print(f"In file {filename} failed because of {e}")
@@ -140,8 +157,7 @@ def get_raw_akk_texts_of_project(project_dirname: str) -> List[Dict]:
        raw_text = get_raw_akk_text_from_json(sents_dicts)
        raw_jsons.append({
            "id_text": cur_json['textid'],
-            "project_name": project_name,
-            "raw_text": raw_text,
+            "raw_text": raw_text
        })

    # if not texts_jsons or not texts_jsons.get('members'):
@@ -195,7 +211,7 @@ def _get_raw_text(json_dict: dict) -> str:
            raw_texts.extend(_get_raw_text(d['cdl']).split())
        elif _is_word(d):  # If node represents a word
            if previous_ref != d.get('ref'):  # If encountered new instance:
-                cur_text = d['frag'] if d.get('frag') else d['f']['form']
+                cur_text = d['f']['norm'] if d['f'].get('norm') else d['f']['form']
                raw_texts.append(cur_text + _get_addition(d))
                previous_ref = d.get('ref')
Author	SHA1	Message	Date
Benny Saret	8b8e15b082	auto rtl has been added	2023-10-21 15:37:32 +03:00
1kamma	8352a0a097	summery	2023-08-12 18:19:37 +03:00
Benny Saret	0e26118247	remove problems	2023-08-12 18:13:10 +03:00
1kamma	f8e1c4d062	demostration	2023-08-12 17:46:38 +03:00
Benny Saret	46152eadbf	update of processing	2023-08-12 15:41:13 +03:00
Benny Saret	01525451c7	added processing	2023-08-10 18:09:19 +03:00
Benny Saret	be4e16ed35	update the project data collecting and the steps for it	2023-08-09 18:54:42 +03:00
Benny Saret	5f91215acd	update the project data collecting and the steps for it	2023-08-09 17:20:43 +03:00
server	1e4f87368e	update the report progress	2023-08-09 00:37:42 +03:00
server	aad15a2a5a	readme	2023-08-08 18:53:05 +03:00
server	ee5983a7c5	updated source data, data grab	2023-08-08 18:36:53 +03:00
server	acc006df1b	Merge branch 'master' of https://git.saret.tk/saret/DH	2023-08-08 17:11:29 +03:00
server	31d2007bcb	no raw data	2023-08-08 17:06:12 +03:00
server	0b66f6cf1d	no raw data	2023-08-08 17:06:12 +03:00
server	e7e18c3300	updated goals	2023-08-08 16:59:57 +03:00
server	cc8dfeea0d	updated goals	2023-08-08 16:59:57 +03:00
server	8f0dd858e2	readme update	2023-08-08 16:28:08 +03:00
server	0448b6c447	readme update	2023-08-08 16:28:08 +03:00
server	1b6d0d2129	readme update	2023-08-08 16:27:47 +03:00
server	78e9e7502a	readme update	2023-08-08 16:27:47 +03:00
server	afe0eaf41d	readme update	2023-08-08 16:26:18 +03:00
server	adad325c44	readme update	2023-08-08 16:26:18 +03:00
server	0118af822c	readme update	2023-08-08 16:26:01 +03:00
server	f784ad9999	readme update	2023-08-08 16:26:01 +03:00
server	bab0735bf7	readme up	2023-08-08 16:25:04 +03:00
server	5af637c650	readme up	2023-08-08 16:25:04 +03:00
server	1545cdac8d	starting the report	2023-08-08 16:08:55 +03:00
server	fcdbfe86fa	starting the report	2023-08-08 16:08:55 +03:00
server	201626c66a	update report	2023-08-08 15:57:25 +03:00
server	5233079481	update report	2023-08-08 15:57:25 +03:00
1kamma	2735fb9ea2	failed scraping	2023-06-27 11:53:56 +03:00
1kamma	09aa16dcc8	failed scraping	2023-06-27 11:53:56 +03:00
1kamma	98d3d5994f	boolean similarity	2023-06-26 23:21:34 +03:00
1kamma	03f1d663d0	boolean similarity	2023-06-26 23:21:34 +03:00
1kamma	826a100f24	update	2023-06-26 23:12:28 +03:00
1kamma	db8244d902	update	2023-06-26 23:12:28 +03:00
server	26bbbe7d8c	updates from server	2023-04-19 06:59:45 +03:00
1kamma	89ec3a3578	update the progress	2023-04-17 02:51:50 +03:00
1kamma	a9e93bd99f	requirements	2023-04-16 20:24:47 +03:00
1kamma	4aaeb48ffb	remove irelevant files	2023-04-13 14:03:15 +03:00
1kamma	218a3d8135	finished scrapping all the data	2023-04-12 22:05:16 +03:00
1kamma	df548fa29d	update folders	2023-04-12 12:34:56 +03:00
1kamma	f592236971	failing in saao	2023-04-12 12:29:22 +03:00
1kamma	faab62e768	almost done with scraping	2023-04-12 12:06:49 +03:00
1kamma	15c9b56fd0	updates in oracc link; reading project that failed	2023-04-12 11:23:35 +03:00
1kamma	5b071fbac3	solve problems with some projects	2023-04-12 09:57:50 +03:00
1kamma	1236d91d9a	update exceptions in projects	2023-04-12 08:24:03 +03:00
1kamma	52f7b9abf0	more updates	2023-04-12 02:22:13 +03:00
1kamma	700c927cd8	updated log	2023-04-12 02:19:45 +03:00
1kamma	6380a9fa20	t1	2023-04-12 01:43:03 +03:00
1kamma	d66576cc06	trying later	2023-04-12 01:37:57 +03:00
1kamma	1d00aab2ab	update readme	2023-04-12 01:02:46 +03:00
1kamma	e31ca85485	writing data	2023-04-12 01:00:18 +03:00