Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Can be used to perform:
* <a href="https://github.com/flairNLP/flair" target="_blank"><code>flair</code></a> - Required if you want to use Flair mentions extractor and for TARS linker and TARS Mentions Extractor.
* <a href="https://github.com/facebookresearch/BLINK" target="_blank"><code>blink</code></a> - Required if you want to use Blink for linking to Wikipedia pages.
* <a href="https://github.com/urchade/GLiNER" target="_blank"><code>gliner</code></a> - Required if you want to use GLiNER Linker or GLiNER Mentions Extractor.
* <a href="https://github.com/SapienzaNLP/relik" target="_blank"><code>relik</code></a> - Required if you want to use Relik Linker.

## Installation

Expand Down Expand Up @@ -90,7 +91,7 @@ The linguistic approach relies on the idea that mentions will usually be a synta
### Linker
The **linker** will link the detected entities to a existing set of labels. Some of the **linkers**, however, are *end-to-end*, i.e. they don't need the **mentions extractor**, as they detect and link the entities at the same time.

Again, there are 5 **linkers** available currently, 3 of them are *end-to-end* and 2 are not.
Again, there are 6 **linkers** available currently, 4 of them are *end-to-end* and 2 are not.

| Linker Name | end-to-end | Source Code | Paper |
|:-----------:|:----------:|----------------------------------------------------------|--------------------------------------------------------------------|
Expand All @@ -99,6 +100,7 @@ Again, there are 5 **linkers** available currently, 3 of them are *end-to-end* a
| SMXM | &check; | [Source Code](https://github.com/Raldir/Zero-shot-NERC) | [Paper](https://aclanthology.org/2021.acl-long.120/) |
| TARS | &check; | [Source Code](https://github.com/flairNLP/flair) | [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) |
| GLINER | &check; | [Source Code](https://github.com/urchade/GLiNER) | [Paper](https://arxiv.org/abs/2311.08526) |
| RELIK | &check; | [Source Code](https://github.com/SapienzaNLP/relik) | [Paper](https://arxiv.org/abs/2408.00103) |

### Relations Extractor
The **relations extractor** will extract relations among different entities *previously* extracted by a **linker**..
Expand Down Expand Up @@ -241,7 +243,7 @@ from zshot import PipelineConfig
from zshot.linker import LinkerTARS
from zshot.evaluation.dataset import load_ontonotes_zs
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report
from zshot.evaluation.metrics.seqeval.seqeval import Seqeval
from zshot.evaluation.metrics._seqeval._seqeval import Seqeval

ontonotes_zs = load_ontonotes_zs('validation')

Expand Down
12 changes: 11 additions & 1 deletion docs/entity_linking.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

The **linker** will link the detected entities to a existing set of labels. Some of the **linkers**, however, are *end-to-end*, i.e. they don't need the **mentions extractor**, as they detect and link the entities at the same time.

There are 5 **linkers** available currently, 3 of them are *end-to-end* and 2 are not.
There are 6 **linkers** available currently, 4 of them are *end-to-end* and 2 are not.

| Linker Name | end-to-end | Source Code | Paper |
|:----------------------------------------------------:|:----------:|----------------------------------------------------------|--------------------------------------------------------------------|
| [Blink](https://ibm.github.io/zshot/blink_linker/) | X | [Source Code](https://github.com/facebookresearch/BLINK) | [Paper](https://arxiv.org/pdf/1911.03814.pdf) |
| [GENRE](https://ibm.github.io/zshot/genre_linker/) | X | [Source Code](https://github.com/facebookresearch/GENRE) | [Paper](https://arxiv.org/pdf/2010.00904.pdf) |
| [SMXM](https://ibm.github.io/zshot/smxm_linker/) | &check; | [Source Code](https://github.com/Raldir/Zero-shot-NERC) | [Paper](https://aclanthology.org/2021.acl-long.120/) |
| [TARS](https://ibm.github.io/zshot/tars_linker/) | &check; | [Source Code](https://github.com/flairNLP/flair) | [Paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf) |
| [GLINER](https://ibm.github.io/zshot/gliner_linker/) | &check; | [Source Code](https://github.com/urchade/GLiNER) | [Paper](https://arxiv.org/abs/2311.08526) |
| [RELIK](https://ibm.github.io/zshot/relik_linker/) | &check; | [Source Code](https://github.com/SapienzaNLP/relik) | [Paper](https://arxiv.org/abs/2408.00103) |


::: zshot.Linker
13 changes: 13 additions & 0 deletions docs/relik_linker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# ReLiK Linker
ReLiK is a lightweight and fast model for Entity Linking and Relation Extraction. It is composed of two main components: a retriever and a reader. The retriever is responsible for retrieving relevant documents from a large collection, while the reader is responsible for extracting entities and relations from the retrieved documents. ReLiK can be used with the from_pretrained method to load a pre-trained pipeline.

In **Zshot**, we created a linker to use ReLiK, and it works both providing entities or without providing entities, and with descriptions.

This is an *end-to-end* model, so there is no need to use a **mentions extractor** before.

The ReLiK **linker** will use the **entities** specified in the `zshot.PipelineConfig`, if any.

- [Paper](https://arxiv.org/abs/2408.00103)
- [Original Source Code](https://github.com/SapienzaNLP/relik)

::: zshot.linker.LinkerRelik
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ nav:
- regen.md
- smxm_linker.md
- tars_linker.md
- relik_linker.md
- gliner_linker.md
- Relations Extractor:
- relation_extractor.md
Expand Down
1 change: 1 addition & 0 deletions requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ gliner>=0.2.9
flake8>=4.0.1
coverage>=6.4.1
pydantic==1.9.2
relik==1.0.5
IPython
1 change: 1 addition & 0 deletions zshot/linker/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@
from zshot.linker.linker_smxm import LinkerSMXM # noqa: F401
from zshot.linker.linker_tars import LinkerTARS # noqa: F401
from zshot.linker.linker_ensemble import LinkerEnsemble # noqa: F401
from zshot.linker.linker_relik import LinkerRelik # noqa: F401
from zshot.linker.linker_gliner import LinkerGLINER # noqa: F401
80 changes: 80 additions & 0 deletions zshot/linker/linker_relik.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import contextlib
import logging
import pkgutil
from typing import Iterator, List, Optional, Union

from relik import Relik
from relik.inference.data.objects import RelikOutput
from relik.retriever.indexers.document import Document
from spacy.tokens import Doc

from zshot.config import MODELS_CACHE_PATH
from zshot.linker.linker import Linker
from zshot.utils.data_models import Span

logging.getLogger("relik").setLevel(logging.ERROR)

MODEL_NAME = "sapienzanlp/relik-entity-linking-large"


class LinkerRelik(Linker):
""" Relik linker """

def __init__(self, model_name=MODEL_NAME):
super().__init__()

if not pkgutil.find_loader("relik"):
raise Exception("relik module not installed. You need to install relik in order to use the relik Linker."
"Install it with: pip install relik")

self.model_name = model_name
self.model = None
# self.device = {
# "retriever_device": self.device,
# "index_device": self.device,
# "reader_device": self.device
# }

@property
def is_end2end(self) -> bool:
""" relik is end2end """
return True

def load_models(self):
""" Load relik model """
# Remove RELIK print
with contextlib.redirect_stdout(None):
if self.model is None:
if self._entities:
self.model = Relik.from_pretrained(self.model_name,
cache_dir=MODELS_CACHE_PATH,
retriever=None, device=self.device)
else:
self.model = Relik.from_pretrained(self.model_name,
cache_dir=MODELS_CACHE_PATH, device=self.device,
index_device='cpu')

def predict(self, docs: Iterator[Doc], batch_size: Optional[Union[int, None]] = None) -> List[List[Span]]:
"""
Perform the entity prediction
:param docs: A list of spacy Document
:param batch_size: The batch size
:return: List Spans for each Document in docs
"""
candidates = None
if self._entities:
candidates = [
Document(text=ent.name, id=i, metadata={'definition': ent.description})
for i, ent in enumerate(self._entities)
]

sentences = [doc.text for doc in docs]

self.load_models()
span_annotations = []
for sent in sentences:
relik_out: RelikOutput = self.model(sent, candidates=candidates)
span_annotations.append([Span(start=relik_span.start, end=relik_span.end, label=relik_span.label)
for relik_span in relik_out.spans])

return span_annotations
6 changes: 3 additions & 3 deletions zshot/tests/linker/test_gliner_linker.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

@pytest.fixture(scope="module", autouse=True)
def teardown():
logger.warning("Starting smxm tests")
logger.warning("Starting gliner tests")
yield True
gc.collect()

Expand All @@ -25,7 +25,7 @@ def test_gliner_download():
del linker.model, linker


def test_smxm_linker():
def test_gliner_linker():
nlp = spacy.blank("en")
gliner_config = PipelineConfig(
linker=LinkerGLINER(),
Expand All @@ -43,7 +43,7 @@ def test_smxm_linker():
del doc, nlp, gliner_config


def test_smxm_linker_no_entities():
def test_gliner_linker_no_entities():
nlp = spacy.blank("en")
gliner_config = PipelineConfig(
linker=LinkerGLINER(),
Expand Down
60 changes: 60 additions & 0 deletions zshot/tests/linker/test_relik_linker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import gc
import logging

import pytest
import spacy

from zshot import PipelineConfig, Linker
from zshot.linker import LinkerRelik
from zshot.tests.config import EX_DOCS, EX_ENTITIES

logger = logging.getLogger(__name__)


@pytest.fixture(scope="module", autouse=True)
def teardown():
logger.warning("Starting relik tests")
yield True
gc.collect()


@pytest.mark.skip(reason="Too expensive to run on every commit")
def test_relik_download():
linker = LinkerRelik()
linker.load_models()
assert isinstance(linker, Linker)
del linker.model, linker


@pytest.mark.skip(reason="Too expensive to run on every commit")
def test_relik_linker():
nlp = spacy.blank("en")
relik_config = PipelineConfig(
linker=LinkerRelik(),
entities=EX_ENTITIES
)
nlp.add_pipe("zshot", config=relik_config, last=True)
assert "zshot" in nlp.pipe_names

doc = nlp(EX_DOCS[1])
assert len(doc.ents) > 0
del nlp.get_pipe('zshot').linker.model, nlp.get_pipe('zshot').linker
nlp.remove_pipe('zshot')
del doc, nlp, relik_config


@pytest.mark.skip(reason="Too expensive to run on every commit")
def test_relik_linker_no_entities():
nlp = spacy.blank("en")
relik_config = PipelineConfig(
linker=LinkerRelik(),
entities=[]
)
nlp.add_pipe("zshot", config=relik_config, last=True)
assert "zshot" in nlp.pipe_names

doc = nlp(EX_DOCS[1])
assert len(doc.ents) == 0
del nlp.get_pipe('zshot').linker.model, nlp.get_pipe('zshot').linker
nlp.remove_pipe('zshot')
del doc, nlp, relik_config
4 changes: 4 additions & 0 deletions zshot/utils/download_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ def load_all():
LinkerGLINER().load_models()
except RuntimeError:
pass
# try:
# LinkerRelik().load_models()
# except RuntimeError:
# pass
try:
RelationsExtractorZSRC().load_models()
except RuntimeError:
Expand Down
Loading