Skip to content

Bean Validation errors calling file redetect api endpoint #8821

@matthew-a-dunlap

Description

@matthew-a-dunlap

During development of CORE2, I've been using pyDataverse to handle our Dataverse interactions.

One aspect of this is uploading files. We ran into #8344 which causes mime type to not be set. Because we want to support older installations, I'm shooting for a solution that doesn't require the fix pushed by @landreev (though I'm glad it exists!).

The solution I've tried is to call the redetect endpoint to get the correct file type. This works and there are no errors thrown in the response... BUT there are concerning messages now appearing in our logs. Note this is on our S3-based test Dataverse running 5.3:

file_redetect_error.log

I'm curious if anyone over at IQSS has insight as to what might be causing this? Maybe this is a pyDataverse issue but it seems like the calls are pretty straightforward. We are concerned specifically that all these warnings indicate that something is corrupting the metadata in our database.

Thanks much!

p.s. Incase it helps here are the responses from a few of our calls to the pyDataverse upload_datafile and redetect_file_type functions:

2022-06-29 19:18:08 DEBUG    [dataverse:059] Manuscript 32, upload_file_response {'_content': b'{"status":"OK","data":{"files":[{"description":"","label":"LagodnyJonesKochEnns_MainAnalysis.dta","restricted":false,"version":1,"datasetVersionId":32164,"dataFile":{"id":7519737,"persistentId":"","pidURL":"","filename":"LagodnyJonesKochEnns_MainAnalysis.dta","contentType":"application/x-stata-14","filesize":317240,"description":"","storageIdentifier":"s3://dataverse-awstest-dev:181b0e63abf-550fb746720c","rootDataFileId":-1,"md5":"9009bf1fb8fa1a8a1388c8feec250857","checksum":{"type":"MD5","value":"9009bf1fb8fa1a8a1388c8feec250857"},"creationDate":"2022-06-29"}}]}}', '_content_consumed': True, '_next': None, 'status_code': 200, 'headers': {'Date': 'Wed, 29 Jun 2022 19:18:06 GMT', 'Server': 'Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Content-Type, X-Dataverse-Key', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '570', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'}, 'raw': <urllib3.response.HTTPResponse object at 0x7f9968364790>, 'url': 'https://dataverse-awstest.irss.unc.edu/api/v1/datasets/:persistentId/add?persistentId=doi:10.33563/FK2/B4PIIQ&User-Agent=pydataverse&key=feac0f49-c19a-42da-abb0-88ec3778e824', 'encoding': 'UTF-8', 'history': [], 'reason': 'OK', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(seconds=1, microseconds=747831), 'request': <PreparedRequest [POST]>, 'connection': <requests.adapters.HTTPAdapter object at 0x7f99a9132c40>}
2022-06-29 19:18:09 DEBUG    [dataverse:064] Manuscript 32, redetect_response {'_content': b'{"status":"OK","data":{"dryRun":false,"oldContentType":"application/x-stata-14","newContentType":"application/x-stata-14"}}', '_content_consumed': True, '_next': None, 'status_code': 200, 'headers': {'Date': 'Wed, 29 Jun 2022 19:18:08 GMT', 'Server': 'Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Content-Type, X-Dataverse-Key', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '123', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'}, 'raw': <urllib3.response.HTTPResponse object at 0x7f99988da130>, 'url': 'https://dataverse-awstest.irss.unc.edu/api/v1/files/7519737/redetect?dryRun=false&User-Agent=pydataverse&key=feac0f49-c19a-42da-abb0-88ec3778e824', 'encoding': 'UTF-8', 'history': [], 'reason': 'OK', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(seconds=1, microseconds=63350), 'request': <PreparedRequest [POST]>, 'connection': <requests.adapters.HTTPAdapter object at 0x7f99b97d5250>}
2022-06-29 19:18:10 DEBUG    [dataverse:059] Manuscript 32, upload_file_response {'_content': b'{"status":"OK","data":{"files":[{"description":"","label":"LagodnyJonesKochEnns_StatePolicyMood_Codebook.pdf","restricted":false,"version":1,"datasetVersionId":32164,"dataFile":{"id":7519738,"persistentId":"","pidURL":"","filename":"LagodnyJonesKochEnns_StatePolicyMood_Codebook.pdf","contentType":"text/plain","filesize":96596,"description":"","storageIdentifier":"s3://dataverse-awstest-dev:181b0e643cf-6953673c070b","rootDataFileId":-1,"md5":"c21ceab2fd4bdd1d34065d2da91d6651","checksum":{"type":"MD5","value":"c21ceab2fd4bdd1d34065d2da91d6651"},"creationDate":"2022-06-29"}}]}}', '_content_consumed': True, '_next': None, 'status_code': 200, 'headers': {'Date': 'Wed, 29 Jun 2022 19:18:09 GMT', 'Server': 'Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Content-Type, X-Dataverse-Key', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '581', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'}, 'raw': <urllib3.response.HTTPResponse object at 0x7f9968388700>, 'url': 'https://dataverse-awstest.irss.unc.edu/api/v1/datasets/:persistentId/add?persistentId=doi:10.33563/FK2/B4PIIQ&User-Agent=pydataverse&key=feac0f49-c19a-42da-abb0-88ec3778e824', 'encoding': 'UTF-8', 'history': [], 'reason': 'OK', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(microseconds=968581), 'request': <PreparedRequest [POST]>, 'connection': <requests.adapters.HTTPAdapter object at 0x7f995823e2e0>}
2022-06-29 19:18:11 DEBUG    [dataverse:064] Manuscript 32, redetect_response {'_content': b'{"status":"OK","data":{"dryRun":false,"oldContentType":"text/plain","newContentType":"application/pdf"}}', '_content_consumed': True, '_next': None, 'status_code': 200, 'headers': {'Date': 'Wed, 29 Jun 2022 19:18:10 GMT', 'Server': 'Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Content-Type, X-Dataverse-Key', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '104', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'}, 'raw': <urllib3.response.HTTPResponse object at 0x7f99988daf10>, 'url': 'https://dataverse-awstest.irss.unc.edu/api/v1/files/7519738/redetect?dryRun=false&User-Agent=pydataverse&key=feac0f49-c19a-42da-abb0-88ec3778e824', 'encoding': 'UTF-8', 'history': [], 'reason': 'OK', 'cookies': <RequestsCookieJar[]>, 'elapsed': datetime.timedelta(microseconds=949062), 'request': <PreparedRequest [POST]>, 'connection': <requests.adapters.HTTPAdapter object at 0x7f995827cee0>}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions